【MapReduce】Streaming Job Failed!

    xiaoxiao2021-04-12  41

    报错发生情况:

    用Python写好了一个MR程序,使用Linux环境本地测试正常。 在Hadoop环境上测试就报错。

    我的环境:

    $hadoop version Hadoop 2.5.2 ...

    执行指令:

    hadoop jar $HADOOP_INSTALL_HOME/contrib/streaming/hadoop-*streaming*.jar \ -file ./mapper.py -mapper ./mapper.py \ -file ./reducer.py -reducer ./reducer.py \ -input /data/poem/data_test \ -output /data/poem/result

    报错信息:

    packageJobJar: [mapper.py, reducer.py] [] /tmp/streamjob4957099323859594325.jar tmpDir=null 17/04/13 15:10:52 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 17/04/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 17/04/13 15:10:56 INFO mapred.FileInputFormat: Total input paths to process : 2 17/04/13 15:10:56 INFO mapreduce.JobSubmitter: number of splits:2 17/04/13 15:10:57 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492067422224_0001 17/04/13 15:10:57 INFO impl.YarnClientImpl: Submitted application application_1492067422224_0001 17/04/13 15:10:57 INFO mapreduce.Job: The url to track the job: http://chinahaoop0:8088/proxy/application_1492067422224_0001/ 17/04/13 15:10:57 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local] 17/04/13 15:10:57 INFO streaming.StreamJob: Running job: job_1492067422224_0001 17/04/13 15:10:57 INFO streaming.StreamJob: Job running in-process (local Hadoop) 17/04/13 15:10:59 INFO streaming.StreamJob: map 0% reduce 0% 17/04/13 15:11:56 INFO streaming.StreamJob: map 50% reduce 0% 17/04/13 15:11:57 INFO streaming.StreamJob: map 100% reduce 0% 17/04/13 15:11:58 INFO streaming.StreamJob: map 0% reduce 0% 17/04/13 15:12:27 INFO streaming.StreamJob: map 50% reduce 0% 17/04/13 15:12:31 INFO streaming.StreamJob: map 0% reduce 0% 17/04/13 15:13:08 INFO streaming.StreamJob: map 100% reduce 0% 17/04/13 15:13:09 INFO streaming.StreamJob: map 0% reduce 0% 17/04/13 15:13:30 INFO streaming.StreamJob: map 50% reduce 0% 17/04/13 15:13:32 INFO streaming.StreamJob: map 100% reduce 0% 17/04/13 15:13:33 INFO streaming.StreamJob: map 100% reduce 100% 17/04/13 15:13:36 INFO streaming.StreamJob: Job running in-process (local Hadoop) 17/04/13 15:13:36 ERROR streaming.StreamJob: Job not Successful! 17/04/13 15:13:36 INFO streaming.StreamJob: killJob... 17/04/13 15:13:36 INFO impl.YarnClientImpl: Killed application application_1492067422224_0001 Streaming Job Failed!

    找到日志文件,发现具体报错信息为:

    Error: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1937) at org.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf.java:1125) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1905) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929) ... 8 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1811) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903) ... 9 more

    报错的关键信息是:

    java.lang.ClassNotFoundException: Class org.apache.hadoop.streaming.PipeMapRunner not found

    定位错误过程

    1.MR脚本有误:

    本地测试的时候,脚本正常,排除此问题。

    2.环境配置有误:

    使用hadoop的example jar包测试,正常。排除此问题。

    3.jar包问题:

    因为提示ClassNotFund的异常,第一个时间就应该想到是jar包的问题。jar包可能与hadoop的版本不匹配。

    最终处理:

    我的jar包是在网上单独下的,因为根据网上大多数教程提供的路径$HADOOP_INSTALL_HOME/contrib/streaming/hadoop-streaming.jar 最初我没有找到相应的路径,以为需要自身去下载。

    最后发现,hadoop 2.5.2中对应的jar包地址是在:

    $HADOOP_INSTALL_HOME/share/hadoop/tools/lib

    藏得有点儿太深了呀(′д` )…彡…彡!找了我半天!

    重写的执行语句:

    hadoop jar $HADOOP_INSTALL_HOME/share/hadoop/tools/lib/hadoop-*streaming*.jar\ -file ./mapper.py -mapper ./mapper.py \ -file ./reducer.py -reducer ./reducer.py \ -input /data/poem/data_test -output /data/poem/result

    经验总结:

    ClassNotFound异常,很有可能是jar包与hadoop环境不匹配。我的jar包太老了。像hadoop-streaming*.jar这类型的官方发布基础jar包,一般在装软件的时候都会自带。软件不同的版本,其路径很有可能有变化,需要灵活应变。(就连Centos7对比之前版本,许多命令都变了呢)屏幕上打印的的异常信息,常常不是很详细且精准。除了看屏幕上的错误信息以外,最好查看运行日志,查看详细的错误报告。
    转载请注明原文地址: https://ju.6miu.com/read-667714.html

    最新回复(0)