报错发生情况:
用Python写好了一个MR程序,使用Linux环境本地测试正常。 在Hadoop环境上测试就报错。
我的环境:
$hadoop version
Hadoop
2.5.2
...
执行指令:
hadoop jar
$HADOOP_INSTALL_HOME/contrib/streaming/hadoop
-*streaming
*.jar
\
-file ./mapper
.py
-mapper ./mapper
.py
\
-file ./reducer
.py
-reducer ./reducer
.py
\
-input /
data/poem/data_test
\
-output /
data/poem/result
报错信息:
packageJobJar: [mapper
.py, reducer
.py] [] /tmp/streamjob4957099323859594325
.jar tmpDir=null
17/
04/
13 15:
10:
52 INFO client
.RMProxy: Connecting to ResourceManager at /
0.0.0.0:
8032
17/
04/
13 15:
10:
53 INFO client
.RMProxy: Connecting to ResourceManager at /
0.0.0.0:
8032
17/
04/
13 15:
10:
56 INFO mapred
.FileInputFormat: Total input paths to process :
2
17/
04/
13 15:
10:
56 INFO mapreduce
.JobSubmitter: number of splits:
2
17/
04/
13 15:
10:
57 INFO mapreduce
.JobSubmitter: Submitting tokens for job: job_1492067422224_0001
17/
04/
13 15:
10:
57 INFO impl
.YarnClientImpl: Submitted application application_1492067422224_0001
17/
04/
13 15:
10:
57 INFO mapreduce
.Job: The url to track the job: http://chinahaoop0:
8088/proxy/application_1492067422224_0001/
17/
04/
13 15:
10:
57 INFO streaming
.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]
17/
04/
13 15:
10:
57 INFO streaming
.StreamJob: Running job: job_1492067422224_0001
17/
04/
13 15:
10:
57 INFO streaming
.StreamJob: Job running
in-process (local Hadoop)
17/
04/
13 15:
10:
59 INFO streaming
.StreamJob: map
0% reduce
0%
17/
04/
13 15:
11:
56 INFO streaming
.StreamJob: map
50% reduce
0%
17/
04/
13 15:
11:
57 INFO streaming
.StreamJob: map
100% reduce
0%
17/
04/
13 15:
11:
58 INFO streaming
.StreamJob: map
0% reduce
0%
17/
04/
13 15:
12:
27 INFO streaming
.StreamJob: map
50% reduce
0%
17/
04/
13 15:
12:
31 INFO streaming
.StreamJob: map
0% reduce
0%
17/
04/
13 15:
13:
08 INFO streaming
.StreamJob: map
100% reduce
0%
17/
04/
13 15:
13:
09 INFO streaming
.StreamJob: map
0% reduce
0%
17/
04/
13 15:
13:
30 INFO streaming
.StreamJob: map
50% reduce
0%
17/
04/
13 15:
13:
32 INFO streaming
.StreamJob: map
100% reduce
0%
17/
04/
13 15:
13:
33 INFO streaming
.StreamJob: map
100% reduce
100%
17/
04/
13 15:
13:
36 INFO streaming
.StreamJob: Job running
in-process (local Hadoop)
17/
04/
13 15:
13:
36 ERROR streaming
.StreamJob: Job not Successful!
17/
04/
13 15:
13:
36 INFO streaming
.StreamJob: killJob...
17/
04/
13 15:
13:
36 INFO impl
.YarnClientImpl: Killed application application_1492067422224_0001
Streaming Job Failed!
找到日志文件,发现具体报错信息为:
Error: java
.lang.RuntimeException: java
.lang.RuntimeException: java
.lang.ClassNotFoundException: Class org
.apache.hadoop.streaming.PipeMapRunner not found
at org
.apache.hadoop.conf.Configuration.getClass(Configuration
.java:
1937)
at org
.apache.hadoop.mapred.JobConf.getMapRunnerClass(JobConf
.java:
1125)
at org
.apache.hadoop.mapred.MapTask.runOldMapper(MapTask
.java:
426)
at org
.apache.hadoop.mapred.MapTask.run(MapTask
.java:
342)
at org
.apache.hadoop.mapred.YarnChild$2
.run(YarnChild
.java:
168)
at java
.security.AccessController.doPrivileged(Native Method)
at javax
.security.auth.Subject.doAs(Subject
.java:
422)
at org
.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation
.java:
1614)
at org
.apache.hadoop.mapred.YarnChild.main(YarnChild
.java:
163)
Caused by: java
.lang.RuntimeException: java
.lang.ClassNotFoundException: Class org
.apache.hadoop.streaming.PipeMapRunner not found
at org
.apache.hadoop.conf.Configuration.getClass(Configuration
.java:
1905)
at org
.apache.hadoop.conf.Configuration.getClass(Configuration
.java:
1929)
...
8 more
Caused by: java
.lang.ClassNotFoundException: Class org
.apache.hadoop.streaming.PipeMapRunner not found
at org
.apache.hadoop.conf.Configuration.getClassByName(Configuration
.java:
1811)
at org
.apache.hadoop.conf.Configuration.getClass(Configuration
.java:
1903)
...
9 more
报错的关键信息是:
java.lang.ClassNotFoundException: Class org
.apache.hadoop.streaming.PipeMapRunner not found
定位错误过程
1.MR脚本有误:
本地测试的时候,脚本正常,排除此问题。
2.环境配置有误:
使用hadoop的example jar包测试,正常。排除此问题。
3.jar包问题:
因为提示ClassNotFund的异常,第一个时间就应该想到是jar包的问题。jar包可能与hadoop的版本不匹配。
最终处理:
我的jar包是在网上单独下的,因为根据网上大多数教程提供的路径$HADOOP_INSTALL_HOME/contrib/streaming/hadoop-streaming.jar 最初我没有找到相应的路径,以为需要自身去下载。
最后发现,hadoop 2.5.2中对应的jar包地址是在:
$HADOOP_INSTALL_HOME/share/hadoop/tools/lib
藏得有点儿太深了呀(′д` )…彡…彡!找了我半天!
重写的执行语句:
hadoop jar
$HADOOP_INSTALL_HOME/share/hadoop/tools/lib/hadoop
-*streaming
*.jar
\
-file ./mapper
.py
-mapper ./mapper
.py
\
-file ./reducer
.py
-reducer ./reducer
.py
\
-input /
data/poem/data_test
-output /
data/poem/result
经验总结:
ClassNotFound异常,很有可能是jar包与hadoop环境不匹配。我的jar包太老了。像hadoop-streaming*.jar这类型的官方发布基础jar包,一般在装软件的时候都会自带。软件不同的版本,其路径很有可能有变化,需要灵活应变。(就连Centos7对比之前版本,许多命令都变了呢)屏幕上打印的的异常信息,常常不是很详细且精准。除了看屏幕上的错误信息以外,最好查看运行日志,查看详细的错误报告。
转载请注明原文地址: https://ju.6miu.com/read-667714.html