1、Initial job has not accepted any resources
16/
08/
13 17:
05:
42 INFO TaskSchedulerImpl: Adding task
set 0.0 with 2 tasks
16/
08/
13 17:
05:
57 WARN TaskSchedulerImpl: Initial job
has not accepted any resources; check your cluster UI
to ensure that workers are registered
and have sufficient resources
这个信息就是告诉我们,初始化的作业不能接受到任何资源,spark只会寻找两件资源:Cores和Memory。所以,出现这个信息,肯定是这两种资源不够,我们可以打开Spark UI界面看看情况:
从图中可以发现,cores已经被用完了,也就是有其他任务正在占用这些资源,也或者是spark-shell,所以,才会出现上述警告信息。
参考: http://www.datastax.com/dev/blog/common-spark-troubleshooting
2、Exception in thread “main” java.lang.ClassNotFoundException
Exception
in thread
"main" java
.lang.ClassNotFoundException: Main
at java
.net.URLClassLoader.findClass(URLClassLoader
.java:
381)
at java
.lang.ClassLoader.loadClass(ClassLoader
.java:
424)
at java
.lang.ClassLoader.loadClass(ClassLoader
.java:
357)
at java
.lang.Class.forName0(Native Method)
at java
.lang.Class.forName(Class
.java:
348)
at org
.apache.spark.util.Utils$
.classForName(Utils
.scala:
174)
at org
.apache.spark.deploy.worker.DriverWrapper$
.main(DriverWrapper
.scala:
56)
at org
.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper
.scala)
当我们在提交spark-submit的时候,经常会遇到这个异常,但导致这个异常的原因真的很多,比如,在你的JAR包中,真的没有这个类,这个异常与其他找不到类的异常有个区别,区别在于,这里找不到类,是找不到主类,而不是找不到其他引用的类,如果找不到其他引用的类的话,很可能是类路径有问题,或没引入相应的类库,这里是没有找到主类,当时我也很奇怪,同样在一个JAR里,为什么有的主类可以找到,有些主类无法找到,后面发现当我用package把那个主类放在某个包下面时,这个主类就无法找到了,然后把这个主类放到源代码的根目录下,就能找到,所以,主类找不到的解决方法可以试试把主类放到源代码的根目录下,至少,我的情况是这样的,然后成功解决了,毕竟,每个人遇到的情况不一样,所以,good luck to you!
解决方法: 把主类放到源代码的根目录,即src下。
3、When running with master ‘yarn’ either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment
hadoop@master:~$ ./shell/spark-submit.sh
16/09/03 10:35:46 WARN util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/03 10:35:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /jar/edu-cloud-assembly-1.0.jar
16/09/03 10:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment.
at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251)
at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
解决方法: 编辑$SPARK_HOME/conf/spark-env.sh文件
hadoop
@master:~$ vi spark-
1.6.
0-bin-hadoop2.
4/conf/spark-env.sh
加入以下行:
HADOOP_CONF_DIR=
/home/hadoop/hadoop-2.4.0/etc/hadoop/
然后,将集群上的这个文件都更新。
4、awaitResult Exception
Exception
in thread
"main" org
.apache.spark.SparkException: Exception thrown
in awaitResult
问题原因:
解决方法: 将默认的配置调大,默认为300s,具体如下:
spark
.conf.set(
"spark.sql.broadcastTimeout",
1200)
5、Exception in thread “main” org.apache.spark.sql.AnalysisException: Both sides of this join are outside the broadcasting threshold and computing it could be prohibitively expensive. To explicitly enable it, please set spark.sql.crossJoin.enabled = true
18/
01/
09 20:
25:
33 INFO FileSourceStrategy: Planning scan with bin packing, max size:
134217728 bytes, open cost is considered as scanning
4194304 bytes.
Exception
in thread
"main" org
.apache.spark.sql.AnalysisException: Both sides of this join are outside the broadcasting threshold
and computing it could be prohibitively expensive. To explicitly enable it, please
set spark
.sql.crossJoin.enabled = true
at org
.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.doPrepare(BroadcastNestedLoopJoinExec
.scala:
345)
at org
.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan
.scala:
199)
at org
.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1
.apply(SparkPlan
.scala:
195)
at org
.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1
.apply(SparkPlan
.scala:
195)
at scala
.collection.immutable.List.foreach(List
.scala:
381)
at org
.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan
.scala:
195)
at org
.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1
.apply(SparkPlan
.scala:
195)
at org
.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1
.apply(SparkPlan
.scala:
195)
at scala
.collection.immutable.List.foreach(List
.scala:
381)
at org
.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan
.scala:
195)
at org
.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1
.apply(SparkPlan
.scala:
195)
at org
.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1
.apply(SparkPlan
.scala:
195)
at scala
.collection.immutable.List.foreach(List
.scala:
381)
at org
.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan
.scala:
195)
at org
.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1
.apply(SparkPlan
.scala:
134)
at org
.apache.spark.rdd.RDDOperationScope$
.withScope(RDDOperationScope
.scala:
151)
at org
.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan
.scala:
133)
at org
.apache.spark.sql.execution.SparkPlan.execute(SparkPlan
.scala:
114)
at org
.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan
.scala:
240)
at org
.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan
.scala:
323)
at org
.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit
.scala:
39)
at org
.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1
.apply(Dataset
.scala:
2193)
at org
.apache.spark.sql.execution.SQLExecution$
.withNewExecutionId(SQLExecution
.scala:
57)
at org
.apache.spark.sql.Dataset.withNewExecutionId(Dataset
.scala:
2546)
at org
.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset
.scala:
2192)
at org
.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1
.apply(Dataset
.scala:
2197)
at org
.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1
.apply(Dataset
.scala:
2197)
at org
.apache.spark.sql.Dataset.withCallback(Dataset
.scala:
2559)
at org
.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset
.scala:
2197)
at org
.apache.spark.sql.Dataset.collect(Dataset
.scala:
2173)
at scala
.collection.IndexedSeqOptimized$class
.foreach(IndexedSeqOptimized
.scala:
33)
at scala
.collection.mutable.ArrayOps$ofRef
.foreach(ArrayOps
.scala:
186)
at sun
.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl
.java:
62)
at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
.java:
43)
at java
.lang.reflect.Method.invoke(Method
.java:
498)
at org
.apache.spark.deploy.SparkSubmit$
.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit
.scala:
736)
at org
.apache.spark.deploy.SparkSubmit$
.doRunMain$1(SparkSubmit
.scala:
185)
at org
.apache.spark.deploy.SparkSubmit$
.submit(SparkSubmit
.scala:
210)
at org
.apache.spark.deploy.SparkSubmit$
.main(SparkSubmit
.scala:
124)
at org
.apache.spark.deploy.SparkSubmit.main(SparkSubmit
.scala)
18/
01/
09 20:
25:
34 INFO SparkContext: Invoking stop() from shutdown hook
解决方法:
set spark.sql.crossJoin.enabled = true;
转载请注明原文地址: https://ju.6miu.com/read-1296129.html