Spark各种问题集锦[持续更新]

    xiaoxiao2025-02-05  13

    1、Initial job has not accepted any resources

    16/08/13 17:05:42 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 16/08/13 17:05:57 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

    这个信息就是告诉我们,初始化的作业不能接受到任何资源,spark只会寻找两件资源:Cores和Memory。所以,出现这个信息,肯定是这两种资源不够,我们可以打开Spark UI界面看看情况:

    从图中可以发现,cores已经被用完了,也就是有其他任务正在占用这些资源,也或者是spark-shell,所以,才会出现上述警告信息。

    参考: http://www.datastax.com/dev/blog/common-spark-troubleshooting


    2、Exception in thread “main” java.lang.ClassNotFoundException

    Exception in thread "main" java.lang.ClassNotFoundException: Main at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:174) at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:56) at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)

    当我们在提交spark-submit的时候,经常会遇到这个异常,但导致这个异常的原因真的很多,比如,在你的JAR包中,真的没有这个类,这个异常与其他找不到类的异常有个区别,区别在于,这里找不到类,是找不到主类,而不是找不到其他引用的类,如果找不到其他引用的类的话,很可能是类路径有问题,或没引入相应的类库,这里是没有找到主类,当时我也很奇怪,同样在一个JAR里,为什么有的主类可以找到,有些主类无法找到,后面发现当我用package把那个主类放在某个包下面时,这个主类就无法找到了,然后把这个主类放到源代码的根目录下,就能找到,所以,主类找不到的解决方法可以试试把主类放到源代码的根目录下,至少,我的情况是这样的,然后成功解决了,毕竟,每个人遇到的情况不一样,所以,good luck to you!

    解决方法: 把主类放到源代码的根目录,即src下。

    3、When running with master ‘yarn’ either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment

    hadoop@master:~$ ./shell/spark-submit.sh 16/09/03 10:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/09/03 10:35:46 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes. Deleted /jar/edu-cloud-assembly-1.0.jar 16/09/03 10:35:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Exception in thread "main" java.lang.Exception: When running with master 'yarn' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment. at org.apache.spark.deploy.SparkSubmitArguments.validateSubmitArguments(SparkSubmitArguments.scala:251) at org.apache.spark.deploy.SparkSubmitArguments.validateArguments(SparkSubmitArguments.scala:228) at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:109) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:114) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

    解决方法: 编辑$SPARK_HOME/conf/spark-env.sh文件

    hadoop@master:~$ vi spark-1.6.0-bin-hadoop2.4/conf/spark-env.sh

    加入以下行:

    HADOOP_CONF_DIR=/home/hadoop/hadoop-2.4.0/etc/hadoop/

    然后,将集群上的这个文件都更新。

    4、awaitResult Exception

    Exception in thread "main" org.apache.spark.SparkException: Exception thrown in awaitResult

    问题原因:

    解决方法: 将默认的配置调大,默认为300s,具体如下:

    spark.conf.set("spark.sql.broadcastTimeout", 1200)

    5、Exception in thread “main” org.apache.spark.sql.AnalysisException: Both sides of this join are outside the broadcasting threshold and computing it could be prohibitively expensive. To explicitly enable it, please set spark.sql.crossJoin.enabled = true

    18/01/09 20:25:33 INFO FileSourceStrategy: Planning scan with bin packing, max size: 134217728 bytes, open cost is considered as scanning 4194304 bytes. Exception in thread "main" org.apache.spark.sql.AnalysisException: Both sides of this join are outside the broadcasting threshold and computing it could be prohibitively expensive. To explicitly enable it, please set spark.sql.crossJoin.enabled = true; at org.apache.spark.sql.execution.joins.BroadcastNestedLoopJoinExec.doPrepare(BroadcastNestedLoopJoinExec.scala:345) at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:199) at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195) at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195) at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195) at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195) at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195) at org.apache.spark.sql.execution.SparkPlan$$anonfun$prepare$1.apply(SparkPlan.scala:195) at scala.collection.immutable.List.foreach(List.scala:381) at org.apache.spark.sql.execution.SparkPlan.prepare(SparkPlan.scala:195) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:134) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:240) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:323) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2193) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2546) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2192) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2197) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2197) at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2559) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2197) at org.apache.spark.sql.Dataset.collect(Dataset.scala:2173) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 18/01/09 20:25:34 INFO SparkContext: Invoking stop() from shutdown hook

    解决方法:

    set spark.sql.crossJoin.enabled = true;
    转载请注明原文地址: https://ju.6miu.com/read-1296129.html
    最新回复(0)