目前官方2.x只提供了源码下载,不再提供编译的版本,需要用户自己去编译。
由于对nutch2.3.1 进行编译时,一直处在网络检测中,于是改为对2.2.1版本进行编译,
下载地址:http://archive.apache.org/dist/nutch/2.2.1/
解压到自定义的文件夹下:tar -xvf apache-nutch-2.2.1-src-tar-gz /usr/local
修改 ${NUTCH_HOME}/ivy/ivy.xml文件,取消注释
<dependency org="mysql" name="mysql-connector-java" rev="5.1.18" conf="*->default"/> <dependency org="org.apache.gora" name="gora-sql" rev="0.1.1-incubating" conf="*->default" />修改:
<dependency org="org.apache.gora" name="gora-core" rev="0.3" conf="*->default"/>为: <dependency org="org.apache.gora" name="gora-core" rev="0.2.1" conf="*->default"/>
修改 ${NUTCH_HOME}/conf/gora.properties文件,注释掉默认的数据库连接配置,同时添加以下配置内容:
############################### # Default MySQL properties # ############################### gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver gora.sqlstore.jdbc.url=jdbc:mysql://localhost:3306/nutch?createDatabaseIfNotExist=true gora.sqlstore.jdbc.user=xxxx(MySQL用户名) gora.sqlstore.jdbc.password=xxxx(MySQL密码)将以下内容覆盖nutch-site.xml文件
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>http.agent.name</name> <value>YourNutchSpider</value> </property> <property> <name>http.accept.language</name> <value>ja-jp, en-us,en-gb,en;q=0.7,*;q=0.3</value> <description>Value of the Accept-Language request header field. This allows selecting non-English language as default one to retrieve. It is a useful setting for search engines build for certain national group. </description> </property> <property> <name>storage.data.store.class</name> <value>org.apache.gora.sql.store.SqlStore</value> <description>The Gora DataStore class for storing and retrieving data. Currently the following stores are available:. </description> </property> <property> <name>parser.character.encoding.default</name> <value>utf-8</value> <description>The character encoding to fall back to when no other information is available</description> </property> <property> <name>generate.batch.id</name> <value>*</value> </property> </configuration>
切换到apache-nutch.2.2.1主目录下,运行ant命令
Could not load definitions from resource org/sonar/ant/antlib.xml. It could not be found.
则下载sonar-ant-task-2.2.jar,地址http://repo2.maven.org/maven2/org/codehaus/sonar-plugins/sonar-ant-task/2.2/sonar-ant-task-2.2.jar 将其拷贝到 ${NUTCH_HOME}/lib 目录下面,并修改${NUTCH_HOME}/build.xml,在
<taskdef uri="antlib:org.sonar.ant" resource="org/sonar/ant/antlib.xml">下添加 <classpath><fileset dir="./lib" includes="sonar*.jar" /></classpath> 编译build failed
或者是其他的依赖性问题导致BUILD FAILED的,可通过修改maven中央库地址来解决
修改${NUTCH_HOME}/ivy/ivysettings.xml中
<property name="repo.maven.org" value="http://repo1.maven.org/maven2/" override="false"/>
value值改为其它中央库地址:
http://repo2.maven.org/maven2/(这个靠谱)
http://repository.sonatype.org/content/groups/public/
http://central.maven.org/maven2/
编译卡顿
若一直出现在以下界面:
resolve-default: [ivy:resolve] :: Apache Ivy 2.3.0 - 20130110142753 :: http://ant.apache.org/ivy/ :: [ivy:resolve] :: loading settings :: file = /opt/apache-nutch-2.3.1/ivy/ivysettings.xml 耐心等待两分钟,若还是不动,重新ant编译,最好在网络顺畅的条件下编译 编译中出现以下情况:
You probably access the destination server through a proxy server that is not well configured.重新ant编译
出现:
[ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :: commons-httpclient#commons-httpclient;3.1: configuration not found in commons-httpclient#commons-httpclient;3.1: 'master'. It was required from org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default [ivy:resolve] WARN: :: log4j#log4j;1.2.15: configuration not found in log4j#log4j;1.2.15: 'master'. It was required from org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :: FAILED DOWNLOADS :: [ivy:resolve] WARN: :: ^ see resolution messages for details ^ :: [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] WARN: :: org.mortbay.jetty#jetty;6.1.26!jetty.zip [ivy:resolve] WARN: :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] report for org.apache.nutch#nutch;working@damao-TravelMate-P256-MG default produced in /root/.ivy2/cache/org.apache.nutch-nutch-default.xml [ivy:resolve] resolve done (2940ms resolve - 4576ms download) [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] :::: WARNINGS [ivy:resolve] [FAILED ] org.mortbay.jetty#jetty;6.1.26!jetty.zip: (0ms) [ivy:resolve] ==== local: tried [ivy:resolve] /root/.ivy2/local/org.mortbay.jetty/jetty/6.1.26/zips/jetty.zip [ivy:resolve] ==== maven2: tried [ivy:resolve] http://central.maven.org/maven2/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip [ivy:resolve] ==== sonatype: tried [ivy:resolve] http://oss.sonatype.org/content/repositories/releases/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip 若出现上面FAILED DOWNLOADS,重新ant编译即可
若是maven中央库中确实没有这个包,则需要手动下载放到 /root/.ivy2/local/org.mortbay.jetty/jetty/6.1.26/zips/jetty.zip(具体地址看上述错误信息中的====local:tried部分)
若出现上面UNRESOLVED DEPENDENCIES,首先看已经下载的库中是否有这个包,地址在/root/.ivy2/cache或者/home/用户名/.ivy2/cache下
若是已经下载的库中有这个包,则删除该包,重新ant编译;
若下载的库中没有这个包,需要修改 ${NUTCH_HOME}/ivy/ivy.xml文件,通过定位commons-httpclient发现该包的conf属性为master,
<dependency org="commons-httpclient" name="commons-httpclient" rev="3.1" conf="*->master" />将conf属性修改为default.
参考文章:http://blog.csdn.net/u010317005/article/details/51090175