前置条件:
1、jdk安装成功(http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html)
2、下载hadoop-1.2.1.tar.gz(https://dist.apache.org/repos/dist/release/hadoop/common/hadoop-1.2.1/)
安装hadoop
首先将hadoop-1.2.1.tar.gz复制到usr下的local文件夹内,然后解压如图1
drwxr-xr-x. 2 root root 4096 9月 23 2011 bin drwxr-xr-x. 2 root root 4096 9月 23 2011 etc drwxr-xr-x. 2 root root 4096 9月 23 2011 games drwxr-xr-x. 16 root root 4096 9月 14 16:42 hadoop-1.2.1 -rw-r--r--. 1 root root 63851630 9月 14 21:20 hadoop-1.2.1.tar.gz drwxr-xr-x. 2 root root 4096 9月 23 2011 include drwxr-xr-x. 8 uucp 143 4096 6月 22 17:50 jdk1.8.0_101 drwxr-xr-x. 2 root root 4096 9月 23 2011 lib drwxr-xr-x. 2 root root 4096 9月 23 2011 libexec drwxr-xr-x. 2 root root 4096 9月 23 2011 sbin drwxr-xr-x. 5 root root 4096 9月 14 2016 share drwxr-xr-x. 2 root root 4096 9月 23 2011 src
配置hadoop
0、浏览hadoop文件下都有些什么东西,如图13
drwxr-xr-x. 2 root root 4096 9月 14 16:32 bin -rw-rw-r--. 1 root root 121130 7月 23 2013 build.xml drwxr-xr-x. 4 root root 4096 7月 23 2013 c++ -rw-rw-r--. 1 root root 493744 7月 23 2013 CHANGES.txt drwxr-xr-x. 2 root root 4096 9月 14 21:30 conf drwxr-xr-x. 10 root root 4096 7月 23 2013 contrib drwxr-xr-x. 6 root root 4096 9月 14 16:31 docs -rw-rw-r--. 1 root root 6842 7月 23 2013 hadoop-ant-1.2.1.jar -rw-rw-r--. 1 root root 414 7月 23 2013 hadoop-client-1.2.1.jar -rw-rw-r--. 1 root root 4203147 7月 23 2013 hadoop-core-1.2.1.jar -rw-rw-r--. 1 root root 142726 7月 23 2013 hadoop-examples-1.2.1.jar -rw-rw-r--. 1 root root 417 7月 23 2013 hadoop-minicluster-1.2.1.jar -rw-rw-r--. 1 root root 3126576 7月 23 2013 hadoop-test-1.2.1.jar -rw-rw-r--. 1 root root 385634 7月 23 2013 hadoop-tools-1.2.1.jar drwxr-xr-x. 2 root root 4096 9月 14 16:31 ivy -rw-rw-r--. 1 root root 10525 7月 23 2013 ivy.xml drwxr-xr-x. 5 root root 4096 9月 14 16:31 lib drwxr-xr-x. 2 root root 4096 9月 14 16:32 libexec -rw-rw-r--. 1 root root 13366 7月 23 2013 LICENSE.txt drwxr-xr-x. 4 root root 4096 9月 14 22:03 logs -rw-rw-r--. 1 root root 101 7月 23 2013 NOTICE.txt -rw-rw-r--. 1 root root 1366 7月 23 2013 README.txt drwxr-xr-x. 2 root root 4096 9月 14 16:32 sbin drwxr-xr-x. 3 root root 4096 7月 23 2013 share drwxr-xr-x. 16 root root 4096 9月 14 16:32 src drwxr-xr-x. 9 root root 4096 7月 23 2013 webapps
1、打开conf/hadoop-env.sh,如图14
vi /usr/local/hadoop-1.2.1/conf/hadoop-env.sh
export JAVA_HOME=/usr/local/jdk1.8.0_101 export HADOOP_HOME=/usr/local/hadoop-1.2.1 export PATH=$PATH:/usr/local/hadoop-1.2.1/bin
如图15
---------------------------------------------------------------------------------------------
# remote nodes. # The java implementation to use. Required. export JAVA_HOME=/usr/local/jdk1.8.0_101 export HADOOP_HOME=/usr/local/hadoop-1.2.1 export PATH=$PATH:/usr/local/hadoop-1.2.1/bin
--------------------------------------------------------------------------------------
2、打开conf/core-site.xml
配置,如下内容:
Java代码 <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> </configuration>
3、打开conf目录下的mapred-site.xml
配置如下内容:
Java代码 <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
运行测试:
1、格式化namenode,如图18
hadoop namenode -format
[root@linux-01 hadoop-1.2.1]# hadoop namenode -format Warning: $HADOOP_HOME is deprecated. 16/09/14 22:10:18 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = linux-01/127.0.0.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 1.2.1 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013 STARTUP_MSG: java = 1.8.0_101 ************************************************************/ Re-format filesystem in /home/hadoop/tmp/dfs/name ? (Y or N) Y 16/09/14 22:10:23 INFO util.GSet: Computing capacity for map BlocksMap 16/09/14 22:10:23 INFO util.GSet: VM type = 32-bit 16/09/14 22:10:23 INFO util.GSet: 2.0% max memory = 1013645312 16/09/14 22:10:23 INFO util.GSet: capacity = 2^22 = 4194304 entries 16/09/14 22:10:23 INFO util.GSet: recommended=4194304, actual=4194304 16/09/14 22:10:23 INFO namenode.FSNamesystem: fsOwner=root 16/09/14 22:10:23 INFO namenode.FSNamesystem: supergroup=supergroup 16/09/14 22:10:23 INFO namenode.FSNamesystem: isPermissionEnabled=true 16/09/14 22:10:23 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100 16/09/14 22:10:23 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s) 16/09/14 22:10:23 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0 16/09/14 22:10:23 INFO namenode.NameNode: Caching file names occuring more than 10 times 16/09/14 22:10:23 INFO common.Storage: Image file /home/hadoop/tmp/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds. 16/09/14 22:10:23 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/hadoop/tmp/dfs/name/current/edits 16/09/14 22:10:23 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/hadoop/tmp/dfs/name/current/edits 16/09/14 22:10:23 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted. 16/09/14 22:10:23 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at linux-01/127.0.0.1 ************************************************************/
可能遇到如下错误(倒腾这个过程次数多了),如图19
执行如图20,再次执行如图18
2、启动hadoop,如图21
./bin/start-all.sh
[root@linux-01 hadoop-1.2.1]# ./bin/start-all.sh Warning: $HADOOP_HOME is deprecated. starting namenode, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-namenode-linux-01.out root@localhost's password: localhost: Warning: $HADOOP_HOME is deprecated. localhost: localhost: starting datanode, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-datanode-linux-01.out root@localhost's password: localhost: Warning: $HADOOP_HOME is deprecated. localhost: localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-secondarynamenode-linux-01.out starting jobtracker, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-jobtracker-linux-01.out root@localhost's password: localhost: Warning: $HADOOP_HOME is deprecated. localhost: localhost: starting tasktracker, logging to /usr/local/hadoop-1.2.1/logs/hadoop-root-tasktracker-linux-01.out
3、验证hadoop是否成功启动,
使用jps 如图22
[root@linux-01 hadoop-1.2.1]# jps 25991 DataNode 26361 TaskTracker 24827 FsShell 26428 Jps 26204 JobTracker 26124 SecondaryNameNode 25855 NameNode
运行自带wordcount例子(jidong啊)
1、准备需要进行wordcount的文件,如图23(在test.txt中随便输入字符串,保存并退出)
[root@linux-01 tmp]# mkdir test.txt [root@linux-01 tmp]# vi test.txt hello , welcome hadoop !!!
保存退出 :wq -------------------------------------------------------------------------------------------
2、将上一步中的测试文件上传到dfs文件系统中的firstTest目录下,如图24(如果dfs下不包含firstTest目录的话自动创建一个同名目录,使用命令:bin/hadoop dfs -ls查看dfs文件系统中已有的目录)
bin/hadoop dfs -copyFromLocal /tmp/test.txt firstTest
3、执行wordcount,如图25(对firstest下的所有文件执行wordcount,将统计结果输出到result文件夹中,若result文件夹不存在则自动创建)
bin/hadoop jar hadoop-examples-1.2.1.jar wordcount firstTest result
[root@linux-01 hadoop-1.2.1]# bin/hadoop jar hadoop-examples-1.2.1.jar wordcount firstTest result Warning: $HADOOP_HOME is deprecated. 16/09/14 22:41:42 INFO input.FileInputFormat: Total input paths to process : 1 16/09/14 22:41:42 INFO util.NativeCodeLoader: Loaded the native-hadoop library 16/09/14 22:41:42 WARN snappy.LoadSnappy: Snappy native library not loaded 16/09/14 22:41:42 INFO mapred.JobClient: Running job: job_201609142236_0004 16/09/14 22:41:43 INFO mapred.JobClient: map 0% reduce 0% 16/09/14 22:41:47 INFO mapred.JobClient: map 100% reduce 0% 16/09/14 22:41:55 INFO mapred.JobClient: map 100% reduce 33% 16/09/14 22:41:56 INFO mapred.JobClient: map 100% reduce 100% 16/09/14 22:41:57 INFO mapred.JobClient: Job complete: job_201609142236_0004 16/09/14 22:41:57 INFO mapred.JobClient: Counters: 29 16/09/14 22:41:57 INFO mapred.JobClient: Map-Reduce Framework 16/09/14 22:41:57 INFO mapred.JobClient: Spilled Records=10 16/09/14 22:41:57 INFO mapred.JobClient: Map output materialized bytes=63 16/09/14 22:41:57 INFO mapred.JobClient: Reduce input records=5 16/09/14 22:41:57 INFO mapred.JobClient: Virtual memory (bytes) snapshot=589844480 16/09/14 22:41:57 INFO mapred.JobClient: Map input records=1 16/09/14 22:41:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=106 16/09/14 22:41:57 INFO mapred.JobClient: Map output bytes=47 16/09/14 22:41:57 INFO mapred.JobClient: Reduce shuffle bytes=63 16/09/14 22:41:57 INFO mapred.JobClient: Physical memory (bytes) snapshot=190750720 16/09/14 22:41:57 INFO mapred.JobClient: Reduce input groups=5 16/09/14 22:41:57 INFO mapred.JobClient: Combine output records=5 16/09/14 22:41:57 INFO mapred.JobClient: Reduce output records=5 16/09/14 22:41:57 INFO mapred.JobClient: Map output records=5 16/09/14 22:41:57 INFO mapred.JobClient: Combine input records=5 16/09/14 22:41:57 INFO mapred.JobClient: CPU time spent (ms)=760 16/09/14 22:41:57 INFO mapred.JobClient: Total committed heap usage (bytes)=177016832 16/09/14 22:41:57 INFO mapred.JobClient: File Input Format Counters 16/09/14 22:41:57 INFO mapred.JobClient: Bytes Read=27 16/09/14 22:41:57 INFO mapred.JobClient: FileSystemCounters 16/09/14 22:41:57 INFO mapred.JobClient: HDFS_BYTES_READ=133 16/09/14 22:41:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=109333 16/09/14 22:41:57 INFO mapred.JobClient: FILE_BYTES_READ=63 16/09/14 22:41:57 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=37 16/09/14 22:41:57 INFO mapred.JobClient: Job Counters 16/09/14 22:41:57 INFO mapred.JobClient: Launched map tasks=1 16/09/14 22:41:57 INFO mapred.JobClient: Launched reduce tasks=1 16/09/14 22:41:57 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8646 16/09/14 22:41:57 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 16/09/14 22:41:57 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4553 16/09/14 22:41:57 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 16/09/14 22:41:57 INFO mapred.JobClient: Data-local map tasks=1 16/09/14 22:41:57 INFO mapred.JobClient: File Output Format Counters 16/09/14 22:41:57 INFO mapred.JobClient: Bytes Written=37
4、查看结果,如图26
bin/hadoop dfs -cat result/part-r-00000
[root@linux-01 hadoop-1.2.1]# bin/hadoop dfs -cat result/part-r-00000 Warning: $HADOOP_HOME is deprecated. !!! 1 , 1 hadoop 1 hello 1 welcome 1