第一:YARN概述
yarn是集群资源管理系统,负责资源的统一管理与调度
第二:YARN基本架构与原理
Resource Manager 整个集群只有一个Resource Manager,负责资源统一管理和调度
处理客户端请求监控NodeManager启动/监控Application Master资源分配与调度NodeManager 每个节点只有一个,负责节点的资源管理
单个节点的资源管理和分配处理Resource Manager的请求处理Application Master的请求Application Master 每个应用程序有一个,负责应用程序的管理和调度
为应用程序申请资源,并进一步分配内存等资源内部的监控和容错Container 对任务运行环境的统称
任务运行资源(cpu,内存,节点…)任务启动命令任务运行环境 提供多种资源调度器
FIFOFair SchedulerCapacity Scheduler
第三:YARN资源管理与调度
资源调度流程 Client->ResourceManger->Application Master,NodeManager->ResourceManager->Container资源调度方法1:FIFO
所有应用程序放到一个队列里队列前面的应用程序先获得资源资源利用率低,无法交叉运行作业,比如比较紧急的任务无法插队资源调度方法2 : 多队列调用
将多个应用程序放到多个队列中每个队列单独实现调度策略两种多队列调度器
Capacity SchedulerFair Scheduler基于标签的Scheduler只能用于Capacity Scheduler
第四:运行在YARN上的计算框架
离线的计算框架:MapReduceDAG计算框架:Tez流失计算框架:Storm内存计算框架:Spark
第五:项目实践
一、项目说明
采用Fair Scheduler调度策略,分配任务。 共有三个队列。tool,infrastructure,sentiment
二、配置yarn
yarn.site.xml
[hadoop@hadoopa hadoop-2.7.3]$ cat etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<description>The hostname of the RM.
</description>
<name>yarn.resourcemanager.hostname
</name>
<value>hadoopA
</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.
</description>
<name>yarn.resourcemanager.address
</name>
<value>${yarn.resourcemanager.hostname}:8032
</value>
</property>
<property>
<description>The address of the scheduler interface.
</description>
<name>yarn.resourcemanager.scheduler.address
</name>
<value>${yarn.resourcemanager.hostname}:8030
</value>
</property>
<property>
<description>The http address of the RM web application.
</description>
<name>yarn.resourcemanager.webapp.address
</name>
<value>${yarn.resourcemanager.hostname}:8088
</value>
</property>
<property>
<description>The https adddress of the RM web application.
</description>
<name>yarn.resourcemanager.webapp.https.address
</name>
<value>${yarn.resourcemanager.hostname}:8090
</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address
</name>
<value>${yarn.resourcemanager.hostname}:8031
</value>
</property>
<property>
<description>The address of the RM admin interface.
</description>
<name>yarn.resourcemanager.admin.address
</name>
<value>${yarn.resourcemanager.hostname}:8033
</value>
</property>
<property>
<description>The class to use as the resource scheduler.
</description>
<name>yarn.resourcemanager.scheduler.class
</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
</value>
</property>
<property>
<description>fair-scheduler conf location
</description>
<name>yarn.scheduler.fair.allocation.file
</name>
<value>/home/hadoop/hadoop-2.7.3/etc/hadoop/fairscheduler.xml
</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs
</name>
<value>/home/hadoop/yarn/local
</value>
</property>
<property>
<description>Whether to enable log aggregation
</description>
<name>yarn.log-aggregation-enable
</name>
<value>true
</value>
</property>
<property>
<description>Where to aggregate logs to.
</description>
<name>yarn.nodemanager.remote-app-log-dir
</name>
<value>/tmp/logs
</value>
</property>
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers.
</description>
<name>yarn.nodemanager.resource.memory-mb
</name>
<value>8720
</value>
</property>
<property>
<description>Number of CPU cores that can be allocated
for containers.
</description>
<name>yarn.nodemanager.resource.cpu-vcores
</name>
<value>2
</value>
</property>
<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers
</description>
<name>yarn.nodemanager.aux-services
</name>
<value>mapreduce_shuffle
</value>
</property>
</configuration>
fairscheduler.xml
[hadoop@hadoopa hadoop-2.7.3]$ cat etc/hadoop/fairscheduler.xml
<?xml version="1.0"?>
<allocations>
<queue name="infrastructure">
<minResources>102400 mb, 50 vcores
</minResources>
<maxResources>153600 mb, 100 vcores
</maxResources>
<maxRunningApps>200
</maxRunningApps>
<minSharePreemptionTimeout>300
</minSharePreemptionTimeout>
<weight>1.0
</weight>
<aclSubmitApps>root,yarn,search,hdfs
</aclSubmitApps>
</queue>
<queue name="tool">
<minResources>102400 mb, 30 vcores
</minResources>
<maxResources>153600 mb, 50 vcores
</maxResources>
</queue>
<queue name="sentiment">
<minResources>102400 mb, 30 vcores
</minResources>
<maxResources>153600 mb, 50 vcores
</maxResources>
</queue>
</allocations>
三、验证结果
一次性提交多个任务到不同的队列当中
[hadoop@hadoopa hadoop-
2.7.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-
2.7.3.jar pi -Dmapreduce
.job.queuename=tool
2 5000
[hadoop@hadoopa hadoop-
2.7.3]$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-
2.7.3.jar pi -Dmapreduce
.job.queuename=sentiment
2 5000
登录:http://192.168.1.201:8088/cluster/scheduler?openQueues 发现tool和sentiment两个队列都运行了起来
转载请注明原文地址: https://ju.6miu.com/read-1575.html