mapreduce只用map来处理数据小案例，减少reduce一端数据倾斜

xiaoxiao2021-03-25 181

mapreduce一般是由map和reduce分工合作来完成任务，但有时map分区之后数据不一致导致数据倾斜，某一个reduce任务负载过大，运行速度减慢。本案例以map代替reduce的工作来解决数据倾斜问题。

源码如下：

package MR_mapside_join; import java.io.BufferedReader; import java.io.FileInputStream; import java.io.IOException; import java.io.InputStreamReader; import java.net.URI; import java.util.HashMap; import java.util.Map; import org.apache.commons.lang.StringUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.NullWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class mapSideJoin { static class MapSideJoinMapper extends Mapper<longwritable text="" text="" nullwritable="">{ //设置初始化信息 Map<string string=""> prodect = new HashMap<string string="">(); Text k = new Text(); @Override protected void setup(Context context) throws IOException, InterruptedException { BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream("name.txt"))); String line; while(StringUtils.isNotEmpty(line = reader.readLine())){ String[] word = line.split("\t"); prodect.put(word[0],word[1]); } reader.close(); } @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String info = value.toString(); String[] detail = info.split("\t"); String name = prodect.get(detail[1]); k.set(info+"\t"+name); context.write(k, NullWritable.get()); } } public static void main(String[] args) throws Exception { //加载配置信息 Configuration conf = new Configuration(); Job job = Job.getInstance(conf); //设置jar包所在的路径 job.setJarByClass(mapSideJoin.class); //给job指定mapper类 job.setMapperClass(MapSideJoinMapper.class); //给job指定输出的k、v类型 job.setOutputKeyClass(Text.class); job.setOutputValueClass(NullWritable.class); //设置输入路径 FileInputFormat.setInputPaths(job, new Path(args[0])); //设置输出路径 FileOutputFormat.setOutputPath(job, new Path(args[1])); //指定需要一个缓存文件的到所有maptask节点运行目录 job.addCacheFile(new URI("/info/name.txt")); //设置reduce数目为0 job.setNumReduceTasks(0); //跑完结束 System.exit(job.waitForCompletion(true)?0:1); } } </string></string></longwritable>

输入，输出运行结果如下

输入文件：

这里我的centos系统不知道发什么神经orange这一行数据一直显示乱码，修改了多此也没修改成功，但是大家懂我什么意思就好了，不用纠结

结果：

好了，map运算得出结果了。

dsa

转载请注明原文地址: https://ju.6miu.com/read-4451.html

技术

最新回复(0)