本文为《Spark大型电商项目实战》 系列文章之一,主要代码实现top10热门品类模块中的第五步:二次排序。
代码实现
/**
* 第五步:将数据映射成<SortKey,info>格式的RDD,然后进行二次排序(降序)
*/
JavaPairRDD<CategorySortKey, String> sortKey2countRDD = categoryid2countRDD.mapToPair(
new PairFunction<Tuple2<Long, String>, CategorySortKey, String>() {
private static final long serialVersionUID =
1L;
public Tuple2<CategorySortKey, String>
call(
Tuple2<Long, String> tuple)
throws Exception {
String countInfo = tuple._2;
long clickCount = Long.valueOf(StringUtils.getFieldFromConcatString(
countInfo,
"\\|", Constants.FIELD_CLICK_COUNT));
long orderCount = Long.valueOf(StringUtils.getFieldFromConcatString(
countInfo,
"\\|", Constants.FIELD_ORDER_COUNT));
long payCount = Long.valueOf(StringUtils.getFieldFromConcatString(
countInfo,
"\\|", Constants.FIELD_PAY_COUNT));
CategorySortKey sortKey =
new CategorySortKey(clickCount,
orderCount, payCount);
return new Tuple2<CategorySortKey, String>(sortKey, countInfo);
}
});
JavaPairRDD<CategorySortKey, String> sortedCategoryCountRDD =
sortKey2countRDD.sortByKey(
false);
《Spark 大型电商项目实战》源码:https://github.com/Erik-ly/SprakProject
本文为《Spark大型电商项目实战》系列文章之一, 更多文章:Spark大型电商项目实战:http://blog.csdn.net/u012318074/article/category/6744423
转载请注明原文地址: https://ju.6miu.com/read-670797.html