Cloud Computing(3)

xiaoxiao2021-03-25 78

How do we aggregate partial counts efficiently?

Pairs

An algorithm. This algorithm illustrates the use of complex keys in order to coordinate distributed computations.

Each mapper takes a sentenceReducers sum up counts associated with these pairs //"pairs" approach class MAPPER method MAP(docid a, doc d) for all term w∈doc d do for all term u∈NEIGHBORS(w) do EMIT( pair(w, u) , count 1) //EMIT count for each co-occurrence class REDUCER method REDUCE(pair p, counts[c1, c2, ...]) s = 0 for all count c ∈counts[c1, c2, ...] do s = s + c EMIT(pair p, count s)

For each term emit pairs: ( (a,b), 1 ) 键值是一个pair（a,b）

“Pairs Analysis”(数组短，但数目多)

Advantages

Easy to implement, easy to understand: map就是找pair，reduce就是统计

Disadvantages

Lots of pairs to sort and shuffle around, upper bound = (n!)(n个单词，就有n的阶乘个pairs)Not many opportunities for combiners to work

Stripes

Co-occurrence information is first stored in an associative array, denoted H. The mapper emits key-value pairs with words as keys and corresponding associative arrays as values, where each associative array encodes the co-occurrence counts of the neighbors of a particular word.

Each mapper takes a sentenceReducers perform element-wise sum of associative arrays //"stripes" approach class MAPPER method MAP(docid a, doc d) for all term w∈doc d do H = new ASSOCIATIVEARRAY for all term u∈NEIGHBORS(w) do H{u} = H{u} + 1 //Tally words co-occurring with w EMIT( term w , Stripe H) class REDUCER method REDUCE(term w , Stripes [H1, H2, H3,...]) Hf = new ASSOCIATIVEARRAY for all stripe H ∈stripes[H1, H2, H3, ...] do sum(Hf,H) EMIT(term w , Stripe Hf)

For each term emit stripes: a->{b:1, c:2, d:2, ….} 键值是“a”

“Stripes Analysis”(数组长，但数目少)

Advantages

Far less sorting and shuffling of key-value pairsCan make better use of combiners

Disadvantages

More difficult to implementUnderlying object more heavyweightFundamental limitation in terms of size of event space

Pairs vs. Stripes

处理量不大，处理资源数目少，用pairs；反之，stripes较优

转载请注明原文地址: https://ju.6miu.com/read-16978.html

技术

最新回复(0)