my slice of pizza: MapReduce Prefix Sums

Sunday, September 05, 2010

MapReduce Prefix Sums

Input is a set of items (i, v_i), for 1=1, ..., n, distributed in arbitrary order among m machines with memory M each. Output should be (i, s_i=\sum_{j\leq i} v_j), for all i, again distributed in arbitrary order on the machines. This is the prefix sums problem for massive, unordered, distributed (MUD) data. We need to solve this using MapReduce. I can design an algorithm that will take 3 phases of Map+Reduce with suitable assumptions about n vs M, maybe even 2. Can someone show this can not be done in one phase of Map+Reduce?

Labels: aggregator

5 Comments:

Anonymous said...: Why don't you post the question on http://cstheory.stackexchange.com; 1:51 PM
Mihai said...: Hi Muthu! I guess the model is not 100% clear in my head.

Do you care about the work? How does your algorithm run? Does it do O(n^2) work?; 11:45 AM
Hagit said...: Another model clarification question:

Can you do summation, i.e., compute \sum_{i=1,..,n} v_i, in a single map-reduce phase? (If so, how?); 1:42 PM
Francesco said...: Each reduce can compute at most M (partial) prefix sums in a round. Then if M<n, you need at least two rounds. In general, at least (\log_M n) rounds?; 1:53 PM
Anonymous said...: Dear Mihai, Hagit and Francesco,

Apologies for the late response. Someone (u know who) told me "there is no bigger impediment to scientific progress than the stroller in the hallway."

* What I had in mind was the trivial algorithm. Each machine j works on jn/M + 1 ... (j+1)n/M items and computes required partial prefix sums. Then all prefix sums for the n/M sized problem is computed (I assumed, this will fit into mem of one machine. M like 1000's, internal mem like xGB, n like xTB) and then finally finish up for each piece. So, total "work" is O(n).

* This does not take advantage of fun things Reduce can do. For example, we can assume \sum_{i=1,..,n} v_i can be done in single MR phase. Can other computations be piggybacked on the underlying tree computation?

Part of the problem is to fix some capability for MR.

-- Metoo; 3:02 AM

my slice of pizza

Sunday, September 05, 2010

MapReduce Prefix Sums

5 Comments:

About Me

Previous Posts