### Parallel Models Again

Recent discussions reminded me of three papers, new or old, on parallel models:

- MapReduce has inspired many in the past few years. The upcoming SODA paper by Karloff, Suri and Vassilvitskii has a new computational model for it. For example, the model allows n^{2/3} machines each with n^{2/3} memory for a problem of size n, and works in a series of polytime map and reduce steps. There are a couple of interesting results in this model for s-t connectivity and MST.
- PRAM has been "fixed" many times. A variation of PRAM that allows log n processors is in Dorrigiv, Lopez-Ortiz and Salinger. There are some general partitioning results for dynamic programming and other techniques.
- There is the agenda of using Graphics Processing Units (GPUs) as parallel machines. Suresh wrote a great note abstracting a model in 2003. Since then, more communities are following this agenda. A few weeks ago I heared about GPUs being used for medical data analysis.

Labels: aggregator

## 5 Comments:

I may be stupid, but don't you always have higher than sub-linear amount of machines available, because you have to store the data somewhere? (look at chapter 3 and especially 3.1.1). Of course, each machine can store lots of data but still.

One typically thinks of PRAMs with poly(n) processors, but this MapReduce model works with sublinear number of processors with total memory (super)linear in the input size. Sergei can say more I am sure.

-- Metoo

don't you always have higher than sub-linear amount of machines availableIdeally you would use as few machines as possible (communication costs increase very quickly as we increase the number of machines). If the data is accessed in a streaming-like fashion, it would typically be on disk. Each machine can store very large amounts (terabytes). Even if you keep all data in memory, it is still a few gigabytes per machine. So the number of machines is still about 7-9 orders of magnitude smaller than the number of records.

While we will have (even) more processors per machine soon, the amount of memory will probably grow at least as fast.

In general, one would use a parallel algorithm only if data doesn't fit in memory on one machine (network performance is much worse than in-memory performance). So linear number of machines seems exorbitant in most settings (e.g. I don't know of any jobs running on a 10^9 machine cluster).

I tend to disagree with the "ideally you would use as few machines as possible" statement.

What if you have an algorithm that takes O(n) time for one machine, but O(n/(m^{1/2})) for m machines; where the speed up is not linear due to communication costs, etc. How many machines should you use then?

I think the assumption that you only use parallelism when the data doesn't fit into memory is severely flawed. What if something takes 10 hours to compute using one machine, but 1 hour using 100 machines? What if it's 10 days using one machine but 8 hours (read:overnight) using 1000 machines? The answer is not so clear.

The sweet spot lies in between. You don't want too much parallelism, since then communication costs dominate; but also you don't want too little, even if the speed up you are getting is far from optimal (as in the examples above).

sergei

Since communication costs tend to grow superlinearly with the number of processors, reducing the processor count is generally a good idea. Our LoPRAM (low-processor-count PRAM) model is based on that observation. It turns out that problems which are rather hard for Omega(n) processors, are easy for O(log n) or even O(n^{1-epsilon}) processors.

Post a Comment

<< Home