Linear complexity in External Memory
In the standard RAM model, one hits the sorting bottleneck of \Omega(n\log n) sometimes, and researchers find ways around by using say word operations that avoid comparisons. In the PRAM world, people tried to get around certain sorting bottleneck by "padded sorting", where items can appear in sorted order but with empty memory locations between them.
I was recently drawn to this issue when working with Gianni Franceschini and Roberto Grossi. Consider the external memory (just the vanilla version with input of size N on disk, internal memory of size M, and block size B) model. Even problems like permuting an array that takes O(N) time in the RAM model, and therefore, should ideally just take linear number O(N/B) of block accesses, seem to take essentially \Omega(N/B \log_{M/B} N/B) block accesses in the external memory model; sorting, convex hulls, MST and other problems then face similar bottleneck.
I think this is fundamental and one would think researchers would have by now figured out different ways to circumvent this bottleneck with additional assumptions on the input or on the model. But I haven't found anything. Any pointers?
ps: Btw, for the problem we were working on, we did find an algorithm with linear number O(N/B) of accesses. The algorithm is too long to be described in this blog.
I was recently drawn to this issue when working with Gianni Franceschini and Roberto Grossi. Consider the external memory (just the vanilla version with input of size N on disk, internal memory of size M, and block size B) model. Even problems like permuting an array that takes O(N) time in the RAM model, and therefore, should ideally just take linear number O(N/B) of block accesses, seem to take essentially \Omega(N/B \log_{M/B} N/B) block accesses in the external memory model; sorting, convex hulls, MST and other problems then face similar bottleneck.
I think this is fundamental and one would think researchers would have by now figured out different ways to circumvent this bottleneck with additional assumptions on the input or on the model. But I haven't found anything. Any pointers?
ps: Btw, for the problem we were working on, we did find an algorithm with linear number O(N/B) of accesses. The algorithm is too long to be described in this blog.
1 Comments:
I personally think the permutation bound is real, as long as it is sublinear... To be more precise, I believe that if you want to sort n values in external memory, the optimal running time is the minimum of the RAM running time, and the permutation bound.
The way the permutation bound is shown, it depends on elements being "indivisible". I am aware of several attempts to get around this by doing some sort of network coding (xoring the values intelligently, as opposed to treating them as black boxes). But all these attempts have failed in the worst case...
Post a Comment
<< Home