Thursday, August 20, 2009

Streaming Algorithms

I once asked my favorite sushi chef (You know who, David Applegate) what he would like to do in his life. After climbing and conquering the cruel mountain that is the NY Sushi Scene, what is left. He said, "I would like to specialize in one fish." That is poetry.

I have been thinking about streaming algorithms again some, and the question is, is it necessary to limit the model to sublinear space for it to be nontrivial, interesting. Doesn't it just suffice to limit per-item and query processing times to be sublinear? Now this question never occurred to me before because I have been focusing on one fish, that is, vector signals on streams. In that case, if we allow linear space, many of the problems become simple, sub-FOCS/STOC I mean. The past week I thought about other fish , ie., streaming problems in general (graphs, matrices, etc) and realized that even without restricting the space, the problems are nontrivial. The problems become full space data structure problems where one still needs approximations to be sublinear in time. This is not a novel realization, but makes me take a deeper breath when explaining the stream model of computation.

ps: I will be at the 34th MFCS conference next week, speaking on algorithms for stochastic streams.

Labels:

Mihai said...

Perhaps I am missing something --- but are you saying "I just realized the dynamic data structures are non-trivial" ? :)

7:27 AM
Dan said...

Perhaps I am missing something too, but I thought all the time that these are the old "on-line algorithms", and streaming is just a special case of them.

1:06 PM
Anonymous said...

Mihai:
Stream model is like dynamic data structure with sublinear space, is a valid view. Instead streaming can be said to be like dynamic data structures with sublinear update/query time, and you will still have the nontrivial things for streaming with graphs, matrices and other objects (but not for many of the problems with vector statistics). Eg., we will need+have approximations which is not traditionally big in data structures.

Dan:
Streaming is like Online algorithms with resource bounds is a valid view. Question is what resources one has to bound. For vector streaming problems one bounds space and possibly update/query times. For graph streaming and others, one can just bound update+query times and still have nontrivial special cases of online algorithms.

Summary: May be what I am saying is not a big deal, you have to bound resources, time, space, whatever. But one could make an argument that in trying to justify the streaming model, I had to justify sublinear space as well as sublinear time, but in some applications, I believe we can only justify sublinear update+query time and that is easier, and suffices.

2:19 PM