### Stochastic Data Streams

I am revisiting the area of streaming algorithms, and thought I will use the MFCS talk to push in a new direction. These days, I do research like cows chew their cud: I swallow an idea or a thought as it pops up during the busy day, and much later, when I have time, say during travel, I revisit the idea and chew on it. Much, much later, a paper or talk emerges.

The new-ish direction is that of stochastic streams where a distribution is given a priori. Then, items arrive, drawn from this distribution, instantiating a stream. The performance of a streaming algorithm on this instance is calibrated against the OPT on this instance, or its expected behavior over all streams. We need nice problems in this model, or separate it from the plain streaming model where the distribution is not known, or from plain stochastic model in which the distribution is known, but the algorithms do not necessarily have to be streaming, etc.

I have two potential examples in the writeup for MFCS, and one explained a little in the talk. Here are the slides (go to the end, this is a preview, it may change before the talk on Wednesday).

The new-ish direction is that of stochastic streams where a distribution is given a priori. Then, items arrive, drawn from this distribution, instantiating a stream. The performance of a streaming algorithm on this instance is calibrated against the OPT on this instance, or its expected behavior over all streams. We need nice problems in this model, or separate it from the plain streaming model where the distribution is not known, or from plain stochastic model in which the distribution is known, but the algorithms do not necessarily have to be streaming, etc.

I have two potential examples in the writeup for MFCS, and one explained a little in the talk. Here are the slides (go to the end, this is a preview, it may change before the talk on Wednesday).

Labels: aggregator

## 7 Comments:

sounds like a good place to apply prophet inequalities?

Indeed. The resulting algorithm should be implementable on stream, though.

-- Metoo

Hi Muthu,

I'm surprised that in the last slide you labeled classical data streaming as "well-understood". I still feel like many known results are ad-hoc, without much unifying principle explaining "why" we should expect certain results to be true. Or am I being too hopeful in thinking such principles should exist?

Hi Jelani,

Apologies: I was overzealous, or may be mentally, I hoped all things will be resolved in your thesis. :) In any case, your point is valid, I will tone down that claim.

Thanks!

Metoo

Looks pretty nice. It seems like this might have some connections to the field of online learning and online to batch conversion bounds -- have you thought about that?

--kamalika

Similar thought as above, that is, your setting reminds me of online algorithm and competitive analysis.

Hi K and T,

It is online algorithms, but with resource constraints of streaming algorithms and additionally knowing the distribution a priori.

Hi K,

There may be online learning, but I am unable to pull out a precise connection.

- Metoo

Post a Comment

<< Home