Extreme Streaming
I am making my way back into researching streaming problems.
One of the directions I am focusing on: how to use not polylog memory as is standard in streaming algorithms, but even smaller, say O(1) memory. My coauthors and I have such algorithms for estimating the H-index on streams (to appear in PODS 2017, will be on arxiv soon) and estimating heavy hitters in a stream of streams model (to appear in SDN 2017).
I was sort of pushed into this model the way I like to find problems in general. If you look at modern applications, there are some real constraints. For examples in SDNs (Software Defined Networks), there are memory pipelines that packets can percolate through, each memory stage can be thought of as a row of standard sketches, and then one needs to compute something on top of these row estimates, but use only memory that can fit into a single packet header. Another example is that streaming analyses are done for a very large number of groups (say for each source IP address or internet user) and in that case, polylog memory per group is already far too much.
I call these extreme streaming problems, inspired by Extreme Classification in Machine Learning, which studies ML problems with a very large number of labels. I think there is more to mill here.
One of the directions I am focusing on: how to use not polylog memory as is standard in streaming algorithms, but even smaller, say O(1) memory. My coauthors and I have such algorithms for estimating the H-index on streams (to appear in PODS 2017, will be on arxiv soon) and estimating heavy hitters in a stream of streams model (to appear in SDN 2017).
I was sort of pushed into this model the way I like to find problems in general. If you look at modern applications, there are some real constraints. For examples in SDNs (Software Defined Networks), there are memory pipelines that packets can percolate through, each memory stage can be thought of as a row of standard sketches, and then one needs to compute something on top of these row estimates, but use only memory that can fit into a single packet header. Another example is that streaming analyses are done for a very large number of groups (say for each source IP address or internet user) and in that case, polylog memory per group is already far too much.
I call these extreme streaming problems, inspired by Extreme Classification in Machine Learning, which studies ML problems with a very large number of labels. I think there is more to mill here.
Labels: aggregator
0 Comments:
Post a Comment
<< Home