Tuesday, October 10, 2006

Prime groups

Streaming algorithms use sublinear space to summarize a series of updates and answer queries on the input approximately. If the input consists of a series of inserts only, most problems that can be solved in this model such as heavy hitters and quantiles can be solved deterministically. If the input consists of both inserts and deletes, then most algorithms are randomized, in the spirit of the seminal Alon, Matias and Szegedy work of 1996.

About two years ago, I visited Leszek Gasieniec at Liverpool, and we worked out this simple scheme of grouping the input items modulo primes and keeping all such modulo groups. The scheme used consecutive primes in a suitable range, but the total number of such groups was still sublinear. We solved some niche problem in data streams in presence of both inserts and deletes.

Recently, Sumit Ganguly and Anirban Mazumdar made a nifty observation that the prime groups can in fact be used to answer point queries, that is to estimate any input approximately. This is a basic indexing query using which a number of other queries can be answered! A slew of results now follow that show that heavy hitters, quantiles, histograms, etc, can in fact be computed deterministically in presence of inserts and deletes. Their paper (with other results too) can be found here.

ps: When I gave a talk recently at the Allerton conference, Venkat Guruswami asked if a family of hash functions will do, in place of these prime groups. May well be the case. The proofs only seem to use the fact that the groups should separate any subset of k items, for a given k known a priori.

3 Comments:

Anonymous Anonymous said...

An earlier paper with related somewhat similar ideas is:

Cheqing Jin, Weining Qian, Chaofeng Sha, Jeffrey X. Yu, Aoying Zhou. "Dynamically maintaining frequent items over a data stream". ACM CIKM '03.

7:47 AM  
Anonymous Anonymous said...

I guess the referred CIKM paper is similar to the Count Min sketch proposed and analysed in,

Graham Cormode, S. Muthukrishnan: An Improved Data Stream Summary: The Count-Min Sketch and Its Applications. LATIN 2004: 29-38

Both of them are randomized

11:29 PM  
Anonymous viagra online said...

Well, a set of mathematical instructions that must be followed in a fixed order, and that, especially if given to a computer, will help to calculate an answer to a mathematical problem is must be an algorithm because a program is only an algorithm if it stops eventually and it would be nice if you add annex something about it. 23jj

1:09 PM  

Post a Comment

<< Home