Friday, August 18, 2017

Heavy Hitters Continued

Recently I revisited the classical heavy hitters problem on streams (I think I can call it "classical", I remember the meeting where we chose to call overwhelmingly large items as heavy hitters, in the shadow of the baseball scandals in early 2000's)  but now looked at the high dimensional version, where each stream item is a d dimensional vector.  

There is a d^2 space bound that is inherent, and we manage to circumvent it, by using graphical model (in our case, Naive Bayes) on the dimensions. This is in general a fruitful direction, I think, using a dependency model on dimensions to half-circle around lower bounds we get with high dimensional analyses.  Our paper is in arXiv, joint work with Brano and Hoa. 

This work was motivated by collaboration with Adobe that needs to analyze high  dimensional web/ad analytics data. That is covered in the Ad Exchanger article, as part of the coverage of the university funding program that Anil Kamath spearheads at Adobe (and does a superb job of eliciting very specific projects but drawn from a wide set of areas).



Post a Comment

<< Home