I gave a talk at the MineNet Workshop associated with SIGCOMM 2007 a few weeks ago at Pisa, Italy. Never been to a SIGCOMM, still good to see a lot of friends there, mainly from AT&T and Sprint networking teams, inimitable Bala included. A couple of algorithmii had sneaked in, Michael Mitzenmacher and John Byers.

I have seen many talks/papers which make data analysis into some sort of a lab experiment: "I collected this data, removed the records which were bad, then projected it onto these features, and did a mutual information-based plot, and voila, there is a pattern". As a database person, I find these less interesting than thinking about how to abstract these tasks into a few primitives that may help with a variety of analyses in different domains. So, I chose to talk about how to build the system infrastructure to support different data analyses people wish to do.

I used the examples of analyzing cellphone data, IP traffic data and web data, to show the different system infrastructures one needs to build and their uses.
I really enjoyed giving this talk. Anja Feldmann helped me hone these slides, thank you.


