Tuesday, June 25, 2013

Data Science Summit in SF

Aggregate Knowledge Inc (AK) and Foundation Capital (FC) ran a day  meeting on Data Science, focused on streaming. The program:
  • I spoke about the CountMin sketch and how it gets used in a variety of ways.
  • Sam Ritchie of Twitter spoke about SummingBird and some of their open source software on streaming Hadoop, including the CountMin sketch. In particular, I like some of the monoid primitives they offer: a quick combination of them will let one implement sketch of sketches and other high order aggregations on their base sketch implementations. Sam's talk was quintessential high quality Engg talk; sparse slides with a few lines of code or picture that brought to life a host of issues, design choices, solutions, and hid the hours of details and work behind the Engineering that led to the talk.
  • David Woodruff  of IBM Research and Alex Andoni of MSR spoke about theory results on matrix problems like regression and low rank approximation, and nearest neighbor search via LSH, respectively. There was a lot of interest in the audience on code for these results. There is some for LSH that Alex developed a while ago. Given how much we --- theory community --- have invested and improved approximate results for matrix problems, we really should put some code out there. This is a shout out to you Michael Mahoney! 
  • Armon Dadgar spoke about sketching at a startup Kiip. Check out GitHub for his code.
  • The duo of Blake Mizerany and Timon Karnezos gave a fascinating pair of talks about making research results into code. Blake from Heroku spoke about the challenges an Engineer faces while reading a research looking to convert that into code, and gave examples like unspecified parameters that come from proofs, reliance on black box of prior work, pseudocode that is NOT, etc. Timon in contrast spoke about how an Engineer SHOULD read a research paper before coding, and prescribed iterative reading, reaching out to authors, consulting prior or followup work, etc. Both the speakers gave great performances, Blake with a scintillating sense of timing in delivery and humor, Timon with a call to a higher sense of purpose invoking Engineers to go beyond code and launch, to develop a coding community around their product. I would recommend the duo to any research conference that wants to be reminded that the path from research to products goes through Engineers and they need to communicate better. 
  • Finally, Jeremie Lumbroso gave a heart-felt homage of a talk about Philippe Flajolet and his contributions to sublinear algorithms and their analyses.
  • There was panel comprising Pete Skomoroch of LinkedIn, Joseph Turian of MetaOptimize LLC, Rob Grzywinski of AK, and Ashu Garg of Foundation Capital. Discussions included how to motivate and recognize engineers as well as managers, difference between code by PhDs vs others, state of Big Data industry, etc.
The whole day had several lessons for me. First, I was surprised by how many folks implemented and used the CountMin sketch data structure. I usually know when this gets used in research publications and corporate research labs that has streaming/theory/db researchers, but there is a whole set of startup companies with very smart Engineers who find this data structure useful and use it. It was gratifying to  see for example Steven Noble whip out his laptop and show a dashboard using CountMin sketch to track certain heavy hitters at his company Stripe. Second, there is a community of Engineers who care about cool, powerful methods, want to code them and try things out,  but are not well served by even SODA or ALENEX. Finally, companies like AK are trying to seriously engage the community of researchers and engineers. I was impressed with the amount of work Rob and his team put into the AK blog so they can get their insights (and numbers!) out to the larger community.

Btw, the meeting was at the 111 Minna Gallery, so there was artistic fun as backdrop. Special thanks to Matt Curcio for conceiving off, executing and pulling off this meeting. You should invite him to give a talk if you want to hear about use of streaming algorithms AND/OR internet advertising systems.

3 Comments:

Anonymous Anonymous said...

Could you make your slides available?

4:48 PM  
Blogger Unknown said...

Thanks for the kind words, Muthu! I'm definitely tickled pink by how well it went. I think it should be a great launching point for a more regular conversations about these issues. (Indeed I find that regularity is often the biggest hurdle to gaining legitimacy and recognition in these things.)

It's informative to hear (read) your perspective on things like CountMin, that you've obviously moved well past in your career, but that seem to have had a recent resurgence. It makes me think about when and why and how to approach research that I once ruled out or that fell out of fashion. Of course, this is all in the service of gathering inspiration from old solutions to new, different problems, but I wonder if it constrains my thinking in a way too. I guess this is the fundamental problem of invention and borrowing, which probably won't be resolved in the comment field of a blog ;-)

In any case, thanks again for all your help!

9:44 AM  
Anonymous buy runescape gold said...

Of course, this is all in the support of gathering inspiration from old solutions to new, different problems, but I wonder if it constrains my considering in a way too. I think this is the important problem of advancement and credit score, which probably won't be resolved in the viewpoint place of a blog
buying rs gold

11:31 PM  

Post a Comment

<< Home