Sunday, June 30, 2013

An Ending

Someone asked me if I had a story ending (not a beginning) that I havent filled out yet. Indeed I do:

"Now I have told you a story. It is a very simple story. A boy meets a girl, they fall in love, learn to smile, later their smiles dont hold, they part, splitting their friends between them. It was set in NY and they met over  a slice of pizza. Now why did I tell you the story.

Because my story is post-love,  it begins where you think the love and the story ended.  I swore the day we parted that I will remain in her life and not fail her, but I would be invisible. It has not been easy, I had to speak in a way she would know but not hear anything of the past or the future; I had to breathe in a way she would not feel lonely but she wouldnt look or find anybody near her. And I have managed to do that. I was with her when her boy was born and when she breathed her last. I watched over her baby when he was under observation in the hospital, and  I shut her eyelids before her family arrived at her deathbed yet no one remembers I was there or wonders why I was not there, I have simply filled the space like the air they breathe.

You wanted a story, I have told you one, but the post story is what I can not tell you, because it is every moment of my life. I have developed an eery stillness --- of mind, matter, and most things --- because of how I lived the past 3 decades. "

----------- New Version --------------
"Now I have told you a story. It is a very simple story. A boy meets a girl, they fall in love, learn to smile, later their smiles dont hold, they part, splitting their friends between them. It was set in NY and they met over a slice of pizza. "

His friend says, "Haruki, that is good. Last time I saw you was in high school. Now we are here at this Jazz club you run, we are sipping sake, and you have told me a story. It is a simple story. But I can tell you have become a storyteller. Now I have to leave. I know I will look in when I in town next time. But something tells me you wont be here. Stories once told, dont leave you. Love and lovers would, but not stories. See you around, Haruki."

Wednesday, June 26, 2013


In high school I painted portraits. These days, I am far from a brush or a pencil, but recently I got less than a minute at a console in the Children's Museum facing a mirror.

Tuesday, June 25, 2013

Data Science Summit in SF

Aggregate Knowledge Inc (AK) and Foundation Capital (FC) ran a day  meeting on Data Science, focused on streaming. The program:
  • I spoke about the CountMin sketch and how it gets used in a variety of ways.
  • Sam Ritchie of Twitter spoke about SummingBird and some of their open source software on streaming Hadoop, including the CountMin sketch. In particular, I like some of the monoid primitives they offer: a quick combination of them will let one implement sketch of sketches and other high order aggregations on their base sketch implementations. Sam's talk was quintessential high quality Engg talk; sparse slides with a few lines of code or picture that brought to life a host of issues, design choices, solutions, and hid the hours of details and work behind the Engineering that led to the talk.
  • David Woodruff  of IBM Research and Alex Andoni of MSR spoke about theory results on matrix problems like regression and low rank approximation, and nearest neighbor search via LSH, respectively. There was a lot of interest in the audience on code for these results. There is some for LSH that Alex developed a while ago. Given how much we --- theory community --- have invested and improved approximate results for matrix problems, we really should put some code out there. This is a shout out to you Michael Mahoney! 
  • Armon Dadgar spoke about sketching at a startup Kiip. Check out GitHub for his code.
  • The duo of Blake Mizerany and Timon Karnezos gave a fascinating pair of talks about making research results into code. Blake from Heroku spoke about the challenges an Engineer faces while reading a research looking to convert that into code, and gave examples like unspecified parameters that come from proofs, reliance on black box of prior work, pseudocode that is NOT, etc. Timon in contrast spoke about how an Engineer SHOULD read a research paper before coding, and prescribed iterative reading, reaching out to authors, consulting prior or followup work, etc. Both the speakers gave great performances, Blake with a scintillating sense of timing in delivery and humor, Timon with a call to a higher sense of purpose invoking Engineers to go beyond code and launch, to develop a coding community around their product. I would recommend the duo to any research conference that wants to be reminded that the path from research to products goes through Engineers and they need to communicate better. 
  • Finally, Jeremie Lumbroso gave a heart-felt homage of a talk about Philippe Flajolet and his contributions to sublinear algorithms and their analyses.
  • There was panel comprising Pete Skomoroch of LinkedIn, Joseph Turian of MetaOptimize LLC, Rob Grzywinski of AK, and Ashu Garg of Foundation Capital. Discussions included how to motivate and recognize engineers as well as managers, difference between code by PhDs vs others, state of Big Data industry, etc.
The whole day had several lessons for me. First, I was surprised by how many folks implemented and used the CountMin sketch data structure. I usually know when this gets used in research publications and corporate research labs that has streaming/theory/db researchers, but there is a whole set of startup companies with very smart Engineers who find this data structure useful and use it. It was gratifying to  see for example Steven Noble whip out his laptop and show a dashboard using CountMin sketch to track certain heavy hitters at his company Stripe. Second, there is a community of Engineers who care about cool, powerful methods, want to code them and try things out,  but are not well served by even SODA or ALENEX. Finally, companies like AK are trying to seriously engage the community of researchers and engineers. I was impressed with the amount of work Rob and his team put into the AK blog so they can get their insights (and numbers!) out to the larger community.

Btw, the meeting was at the 111 Minna Gallery, so there was artistic fun as backdrop. Special thanks to Matt Curcio for conceiving off, executing and pulling off this meeting. You should invite him to give a talk if you want to hear about use of streaming algorithms AND/OR internet advertising systems.

Past and Painting

Driving back from the Big Basin Park, I saw sunlight streaming through the Redwoods. The moment and angle have to be just right to catch a low morning Sun below the towering Redwoods. It reminded me when I was in my early teens and very far from this moment in space and time, I  painted something similar in a contest, including the depth of trees and curve of the road ahead. At that time, the only trees I knew were stumpy and sick, roads were  red mud. But on the day of the contest, I badly wanted to imagine a world where trees were outsized tall,  plenty, they would swallow me, and I would be lost. I didnt knew such trees existed. I won a prize at that contest and forgot that moment.

Monday, June 24, 2013

PODS 13 Big Data Panel

Dan Suciu and Chris Re ran a panel at PODS yesterday on Big Data. Their premise was that there was a need for Big Data from industry to government, many CS research communities were already engaged, and PODS community needs to get involved and develop theoretical underpinnings.

  • Joe Hellerstein went first and talked about core database perspective on Big Data, including synchronicity, distributed computing platforms, datalog specifications and separating computing and communication from declarative data management. He also spoke about his CALM conjecture in the context of consistency. Joe referred to himself as (really junior), and was as agile as you know him, mental and physical throughout the panel.
  • Carlos Guestrin went next and spoke about machine learning at scale, in particular, graphical models and learning. In particular, he spoke about GraphLab, graph based machine learning methods. He presented impressive results on running triangle counting for large graphs. He also emphasized the vertex centric view of computations for graphical machine learning and wondered about formulating the precise power of this approach in a logic language.
  • Sergei Vassilvitskii went next and spoke about the algorithmic perspective. While we have a bag of algorithmic tools for sequential algorithms (greedy, dyn pgm, LP rounding etc), we dont have significant tools for mapreduce++ algorithms. He posed coresets as a potential tools for the divide and conquer one needs with distributed machines. He spoke about applications in set cover, k-means and so on. Sergei was metaphorical, often referred to a donut and a cup (as the same thing), and I made a mental note to look for Krispy Kreme after the panel.
  • Jeff Ullman followed and spoke about certain basic mapreduce algorithms. He discussed in detail the result in VLDB13 on one round mapreduce with tradeoffs between reducer size and replication rate for the Hamming distance problem, and left open the problem(s) for larger number of rounds. The lower bound in the main result is reminisent of the comparison based sorting lower bound.
  • Andrew McCallum followed and spoke about information extraction from large corpus of academic research papers. One needs deep techniques to resolve the many conceptual problems that arise from text understanding to entity resolution, so the talk pointed to a variety of challenges in reasoning with probability and uncertainty, conditional random fields, etc. He also discussed their experiments with peer review process, as another front on improving the progress of science by focusing on the scientific community. 
  • I went last and mainly spoke about how Big Data is different from Massive Data because Big Data seems to deal with people. So, we need to accept that data is generated by strategic agents, query results is consumed by strategic agents which may ultimately affect if the database will get more, quality data or not, and we should draw a circle around data to include these aspects, ie., privacy, economics and game theory. Further, instead of BDDB that is general purpose, we could focus on Big Purpose databases.
The talks were followed by a panel discussion, which was lively. Tina Eliassi-Rad asked if we consider generative models of data, and Joe pointed out his 10 yr old paper that does it for the case of acquiring sensor data.  Christoph asked Carlos about the relative emphasis on ML vs DB in ML and DB conferences, and Carlos said each community need to be more like the other. Christophe also pointed out that DB folks may have to become knowledgeable about a lot of other areas in order to deal with Big Data problems. I asked the audience to think about whether the venture funds go towards Big Data or Big Data Applications. C. Mohan mentioned that from a recent facebook meeting, it seemed like there was significant VC activity in this area. Carlos and Joe are embarking on their adventures, thanks to VCs.

It is always great to see the audience in a database conference, Ron Fagin and Mihalis Yannakakis in the front row.

Friday, June 07, 2013

In NY Rain

A girl in short skirt, thick legs, and makeup with a heavy roller bag,
waits for the taxi at the corner, with her leather purse, long boots and umbrella.
This NYer, like others, is not going to let the rain change her plans or her outfit.

I had a quick lunch at La Maison Du Croque Monsieur, a homage to writers worldwide. An young one pointed to the typewirters and asked, "What is it?", and the answer was, "The great grandfather of an iPad."
Finally, here is what is going on in one place, one afternoon in NY. Shinsuke Ogawa's movie The Sea of Youth about the oppressive correspondence course education, "completed by selling off books and blood". Sound of silent music festival, of new music composition performed live to modern silent films from Scorcese to Gus Van Sant and Jan Svankmeier. And, Ogawa's movie Forest of Oppression from 1967 that "networked social movements and film fans across Japan to create an alternative distribution route" about phenomenon of students barricading themselves inside schools to various political ends, in the Takasaki City University of Economics". One Place, One Afternoon.