VLDB 2011 Notes
I managed to make it to VLDB at Seattle. The travel this time --- from SFO to SEA --- was a treat, with a taxi ride with Raghu, discussing what each of us were looking forward to learn at VLDB. On Tuesday, I got there in time for the receptions, the first hosted by Microsoft in the evening, and the second, hosted by Google, later. The party continued later, later. I stayed at Mayflower Park, an old fashioned hotel with, as is standard, grand lobby, a highly helpful bartender at the bar in the corner, staff in glittering buttons, and small rooms.
On Wednesday, David Campbell gave a plenary talk. He said that there was data ambient, models and modeling methods rare, and one could be very creative with questions. He used the example of ``digital shoebox'', dropping in all the data one could collect say with their smartphone, and interesting analyses you could do with the shoebox ("can you figure out how much time you spend in Florida in a year for tax purpose."). One in the audience asked, "what about digital shipping container?", that is, when you get data from many, will methods scale? He also spoke about the lifecycle of a query, how to validate data, hypotheses, etc. Audience asked about DB concerns, what happens when models change, patterns mature, how to admin the DB.
I attended the panel on maximizing impact with Ed Lazowska, David DeWitt, Juliana Freire, Ed Lazowska, Sam Madden, and Jennifer Widom.
- Ed set up the discussion with pointing to the article of Bob Lucky about the diminishing influence of electrical engineers, and the iconic last engineer on the planet. Is that where database engineers are, even though this is the decade of data? He also pointed the audience to kaggle for data analysis competitions.
- Sam did the crowdpleasing exercise of wordling VLDB paper titles for the past three decades and no one resisted the temptation to draw conclusions. Then he did a crowdsourcing exercise of (a) getting list of top 10 impactful ideas/achievements in database research in the past 10 years, and (b) top 10 new ideas for the future. Stream processing made top 4 in (a), yesss! In (b), there were several suggestions, including "data crumbs" by Alon Halevy, of anlayzing data we leave behind during our trail on the web.
- Jennifer then led the discussion, focusing more on education and visibility. Why is AI or machine learning more popular than databases research among incoming student in this decade of data? The explanation was technical (students dont know real world and dont see the need for DB until much later) to marketing (students see driver-less car as AI, that is great image; in contrast, as Natasha put it, the DB product feels like a screwdriver or laptop, not necessarily fun), self-reflecting (shouldnt we take responsibility to go from data analysis ideas to eventual studies and code in an applied area) and educational (better examples than the employee/student example in text books). Joe Hellerstein, who has the incredible ability to capture and keep the intellectual attention of the crowd, kept the conversation focused on what DB community has to offer: getting inspiring ideas from other communities, doing DB magic of declarative languages, parallelization tools, etc. He also called for a simple language to teach DB basics and early programming in undergraduate days.
- David steered the conversation towards the positives (DB students are finding jobs, startups companies are making deals, 50B database community is humming, etc) and lamented the challenges of having impact in practice, even in Industry.
Had terrific conversations with Amr, Divy, Hector, Laks, Joe H, and Surajit about research, and with Amol and Jignesh, about challenges of mentoring students.