Wednesday, May 26, 2010

LaTex I need

A couple of things I need to do automatically in LaTeX (pointers appreciated):
  • Put an index from bib items to the text so I know where a paper is cited within the body of the paper. This will of course be a list per bib item.
  • Number each line of the body of the paper, so I can refer to (page no, line no), rather than say eg (3rd line of paragraph 5).
Well, I have been P-Cee-ing. For theory conferences, I like to read all the papers, no (or\epsilon) subrefereeing. I set up the pile, and frequently reorder so I switch topics (data structures, scheduling, auctions, string algorithms, ...) and mix it up with papers that are easy to read --- accept or reject, papers that one has to plow through because of the details, papers that need checking prior work, thin papers or thick appendices, native English or labored writing, ...


Saturday, May 22, 2010


  • I signed up to be a CI Fellows mentor this time. Deadline is Monday, May 24th.
  • I heard Benny Pinkas talk about some of his recent results. Starting from the simple task of backing up ones' files on the cloud (eg., Mozy, Dropbox), he motivated the security pitfalls of deduplication (cloud does not maintain multiple copies of the same file). Very interesting!
  • Consider the DJIA plunge a couple of weeks ago. What modeling, theoretical analysis will give us better insights into how financial market dynamics work?
  • Kishore Papineni told me about certainty equivalence principle in stochastic optimal control, conditions under which forecasting and optimization may be separated. Note to self: need to follow up!
  • And I heard a local wise researcher call someone the "go through" guy. :)


Friday, May 14, 2010

Collections, Memories and Friends

Friends, families, point out things, and they remain with you:
  • Sometime ago, I got a much-appreciated gift of 4-volume set World of Mathematics. A gem, with the original papers that have become folklore results (from Descartes, Galileo to Turing, Neumann and others) together with a thoughtful commentary of them.
  • Recently, I got an email pointer to this collection: A collection a day, a serene arrangement of a objects, drawn, photographed, imagined or painted.

Saturday, May 08, 2010

Subhash Slams

We all (should) know Subhash Khot won NSF's Alan T Waterman Award. Here is the NSF press release for the event where he was given the award, and video of an interview with Subhash.

I remember seeing this young Subhash in my office at Rutgers once when he was a student at Princeton and wondering what this ultra sharp and ultra nice researcher will produce in his life: we now have a few datapoints and they are amazing. Congratulations to Subhash!


Monday, May 03, 2010

On the Poetry of Data

Point out a cute observation about an individual in a store, and immediately the data miner wants to rearrange the store to optimize revenue, or the researcher wants to tap into the privacy angst. What happened to the subtle reliance on observational data to nudge a detective story, or for poetry?

Anyway, I was in CB2, a furnishing and home accessories store, and watched:
  • A young man with a shopping basket. In it were 2 wine glasses and 1 candle. A single man planning his evening, with the leisure of being far past the first date.
  • The woman measures the shelves, consults her notes, clucks that it might not fit. Her partner grabs books from the staged display, opens them to only find they are fake, but he keeps reaching for more. Definitely a researcher lost in CB2, far from his printed matter.
  • A man in chinos sits on an outdoors chair, leans back and stretches his legs. It is spring, the financier has his bonus and wants to update his roofdeck for summer parties.

One liners

The Onion: Bits of food somehow stuck in the iPhone keyboard.

A researcher: I am interested in the cloud.

Yet another researcher: You have to keep history, and that is hard.

Sunday, May 02, 2010

Report on WWW 2010 Conference

Here is some spotty reportage on the WWW2010 conf.

PC Chair Juliana Freire presented statistics, including tag clouds of words in the accepted abstracts as well as that for all the submitted abstracts, and pointed out that they had many similarities, so stringing a paper from these words should not help. Soumen Chakrabarti, the other PC Chair, who did real-time coding during the PC meeting to machine-analyze the papers, said that there was a lot of intersection across tracks, an argument that may point to less tracks-based silos in the future. He also said, "... it is your own submissions that determine the quality of papers in the conf." Ominous.

Vint Cerf
gave the first plenary talk. He has a good sense of humor ("Mayans must have known something because IP v4 addresses will run out on ...", IP-enabled Surfboard while you wait for the waves, How sensors can monitor wine in your cellar when you are away, etc.) After surveying the spread of Internet, he highlighted some work: Nick McKeown's flowrouter, Jeff Jonas's database work with DHS. He also mentioned some big research challenges: inter-cloud interaction (telnet between clouds?), 3d rendition in Cloud, how to archive applications and avoid bit rot (will windows 3000 interpret your 97 ppt file?), etc.

One of the other plenary talks was given by Danah Boyd, titled: Privacy and Publicity in the context of Big Data. I tend to use "Big Data"as in bad "Big Pharma", but her talk emphasized the badness of "Big Data Analyzers". Here is the full text of her talk. Her talk was polished, premeditated, and had plenty of pictures as well as clever phrases ("uncertainty principle" of social behavior analysis). She discussed issues in dealing with person-related data: sampling biases, using surrogate measures for the underlying phenom such as frequency of contact for meaningful tie strength, ethics, etc. Facebook, alas, was the focus for privacy challenges. She concluded: "Big Data is made of people", meaning, we have to apply people rules in dealing with them.

Some interesting conversations: Ravi Kumar referred to Berkeley as the "Theory City", rightly. We wondered if twitter has short forms for use with the sesquipedelian German language, Vanja verbalized a big challenge over lunch as automated data access across platforms beyond ec2 and Google Apps, and the advantages of systems like Hive. Pablo Rodriguez (his tweets here) and I discussed how to set metrics for innovation in academia and corporations. One of my theory coauthors from the far past told me, "we should try to get back together to think of some problems, sans students even", what a pleasant thought!