Sunday, June 25, 2006


Seriously, what do we call a person who does Algorithms research? Algorithmicist is unwieldy. And rhymes with spooky things. I used Algorithmer. It did not get much support. And it rhymes with materials, which may also be undesirable.

The MMDS Workshop

I was at the workshop on Algorithms for Modern Massive Data Sets (MMDS), organized by Petros Drineas and Michael Mahoney with Gene Golub and Lek-Heng Kim, June 21--24, at Stanford U. (yeah, yeah, good weather, resort-like campus, students wearing rich clothes). The workshop was highly oversubscribed and somewhat dense with talks. Still, it is among one of the best workshops I have attended, with a terrific collection of speakers and lot of good talks on math---numerical analysis, linear algebra, algorithms---and applications.

- Diane Leary gave an overview of matrix decomposition methods, much appreciated! She also used the word "downdate" (as opposed to the positive "update"), therefore giving me the license to late use "postcursor" in my talk. She pointed out a mathematician with the carplate "Prof. SVD".
- Michael Mahoney described interesting way to apply row and column sampling methods for low rank approximation to a variety of problems from genetics, medical imaging and recommendation systems. Cool!
- Prabhakar Raghavan gave his talk from STOC. He did a very good job of not just representing Yahoo! but rather, presenting large areas for research in the domain of internet search, services and software. First, he spoke about Flickr and posed label inference problems on graphs. Second, he spoke about auctions for keyword searches on the internet. He mainly described the properties of Yahoo! and Google auctions, with bulk of the technical results from the paper Internet Advertising and Generalized Second Price Auction by Edelman, Ostrovsky and Schwarz. He mentioned that Google auctions were locally envy-free, but did not go further to describe other properties such as bidder-optimal (ie, total paid by advertisers is minimum among such equilibria). He mentioned interesting problem of determining the granularity of the market to bid. Third, he described Yahoo Answers, and spoke about a speculative p2p version of it from his paper on Query incentive networks with Kleinberg. There are a lot of interesting theoretical open problems here, and a lot of interesting mechanism design questions in general in the area. Organizationally, companies have traditionally had three different groups: research, development and marketing. Some of the companies are succeeding in merging research and development, not just in name. Prabhakar pointed out that in the future all the groups may have to be integrated since pricing/economics and marketing are inherently part of research and development of systems and services.
- There was a session of topology and learning. I finally understood a little about persistent homology that Edelsbrunner and others have worked on, and the excitement around it in the applied math community. A speaker in this session said, "So you go ahead and build the simplicial complex because it is a common thing to do." Smale said, "I have trouble telling the left from the right, or > from <."
- I spoke about algorithmic theory of sparse approximation problems with vectors and in particular, discussed Compressed Sensing problems. Since my talk was mainly on vectors and the audience was primed for matrices and tensors, they had to adjust to the notation. I was happy to hear from several applied mathematicians that they were surprised that even in the simplest vector context, some of the very basic problems were open from the point of view of algorithms and complexity. I was trying to communicate just that with for example, the nonuniform sparse approx problems.

Alas, I could not go to a lot of other talks. I had to work too. :) The organizers have set up a nice site with slides of the talks. I wish the organizers would write a report on this meeting for wide circulation, because it succeeded so well in bringing the applied math communities together. Finally, it was great to see so many students at this meeting! Bravo.


After 4 days at Palo Alto, I returned this AM to New York. Every return to NY needs a period of acclimatizing, more so after a red-eye flight. I went to work out, wading through a parade crowd that I did not parse in my sleepy state. But when I exited the gym, I was alive and my pores and eyes were open. I realized it was the pride parade. The normally theatrical village had an extra umph and the NY's Finest were there with the NY's Ordinary, watching the thunderous parade. World cup soccer be damned, the west village was loud and rowdy. Everyone (ignore the tourists, include children, bicycles and dogs) had a message---mild or buff, hidden or out in the open---on their belts, buttons, bras or briefs. I wanted to be part of the parade and part of the protest, but was unprepared. Somewhat deflated, I walked along the parade cheering when I could, and a couple of blocks later, I suddenly realized I was wearing a shirt that says "Dagegen!", a birthday gift from 1996, its ten-year old message still potent and simple (translation: Against!). Hurray to fortuitous moments of wearing old shirts.

Monday, June 19, 2006


Problem: Suppose you see a stream of n blue points and n red points on a line. The problem is to estimate the size of the best blue-red matching, where the size of the matching is the total length of the lines connecting the matched pair of blue-red points. As is standard in streaming algorithms, let us assume the points are from {1,..,U} and the space one is allowed is say polylog in U, 1/\eps, 1/\delta, where \eps is the approximation and \delta is the probability of error.

This is easy to do via L_1 embedding to get 1+\eps approximation. Piotr Indyk posed the open problem of doing this with points in two dimensions, and getting better than the \log U approximation.

Movies and Algorithms

At the workshop in Bertinoro last week, we had a few social dinners. At one of those dinners, I introduced the topic of coming up with titles for movies about Algorithms/TCS. Some suggestions:

The Story of O,
Prime and prejudice (or Prime and Punishment),
La Curriculum Vita (not to do with Algorithms per se, but Amit, you scored!),
Random Menace,
The World According to Karp,
Sliding Windows^{NT},
The O(1) Gardener.

The dinner conversation moved on to dialogs in movies and the table ended the conversation with: You want Knuth? You can not handle Knuth. Equivalently, You want the Truth Table? You can not handle the Truth Table. Sigh.

Space Consciousness

I was away for a week at the Workshop on Space-Conscious Algorithms at Bertinoro. The meeting brought together algorithmicists interested in streaming and cache-oblivious algorithms, and succinct data structures.

Phil Gibbons spoke about open problems, in particular, in modeling parallel, multi-level caches. He also mentioned that "systems research is about making the common cases faster" and sought algorithmic analyses that reflected it. Piotr Indyk spoke about new developments in approximate nearest neighbors, including random projections onto two and higher dimensional spaces (rather than the line as in many previous papers) and use of the Leech lattice. Martin Farach-Colton argued that cache-obliviousness really should be about disk-obliviousness. He also (tongue-in-cheek) suggested the need to forego "motivations" in theory papers that were mainly marketing and not reality. This released a lot of speakers who followed from the burden of explaining the motivation for their work! :) On the other hand, Andrew McGregor, in response to some adversarial ribbing about the random-order stream model, presented a very nice motivation for studying it, namely, that it fits neatly the study of statistical problems that take random samples of the underlying distribution. Frank Ruskey's subset encoding result was talked about in compression results.

Wolfgang Tyler of the European Patent & Trademarks Office gave a talk on building suffix arrays for searching full-text patent documents. He gave a nice motivation for going beyond reverse index with just the words to build a suffix array that looks at subwords: in German, words are concatenated to form larger words and during prior art search, one often searches for only parts of words. Martin Strauss spoke about his recent results in Compressed Sensing that improve the decoding time of the results of Donoho, Candes and Tao, Vershynin and Rudelphson, and others. Petros Drineas and Michael Mahoney spoke about my work with them on matrix low-rank approximations among other things; there is so much work in this area, just wait for their tutorial at VLDB and elsewhere. Alex Lopez-Ortiz spoke about a new performance measure that in theory showed the superiority of LRU for online caching which has been previously observed in practice: the proof technique of bisimulation seemed interesting.

Graham Cormode and Amit Chakrabarti presented streaming algorithms for biased quantiles and entropy estimation, and together with Andrew McGregor, ended up improving some of the bounds for the entropy estimation, I hear. Unfortunately I missed some of the other talks because of (a) interesting puzzles and (b) catching up with people.

There are a lot of reasons to go to these workshops (next one on Massive Data Sets, at Stanford this week). Phil Gibbons pointed out that I was a coauthor on so many results presented in the workshop, but I was busy learning about the more than the 50% of the results I did not already know! On a selfish note, I learnt that my O(n) time in-place integer sorting result from a couple of months ago may be of some interest after all.

Monday, June 05, 2006


There is something fascinating about a duel, be it with swords or words, out in the open or hidden within social exchanges. But a duel, even if fully witnessed, does not reveal the psyche of the duelists. Borges has a short story called Guayaquil that describes a duel between two scholars. Two scholars get chosen for a travel and a great honor. They meet and discuss. One emerges the victor and proceeds to travel, and the other abdicates. The story describes the entire meeting. The writing provides nuances, academic references and scholarly hints. Still, the reader does not understand why one abdicated. That is it. Now, embedded in the story is a Celtic legend of a duel between two bards in which one bard sings all day, accompanied by his harp and hands it over to the other when the stars and the moon come out; the other bard then lays the harp aside and stands up, a victor! Borges' story itself is the story of a yet another elusive duel. Gen. Jose de San Martin and Simon Bolivar meet at Guayaquil. At the end of the meeting, Gen Martin hands over his army to Bolivar who goes on to defeat the spaniards. What happened at the meeting? Perhaps the answer lies in a letter of Bolivar. The duel between Borges' scholars is for the honor of reading that letter. Borges uses this context to remind us of other duels, none fully explained.


The incomparable Beat Takeshi (aka Takeshi Kitano) taps more than one rhythm, both in his life and in his movies: Hanabi, Brother, Sonatine, Zatoichi, and others. He can combine animal heads and flower stems into beautiful paintings, and weave that into a story of a loving journey along the river or a sudden bloodbath amidst the snow. Now for something completely different! Here he taps out to Ken Shimura's lovely Shamisen.