Tuesday, July 31, 2007

Mihai Slams

I met Mihai at FUN 2003 in Isla de Elba. Since then, he has let his hair grow, got a bike, and has amassed enormous street cred, on and off the field of theoretical CS. Here is Mihai!

Saturday, July 28, 2007

No Names

I wonder how many apples fell on his head before he finally got it!

Math 10012

Subhash Khot, the quiet thinker, moves to Courant Institute, NYU, this fall and will join the faculty. With Assaf Naor already there, and the visitors at IAS this year (Luca, Venkat, Noga, Tali and many, many others) who will no doubt want to spend some time in the neighborhood, 10012 will be rocking, math-wise!

A Triptych of Trains

A train, belching smoke from the 50's, lures the siblings to a moment of avid distraction one afternoon in Ray's classic; A train spirits a child away set some time not the past, present or the future; A train, rattling and noisy, brings the french together, very much today, this moment.

The persistent fountain

Seated at the Father Demo square, the week's worth of newspapers on my lap, a cup of coffee in my hand, all of Saturday ahead, I watched the city come out of the early morning stupor, stalls being set up for the street fair, the steadily increasing flow of cars, half full tourist buses on prowl, the swirling paper plates left behind from the pizza last nite, the water fountain persists and in sporadic bursts of silence within the throb of the city, one can hear its splashing water.

Data Offensive

During meetings in universities, companies or with govt agencies, while developing products, writing grants or simply plotting papers, the question of "data" arises. Researchers will lament, "if only we had data", or some would say, "just give us data". Typically, these researchers approach the Pandora of "data" as a blackbox, as if, just opening it will solve their problems. I get reminded of a moment in a ridiculous movie (youtube version above, in french) when Number 5, the robot with incredible intelligence thanks to some accident, becomes a voracious reader of books, pursues books wherever it sees them, chanting in a trance, "data, data,..." Data is not a blackbox, database is not a blackbox, neither can be invoked to solve one's problem with a wave of the wand.

Defense of Data Analysis

Michael pushed, Suresh and Piotr pulled, the algorithmii in the northeast and beyond pored over data, and someone asked, why slice, dice, aggregate and group by. Is it actionable?

John Graunt wrote "Foundations of Vital Statistics" in 1661/2 (his notation), thus founding Statistics. He describes his data (birth/death announcements in Parish Hall records, starting circa Plague time), how they were collected (eg ringing of the bell brings the data collector to the site so they can determine the cause of death, published every thursday for a fee), his first analyses are simple groupbys (cause of death and number of fatalities due to each), and then he proceeds to observations, some simple, some that forces him to lament the lack of data and curiosities of data collection, and some elliptical, hanging without much support. The paper is remarkably modern (modulo the quirks of the language from 300 years ago), and culminates in the soul-searching data analysts do even now: "It may now be asked, to what purpose tends all this laborious buzzling and groping", and after aggressive answers, concludes with:

That a clear knowledge of all these particulars, and many more, whereat I have shot but at rovers, is necessary in order to good, certain, and easie Government, and even to balance Parties, and factions both in Church and State. But whether the knowledge thereof be necessary to many, or fit for others, then the Sovereign, and his chief Ministers, I leave to consideration.

Did Michael, Suresh and Piotr know their python scripts (?) will have political, religious and other implications?

Tuesday, July 24, 2007

Counters, metrics and milestones

You are on a treadmill, running, working out, and the display counters help, sometimes you target the next closest mile to motivate yourself, and then having reached that, use the calories counter to target the next milestone and then finally round up the overall minutes counter to be the target and eke out more from your body; if you are perverse, this will not be the end, and when you just hit one of the milestones, this or the other counter will be tauntingly close to the next marker you can so reach, if you only push yourself a little more....

That is the deal with the counters in academia, we have many (publications, awards, grants, salary, bonuses/raises, students graduated, committees, whatever). Each counter ticks and teases, presents a target, and you have to learn not to be perverse.

Sunday, July 22, 2007

Research problem in Streaming

Observe a stream of tuples (i,j) where the same tuple may be seen many times over the stream. For each i, define d_i to be the number of distinct j's in tuples (i,j). Define M_2= sum_{i=1}{n} d_i^2. Problem: Estimate M_2 in space polylogarithmic in n.

This is the generalization of the second frequency moment F_2 studied in the classical Alon-Matias-Szegedy paper and elsewhere. The best known space bound for M_2 is O(1/\eps^4 \sqrt{n} \log n) for 1+\eps approximation, while that for F_2 is O(1/\eps^2 \log n). Any improvements to M_2 estimation will be great!

Some weeks

Never been on a chariot, but seen them race in westerns and classics.
Some days I feel like I am racing a chariot, the right wheel splintering, the nuts and bolts loosening and giving away, and the rattling chariot beginning to disintegrate under me, and all the while, I am racing, watching with the corner of my eye for a boulder to slam into me. Luckily, days like that pass, and a new week emerges.


Rejection is part of life. Moma rejects Andy Warhol in a polite letter on the left, circa 1956. We have anecdotes of rejected conf papers that have come back to knock out our collective psyche. Curiously, the two examples I know are the papers in compression: Lempel-Ziv and Burrows-Wheeler.

Tuesday, July 17, 2007

Mountain Biking, Beaching and Research

In the flurry that was the last week, I got an iPhone. The pinch surface is fabulous; maps work superbly, notwithstanding the poor coverage for wireless data; YouTube works fine (only) with the built-in wifi.

I went mountain-biking in Montauk on Saturday. The curse of the researcher is that one finds analogy between research, ie., the journey to solve a problem, and whatever arduous journey one takes, physical or not. In The Zen and the Art of Motorcycle Maintenance, my guidebook for several months during a summer a long time ago, the author goes on a hike with his son, tussels for speed and stamina, and summits in his discussion of Philosophy even as the drama of the family unfolds. In mountain-biking, I am tempted to say as in my research, I dislike riding on flat portions: I like the climbs even if I am humiliated, and the downs, because it is exhilarating. You have to switch gears, break, twist, find the gaps, fly over tree trunks, barks, stones, sand whatever, and watch from the corner of the eye, the fleeing deer, wild turkey, the blue bay and the bluer ocean. I lost the rhythm for a tiny moment during the several hours of ride and gashed my arm, shoulder and leg.

On Sunday, I beached. Watched people step in towards the waves, step back, restep in, tease the waters of the Ocean and themselves, and finally work themselves to a spot where a confluence of waves splashes them fully, they emerge from under it, cough, call it even, and proceed to swim. When they leave, the waves pursue, chase them until they get out of its reach and they plunk down on beach towels, go to sleep, or in my case, read a research paper. Alas, I could not find "my working week and my Sunday rest", aka, NY Times.

Sunday, July 15, 2007

CPM + A Research Problem

I enjoy being at CPM (Combinatorial Pattern Matching) since I have a sense of being at home with friends; the conference more or less deals with string, array and tree matching, and applications.

CPM conf has been good at connecting with suitable
  • math (periodicity properties of strings, decomposition theorems),
  • data structures (dynamic, compressed, cache oblivious),
  • approximation/randomization algorithms (sequence comparison, string embeddings, Karp-Rabin fingerprints), and going out to find
  • applications (computational biology, compression, databases, information retrieval).

CPM has also been good at training some top class algorithmers; another niche area with this claim is Scheduling Theory. Still, CPM is less successful than it could become. Connection to more approx/random main stream algorithms, finding applications for the techniques in web retrieval, and perhaps collocation in the future with other more visible confs could bring more attention to CPM.

Instead of focusing on applications, I spoke about fundamental problems in string matching: extending non-standard string matching framework to develop a theory of alignments, 2 dimensional indexing, edit distance embeddings and others. Here is a research problem:

Input: Given k pairs of strings (fn_i, ln_i) of each length l. May be preprocessed.
Output: Given a query (p,q), find all i's such that fn_i contains p and ln_i contains q.
Target: With linear preprocessing we can answer the query in O(k+l) time by indexing each dimension separately and merging the two output lists. With O(kl^2) preprocessing, query time comes down to O(l + polylog (k)). What we need is a solution with better tradeoff between preprocessing time/space and query (same flavor as the vaunted nearest neighbors problem in high dimensions). Check out the reference (this is written in external memory parlance, but can be easily massaged to the standard RAM).

Smallest singular value of a random matrix.

Say A is an n X n random matrix, eg iid +/- 1. Its smallest singular value s_n(A)= inf_{x; x_2=1} Ax_2. A conjecture, derived from von Neumann, Smale and others, has been that s_n(A) \tilde n^{-1/2} whp. In an improvement to a recent result of Tao and Vu that proved n^{-B}, B not quite a 1/2, Rudelson and Vershynin finally nail it and prove the conjecture. Check out the paper and the talks. Roman Vershynin's talk was totally inspiring and the papers seem they must be studied, full frontal. The proof involves nice intuition from sparse representations and embeding into arithmetic progressions.

von Neumann Symposium at SLC

The von Neumann Symposium on Sparse Representation and High Dimensional Geometry was at Salt Lake City early last week; the background in the group photo looks like a painting.

I could speak about data streams (von Neumann defined computational models) or about game theory for search auctions (von Neumann did game theory too), but instead chose to talk mainly on low rank matrix approximations (I am sure he used matrices in his work). I now have a clean, simple story with proofs starting with approximate l_2 regression and building it to low rank matrix approximation and beyond.

People from math, signal processing, hardware and others, looking at different aspects of sparse representation problems. Rich Baraniuk gave a very engaging talk on using compressed sensing ideas to go from analog singals to information, and spoke about his one pixel camera, camera across large spectrum, etc. Cool! Emmanuel Candes gave a terrific talk, full of intuition on the structure of the LP that arises in compressed sensing. Roman gave a fantastic talk, more on that later. There was one communication theory talk I remember, that studied compressed sensing and quantization. In general, notwithstanding some beautiful math that was talked about, I thought there were a lot of people -- students, scientists and professors -- who did care about algorithms -- designing, implementing and using them---but had very little training in algorithm design techniques, which is interesting! Alas, I left after 2 days, I am sure there were many other great talks.

I planted the idea that we need to do a brainstorming meeting, gathering a few from network coding, communication & information theory, algorithms, and sparse representations, and trying to figure out what are the big open problems in transforming analog signals to information over networks. I have combinatorial versions of stochastic problems communication theorists study and think there is a nice, large problem lurking there, when merged with network coding and sparse representations.

Btw, why runs meetings at the Snowbird resort? It is very expensive and one feels trapped there and has to pay $3 for coffee or bagel (that is, exclusive OR).


A week off from blogging is serious inertia to overcome! I was thinking of startups. My high school friend Anurag Acharya (not this) had many questions and found most answers at Google. My college friend, the inimitable Sudeshna Sarkar, has her company, minekey. My grad school friend, the talented Ramesh Hariharan, has been successful for a few years with his Strand Genomics. India may yet generate companies based on ideas and technology, brain not keyboard brawn.

And then I was speaking with someone who asked me about our common friend's startup: are they doing cash oblivious things? :)

Saturday, July 07, 2007

Rap in the AM

Trying to catch up after a two-day hike at Mt. Washington, I was on the subway circa 4.30 AM on Friday, on my way to work. This young man sauntered down the platform, wearing a Tupac hood, dragging his feet in an elaborately stylistic gesture, and he was rapping, somewhat reflectively. When this visage of inner-city cool passed me, I heard his rap: "Peanut butter stuck in my hands, Jelly on my hood, Ketchup squirted on my face..." Even the street-cool have children, change diapers, and draw inspiration from everyday acts for their rap, find themselves sandwiched between R&B and P&J. In a curious way, his prose, poetry and singing tells the tale of travails and traces itself back to the American blues.

Btw, here is a distraction: the patent for sealed crustless sandwich, granted quite recently.

Sunday, July 01, 2007

Any given sunday

Well, it is not the season, but we can still keep connected. What will raise several thousand pounds of the defensive and offensive line of a football team and send them out to the field to fight for each inch? The voice of Al Pacino.