Monday, August 27, 2007
May be it is being away from WSJ and being amidst the TV coverage of world athetics championships, I was reminded of a soaring outside the stock market: the sprinting, propping oneself up, powering past the pole, twisting, turning and flipping over the bar, and the joyous descent down, the wild celebration afterwards, and the world record: Yelena Isinbayeva, world record in pole vault, 5.01 m, from 2005. Enjoy.
Sunday, August 26, 2007
Hypothetical Center for Algorithm Engineering
I was thinking the other day: we have great examples of centers for theory/math (IAS, DIMACS, IMA, IPAM), but what would be an interesting center for algorithm engineering? Do we need one, and assuming we did, how would we organize it? Say we do. Here is a potential organization: a few resident professors with responsibility, some visitors, dozens of well-paid, top class research programmers.
- Top algorithms researchers visit the center for say 3 or 6 months at a time, and spend time interacting with the research programmers to implement a set of algorithms for a problem. This would include "their" algorithms of course, but ideally, would force the researchers to spend time understanding other competing algorithms and working with the research programmers to find the best implementations. It will also help the researchers learn a few tricks how to build programs in practice and may be ego soothing (my algorithm triumphs) or ego seething (my algorithm is no better than others').
- Graduating PhD students who don't want research career, but want to do more thinking than one finds in average software engineer job with a software company, may find useful careers as research programmers.
PhD Defenses
I have been thinking PhD defenses recently. Two of my students (Vlad, Irina) finished theirs and another (Yihua) will finish within a month. They will work in corporate labs (AT&T, Google, Google, resp).
At Copenhagen, I was on the PhD Committee when Philip Bille defended his thesis the past Thursday. Philip worked on hard problems (regular expression matching, tree matching and others) and eked out interesting improvements. In particular, he got improved space bounds in cases where dynamic programming has been the norm. It is reassuring that there are classical problems still lingering out there and that PhD students can, do and will study them.
Regex matching: we should nail down a faster algorithm for regex matching, a faster approximate algorithm, or simply show some hardness results since general lower bounds are too much to hope for? Testing regular languages is doable, but we don't have even remotely interesting approximate algorithms. Maybe show a reduction from matrix multiplication to regular expression recognition with a suitable encoding?
At Copenhagen, I was on the PhD Committee when Philip Bille defended his thesis the past Thursday. Philip worked on hard problems (regular expression matching, tree matching and others) and eked out interesting improvements. In particular, he got improved space bounds in cases where dynamic programming has been the norm. It is reassuring that there are classical problems still lingering out there and that PhD students can, do and will study them.
Regex matching: we should nail down a faster algorithm for regex matching, a faster approximate algorithm, or simply show some hardness results since general lower bounds are too much to hope for? Testing regular languages is doable, but we don't have even remotely interesting approximate algorithms. Maybe show a reduction from matrix multiplication to regular expression recognition with a suitable encoding?
Friday, August 24, 2007
Open Problem: Graph traversal in small memory
We have a graph G and a pool of k locations. Each location can hold one node of the graph. We say that an edge is evaluated if both its endpoints reside in the pool. In order to evaluate all the edges of $G$, one has to take turns and move nodes in to the pool whenever needed. The cost of an evaluation is the total number of times nodes are loaded on to the pool. The problem is find an evaluation of minimum cost. Approximation algorithms are ok.
MAD ALGO morning
I am at MADALGO inauguration. Jeff Vitter gave a nice overview of external memory algorithms, using a card trick to illustrate the Gilbreath principle. Charles Leiserson spoke about Cache-Oblivious algorithms, somewhat obliquely, by running a Jeopardy style quiz show on facts and factoids of CO algorithms. I am glad to report that the US team of Jeff Vitter and Mike Goodrich won over Danish and German teams. The clincher question was, how many hits are shown in Google for CO algorithms? Hint: order related to Pi. The answer they gave was like 227k, but I get a different answer now.
I spoke about streaming algorithms, but dispensed with standard stuff in the first half, and mainly focused on distributed streaming and its relationship to MapReduce, continuous communication complexity problems in sensor streams and analyzing blog streams. I hope the audience discover(ed) new research problems. Peter Sanders spoke about challenges in algorithm engineering, an independently funded effort in Germany, with significant goals; among other things, he did a terrific job of emphasizing the need to implement competitors' algorithms with at least as much zeal (if not more) as one implements their own algorithms.
It is great that EU is willing to invest in foundational algorithms through MADALGO. Lars Arge who heads the center appeared on local newspaper. Apparently an earlier version of the article mentioned that he was an expert on logarithms. (al- dropped).
I spoke about streaming algorithms, but dispensed with standard stuff in the first half, and mainly focused on distributed streaming and its relationship to MapReduce, continuous communication complexity problems in sensor streams and analyzing blog streams. I hope the audience discover(ed) new research problems. Peter Sanders spoke about challenges in algorithm engineering, an independently funded effort in Germany, with significant goals; among other things, he did a terrific job of emphasizing the need to implement competitors' algorithms with at least as much zeal (if not more) as one implements their own algorithms.
It is great that EU is willing to invest in foundational algorithms through MADALGO. Lars Arge who heads the center appeared on local newspaper. Apparently an earlier version of the article mentioned that he was an expert on logarithms. (al- dropped).
Wednesday, August 22, 2007
Saturday, August 18, 2007
Response, on the side
I planned this post 2 days ago, but on catching up with the state of the world, emails and blogs after a week, it would seem like a response, however subtle, to Michael' s posting.
On Thursday, 7 of the many research interns at Google NY gave talks on their summer research. The talks were on auction and mechanism design, algorithms and optimization. Petros Drineas, Dana Ron, Ronitt Rubinfeld, Mike Saks, Rocco Servedio, Avi Wigderson, and Moti Yung came to listen to some of the talks. The results will make their way to publications over the next few months, the phenomenon and impact will linger longer and will play out over a longer period, 20+ years or not.
Johnson-Lindenstrauss, updated
Jiri Matousek has a nice writeup with a variant of JL that I find useful. In standard JL, generating a k X n matrix T of random, independent \pm 1 suffices to map n points in n dimensions to points in k dimensions and preserve Euclidean distances upto 1\pm \eps, where k is about 1/\eps^2 \log n. This results in a dense T, and the quest has been to sparsify it. Ailon and Chazelle made terrific progress, and Matousek has a version that is interesting. Generate T by picking each entry to be
- +q^{-1/2} with prob q/2,
- -q^{-1/2} with prob q/2 and
- 0 with prob 1-q.
Prizes
More than a decade ago, when theoretical computer scientists romped on to Computational Biology (now Bioinformatics?), a questions was, will a computer scientist eventually get a Nobel Prize via Biology? Maybe there is a Nobel prize for having sequenced the human genome, or may be, the work is still something out there, yet to be done.
The past few years have seen theoretical computer scientists rummage through mechanism design and internet economics. Will one of the computer scientists eventually get a Nobel prize, this time via Economics? May be there is a Nobel prize for having created and sustaining an entirely new market, eg. sponsored search, or maybe the work is still something out there., yet to be done.
Personally I am not focused on prizes, but sometimes questions like the ones above sustain a research activity, a funding goal or quite simply, dinner conversations.
The past few years have seen theoretical computer scientists rummage through mechanism design and internet economics. Will one of the computer scientists eventually get a Nobel prize, this time via Economics? May be there is a Nobel prize for having created and sustaining an entirely new market, eg. sponsored search, or maybe the work is still something out there., yet to be done.
Personally I am not focused on prizes, but sometimes questions like the ones above sustain a research activity, a funding goal or quite simply, dinner conversations.
Europe, UK Included.
Everyone complains about things in this country when they travel here (lack of good bread or good coffee or whatever). I thought I will start whining about travel "over there", for now, Europe, UK included.
- When will they learn to stand in one line, no matter how many servers? There must be a queueing theory result that picking from multiple lines using bizarre calculations and angsting about the relative progress rates of different lines is inefficient.
- Hotel: Need towels, big, fluffy and plenty please. Need showers where I can move comfortably and not have to turn around in-place like a rotisserie.
- More sensitive BBC please. They had a show that "explored" the topic of how asians in UK had to be coconuts, brown outside and white inside. As is typical of BBC, the issue is phrased as a discussion exploring social trends, but ends up asking a guy on the show if he was a coconut!
Me, I am brown outside, black, white, yellow and even brown inside.
Timing in Travels
I always tend to think I am 30 sec too early or 30 sec too late in most aspects of my life, or may be it is a month, year, whatever. But when I travel, I seem to get it just right.
I was in Zurich on a spur of a moment thing last weekend, and that happened to coincide with the annual Street Parade, a Berlinesque techno-love thing (more techno, less love), or New Yorkesque Pride (less pride, more alcohol on the street) events. What looked like 100 thousand or more are on the street and they shed clothes, show skin or fading tattoo or the bleeding piercing, acquire fur, color face and hair, tort, twist, transcend the days, grow impossibly tall on stilts, or whatever. The ultra clean city of Zurich flexes and for a day, gathers beer cans, wrappers and flyers without a sigh, and time slows down notwithstanding the numerous public clocks. The trick in such festivals is to get the energy just taut, the buzz grown to a point, a bow pulled to its extreme, a gun loaded and cocked, hand on the trigger, and of course, hold that as long as one can, don't let the buzz go over, the arrow fly or the gun explode.
A year ago (?), I was in Pisa, Italy for a day, and coincidentally, that was the festival of the Saint Patron of Pisa in Italy. As the night settles, electric lights get shut down, and candles on windows of mansions along the Arno river get lit up, fireworks explode and the Italian style shows itself in stunning men and women, young and not-so young, strutting. There is more to the festival I am told, but I am happy that for a few moment, I felt Italian, strutting on the Lungarni.
I was in Zurich on a spur of a moment thing last weekend, and that happened to coincide with the annual Street Parade, a Berlinesque techno-love thing (more techno, less love), or New Yorkesque Pride (less pride, more alcohol on the street) events. What looked like 100 thousand or more are on the street and they shed clothes, show skin or fading tattoo or the bleeding piercing, acquire fur, color face and hair, tort, twist, transcend the days, grow impossibly tall on stilts, or whatever. The ultra clean city of Zurich flexes and for a day, gathers beer cans, wrappers and flyers without a sigh, and time slows down notwithstanding the numerous public clocks. The trick in such festivals is to get the energy just taut, the buzz grown to a point, a bow pulled to its extreme, a gun loaded and cocked, hand on the trigger, and of course, hold that as long as one can, don't let the buzz go over, the arrow fly or the gun explode.
A year ago (?), I was in Pisa, Italy for a day, and coincidentally, that was the festival of the Saint Patron of Pisa in Italy. As the night settles, electric lights get shut down, and candles on windows of mansions along the Arno river get lit up, fireworks explode and the Italian style shows itself in stunning men and women, young and not-so young, strutting. There is more to the festival I am told, but I am happy that for a few moment, I felt Italian, strutting on the Lungarni.
Top X
There are top 10 lists, top 5's and other top X's. I don't read Communications of ACM magazine in general, but a tip led me to take a look. The Aug 2007 issue lists top 10 downloaded papers from ACM Digital Library on Page 100, and at the top of this list is a paper of mine. No, I can not think of a single good explanation. There must be tons of data quality problems somewhere. I will go back to not looking at these lists.
Friday, August 17, 2007
Graph Labeling
In applied algorithms, occasionally you strike gold --- formulation of a nice abstract problem: some of the times, people manage to solve it and it becomes a mini-industry (ex: packet classification in IP networks), other times, the problems haunt you.
About 10 years ago, I (and am sure many others before and since) struck a problem: Given a graph G=(V,E) with a subset of nodes labeled from label set L, determine a suitable labeling for the remaining nodes.
The nodes may be say telephone numbers, the edges the calls between them, the labels say the monthly bill on some of the customers, and the question is, what can you infer about the monthly bill of the others. As technology has evolved, the problem continues to matter. The nodes may be IP addresses, webpages, blog nodes, and others; the labels may be the age of the blogger, how spammy a page, etc.
10 years ago was the time of HITS and PageRank and I thought of simple iterative algorithms. Since then, the problem has been studied using techniques of machine learning (ex), stochastic relational labeling (ex), partitioning on random graph models (ex), decoding (ex), etc. No matter the approach, the problem is still fluttering, with lots to do.
About 10 years ago, I (and am sure many others before and since) struck a problem: Given a graph G=(V,E) with a subset of nodes labeled from label set L, determine a suitable labeling for the remaining nodes.
The nodes may be say telephone numbers, the edges the calls between them, the labels say the monthly bill on some of the customers, and the question is, what can you infer about the monthly bill of the others. As technology has evolved, the problem continues to matter. The nodes may be IP addresses, webpages, blog nodes, and others; the labels may be the age of the blogger, how spammy a page, etc.
10 years ago was the time of HITS and PageRank and I thought of simple iterative algorithms. Since then, the problem has been studied using techniques of machine learning (ex), stochastic relational labeling (ex), partitioning on random graph models (ex), decoding (ex), etc. No matter the approach, the problem is still fluttering, with lots to do.
Thursday, August 16, 2007
Bhangra, Haka and Capoeira
To "relax" after a long day, week, the summer of 07, I went to catch some Bhangra. It is a remarkly celebrative dance from Punjab (India+Pakistan) that more or less puts men and women on the same level --- the men still tend to lead with their shoulders, the women with their hips, and the subcontinent-ians with their head movements as usual --- but these are second order effects on the central thema of rhythmic hopping that is focused on legs, that swiftly moves from wild bhangra to fierce haka to Ultimate Capoeira.
Tuesday, August 14, 2007
Technological Message
At the movies recently, a cellphone company advertised in a holier-than-rest attitude and asked me to power off my cellphone before the feature presentation. I hate advertisements before movies (not so with previews, which I like, or even better, a couple of shorts, to showcase indies). In this case, the advertisement was superfluous in any case since the signal does not work within the theater for the customers of that cellphone company! (check out the youtube feature at the end.)
Friday, August 10, 2007
Reductions
We have all heard of the mathematician who poured out the water from the tea kettle, to reduce the problem of making tea starting from a filled kettle to one starting from an empty kettle that was already known to be solvable. Jokes aside, I really like reductions, they tie one problem within, by, to, over another.
There is a COLT 07 paper by Balcan, Bansal, Beygelzimer, Coppersmith, Langford, and Sorkin that shows how to reduce ranking (outputting the ranked order of entities) to classification (putting them into classes). Both problems are seen as machine learning problems, so one has to argue about generalization error, regret, AUC, or other suitable measures. This paper looks like a nice start (does one get useful algorithms? The algorithm seems to be to rank nodes by wins after running a tournament), the area sorely needs more work.
Dan Spielman, ever the thinker, after a day of talking about what are theoretical, fundamental problems in networking ("theory of networking" rather than "theory of computing"), suggested reductions. Could we take some fundamental networking problems such as convergence of asynchronous, distributed routing protocols and build some structure of relative hardness amongst them, via reductions?
I guess what I am saying is that reductions have already been used in theory of computing very nicely. time to use them in theory of machine learning, networking, games, whatever.
There is a COLT 07 paper by Balcan, Bansal, Beygelzimer, Coppersmith, Langford, and Sorkin that shows how to reduce ranking (outputting the ranked order of entities) to classification (putting them into classes). Both problems are seen as machine learning problems, so one has to argue about generalization error, regret, AUC, or other suitable measures. This paper looks like a nice start (does one get useful algorithms? The algorithm seems to be to rank nodes by wins after running a tournament), the area sorely needs more work.
Dan Spielman, ever the thinker, after a day of talking about what are theoretical, fundamental problems in networking ("theory of networking" rather than "theory of computing"), suggested reductions. Could we take some fundamental networking problems such as convergence of asynchronous, distributed routing protocols and build some structure of relative hardness amongst them, via reductions?
I guess what I am saying is that reductions have already been used in theory of computing very nicely. time to use them in theory of machine learning, networking, games, whatever.
Saturday, August 04, 2007
Saturday
Week days are overwhelming, and fly past in a whirl. Saturday is the time for the first sigh of freedom. So, one paints, conversates, musics, whatever.
Brooklyn Museum's free series on first saturday of every month is "one of those things New Yorkers always talk about attending, and when they finally get around to it, they can't believe they waited so long" (NYT). Today there is a Caribbean dance party after 9 PM, with bachata and zouk, and the movie "When the spirits dance mambo" about a Cuban tradition. Bachateros, enjoy!
Scrabblers
Amit Chakrabarti came to visit and reminded me that everyone belongs to a cult, or two. Amit, the ever scrabbler (Google search suggests "scrambler"), reveals the tendency among scrabblers to construct anagrams of names of the people they meet. Also, Adam Klivans visits, and reveals himself to be a (travelling) scrabbler, trapped by Jon Feldman to play. Some like words strung together, others like them in symmetric boxes, and yet others, seem to like them quixotic (336 points) in squares with points.