Friday, October 27, 2006

Reversals

Problem: Let S be a string of length n over alphabet set {1,...,k}. The operation one is allowed to do is to replace any X X^R by X, where X^R is the reverse of X. Determine the smallest resulting string by applying this operation any number of times.

Commentary: If k=2, this can be solved. Also, if the operation is to delete the entire X X^R, then the problem is solvable. The problem may belong to some general algebraic theory?

Monday, October 23, 2006

Antarctica, A Partial View

From my travel to Antarctica from a few months ago, here are some pictures. I have more detailed pictures of these and others, and would like to do an exhibition in the future.

There are K's of other pictures for me to sift through, select and project on to this page over time.

Saturday, October 21, 2006

Innovators

I went to the awards ceremony for R&D 100 2006. It is awards for top technology/engineering research. Many national research labs had winners, so did a few companies including big ones such as Toyota. My brother featured to the left was a winner too for some research I hope to understand some day. An algorithmus, Michael Bender, was also a winner, for his scheduling work done at Sandia National Labs.

The innovator of the year award went to Dean Kamen, among other things, the inventor of Segway. Dean knows how to get a hold on the audience and retain it. He described his program FIRST: a technology competition for high school kids, making technology as exciting as sports; it is run like a tournament in a superdome with cheerleaders I hear. He spoke passionately about using technology to solve biggest problems in the world such as the lack of clean water and electricity in third world countries. He is doing something about it.

ps: In the story of David and Goliath, Dean said the moral he learned was "Technology (i.e., slingshot) helps". For theorist, it is more like "changing the computational model" to beat the lower bound?

Tuesday, October 17, 2006

Infrastructure

I gave a talk at the MineNet Workshop associated with SIGCOMM 2007 a few weeks ago at Pisa, Italy. Never been to a SIGCOMM, still good to see a lot of friends there, mainly from AT&T and Sprint networking teams, inimitable Bala included. A couple of algorithmii had sneaked in, Michael Mitzenmacher and John Byers.

I have seen many talks/papers which make data analysis into some sort of a lab experiment: "I collected this data, removed the records which were bad, then projected it onto these features, and did a mutual information-based plot, and voila, there is a pattern". As a database person, I find these less interesting than thinking about how to abstract these tasks into a few primitives that may help with a variety of analyses in different domains. So, I chose to talk about how to build the system infrastructure to support different data analyses people wish to do.

I used the examples of analyzing cellphone data, IP traffic data and web data, to show the different system infrastructures one needs to build and their uses.
I really enjoyed giving this talk. Slides (sans paintings) are here. Anja Feldmann helped me hone these slides, thank you. More acknowledgements in the slides.

Sunday, October 15, 2006

49 and other squares

A couple of days ago, I saw the latest movie in the UP series: 49 Up. The UP series follows the lives of a fixed set of people, via a movie every 7 years. This "reality" series has successfully represented the characters and their evolution, together with the development of UK over the past 50 years. I was somewhat disappointed to see idealists, revolutionaries and the shy ones in their 20's, all turn more or less sentimental over their (grand)children in their late 40's. Relationships fall apart, people redefine themselves, and the 49 year olds look to the future through their children.

ps: My friend married the well-physiqued Mathematician who claimed his age was a perfect square and she mistakenly thought he was 36.

Compressed Sensing

There is some excitement about what is called Compressed Sensing (CS) in signal processing, harmonic analysis and applied math.

David Donoho came up with this question that since signals were going to be approximated anyway (using few coefficients in some suitable basis), can one just make a small number of measurements rather than measuring at every point? He showed how to do it. Did this have a precursor? Indeed it did, in a kingdom by the sea (gratuitous reference to Nabokov). Not an exact precursor, but some related results were known in learning theory, streaming algorithms, etc. Donoho's twist is interesting and has generated a lot of research the past few years. Check out the CS site for the early papers by Candes, Tao (yes, field medalist Terence Tao), Romberg, Ruselson, Vershynin et al, and the other papers since.

I was lucky to get to a meeting with Ingrid Daubechies, Ron DeVore, Anna Gilbert and Martin Strauss (thanks to a NSF FRG) where we tried to figure out CS and its relationship to other problems. In particular, I am indebted to Ron for gamely explaining elements of the CS conditions in Donoho's paper and the proofs, on the white board.

Recently, I went to the Allerton conference (my first, it IS in the middle of nowhere, but it was a reassuringly warm math conf) to speak about algorithmic issues in compressed sensing. I eked out some research time to write down a few results I have been mulling over, and some new research directions in compressed sensing (in particular, what I call functional compressed sensing; the distributed, continous version of compressed sensing and its intriguing connection to the classical Slepian-Wolf). Check out a copy of the paper here. I hope to refine this over time, so comments are welcome.

Tuesday, October 10, 2006

Cheap NY

Those who know me, know I prefer the 50c morning coffee from the sidewalk vendors in NYC. Micro living in the big city. But one can do better. Gray's Papaya on 8th st and 6th Avenue has morning coffee for only 25c from 6 AM to 11 AM (one per person only). Problem is one can not resist having two franks and a hearty papaya juice for \$2.99, no matter the time of the day. Sigh. From franks to wursts: Munich style beer garden, in micro scale, at Lederhosen, 39 Grove Street. Fantastic wursts and beer in klein, gross, mass, or pitcher.

Prime groups

Streaming algorithms use sublinear space to summarize a series of updates and answer queries on the input approximately. If the input consists of a series of inserts only, most problems that can be solved in this model such as heavy hitters and quantiles can be solved deterministically. If the input consists of both inserts and deletes, then most algorithms are randomized, in the spirit of the seminal Alon, Matias and Szegedy work of 1996.

About two years ago, I visited Leszek Gasieniec at Liverpool, and we worked out this simple scheme of grouping the input items modulo primes and keeping all such modulo groups. The scheme used consecutive primes in a suitable range, but the total number of such groups was still sublinear. We solved some niche problem in data streams in presence of both inserts and deletes.

Recently, Sumit Ganguly and Anirban Mazumdar made a nifty observation that the prime groups can in fact be used to answer point queries, that is to estimate any input approximately. This is a basic indexing query using which a number of other queries can be answered! A slew of results now follow that show that heavy hitters, quantiles, histograms, etc, can in fact be computed deterministically in presence of inserts and deletes. Their paper (with other results too) can be found here.

ps: When I gave a talk recently at the Allerton conference, Venkat Guruswami asked if a family of hash functions will do, in place of these prime groups. May well be the case. The proofs only seem to use the fact that the groups should separate any subset of k items, for a given k known a priori.

Sunday, October 01, 2006

Teaching

Teaching is a challenge. I think of the classroom as a performance space and expect the students to be actors too, not the audience. In the past few years I have developed a few tricks to create that dynamics, not always with full success. Giving puzzles in the class helps.

For graduate students, this may be math problems. They typically listen, register the problem, tilt their heads, scratch their chin or stare nowhere, scribble on the notepad, do whatever stance each individual adapts when they solve problems, and typically don't talk unless they have questions or ideas. This *is* being part of the performance that is the classroom.

The undergrads are a different challenge and math puzzle may not help. I typically use "lateral puzzles". These are descriptions of instances in everyday life; the instance sounds strange and inexplicable, but there is a rational explanation. The puzzle is to find the explanation. The solutions are more grey than black and white math puzzles. Further, students can not typically proceed to solve these puzzles without asking questions for more clarity which gets them to talk in the class, a big step forward. Finally, when you give math puzzles, during the course of the semester, one or two well-trained solve most of them; with lateral puzzles, any one can solve them even without training, and many of the students in the class end up solving a puzzle or two during a semester, which is cool.

ps: There are six eggs in the basket. Six people each take one of the eggs. How can it be that one egg is left in the basket?

pps: Once, a long time ago, I remember playing a game with cards that involved lateral puzzles with Peter Winkler, Joel Spencer, Robin Pemantle and others, but don't remember how the game ended.

Camels

Mike Paterson gave me this puzzle a long time ago (some of the fondest moments of the time I spent being sent to Coventry was solving puzzles with Mike during lazy afternoons). You have 3000 bananas and a camel at one end of the desert that is 1000 miles wide. The camel can carry at most 1000 bananas at any time. How many bananas can you get across on the camel's back if it eats one banana before each mile of the trek across the desert?

Most of us can come up with the "right" upper bound and be convinced we can't do better. Is there is a nifty way to show the lower bound without branching off into case analysis. Is there a nice potential function perhaps?

ps: I stated this puzzle *before* my talk at a meeting at IMA and alas, lost my audience who preferred to mull over the camel munching bananas rather than focus on random projections and pseudorandom number generation.

pps: I am sure there is a version of this puzzle with tanks and gallons of gas, but the bovine camel with bananas is more enticing.

Multiple-free

Let S = {1,2,...,n}. Define T_k to be a subset of S such that if i belongs to T_k, k*i does not belong to T_k. What is the largest size of T_k? It was (is) an exercise in some Discrete Math textbook to show that T_3 = 3n/4. The upper bound is a simple recursive construction giving T(n)= T(n/9)+ 2n/3; the lower bound will argue that any other "largest set" can be converted to the one given by this recursive construction without decreasing its size.

Say we extend this to look at subset T_{k,l} of S such that if i belongs to T_{k,l} then k*i and l*i do not belong to T_{k,l}, what is the largest size of T_{k,l}? In particular, T_{3,5} is a simple start.

Humiliation: I was a TA for a discrete math course a long time ago when a student
asked me about T_3. I approached it without much respect since it was an undergrad HW problem and guessed silly things like n/2, 2n/3 etc before slowly seeing that one could do better. Ultimately, I had to stand in the front of the board and scratch it out before the smug student.