Tuesday, December 26, 2006
Monday, December 25, 2006
Seats, permutations and (re)arrangements
We need something similar for hotel rooms too, where my fading cheat-sheets are not triumphant. Best views?, far from the elevators or the vending machines and the associated noise?, weird layout?, used to be smoking ones?, smelly carpet?, working wifi? ... This will avoid the series of room re-assignments I go through when I check in.
Things you learn during travel
Sunday, December 24, 2006
Compressed Sensing in Media, Contd.
Saturday, December 23, 2006
Pens in India
Notes from India
- Stray dogs have vicious faces and maligned bodies, and on close inspection, are unique in the scars and persona they carry. I recently checked in to a hotel that gave me a standalone, rotunda of a room, with a flat roof. Before entering the room, I noticed a stray dog on the roof and spent the night on a bed listening to the pacing dog. I was glad at least for a night the dog was kept off the streets.
- Some of the reasons why people ask to change seats with me when I am traveling alone and have just settled into extra-leg room, aisle seat on an airplane: "I am 6 ft tall and can't fit my legs in this space". "I am claustrophobic."
Friday, December 22, 2006
Sugarcane
Sampling, Sketching or One Following the Other
I have a way to formalize this question as a theory problem. Consider each item and ask: should I skip this item, or should I sketch it? If this can be decided quickly (ie., o(sketching time)), and we can still guarantee approximation of desired functions, we have a win. The results I have with my coauthors Supratik Bhattacharyya, Andre Madeira and Tao Ye are that this is possible for many functions. There are two tricks:
- you have to use the norm of the skipped part to make this decision, and
- you need to estimate the norm of the skipped portion quickly ie in o(sketching time) per item. this is somewhat circular?
The approach provides a way to actually design nontrivial algorithms and prove theorems about approximations. For example, we get a factor 4+ eps approximation for estimating F_2 and get constant factor speedup over best known F_2 estimation on streams (which gets 1+eps approximation). Improvements to this will be cool. Our paper has other results, theoretical and applied. In the applied world, one of the biggest win we get is in finding heavy-hitters within contents of IP packets, where this method allows us to skip entire packets without processing them. Better understanding of this can help tremendously where people have to resort to specialized hardware to solve this problem.
Disclaimer: I have some randomized skipping ideas that gives better results than ones in this paper for some cases, but more on that later.
Thursday, December 14, 2006
Continuous Communication Complexity
Notes: One should allow Carole to approximate |A_t \union B_t|. This seems related to Slepian-Wolf where A and B independently communicate their individual streams to C and the goal is to make use of mutual information in their streams. Things get tricky when we allow A, B or C use models to predict how A_t, B_t vary over time t.
Refs: There has been a few papers recently in the database community that give upper bounds on the number of bits communicated for different problems. It is called distributed, continuous streaming. There is a survey and a tutorial by Graham Cormode and Minos Garofalakis. What we need is a good theory and a semblance of lower bounds.
Probabilistic Data Streams
This is a nice abstract model to study a bunch of problems that arise: sometimes input data stream is probabilistic, some times the data stream is probabilistic because the sensors that do the measurement introduce error and noise, some times the input stream is deterministic but we derive stream from the input by say tagging each item with a bunch of values it may possibly represent, and the result is a probabilistic stream.
A fundamental problem here is, are there any issues beyond just instantiating a bunch of deterministic streams by instantiating each item of the data stream with the given pdf? Other problems include: can we take into account some concentration property of the individual pdfs to get some smooth complexity for problems that goes from deterministic streams to probabilistic streams? is there a suitable model of smooth pdfs for say geometric data streams with interesting algorithmic problems?
Sunday, December 10, 2006
Ranking CS Conferences.
Amuse: Saw a reference to *Googol* Page Rank system in the best practices memo at the CRA site.
Saturday, December 09, 2006
Sporting, Sporty, Sports.
Saturday AM
... Pack up the moon and dismantle the sun;
Pour away the ocean and sweep up the wood ....
mood, wrapping up the year. Alas, travel, reimbursements, holiday shopping and other chores loom. But for now I can take an AM off, read, lounge or be silly. I managed to read all of an article in the New Yorker on the growing community of Somalis in Lewiston, Maine, a fascinating account of somalis finding a new life ("secondary migration" in technical parlance, the primary migration is when the federal govertment settles them, in Atlanta, in this case, and then the population flows elsewhere), discovering the lines that separate them from the local people and rediscovering the lines that separate them among themselves, along their clan lines and away from the Bantus with their legacy of slavery.
... I thought that "this break" would last forever: I was wrong
I must get back to work and life I suppose. Just one more pointer: Shouts and Murmurs, in NYer, discusses Excessive Article Disorder, the tendency of many (me too) to put "the" wherever. the dance, the work, the Google, ...
Sunday, December 03, 2006
Lost in Movies
ED
What's your axe?
FRED
Tenor... Tenor saxophone. Do you...
ED
(shakes his head)
Tone deaf.