Saturday, February 25, 2012

On Softwaring...

Sometimes you produce algorithms that you believe should be useful, and then you might just go little further, and produce pseudocode or even code. You dont want to do this for all the theory results one produces, only for the ones you think this will be worth the effort. And then you do outreach: you have to take it to the applied communities that will care. Here are two examples.
  • Count-Min Sketch. Many pieces of code exist for count-min sketch, but we thought it will be worthwhile to draw attention of many software professional. So, we wrote an expository article in IEEE Software recently, in a special theme of "Algorithms for Today's Practitioners", edited by Giuseppe Prencipe, University of Pisa, Cesare Zavattari, CrowdEngineering, Alessandro Tommasi, CrowdEngineering, and John Favaro, Intecs SpA.
  • Approximating Fourier Transform. Over the past two decades, there have been several algorthms to approximate fourier representation of signals. I really liked this expository article, "A Tutorial on Fourier Sampling, How to apply it to problems" in IEEE Signal Processing Magazine, by Anna Gilbert, Martin Strauss and Joel Tropp who took our complicated algorithms and developed it into a workable psuedocode.
  • Going beyond the above, there is a very nice, recent, related work by Haitham Hassanieh, Piotr Indyk ,Dina Katabi, and Eric Price in which they improve FFT for k-sparse signals. This is big. See their page on sFFT for details. This paper got a lot of press (eg., IEEE Spectrum news, authors promise code soon), deservedly.


Friday, February 24, 2012

Scienceography Contd

The Scienceography paper will appear in FUN 2012. Since our scienceography paper studies among other things comments in the latex version of papers in arxivs and our paper was uploaded to arxivs, self referentiality demanded that of course we leave a sneaky comment in our paper! Those who discovered the comment and followed it through were part of an interesting experiment. :)
ps: People have suggested many things for us to study, thanks for the suggestions, we will follow up.


Sneaking into Cities

I sneaked into cities the past few days.
  • First there was Paris, I have never seen another culture so obviously and immediately focused on being creative or different. The undulating escalators and angled tube walkways of CDG is the first signal as you land in Paris.
  • Then, there was NY, where I thought I heard a man in a bar say when asked what he wanted in a city (I may have heard it wrong, but I like my version), "Nothing much, a decent newspaper, folks to make the news in it, and when I put my hand up at the street corner, there should be a taxi,"
  • Finally, there was Las Vegas, I came back with the stink of cigarettes from the carpets and not a mot in sight.

Tuesday, February 14, 2012

NITRD Symposium in DC

This week there is the invitation-only symposium in Washington, DC, that explores the accomplishments and prospects of the coordinated effort -- now referred to as the Federal Networking and Information Technology Research and Development (NITRD) Program, and involving 15 Federal agencies as full partners. The program seems interesting, and seems to represent a great group of agencies (national archives, yes!) with topics from privacy, network security, health, high performance computing and so on. Also, looks like there will be some focus on big data and "big ideas" for the future. Finally, it is all webcast live. Enjoy!

ps: Two decades of game changing breakthroughs in networking and information technology celebrated by this meeting didnt seem to have overlapped with Algorithms/Theory/Databases, may be the next two decades will be different?

pps: Btw, check out the invitation-only symposium, "Computing Research that Changed the World: Reflections and Perspectives," at the Library of Congress in 2009. Great speakers, and their talks are online.


On How Science gets Written

Here is a recent research. The starting point is the observation that arXiv stores papers in LaTeX form! So we crawl them and analyze what we can mine from LaTeX form --- the “source code” of scientific papers --- that is difficult or impossible to do with the "finished product" like PDF or PS (think "comments" for example. :)). Beyond this amusement, there is a bigger point: there is a large pipeline of how science is
  • conceived (apple hits Newton's head or Archimedes runs the streets of Syracusa: the stuff for science novels),
  • executed (experiments, proofs in cafes: stuff that movies show with soundtrack),
  • communicated (written, published, presented) and
  • ultimately impacts the world (citation analysis, analysis of social network of researchers: the stuff that holds academia together).
Traditionally, we have little visibility into large parts of this pipeline; our argument is that now with emerging large scale data, we can make study of this pipeline itself an object of principled study.

We focus on how science is written, an area we call Scienceography. Our study identifies broad patterns and trends in two example areas—computer science and mathematics—as well as highlights key differences in the way that science is written in these fields. (Party Puzzle: What math operator symbol is most commonly used in CS vs Math?). Comments/suggestions for other studies very welcome (yes, this is only the beginning). Also, serious caveat: this is a data analysis paper, don't expect well packaged principles or theorems.


Sunday, February 12, 2012

On Compression

Compression is a research area with many branches. I will focus on lossless compression; lossy compression has a huge history with its recent renaissance thanks to compressed sensing, not-too-recent resurgence due to wavelets for image and video standards, and other constant hums.

In lossless compression, there was the early pioneering work of Lempel-Ziv (LZ) in 70's. Since then there have been various extensions: Ziv worked on extending the"context" into various forms of universality; LZ method got applied to objects other than strings, from file versions to tables, graphs, web, grammars, etc; heuristics were developed using different chunking techniques or suitable dictionaries; and so on. But the general consensus when you spoke to people is that at the highest level, compression involved replacing copies of an object with references to a single copy, and you can perhaps eke out some small constant factor tradeoffs, but no further.

Are there major breakthroughs in compression? Doubtless one breakthrough has been the Burrows-Wheeler (BW) approach. The first time I saw it was in mid 90's and it was very unusual, almost magical. I couldnt tell how far this method will go. Now, not only is this a commonly used compression scheme based on BW (bzip), but we have clear theoretical analysis of its compression ratio which is asymptotically better than LZ, and interestingly, it has been used for indexing and made available in far more applications than LZ or simply compression. That is a good story of Algorithms in the Field.

I am working on next steps following the last year's workshop on Algorithms in the Field (8F), and to determine if there are other brewing new ideas in compression.


Saturday, February 11, 2012

Travel and Art

Here is the challenge: Visit all eleven Gagosian Gallery locations --- NY to Europe to Hong Kong and back via LA --- during the exhibition The Complete Spot Paintings 1986–2011 and receive a signed spot print by Damien Hirst, dedicated personally to you. My zany friend Gilles did this, go Gilles! What of Damien Hirst? New Yorker says, "Buying one, you can hang it on your wall like a framed diploma from Smartypants U."