Friday, September 27, 2013

Big Real Data

Big Data and Economics go together. Here are two unconnected fun links:
  • Here is a nice article on Susan Athey, Economist at Stanford and Microsoft, an article that is more insightful than most as a profile.
  • In practical terms, here is a nice blog on real estate in SF where Misha Weidman uses data and great visualization to bring some insight to the madness.

Monday, September 23, 2013

On Poverty

World Economics is in a strange place, a few doing very well and the 99% not sure they belong to the world around them.

A comment on Krugman's piece caught my attention:
I've got a big box super-saver type grocery in my neighborhood, the type place where most product is put on shelves in the shipping boxes with the top cut off, where you bag your own to save a few pennies. Low cost, low quality, big bulk packs of generic products. I frequently go by and the parking lot looks empty, except on the first of the month. It's packed with crowds of shoppers piling in. I was coming home from a night out last month when I saw the parking lot filled to capacity at 11 pm on a Saturday night. People milling around by the hundreds. Why? At midnight on the 1st of the month, their accounts would be credited with their monthly SNAP payments. And people could afford to buy their children food.

Life is hard in US. Elsewhere, in India, home to quarter of the world's hungry, the government has launched a program to give highly subsidized grains to nearly 70% of the country!  This has its  serious pros and cons. But on a lighter note, apparently there was an ad for this program that said, "No one should go to bed hungry", and the response from most folks, "what is a bed?"

Wednesday, September 18, 2013

NY is NYCE: Call for Participation

The 6th annual New York Computer Science and Economics Day (NYCE) will be held on November 1st at the Simons Foundation in New York City. The goal of NYCE Day is to bring together researchers in New York and surrounding areas who are interested in problems at the intersection of economics and computation.  Our invited speakers this year are 
  • Nicole Immorlica, 
  • Panos Ipeirotis, 
  • Christos Papadimitriou, and 
  • Rakesh Vohra
More details about the program can be found on the website:
If you plan to attend the workshop please register online before October 15th at Please note that the venue for NYCE 2013 has a limited space, and on-site registration may only be available on a (limited) first-come first-served basis.

The deadline for submissions for short talks and posters is October 8th. Topics of interest to the NYCE community include (but are not limited to) the economics of Internet activity such as search, user-generated content, or social networks; the economics of Internet advertising and marketing; the design and analysis of electronic markets; algorithmic game theory; mechanism design; and other subfields of algorithmic economics.  We welcome posters and short talks on theoretical, modeling, algorithmic, and empirical work. You can submit your abstract via our submission form at:


Beyond Streaming: Frugal streaming

In streaming, we have a nice model: compute things in one pass with polylog space. I have been thinking about how to go beyond, can some of these things be computed using much less space? I call this "frugal streaming". Qiang Ma, Mark Sandler and I wrote  a paper on frugal streaming for quantiles. I could reference the paper, but akblog has a great entry that not only describes the algorithms in the paper, but --- in their typical style --- also provides a web interface to simulate them! Very cool.


Tuesday, September 17, 2013

Nationalism Misplaced, in Comedy and Sports

I watched the 1965 movie "Those Magnificient Men in their Flying Machines". It is about the contest in early 1900's organized by a newspaper man to get pilots  to fly solo on state of the art airplanes (bizarre!) from London to Paris. It is a great engineering feat and the pilots do it, with usual twists in their tales. But the one aspect  I found interesting was  it focused on branding each pilot as caricature of their nationalist images. I am uncomfortable when nationalism is thrust forward, even when set in comedy.  In a gag,  "Irina Demick plays a series of flirts who are all pursued by the French pilot. First, she is Brigitte (French), then Marlene (German), Ingrid (Swedish), Fran├žoise (Belgian), Yvette (Bulgarian), and finally, Betty (British)."

Nationalism misplaced is nationalism in Tennis. We see individual tennis players play, say, Wimbledon. They dont represent their nations like in Davis Cup. There are no limits on how many players from any given country can play in a Wimbledon. Yet, commentators routinely refer to the Swiss and the Spaniard. I cringe when I hear that. Why not just say Nadal and Federer or  even Rafi or Fed-ex (for Federer).

Saturday, September 14, 2013

Suffix Trees Trimmed

The Suffix Tree is a beautiful data structure. It encodes O(n^2) total size of substrings of a string of length n in an O(n) sized tree, and can be constructed in O(n) time. It was invented by Peter Wiener in 1973 and is one of the earliest, nontrivial sub-sorting time algorithms. Now there are half a dozen or more linear time algorithms for constructing suffix trees including the big breakthrough of Martin Farach Colton's in mid 90's that this could be done even for integer sized alphabets.

Combinatorial Pattern Matching (CPM) 2013 celebrated the 40th anniversary of this remarkable invention.  Amazing galaxy of speakers --- Vaughan Pratt, Peter Weiner, Ed McCreight, Martin Farach-Colton, Jacob Ziv --- onstage!  Check out the video channel of talks.  Incredible window into the history of (string) algorithms, including a referee's perspective. Of course, Vaughan Pratt is an extraordinary referee. See more details about the event here. There is a research paper on Forty Years of Text Indexing, in the proceedings.


CFP for WWW 2014, Internet Monetization Track

WWW 2014  April 7-11, Seoul, Korea 
Abstract deadline: Oct 1, 2013  Paper submission deadline: Oct 8, 2013
Internet economics and monetization Track

The Web has become as a major economic phenomenon, serving both as a new platform for traditional transactions such as business-to-business and business-to-consumer commerce, as well as an arena for a variety of new economic activities ranging from online advertising, digital payment systems, bandwidth provisioning, peer-to-peer lending, and crowdsourcing.

The WWW track on Internet Economics and Monetization is a forum for theoretical and applied research related to the modeling, analysis, and design of web-specific economic activities and incentive systems, and computational advertising and ad selection for all available online advertising formats. The track will be interdisciplinary in nature, welcoming any research related to economic aspects of the Web.

Topics: Relevant topics include (but are not limited to) :
  • Internet auctions, markets, and exchanges
  • Computational advertising: Sponsored search, content match, graphical ads delivery, targeting
  • Machine learning and data mining in the context of Internet monetization
  • Economics aspects of online reviews, reputations, and ratings
  • Monetizing digital media, user generated content, and the social web
  • User-experience design aspects of Web monetization mechanisms
  • Web analytics for e-commerce
  • Economics of data and digital goods
  • Incentives in crowdsourcing and human computation
  • Advertising infrastructure: tools, platforms, networks, exchanges, automation, audience intelligence
  • Economic approaches to spam/fraud control
  • E-commerce issues in cloud computing and and Web apps
  • Mobile web advertising and location-based e-commerce
  • Decision-theoretic and game-theoretic modeling of online behavior
  • Empirical and theoretical analysis of online labor markets

Track chairs:  Arpita Ghosh, Cornell University and Vanja Josifovski, Google 


Wednesday, September 04, 2013

Simons Big Data Boot Camp

I am in a boot camp, but this is at Simons, Berkeley, so more sandals than shoes.

Dick Karp and Alistair Sinclair started off the Big Data program this Fall by setting the meeting in context, both the broader one in terms of Simons' effort to support theory (Dick) and in narrower context of the Big Data program of bringing researchers from many different areas together via series of tutorials during this boot camp (Alistair).  Michael Jordan, while being on a sabbatical across the Atlantic, managed to pull off organizing not only this boot camp, but also the entire program. Amazing! He has pulled together an enormous trove of talented researchers and everyone I talked to was pumped up and felt inspired to get something done this semester.

The scientific program started with Michael Jordan's two parts talk, one a breezy overview of Stats,CS and the interface, and the other, a dive into couple of results. This was phenomenal. Deep insights in simple sentences.

  • Big data he said was data at great granularity: say, data about lot of people, with models needed per person. While with more of traditional resources (time/space/energy) one can do more in CS, it is not clear more data is necessarily better --- because more columns means  more hypotheses --- if one is more ambitious than getting only the mean. He contrasted the confirmatory (model and fit) vs exploratory (collect and mine) approach to Big Data. He stated the ultimate goal: given inferential goal and computational bound, provide a procedure with guarantees, quality of inference need to increase monotonically as data accrue. It is not even clear such procedures exist across goals. He then contrasted frequentist (fixed model parameter, average over data) vs Bayesian (fixed data, average over model parameter) approaches and discussed sampling, consistency, convergence rates, etc. 
  • In the second half, he spoke about computing vs statistics tradeoff that alas, I sorta spaced out, and then he described his recent work I have followed for a year, on Bag of Bootstraps (BoB) which is very interesting. BoB looks at statistical problems with a large support and is a very general estimation procedure that breaks the problem down into a small number of problems with \sqrt{n} like support in 2 levels. There are a bunch of TCS type research problems of interest to me re. BoB that I hope to follow up. 
The audience questions included,  role of dimensions d vs data size n,   lower bounds for estimators for risk/bias/variance, etc.

Joel Tropp, in a superhuman effort, spoke for 3 hours in one afternoon. He is a deliberate and precise speaker. He started with drawing the large arc of random matrices, from early days to now. Random matrix analyses are hard. He presented two simple, abstract principles (break random matrix into sum of independent ones and use exponential tailbounds on spectral properties) that he posited will be sufficient to do many of the random matrix analyses, if not to the finest bounds, good enough and more doable. He gave a few examples and then spent the rest of the lecture presenting matrix versions of bunch of inequalities that we know from non-matrix context (Bernstein, Markov, etc): these get remarkably subtle in form and proof. While many of us in TCS have worked with inequalities like this for specific problems we wished to solve and prove them as needed in the form we like, Joel has done a splendid job of abstracting these into well framed tools and polished/refined/applied them. Focusing only on the expository potential of this work, this is akin to what Spencer/Alon book/lecture notes does for the probabilistic method with the traditional tools. You can check out the video to see his highly polished, superb performance.

ps: It was good to see Josh Bloom again. He is a man unconsciously full of remarkable quotes. He said, during our chat about the community of astronomers, "Astronomy is tiny".