Two hats from Twitter
People observe twitter for various reasons. I recently observed that in the research world, twitter, because of its policies that enable easy data access, has a great mindshare (just search for twi in ICWSM 2011). This blog is about two talks on twitter, by Ashish Goel, with two different hats worn comfortably.
Talk 1 was at Workshop on Ad Auctions on Monetization. He started with the observation that twitter supports both highly individualistic view so you can follow an individual or the aggregated view via tags and trends of many. He spoke about 3 products: promoted accounts (on the right, to follow), tweets in search (on top) and trends (on the right, via hashtags). The guiding philosophy with promotions is that they should conform to user's expectations of tweets and have follow-on effect. Ex, promoted trend leads to conversations, promoted accounts continue to deliver future impressions, promoted tweets lead to retweets, etc. Ashish released some impressive numbers on how these efforts were going, and few examples.
He then spoke about challenges. On the strategic side, there is need for novel promotion ideas. Twitter is not about time spent on the site, so these should be beyond display ads, and encourage twitter experience and follow-on. On the algorithm side, challenge is to work with a short window of time/data, and estimate one's current interest as well as estimate future follow-on effects of promotions, machine learning at the speed of thoughts/tweets. On the auction side, the challenge is to design pricing and allocation for extremes like promoted trends which has to reach many, and promoted accounts that are highly liquid. This talk was structured perfectly, with an overview of the products, main philosophy shown through the interaction between these products, specific numbers and examples, followed by what the audience wanted, namely, technical challenges. Nearly everyone in the audience appreciated one slide, where Ashish gave examples of suggestions from the research community that will NOT be valuable, these are throwoff ideas like, use keywords in tweets to target ads or show promotions in timeline or offer coupons, etc. Chances are twitter which has smart full time folks thinking about these issues would presumably have analyzed the many immediate suggestions already.
Talk 2 was at SIGMOD, where Ashish gave the keynote industry talk, on data models and management issues at Twitter. He started with the mission statement: instantly connect people everywhere to what's meaningful to them.
Problem 1: How to deliver tweets (several k's a sec, each going to more than avg degree of users, and has to be aggregated into timeline). A natual data model will compile tweets into tuples and apply map and reduce (MR) operations. However one needs some sort of continuous mapreduce (MR) system to be real time, but such systems are mainly in academic stage and robust versions have to be designed and made available at large. You could also discover duplicate tweets with a combination of continuous MR and locality-sensitive hashing (see some inspiration here). Problem 2: Search, real time relevance ranking of users and tweets, personalized. In the future, social graph and resonance scores. Some discussion here. Problem 3: Incremental Pagerank, maintain pagerank as edges, nodes arrive. He referenced the VLDB paper with roughly O(nlog n) solution where n is number of nodes, and m, the number of edges has minimal role, improving on the trivial O(nm).
The second data model Ashish explored for these problems was active DHTs (distributed hash tables with user-specified code) which are like continuous MR but with reducers able to communicate with each other. He then discussed maintaining suitable random walks on social graph via active DHTs (see here for db details) and its uses from incremental to personal pagerank. The third model Ashish spoke about was semi-streaming where there is memory to store row-IDs and some properties, but not all matrix entries. This seemed more suitable than "clever" graph partitioning because social graphs seem to be expanders that change rapidly. Many recommendations problems will be of interest in semi-streaming. He summarized with the excitement of mix of database and algorithmic challenges at Twitter.
Summary: These two talks, juxtaposed in time, were among the most informative and insightful I have heard of a tech business, able to move between sound business and solid technical concepts as needed, a rare and impressive achievement for a theory researcher. Ashish is one of those speakers who you appreciate when you hear their arguments, but with every question you ask, he peels more technical layers, and you appreciate how much more he knows and how much more depth there is, in other words, his talk is more than the total of his slides.