Sunday, October 31, 2010

A Walk

On a down Sunday, I skipped work and went for a short walk. Saw a poem spread out over a block, giant graffiti, h'weeners waiting for the parade to begin, and the Soho store of blurb, the make your own book co. Now I can get back to work.

Thursday, October 28, 2010

Visiting Columbia IEOR Dept

Researchers of interest to us are hidden in many depts in a typical university (CS, IS, OR, Business, EE, whatever), some even 100 city blocks uptown. I visited the Industrial Engineering and Operations Research dept at Columbia U, and presented an hour long talk on Ad Exchanges. I bring a CS/Algorithms perspective to problems, but the researchers in OR/Stochastic optimization have different perspective that will be particularly useful for problems in AdX and I look forward to learning from them.
  • Nicolas Stier: Spoke about nonlinear optimization with complements and relationship to transportation systems; endogenizing smooth cost functions a la Klemperer's work on supply function equilibria; principle agents problem and contract theory for employer-employee relationship; models for net neutrality.
  • Mariana Olvera, Assaf Zeevi and Ciamac C. Moallemi, over lunch. Relative sizes of display markets.
  • Santiago Balseiro: Stochastic control modeling of AdX optimization problems.
  • Mariana Olvera: Stochastic recursion and explanations for power law distribution of PageRank based on indegrees. Mariana noticed that short ads are repeated sometimes to fill in long slots, ad scheduling can do better.
  • Vineet Goyal: Stochastic versions of query optimization problems. Electricity markets.
  • Chris Wiggins: Applied mathematician, physicist, machine learner, computational biologist and more. Among other things, we managed to catch up on Mark Hansen's sabbatical at NY Times R&D and art installations.
  • Jay Sethuraman: Reminisced about a SODA talk long time ago on combinatorial approach to computing determinants.
  • Garud Iyengar: He pointed out that AdX standardizes the goods that are traded in spot auctions, and we discussed how to standardize goods to trade in futures market and pricing challenges.
  • Garud Iyengar and Victoria Stodden ( a CS PhD + law, combining sparse approximation and legal/policy aspects of data, both of great interest to me): Dinner conversation on challenges with NSF's new data sharing policy.
Full day even without meeting my CS colleagues.


Wednesday, October 27, 2010


NY Area CS and Economics day was on Friday, Oct 15th at the NY Academy of Sciences. Organizers Christian Borgs, Michael Kearns, Sebastien Lahaie and Vahab Mirrokni put together a great program. The Academy and in particular, Jamie Kass, did a fabulous job of coordinating and hosting the meeting. There were 136 attendees. Besides researchers from local neighborhood as far away as Chicago, IL and Redmond, WA, there were preregistered attendees from Opera Solutions, Deutsche Bank, Benedetto Laskay LLC, Knight Equity Markets LP, Analytical Synthesis, Mayera Software Solutions, Delta Vega LLC, SpeakToMe Inc., Bloomberg LP etc.

Mihalis Yannakakis gave an excellent overarching talk on games from existence of equilibrium, uniqueness/multiplicity/selection of equilibrium, dynamics and convergence to computational issues of finding them. He started with the simple example of neural networks in which each node takes \pm 1, and a node is stable if it agrees with weighted majority of neighbors (edges have weights.) Hopfield 82 shows that all undirected networks have one or more stable configurations. Further, asynchronous dynamics in which nodes individually adjust their label eventually reaches stable config that can be seen using the potential \sum_over_edges w(u,v)s(u)s(v). So, stable config = local optima under node flips. This motivates class PLS: poly local search from JPY85. Neural network stable config finding is PLS-complete. Dynamics takes exponential time for any pivoting (local improvement) rule. [SY] Further, PLS can't be NP hard unless NP=co-NP. This is a game in which stable config == pure nash equilibria. The second example was congestion games. strategy: subset of resources; cost is sum of all costs of resources in a strategy; cost of a resource is number of players using it. Rosenthal: all congestion games have one or more pure equilibria. Again potential argument. There are games in PLS, but don't know if they are in P. Simple stochastic games due to Condon. 2 player, random max/min game. Equilibrium is a pair of strategies that are best responses to each other, value of the game = prob (one wins). There is pure stationary dist. Open: is it in P? The next example: every finite game has mixed nash eq.; proof uses Brouwer's FPT. Fixed points == mixed equilibria. Message is that many games => FPT. Basically solve x=F(x). Challenge is computation: continuous prob, irrational numbers in solutions. Class FIXP [EY]: problems cast as FPT for Brouwer functions computed in poly number of steps in a restricted computing model. 3-player nash is FIXP complete. tech: equate FP to nash equilibrium by translation+scaling only. It is in PSPACE, proof uses existential theory of reals. Lower bound: NP? open. Related to square-root-sum problem, also power of unit-cost arithmetic model (even with exp word size). We know unit cost integer arith model == PSPACE. For P-time piecewise linear functions => there exists rational FP. Can be achieved by not allowing mult, div == linear FIXP == PPAD. Simple stochastic game, market eq with separable piecewise concave utilities, etc. are examples. For market dynamics, complexity of price adjustment mechanisms -> PY10. Bunch of prob, math understood, not algorithmics. (During question time: q [cole]: neural networks, weights should have large number of bits? Y: natural dynamics will converge in time proportional to weights. q: will synchronic updates work with random updates for neural networks? Y: yes. will break cycles. exp lower bound holds for all of them, no matter what. speed of convergence of MDP open for 25 years, exponential lower bound.) Mihalis is a Master, gliding easily between algorithms, structural complexity and game theory in his presentation and answers to questions!

Sham Kakade spoke about optimal dynamic mechanisms. The problem is, seller wants maximum revenue, they can do whatever. Ignore incentives => dynamic stochastic optimization problem. Static => Myerson. Dynamic: maximize social efficiency has been considered by Parkes+Singh, Eso and Szentes? The talk was about dynamic mechanisms that maximize revenue. Membership mechanism: buyer picks price p, pays membership M(p), then gets all items at price p. Not direct mechanism, Buyer still faces hard problem because the value may change with time. Privacy: only scalar values revealed by Buyer, so has some privacy. q: explicit construction of M(p)? yes. delicate. reduces to dynamic VCG! M(p) somehow translates revenue to efficiency. In general, designing optimal dynamic mechanism is a hard problem, Sham obviously has a feel for the nuances of the problem, and presented a lot of insights. Robert Almgren gave a nice talk about ranking portfolios. There are multiple rankings. There exists a set of linear inequalities to express a return vector r. There is a cone of satisfiable r. Compare portfolios 1: w_1 > w_2 for all possible r's. Then, portfolios not always comparable. Compare portfolios 2: w_1 > w_2 for more r's than the other, so, compare in measures. All performances are in ratios, so even if different units along directions, doesn't matter. Use centroids of cones to rank. This talk generated a lot of questions, including one by Boaz Barak exploring the robustness of these measures when one splits a portfolio into smaller units.

From Vahab and Sebastien:
The rump session featured short talks spanning many different areas including auction design, computational game theory, social networks, and recommendation systems. Azarakhsh Malekian talked about computing Walrasian equilibria with general non-quasi-linear utility functions; Vasilis Gkatzelis talked about coordination mechanisms for machine scheduling; Gagan Goel talked about revenue maximization in TV ad auctions with submodular utility functions; Sharad Goel talked about learning social influence functions; Xi Chen talked about computing market equilibria in the presence of network externalities; Sanmay Das discussed experimental design to compare prediction market structures; Bhaskar DasGupta talked about stochastic budget optimization in ad auctions; Mohammad Irfan talked about influence games over social networks; Beibei Li presented a personalized hotel recommendation system; Tanmoy Chakraborty discussed market making in mean reversion price models; and Sven Seuken talked about user interface design for markets.

Bobby Kleinberg presented work on modifying algorithms to make them incentive-compatible in Bayesian multi-parameter settings. He argued that in many situations, we may come up with a sophisticated allocation algorithm to maximize social welfare, and we would like to convert our algorithm to a mechanism with the right incentives and the same social welfare. If we insist on dominant strategy equilibria, this is not possible, but if we settle with Bayes-Nash equilibria, it might be possible. Bobby described joint work with Jason Hartline and Azaraksh Malekian in which they give a way to convert any approximation algorithm for maximizing social welfare to an incentive-compatible mechanism with the same approximation factor under Bayes-Nash equilibrium. He described how they built on earlier work of Hartline and Lucier for the single-parameter setting, and gave an approximate scheme for the multi-parameter setting. He had this take-away message: "Converting approximation algorithms to incentive compatible-mechanisms might be easier than we thought, if we are happy with Bayes-Nash equilibrium as a solution concept."

Hal Varian gave a two-part talk about two cute problems in ad auctions and general mechanism design. First he talked about reformulating the standard sponsored search auction so that in the event of a click, other than charging the clicked advertiser the next bid, the auctioneer pays a fixed amount to the bidders who are ranked higher than the advertiser. This turns out to be an ingenious way to implement the VCG payment rule, and removes the motivation for underbidding as advertisers get paid by clicks from advertisers below them. In the second half, Hal discussed optimal prior-free auctions and the idea of computing an optimal price based on the bids of other users. This was inspired by the auction by the work of Baliga and Vohra. Through a simple family of instances, Hal argued that this can give a simple prior-free auction that incurs the right incentives and optimal revenue in most scenarios.


Sunday, October 17, 2010


NIPS 2010 Workshop on Machine Learning in Online ADvertising (MLOAD), December 10, 2010, Whistler, B.C. Canada

IMPORTANT DATES: Submission deadline: Oct. 23, 2010 Notification of Acceptance: Nov. 11, 2010 Camera ready: Nov. 22, 2010 Workshop Date: Dec. 10, 2010.

Online advertising, a form of advertising that utilizes the Internet and World Wide Web to deliver marketing messages and attract customers, has seen exponential growth since its inception over 15 years ago, resulting in a $65 billion market worldwide in 2008; it has been pivotal to the success of the World Wide Web. This success has arisen largely from the transformation of the advertising industry from a low-tech, human intensive, “Mad Men” (ref. HBO TV Series) way of doing work (that were common place for much of the 20th century and the early days of online advertising) to highly optimized, mathematical, machine learning-centric processes (some of which have been adapted from Wall Street) that form the backbone of many current online advertising systems.

The dramatic growth of online advertising poses great challenges to the machine learning research community and calls for new technologies to be developed. Online advertising is a complex problem, especially from machine learning point of view. It contains multiple parties (i.e., advertisers, users, publishers, and ad platforms such as ad exchanges), which interact with each other harmoniously but exhibit a conflict of interest when it comes to risk and revenue objectives. It is highly dynamic in terms of the rapid change of user information needs, non-stationary bids of advertisers, and the frequent modifications of ads campaigns. It is very large scale, with billions of keywords, tens of millions of ads, billions of users, millions of advertisers where events such as clicks and actions can be extremely rare. In addition, the field lies at intersection of machine learning, economics, optimization, distributed systems and information science all very advanced and complex fields in their own right. For such a complex problem, conventional machine learning technologies and evaluation methodologies are not be sufficient, and the development of new algorithms and theories is sorely needed.

The goal of this workshop is to overview the state of the art in online advertising, and to discuss future directions and challenges in research and development, from a machine learning point of view. We expect the workshop to help develop a community of researchers who are interested in this area, and yield future collaboration and exchanges.

Saturday, October 16, 2010

Ad Exchange (AdX) Update

Well, it has been a year+ since DoubleClick Ad Exchange launched. Here is an update: it made the top 10 for 2010 digital hotlist. Others on the list include facebook, twitter, zynga, iPad... Understandably, AdX is down in the list in this great company (just made the 10th). Go AdX! Here is the entry:

Advertisers are abandoning traditional buying models on the Web in favor of buying specific audiences all over the Internet using a wealth of data. Nearly every ad agency worth its salt has established a trading desk to purchase this highly targeted ad inventory at scale in real time. Ad exchanges are at the heart of this macro trend, with Google's DoubleClick Ad Exchange having emerged as the clear leader. The exchange, which rolled out last fall, trades inventory from 50-plus ad networks that represent thousands of Web publishers large and small. Naturally, individual publishers also come to the exchange to directly unload leftover avails. Per Google, those publishers enjoy prices for their ad space that are an average of 130 percent higher than those achieved by ad networks. As inventory becomes more abundant across the Web, the importance of exchanges will only expand. Google's competition in the space will grow too, as independent exchanges like OpenX gain traction.

• Hundreds of thousands of advertisers
• Number of transactions tripled over past year
• Average CPM lift of 136 percent compared with standard ad inventory, per Google research

Congratulations to the team. Here is a link to all my previous posts on AdX.


Wednesday, October 13, 2010

NY Area Meetings

NY Area Computer Science and Economics III will meet this Friday, Oct 15th at NY Academy of Sciences. From what I sense of the buzz, I expect a very good audience.

NY theory day will be on Nov 12. Great set of speakers!

New York Area Theory Day. Organized by: IBM/NYU/Columbia. External sponsorship by: Google. Friday, November 12, 2010

The Theory Day will be held at Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, Auditorium 109, New York.

9:30 - 10:00 Coffee and bagels
10:00 - 10:55 Prof. Boaz Barak. Subexponential Algorithms for Unique Games and Related Problems
10:55 - 11:05 Short break
11:05 - 12:00 Dr. Matthew Andrews. Edge-Disjoint Paths via Raecke Decompositions
12:00 - 2:00 Lunch break
2:00 - 2:55 Prof. Ryan O'Donnell. Optimal Lower Bounds for Locality Sensitive Hashing (except when q is tiny)
2:55 - 3:15 Coffee break
3:15 - 4:10 Prof. Toniann Pitassi. Pan Privacy and Differentially Private Communication Complexity

Organizers: Yevgeniy Dodis, Tal Malkin, Tal Rabin, Baruch Schieber


Tuesday, October 05, 2010

Romance Leads to Insights

So, there is a pattern.
  • In "A Beautiful Mind", John Nash is shown having his key insight for game theory pop up when discussing dating. "If we all go for the blonde and block each other, not a single one of us is going to get her. So then we go for her friends, but they will all give us the cold shoulder because no on likes to be second choice. But what if none of us goes for the blonde? We won't get in each other's way and we won't insult the other girls. It's the only way to win."
  • In "Social Network", Mark Zuckerberg rushes out when he gets the idea for the "relationship status" option in a conversation about someone being single.
  • I am sure Newton was thinking about something when the apple (making a cameo since Eve) hit him on his head.
ps: Btw, "Social Network" is engaging (sharp dialog, good acting).
ps: Simpsons 22-2, Loan-a-Lisa, has a Mark Zuckerberg portrayal. Not as good as the Mapple store episode.

Sunday, October 03, 2010

NY: Relying on Strangers

Friday, October 01, 2010

Video Diptych

I missed doing triptychs. Here is a diptych, the third leg would have to wait. On the left, Cirque dazzles; on the right, Rebekah Del Rio crushes illusions.

Update: Here is the third leg, Nina Simone talks, 6 min into it and it is no longer just talkin.

Multi objective Optimization: The String Example

In algorithms research, we formulate problems as optimizations. In reality, there are many objectives. If world were simple, we would optimize the right parameter. Else, we have to bound some of the parameters, and optimize wrt the others.

In a recent plenary talk at ESA 2010, Paolo Ferragina explores the multiple parameters in string indexing data structures: running time, memory hierarchy, compression, and now, it seems energy efficiency, and mobile. The talk connects the theoretical string data structure problems to practical performance and to modern data systems like Cassandra (facebook), Cosmos (Bing) and HyperTable (Bing).