Sunday, December 07, 2014

Data Science and Online Ads: Panel in NYCE 2014

Thanks to Arash Asadpour, Mohammad Hossein Bateni and Alex Slivkins for organizing the 2014 NY Area CS and Econ (NYCE) day in NY.   I will let the organizers blog about the day, it was a superb program.

I organized a panel on Data Science in Online ads. The panelist are stars and represented a constellation of perspectives in this complex ecosystem.

  • Matt Curcio is from Neustar (via Aggregate Knowledge). He builds and supports a neutral data platform for advertisers to gather and analyze ads data. He is remarkably broad, using data streaming to data privacy in this work. He spoke about the challenge of getting data scientists to collaborate, no matter the company they worked. 
  • Chris Wiggins is now the Chief Data Scientist at NY Times. He spoke about data products at NYT and estimating Long Term Value (LTV) of users. He also talked about placing house ads as an example of reenforcement learning. 
  • Aparna Pappu runs AdX at Google. She focused on AdX and described the goal of fair transfer of value from advertiser to publishers. She mentioned many specific data issues: there are gaps in their data since they dont observe all online events; data viz is hard; there is asymmetry of info since advertisers may know about users than any specific publisher; AdX can not share data equally with all parties; and finally, she spoke about great diversity of data they have so it is hard to find natural segmentations of publishers and advertisers. 
  • Paul Barford is now the Chief Scientist at Comscore, Inc following their acquisition of his Mdot. He described his consulting experience at BIM which is a publisher network that led him to the problem of detecting fraud clicks. He said, when systems are complex and there is money involved, there are bad actors, ie, fraud is a problem. Further, he quoted that data science starts with measurement and it is hard to gather data from ads platforms. 
  • Neal Richter is now the CTO at Rubicon Project. He spoke about how ads sales is changing into being automatic, and a challenge in petabytes of analyses he does with 200B transactions a day is to make the analyses and conclusions explainable to others including biz folks. 
  • Catherine Williams is Head of Data Sciences at AppNexus. She described AppNexus as the largest independent (non FB/GOOG) programmatic media co. and not involved with PII. AppNexus has a performance marketplace which is nice. She spoke about the challenge of suitable incentives for various types of content and stretched us to consider freedom of speech issues when we emphasize one type of content over the others. 
  • Claudia Perlich is Chief Scientist at Dstillery. She spoke about the predictive modeling and machine learning challenges in prospecting for ad targets. In particular she pointed out that it is not as much about predicting if you will buy X as it is to convince/convert you to buy X. She quipped that from their lat/long data of users, 30% os US population travels above the speed of sound! She also talked about how not to look at artificial metrics to improve in ad platforms, and the challenges of getting performance data from networks. 
  • Jon Krohn is a Data Scientist in the orbit of Omnicom, a large media company. He started with the observation that you need data to spend money well, and went to the board to draw the ``river of money'' from advertisers to agencies and media companies like his, to eventually publishers, with $s dwindling along the way with 20--30 hands that touch the transaction. 
I summarized their presentations. Discussions ensued:
  • Costis Maglaras asked, is the ad market going to be like DJ with small transaction cost or like Christies with XX% cut? Goods in ads are ephermeral, valued differently by different parties and cant be retraded, so not clear financial analogies apply. 
  • Vahab Mirrokni asked, is the ad market converging to reservations/allocation or auctions? Catherine mentioned that platforms like AppNexus are supporting many different types of markets from reservations to private packages/deals to RTB and performance. 
  • I asked if large distributed ML package that searches automatically over models and parameters will suffice for ad business. No, because information is not complete, players may not be rational, not single objective optimization, signal is weak, moving targets, etc. 
  • I asked why more of microeconomic concepts didnt penetrate ad markets, like substitutable goods. This is because publishers dont think their inventory is substitutable, and there are handcuffs around who owns data and privacy isseus, so data permissions dont let this info be usable. Paul Barford said Comscore is an exception of data and he was willing to work with academics on data access. 
I now have the formula for a great panel: recruit great professionals, let them go, and sit back. I enjoyed the panel immensely. I hope researchers connect with the folks above, there is a lot we can gain. It was good to sneak into NY academic scene, if only briefly.