Sunday, July 31, 2011
Algorithms in The Field (Update)
- Algorithms in The Field continues to be in focus. The new call from the Algorithmic Foundations division of NSF has it.
- Thanks to Tanya for the tip, there is a center emphasizing "field academics" with an interesting graphic.
Two hats from Twitter
Friday, July 29, 2011
Dan Boneh (who organizers credited with quarterbacking this meeting) started the workshop and introduced the amazing crypto+security group at Stanford, seminar series, mailing list, and link to course notes and certificate course programs.
Ed Felten spoke next, communicated his personal goal to "write a few lines of code everyday" (in my case, when I was a graduate student, the mantra was, "learn one new lemma a day") but talked about his policy work at FTC for monitoring businesses. He communicated how FTC can enforce: deceptive use (misleading claims or use by companies) is easy to regulate, unfair use is harder to define, and in general, before an issue can come up for FTC enforcement, researchers have to go beyond research and develop support from general population for their case.
One of the anticipated talks was by Omar Tawakol of Bluekai. He spoke about how context and sites are no longer the key in Internet advertising, and user data was the new fuel. He stated the premise of his company that data is too much in shadow, consumers are smart and can handle the complexity of data, and how they support opt-out cookies, open source and so on. How much will users pay for ads free: little. He phrased a technical argument that the plumbing you need for (a) user conversion tracking where one measures whether user actually bought/converted some time after seeing an ad, (b) retargeting, the most effective form of advertising after search, where a user seen at once place is targeted elsewhere, and (c ) frequency capping, where an advertiser limits how many times a user sees an ad, are all the same. (a) and (c) are very legitimate uses for advertisers. Does that justify (b) too? Without 3rd party cookies behind (a-c), in particular (b), he claimed that 70--80% of 7B spend on display ads will be at risk. Tradeoff of free content vs ads is not fair because users don't know what they are trading for the free content; if you aren't paying for it, you aren't the customer, you are the product; why don't we turn off targeting?; why not turn off ads altogether? He proposed that these were all good questions. He tried out a technical argument that if 50% turn off ads, other end up paying for their free loaders content and that is not fair. Not clear this argument flies. He said Bluekai has a dashboard so users know what is known about them. He further pointed out that k-anonymity of a dataset does not help quantify the damage when two or more datasets are combined; he argued for some "smart noise" that will obfuscate the data and still keep it useful for advertising. Questions: common pipe may be avoided by doing (a) and (c) at the client browser, without doing (b); q: what is the marginal value of tracking users; etc.
Russell Glass of Bizo spoke about targeting business professionals. He pointed out also that you have to see lot of ads to make small amount of $'s (margins are small in data/display ads). He argued that what the industry needed was certainly regulation, but that will not provide "trust" which is sorely needed.
Dean Hachamovitch spoke about Internet Explorer and argued it was good at blocking malware and pointed to IETestDrive.com for consumer protection. He spoke about the list data structure behind how users can specify track/dont track sites. Knowing he was talking to research audience, he posed some directions: How to describing and visualize information flow between sites; how to generating, validate and value tracking lists for curators; etc. Questions: couldnt multiple tracking list conflict each other in white/blacklisting same site? Q: Do you see visualizing information across sites as a browser problem? He wanted a more general solution across vagrancies of the browser.
Narayanan spoke about Adnostic where behavioral profiling and targeting takes place in the user's browser. The ad network remains agnostic to the user's interests. Jonathan Mayer spoke zealously about DONT Track feature (8 byte request in http header?). Paul Francis spoke about privad, a privacy-preserving way to see and be shown ads. Nina Taft did a great job of describing research goals of the new Technicolor Labs in Palo Alto (great location, people and interns!). She had math on her slides, describing differentially private SVM algorithms. She also posed problems: how to model users in a home entertainment scenario; how to do social recommendation at home where users share resources; etc.
Altogether, a good set of talks, a very large audience (200+?), and an important set of topics, I enjoyed the workshop.
Sunday, July 17, 2011
In a NY/NJ train, the youth in baggy clothes sprawled on the seat in front of me, turns to me and asks urgently (which I understood after asking him to repeat it a couple of times), "how do you spell, shield, man, shield, you know, shield?". I spell it, and he types it into his SMS. Shield? Um....
The blonde woman dressed for the club sits next to me at Starbucks and yells into the phone (profanity withheld), "I am a lousy driver, I dont drive, why dont you just tell me where to go, am parked like right here". Long conversation like this, help from me, checking GPS etc, and then she tells me, "Oh, I am so close to my friend's place, it is only 3 miles". In NY city, in 3 miles, you can go from Upper West to Upper East, Village to Midtown, Apollo theater to City center, all along wondering why you would want to do it!
On a NY/NJ train, at a certain station that locals can guess, a man gets on, sits behind me, looks around to take in the scene, watches me read the newspaper, leans over in the little while and asks, "that the wall street journal?", I nod yes, and he says, "they shrunk the size?", I say, "yes, a while ago", and he says, "yeah, been inside for long" and chuckles.
In San Francisco, I get a drink, and this old gentleman engages me in a conversation. We talk about many things and at one point, speaking of why his PhDed son didnt choose to work in research, he says, "you have to really want to write papers you know, and publish them." Simply said, and true. Later, I ask this 70+ year old man what he wants to do, and he says, "Go to NY city, don't know, stay there for long, for two weeks or something, and soak it in."
Thursday, July 14, 2011
Demos, Bs and word humor
Today over lunch I faced beer pressure. (no typos there)
I get confused between the B "triangle of beaches: Belize, Bermuda, Bahamas and Barbados.
Tuesday, July 05, 2011
SIGMOD/PODS at Athens
- Tova Milo gave a nice plenary talk on business process data management (imagine queries to travel agency like, can I get a price quote without giving first my credit card details?), nearly all aspects from data models to query language to logic, optimization, provenance and structural/module privacy. More info here.
- I spent some time with Raghu and Divy. Raghu has done everything, and it is great to pick his brain about the potential evolution curves of MapReduce, BigTable/Cassandra/ and others. Divy is super open to discussion on any topic, and we continued our past discussions on auction and pricing, this time more focused on selling data. I also spent time with Divesh, charming, incredibly knowledgeable with footprint all over SIGMOD as usual, and the ever-clever Alon Halevy (I cant break his story yet!).
- In PODS business meeting, it was suggested that number of emails be used as a measure of complexity of organizing a meeting.
- Finally, PODS overlapped with the day of protests and general strike in Athens. Normally the Athenians (and Greeks in general) have an alluring air of anarchy about them, and this day exaggerated the effect. I mistakenly walked into the intersection with tear gas, and not only teared and had a serious headache afterwards, but also spent the flight back to NY with no feelings on my left leg. Notwithstanding that, I was happy to be in Athens where the modern world began.
Monday, July 04, 2011
FCRC (Google Meeting)
- Infrastructure: A web service request gets translated into 100's of parallel machine requests. Rare events (1 in million) happen! Fiuring out timing of processes across machines is a challenge. Performance problems in reality is hard to reproduce in labs because of the scale and complexity of real life.
- Cluster management: external view of the cluster is simple, internal view not as much. Each service depends on scores of others. Anything new introduced makes some people's life better, everyone's life simple bit worse. How do you move an application from one data center to another, when each depends on scores of other services? Failures are a reality, make them first class objects in thinking about systems. Load variations are enormous, 100B to 6 TB per request in some cases.
- Profiling systems at scale. Goal: 0% overhead for profiling across systems, platforms, languages, versions, and it has to be stable and everywhere? Very few benchmarks because world constantly changes and cant be reproduced. Use lot of open source tools and publish them.
- Social: social interactions are more than 2 people, or edges. Social behavior does not happen on graphs. Directionality, symmetry, organizational structure, etc. Are users comfortable with sharing: any measure of this and associated utility?
- Android security: Malware problem due to being popular and also open (due to large numbers). A problem: find andorid malware (client side? power-aware?). Malware perform redundant operations to be deceptive. Prob: Strong authentication methods for cellphone as a personal authenticated device?
- Bayesian vs worst case values. Of course! We discussed Leonard Savage's work on Bayesian assumption. Do advertisers know values? Who knows priors, bidders or mechanism. Can we design mechanisms which will work in adversarial as well as Bayesian priors nicely?
- Multidimensional revenue optimal mechanisms, problem to solve. Correlated bidders.
- Auctions that optimize not expected revenue but risk aversion (see eg. Mukund + Qiqi's work), say concave utilities and say other stochastic measures besides expectation? Challenge is also conceptual, what is a suitable benchmark?
- Automated mechanism design?
- Repeated games. 2 bidders with budgets, repeated second price auctions, what happens? Bidders learn, CSists might have a better handle on this theory than say Economists.
- How to integrate auctions and machine learning? Of course! Variations in input vs predictions needs to be modelled. Online solutions vs where ML applies.
- Big problems, eg., can we provide guidance on how science budget should be allocated among various disciplines, or NSF CS budget among different areas? Given a subset of researchers, say we can estimate their impact on the society when funded. Given this oracle, can we allocate funds to people to maximize social welfare? Can we model people switching teams in second round or open bid systems for reallocating funds? Q: Why doesn't NSF give $'s to 2 teams for the same project and get them to compete? For some recent work, see the work of Shay Kutten, Ron Lavi and Amitabh Trehan.
- Level k thinking and dynamics. We discussed 2/3rd of the average game (did you know, there was a museum of money?). If you have a wrong model of competition (say you think you play against MIT students, but we are not), then you can do wrong things even if you are rational. Vahab mentioned several results on level k dynamics and ad auctions.
- Assumption about others. Can one design a mechanism that is independent of assumption about others, robust to beliefs, solution concepts, and despite sophistication of bidders. For ex, some black box where myopic learning leads to good equilibrium?
- Abstract a model, deisgn incentive-compatible mechanism, go to real world, property doesnt hold. We need a notion of how incentive-compatible are bidders. Need a useful analog of approx for incentive compatibility.