Sunday, March 01, 2015

Few Good Lines

Over the past few weeks, people asked me for few lines on ...
  • The best line from a movie, "Boards don't hit back". 
  • On Valentines': every time I would buy coffee beans,  I would buy a  new kind and I would come downstairs in the morning, grind them, make myself a cup, make sure it tasted ok, before the rest of the family woke up. 
  • On Family: Spending 3 hrs washing, wiping, drying and transporting $100 IKEA wood tiles, doing it together as a family. 
  • On Being a Parent: When you are any fraction x into the hike, be prepared for the remaining 1-x, to carry the food, drink, jackets, and her scooter on your back, and your kid on your shoulder while she sings "Let it go". 

Weekend Update

What was my saturday like? I made myself some black coffee and pancakes. I couldnt bring myself to read the news, far too many unfolding events in the world had the potential to distract me. Instead, I washed, sanded and primed the roughly 50ft X 5 ft fence. I needed something on the sidewalk to catch the drips and decided to buy a weekend WSJ. And while I may have been Tom and the art of painting is a lustrous seductive, no neighbors volunteered to swap their treasures for my brush. And when I was done, I cleaned up with thinners, choosing to leave some white primer on my beard, which no one noticed. 

Sunday, January 25, 2015

Rob's 60th

Rob Calderbank is a mathematician of great caliber in coding and information theory, including its algebraic aspects. He is the winner of the Shannon award and the Hamming medal. Rob's 60th birthday celebration will be in La Jolla this weekend. He was a terrific mentor to me at AT&T in years long behind us, and I am looking forward to being at the celebration!

Labels:

Monday, January 19, 2015

Again, On Haruki

I am a schoolteacher in Ashiya (Hyogo Prefecture). Each year, while the students stay the same, I age towards something, maybe my end, who knows.

I teach the English I know, and mostly avoid the new.  This year, as usual, I discussed the Tell-Tale Heart by Edgar Allan Poe and asked the  students to write something inspired by it. Tell-Tale Heart is the story of the narrator who decides to murder the Old Man, and slowly, surreptitiously puts his head inside the Old Man's bedroom in preparation:  "Oh, you would have laughed to see how cunningly I thrust it in! I moved it slowly --very, very slowly, so that I might not disturb the old man's sleep. It took me an hour to place my whole head within the opening so far that I could see him as he lay upon his bed. Ha! would a madman have been so wise as this, And then, when my head was well in the room, I undid the lantern cautiously-oh, so cautiously --cautiously (for the hinges creaked) --I undid it just so much that a single thin ray fell upon the vulture eye." Thus begins the tale that ends in a scream of "tear up the planks! here, here! --It is the beating of his hideous heart!".  The students came up with the usual -- one person narrative, unreliable protagonist's plan for a murder -- and the usual unusual -- a reverse perspective of the Old Man sitting absolutely still in darkness anticipating the single thin ray of lantern. I then did what I have done each year since early 60's, read to the class a few selected student pieces from my past. When I read "Elevator, Silence" by my student Haruki, the class excitedly told me that Haruki had just published that story.

Well, "Elevator, Silence" is a short story about a man riding an elevator:  he cant tell if it is still or moving, it has no control panel, it seems hermetically sealed, his coughing and whistling produce no sound,  altogether a scary predicament. He whiles away the time by counting the change in his pocket: "I always come prepared with pockets full of loose change. In my right pocket I keep one-hundred and five-hundred yen coins, in my left fifties and tens. One-yen and five-yen coins I carry in a back pocket, but as a  rule these dont enter into the count". The story ends with terror, "The only possibility was that they had intentionally placed me in this particular situation. They wanted the elevator's motions to be opaque to me. They wanted the elevator to move so slow that I wouldnt be able to tell if it is going up or down."

Now it is alright as a student piece, but  I dont know how to make that into a published story. Maybe Haruki added more to it, introduced a chubby woman or unicorn or may be a plot about End of the World. He didnt strike me as an imaginative boy, but people grow up, they start operating bars or writing novels. I havent read Haruki's story yet, but I am not sure modern writing has anything to teach these children. I dont let decades intrude my classroom.

Sunday, December 21, 2014

Setting 2014 down

Happy holidays everyone! I hope you set 2014 down gently and meet 2015 with energy.

Spurring by an offline conversation, let me add: Some people make the days of their lives and its instant decisions sound difficult, some infuse them with gravitas, yet others with visions of achievement, drama, seriousness, etc. I work very hard to package my grimed fingernails, sleepless sumping and sweat of research, work and relationships, submerge the package, stand on top of it, and sketch aparcus of fun, art, smiles and puzzles. That is what I am, every year. 

Monday, December 08, 2014

On Urban Planning and Story Telling

When I was in University (it doesnt matter which prefecture), I enrolled for a class in Urban Planning. Now I can imagine many reasons why I might have done that, I was 20 yrs old, and I focused more on easy grades and good looking fellow students than learning. Whatever the reason, I didnt really make it to classroom all semester except once. That day, I was drinking my tea in the students center as usual, reading the Monkey King, and puzzling over not being able to recall the monkey's name. I happened to talk to a student, she was easy on eyes, our conversation flowed and before I knew it, I was accompanying her to her class, which coincidentally turned out to be Urban Planning. Her father was a government official in-charge of the local Dept of Buildings, and she really cared about Urban Planning. I dont know why, but to this day I remember what happened in the class. The professor taught us about zoning (how buildings have to be set back a fixed amount from the street) and water runoff (how to build catch basin and french drains to capture runoff from neighbors).

I told this to my friend Haruki in college, and he later told me he wrote a short story about it. I didnt think I had much of a story but I read Haruki's and you know, he is a real writer, he can imagine things I cant even contemplate, his story was creative and went places my mind couldnt be dragged. In the end, it was not my story at all, it could only have come out of Haruki's mind.

But my story continues. Years later, I bought a place that needed a lot of work. I could easily build my own fence because I knew what the setback was, and I built a catch basin too and watched the runoff from my neighbors property.

Sunday, December 07, 2014

Data Science and Online Ads: Panel in NYCE 2014

Thanks to Arash Asadpour, Mohammad Hossein Bateni and Alex Slivkins for organizing the 2014 NY Area CS and Econ (NYCE) day in NY.   I will let the organizers blog about the day, it was a superb program.

I organized a panel on Data Science in Online ads. The panelist are stars and represented a constellation of perspectives in this complex ecosystem.

  • Matt Curcio is from Neustar (via Aggregate Knowledge). He builds and supports a neutral data platform for advertisers to gather and analyze ads data. He is remarkably broad, using data streaming to data privacy in this work. He spoke about the challenge of getting data scientists to collaborate, no matter the company they worked. 
  • Chris Wiggins is now the Chief Data Scientist at NY Times. He spoke about data products at NYT and estimating Long Term Value (LTV) of users. He also talked about placing house ads as an example of reenforcement learning. 
  • Aparna Pappu runs AdX at Google. She focused on AdX and described the goal of fair transfer of value from advertiser to publishers. She mentioned many specific data issues: there are gaps in their data since they dont observe all online events; data viz is hard; there is asymmetry of info since advertisers may know about users than any specific publisher; AdX can not share data equally with all parties; and finally, she spoke about great diversity of data they have so it is hard to find natural segmentations of publishers and advertisers. 
  • Paul Barford is now the Chief Scientist at Comscore, Inc following their acquisition of his Mdot. He described his consulting experience at BIM which is a publisher network that led him to the problem of detecting fraud clicks. He said, when systems are complex and there is money involved, there are bad actors, ie, fraud is a problem. Further, he quoted that data science starts with measurement and it is hard to gather data from ads platforms. 
  • Neal Richter is now the CTO at Rubicon Project. He spoke about how ads sales is changing into being automatic, and a challenge in petabytes of analyses he does with 200B transactions a day is to make the analyses and conclusions explainable to others including biz folks. 
  • Catherine Williams is Head of Data Sciences at AppNexus. She described AppNexus as the largest independent (non FB/GOOG) programmatic media co. and not involved with PII. AppNexus has a performance marketplace which is nice. She spoke about the challenge of suitable incentives for various types of content and stretched us to consider freedom of speech issues when we emphasize one type of content over the others. 
  • Claudia Perlich is Chief Scientist at Dstillery. She spoke about the predictive modeling and machine learning challenges in prospecting for ad targets. In particular she pointed out that it is not as much about predicting if you will buy X as it is to convince/convert you to buy X. She quipped that from their lat/long data of users, 30% os US population travels above the speed of sound! She also talked about how not to look at artificial metrics to improve in ad platforms, and the challenges of getting performance data from networks. 
  • Jon Krohn is a Data Scientist in the orbit of Omnicom, a large media company. He started with the observation that you need data to spend money well, and went to the board to draw the ``river of money'' from advertisers to agencies and media companies like his, to eventually publishers, with $s dwindling along the way with 20--30 hands that touch the transaction. 
I summarized their presentations. Discussions ensued:
  • Costis Maglaras asked, is the ad market going to be like DJ with small transaction cost or like Christies with XX% cut? Goods in ads are ephermeral, valued differently by different parties and cant be retraded, so not clear financial analogies apply. 
  • Vahab Mirrokni asked, is the ad market converging to reservations/allocation or auctions? Catherine mentioned that platforms like AppNexus are supporting many different types of markets from reservations to private packages/deals to RTB and performance. 
  • I asked if large distributed ML package that searches automatically over models and parameters will suffice for ad business. No, because information is not complete, players may not be rational, not single objective optimization, signal is weak, moving targets, etc. 
  • I asked why more of microeconomic concepts didnt penetrate ad markets, like substitutable goods. This is because publishers dont think their inventory is substitutable, and there are handcuffs around who owns data and privacy isseus, so data permissions dont let this info be usable. Paul Barford said Comscore is an exception of data and he was willing to work with academics on data access. 
I now have the formula for a great panel: recruit great professionals, let them go, and sit back. I enjoyed the panel immensely. I hope researchers connect with the folks above, there is a lot we can gain. It was good to sneak into NY academic scene, if only briefly. 

Labels:

Saturday, November 29, 2014

Workshop on Graph Streams (Sandia/DIMACS)

Here are some notes from the workshop, superbly organized by the Sandia team.

  • Workshops are hard to organize, and you have to have a large purpose to do the work. The Sandia team of Bruce Hendrickson, Jon Berry, Cynthia Phillips, and others has a scholarly attitude, which was truly refreshing. There is genuine interest in Sandia, from US Govt IP network (mix of classified, unclassified, specialized) monitoring applications to new theoretical graph stream models, and an empirical approach based on setting up synthetic dataset, benchmark tools and infrastructure systems. I didnt know Livermore has a Sandia Lab, with Kevin Matulef, C. Seshadri and others. This is some nice research horse and brain power for streaming research at Sandia.  They had a uber-data context: some stored data, some sampled, some streaming hose, how to process them all with a combination of multiple machines, cloud, etc. Will wait for Jon to put his slides online where this model was clearer. 
  • Attending a workshop even for a  day is a welcome break to think about problems. Here are vague questions. Somebody out there may have something to say (incl. shooting down the problems): (a) say characters of a string arrive online, produce a uniformly random sample substring. Detail: good for string seen thus far, represent the substring by O(1) sized representation of left and right endpoints, ... (b) the contents of a file are sent by breaking into substrings in IP packets, but substrings are sometimes repeated, sometimes substrings are overlapping in arbitrary ways (due to TCP resend). Is there a coding/decoding solution that tradeoffs coding quality to sublinear space reconstruction? (c) each new stream item is a string. have to find substrings of each stream item that appears a lot of times thus far. If the lengths of strings is L, can you avoid doing O(L^2) work per item and/or use space less than exp in L. 
  • Distractions. Cindy said, "that is the last edge that broke the camel's back". Sudipto used the phrase, "the right side of Buddha". Madhav could not attend the workshop because he had to respond to the Ebola threat. 

Labels:

Algorithms in the Field (8F)

NSF announces a new funding program for Algorithms in the Field. Deadline is Feb 9, 2015.  One of the metrics in Algorithms community is the ultimate use of our algorithms, ``use" being broadly interpreted, and this often needs us to go more than halfway to meet other communities. This program is an opportunity to codify the process some. When one does meet the other communities, almost always it leads to new theories and algorithms, and more than pays for the journey. I hope you will respond.

Here is more info on the workshop we organized 2 years ago. The videos of the talks are here.

Labels: