Sunday, October 30, 2011

Hiring in Ads

NYT article that data analysts, mathematicians and engineers are needed to run modern ad machinery:

“You have to get very close to technology,” Ms. Kent-Smith said. “You have to get your hands in it.”

The Ad:tech conference will be held at the Jacob K. Javits Convention Center in Manhattan from Nov. 8 to 10, and will include a panel on how marketers can build a digitally skilled “brand dream team.”

Video hangout with the World: Update

Last week was the first experiment with two hosts focused on a topic. The topic was Compressed Sensing (CS), and the main host was Igor Carron. There were 10+ people trying to get on the call at various times and not all could get on. Igor was in and out due to technical problems. Bottomline: our technique for video hangout with the world will keep improving, and we need another session later where Igor will have a platform to address the many issues that come up, with his encyclopedic insights into CS.

Last week, we talked about:
  • Functional CS. Minimize number of measurements needed to not reconstruct the signal, but estimate various functions of the signal. Streaming algorithms can be seen to be in this genre, but they dont provide the typical for-all signals guarantee or provide insights on what is a suitable notion of class of all ``compressible'' signals for a function of interest. Eric Tramel who was in the call and has image analysis background, proposed ``smoothness'' or total variation distance as a function to estimate. Defined as \sum_i (A[i]-A[i-1])^2, this does not seem to be a new problem: it is L_2 norm squared, and inner product. But some variation of this may be of interest. Some old thoughts on functional CS is here.
  • Linear measurements are the key to compressed sensing. What is the status on building hardware that will do linear measurements from analog signals that is faster/more efficiently than standard Nyquist sampling?
  • What is the status of CS for arbitrary dictionaries (not necessarily, orthonormal). Did any new algorithmic technique beyond usual pursuit + group testing algorithms get developed?
  • What are the latest developments in CS for matrix approximation?
  • What are recent CS conferences? Examples: 1, 2, ..


Saturday, October 29, 2011

Lo Lee Ta, The Three Trips

Long straight hair and glasses framing her face,
her long longs draped over,
she lolled by the tall windows,
with flickering eyelids, hair tugs and occasional smiles,
reading Lolita between sideways looks.

In the morning light of a Saturday at Starbucks on Sharon Heights,
anyone can be Lolita and anybody can be Nabokov.

+ for LA

On why I connect with LA: murals galore (more like the one on the left). Now, murals and ads, artists and patrons collide.

Saturday, October 22, 2011

A New Yorker Snaps

This new yorker snapped and drove 30+ min to a place called, "a slice of new york" in San Jose. The pizza was 1/3rd as good as in NY and the place used 3 times as many employees as a standard pizza place in NY. I am sure Californians looking for their slice in NY will only get pizza 1/3rd as good as in California, but that will be served with only 1/3rd as many employees. :)

Video Hangout With the World: Expert Guests

I will have help in the weekly video hangout with the world.
  • Wed Oct 26: Compressed Sensing blog's Igor Carron will be the maestro discussing compressed sensing: math, algorithms, applications.
  • Wed Nov 9: Algorithmic Game Theory/Economics blog's Noam Nisan will be the maestro discussing mechanism design, in particular, auctions: ads, combinatorial, whatever.
  • Wed Nov 16: Polylog blog's Andrew McGregor will be the maestro, discussing data stream algorithms and applications.
I am excited, Igor, Noam and Andrew are experts and I invite people with questions, solutions, discussion points to join us.
  • Instructions: invite into your circle on your Google+ or email me, I will add you to my video hangout circle; then, at the suitable time, go to our Google+, you will see me hanging out, join the hangout!)
  • Time: Wednesdays 8 AM -- 9 AM pacific time in US.


Wednesday, October 19, 2011

Video Hangout With the World, Update

In the past 3 weeks, we have discussed:
  • What is a suitable approximation for an SQL query? The point is, the correct output is a multi-set of tuples with several attribute values. A definition of an approximate output is not immediately clear. If one were to define a distance between possible outputs, weighted sum of errors over attributes is not necessarily the most informative. Set similarity is not necessarily the best answer either, for example because attribute values have a meaning. (Say the correct answer is {10,20,30}; {10,20} is a similar set, but {9,19,31} may be a good approximate output.) Some sort of "matching" constraint should be implicit. A related paper is from VLDB 99 by Ioannidis and Poosala called "Histogram-Based Approximation of Set-Valued Query-Answers". There are a lot of nice theory problems lurking here.
  • What are interesting algorithms problems one can abstract from Informatics Olympiads, math puzzle sites? The solution hopefully involves something more than dynamic programming.
  • Data structure problems where outputs should be in some specified order. Say the output is of size k, then in O(klog k) additive time, one can output any specified order, assuming output items can be compared. Are there problems where one can do + O(k) only or without knowing k ahead of time?
  • Specific problems: 2d image listing (rather than document listing) problem. This leads to the following "simpler" problem. Given n lists of length l that can be preprocessed, at query time, k < n lists are identified, and the goal is to output the intersection of these k lists in time like k, log l or poly in that. Any ideas?
We have persevered through bad connections, too many or too few on the hangout and so on, and we are still rolling.



I spent the weekend riding the rocky mountains of Nevada in a 4WD pickup truck. These vehicles, innocent on the Highways, can grip rocks, climb, spin, spit them out and forge ahead, up or down, treading where there are no tracks, just rocks, sage brushes and emptiness managed by the Federal Bureau of Land Management (plus empty shells from guns fired by a few pioneers who had ventured before). Note to self: must get one of these 4WDs!

Palo Alto Talks

Last week was a whirl of talks.
  • Bob Tarjan started his lectures on algorithms+data structures with amortized analysis. The simple example of counting the number of bit operations needed to increment n times, led to potential function based analysis and nice exercises of getting worst case bounds (via redundant codes, still with O(log n) bits). Qiqi Yan asked if there was a way to appeal to LP Duality to get worst case bounds in general. Then, Bob introduced different "heap" representations from the standard to tournament (n values are leaves of a tournament), half full (replace winners by empty slots), right full (make all empties the left branch), compressed (keep links to successive winners), and finally to half ordered representation by linking the siblings. Those who know Bob or read his monograph know, he is superbly efficient with his words. I mentored myself from his monograph when I was a graduate student, and this is now an in-person treat!
  • Vijay Vazirani spoke about markets. He started with the Arrow-Debreu (AD) model that shows that there are prices for a supply that will satisfy every demand and clear the market precisely. This result has many implications and issues (not unique even beyond scaling, pareto optimality, every pareto optimal allocation comes from equilibrium after redistribution of wealth, dynamics? etc). Vijay spoke about his joint work with Kamal Jain on studying this problem with digital goods with no cost to infinite supplies and challenge is to envision a nontrivial market that doesn't allow equilibriums when everyone gets every copy of the digital good. He proposed a model that involved grouping items into semantic categories with full substitutability within each for uniform prices, and a rich context with cardinal and ordinal preferences. Now supply/demand constraints become more intricate, some bundles are forbidden, and the entire edifice of AD has to be reconstructed. This is a grand problem where Vijay's natural ebullience keeps the audience engaged through the myraid details of formulations. He left open even grander problems of pricing items like theorems, financial advice and innovations.
Talks are one-sided. I managed to have two-sided conversations not only with Bob and Vijay, but also Ashish and Tim over dinner. Tim is going through energy spins, waxing and waning with the weekly cycles of his teaching, and he shared a key insight (or two! :)): some textbooks are for professors and others are for students. In Algorithms, we know which is which. My favorite professors' text is the monograph by Dexter Kozen, The Design and Analysis of Algorithms. Ashish, who can quietly float to the top of any topic, described the excitement of molecular machines which can trot along biological material and repair strands.


Shout Out to "Occupy Wall Street"

This is a shout out. Finally, we hear more voices and narratives in the mix than what has consumed us the past 3 years of this financial mess. We are the 99%! (T-shirts).

Thursday, October 06, 2011


Research on Algorithms and Incentives in Networks (RAIN) seminar at Stanford is on Wednesdays. Yesterday, Ashish Goel started off the series for the Academic Year introducing the new faculty (Kostas andMohsen) and introduced the first speaker of the series: Jure Leskovec.

Jure set out the goal nicely. Consider large networks (collaboration, citation, web, telecom whatever). They have some modular structure (overlapping or disjoint clusters), but how do we form a picture of their structure, how do we formalize this picture? He summarized his approach as "computational experiments", that is, start with data, formulate some atomic notions, tweak, identify some initial structure, peel and find more, all the while formulating and solving interesting computational problems on networks.

Technically, he started with conductance of a set of nodes (number of cut edges/sum of degrees) and defined the network conductance profile (NCP) as min conductance for each size k of the set of nodes. You can compute this using spectral, multicommodity flow or SDP techniques, Metis and others. He showed the NCP of many networks (V shaped). Ramesh Johari asked why doesnt the profile contain for each k, the number of sets of size k of nodes that achieve the min conductance. Yoav Shahom asked more generally why not look at the entire distribution of conductance versus sets. A: doesnt change the main observations.Amin Saberi asked whether the picture showed NCP or Intractability profile. A: you can use lower bounds on SDPs and optimizations to convince oneself that the curves are right. He then zoomed in on the down and up parts separately and eventually worked the crowd to figuring out the structure of the network that it has a periphery of "whiskers", you can peel then off (by focusing on largest biconnected components) and you will find similar structure again, and so on. He proceeded to taking labeled data of users belonging to various communities as ground truth, and showed series of analyses of probabilistic parameters of potential models that would generate such networks. Among other things, overlaps of these communities is dense.

Jure is a top data miner and like any good data miner, he is more than the sum of the plots he shows: behind each plot, is the sweat of many other plots and analyses that are mere carcasses on the way.

Sunday, October 02, 2011

NAE Frontiers Meeting

The National Academy of Engineering (NAE) runs a Frontiers of Engg (FOE) meeting annually at the Beckman center in Irvine, with 100+ bright engineers from bio to brick+martar, computers and beyond, interacting with each other over 2.5 days. I was in this main meeting in 2002 (found it intellectually very exciting), and also the US/German version a few years later.

They have a US version of the FOE. It was recently held in Mountain View, CA, at the Google headquarters. At NAE, Janet Hunziker has been quarterbacking these meetings for a long time, and at Google, Lisa McCracken who handles so many of our CS conferences so effectively, stepped up, and together they seemed to have pulled off an awesome meeting. When NAE discussed 2011 version four years ago, I connected them with Google, so I scored a ticket to the reception dinner and got to meet the participants (Michael Cafarella, Evgeniy Gabrilovich and Lisa Zhang from our communities were invited to the meeting, among others).

Alfred Spector was the dinner speaker. He started with the observation that in early days of CS, it was 50% math, logical proofs and analytics (think Knuth, Dijkstra), now it is large % empirical, more like natural sciences, because we use and enable an external world. A good example is spelling correction, which morphed from (dictionaries, cognitive sciences, algorithms) -> (large data, parallelism) in an experiment with the world. He switched to discussing set of n grams from books in the history, study of regular vs irregular words, changes in uses of words, and eventually to laying out the agenda of "Digital Humanties" that CS researchers are carving. He went on to discuss various challenges like security, healthcare, education (individualize education?), etc. Questions from the audience: state of electrical power measurement by Google; why doesnt Google keep up healthcare software for people who continue to want it even though it is discontinued; personalization in larger, search context? comment; future of smartphone?; you provide sketch, language, picture, video tools. When will 3D and shapes be more prevalent? A: Chrome supports 3D. Alfred, already a NAW Fellow, is a virtuoso, his talk and answers were well calibrated, he timed it to catch his flight straight after the talk.

Quote from Thomas Edison on the dinner menu: "Everything comes to him who hustles while he waits." So I hustled after the dinner.