Thursday, May 21, 2015

Morning Madness: An Ode to Mercedes S

This is the morning madness, of a drive along skyline.
Cutting through fog as thick as stew and waves of dew,
#@70mph, hugging the memorized curves without tapping the breaks,
the top and windows down, listening to the tires searing and splattering the film of water,
Mountains and cliffs shrouded in clouds in the clasp of prehistoric moments.
At last, I reached the mission, the valley in SF, parked the car and let it sigh.

Friday, May 08, 2015

Microsoft Surface Projects

I thought this could be of interest to academic folks: Academic RFP for research with surface hub. 

Saturday, May 02, 2015

What does a trip to NY mean?

A two day trip to NY means:
Among other things, like meeting artists by proximity. Someone asked me what I like about NY. It is the shadows. The light is just so, even the shadows and silhouettes on a soho afternoon are theatrical. 

Predicting Population Size

Computer engineers claim they predict impressions, clicks, conversions, prices, and other things billions of times a day;  data mining researchers write papers on time series prediction; machine learning researchers use ensembles as default answer. Still, it is very hard to predict things, even aggregate things, in reality. Here is a piece on prediction in practice

Tuesday, April 28, 2015

Yelp Data Challenge

Yelp runs a data analysis challenge: Predict health metrics of restaurants. Enjoy. 

Sunday, April 26, 2015


My thoughts are with the family and friends of my neighbor, friend and an incredibly thoughtful soul Dan who fell pursuing Mr. Everest, a victim of the quake

Friday, April 24, 2015

Streaming H

Items  I_i s (+ve integers,  I_i \in [1,U]) arrive one after another, and for any n, after having seen n items, you have to return an approximation to H_n which is the H-index of the first n items. I_1,...,I_n  (The H-index of a set of number is largest k such that there are at least k items each >= k). A student in Rutgers posed this problem.

This problem has a simple worstcase solution: sing log_{1+\eps} U space, get  (1+\eps) one-sided error approximation by maintaining an exponential histogram on the domain and counting the number of items in each bucket. (The same solution works if items arrive and depart as well. ) Now, U can be replace by max_{i <=n} I_i after n items, so it is input data-sensitive. If items are drawn from domain [L,R], you can splice out the lower buckets and the same solution works with about (log R-log L)  buckets (better than log (R-L)).  None of this is particularly interesting theory wise.

Is there an interesting version of this problem if the items are drawn iid from unknown distribution repeatedly? Curious people should see the connection to the Prophet inequality and frugal streaming results.


Sunday, April 19, 2015

Bernal Heights Park

Someone asked me to poet about the Bernal Heights Park in my SF neighborhood, so here it goes:

Climb any which way, this pyramid of a hill with three trees on the top.
Look up whilest and see peoples' busts and  heads of unleashed dogs against the sky.
And head down to see the porcelain youth on Cortland Avenue.

Art Gravity Shifts: Whitney Museum among Meatpackers

Whitney, a real American museum --- forget the century old language of art elsewhere, here it is urgent, it is both now and the far future with its biennials ---   moves to meatpacking district next month. My great hope is, they will come for the High Line, stay for a real lesson in the past 50 yrs of American art and the next 50, and the world will be a far better place.

That aside, NY times shows the depth of the art galleries in NY, 10 in Chelsea to see.