Monday, March 20, 2017

Self II

Here is how the world sees me.
  • There is a coffee place in SFO, tucked away where people dont make it often. I pick up coffee there when I fly in from EWR, and they give me a discount because "you are a taxi driver". 
  • I went to a WeWork location in NY city to meet a friend, and as I walked in, the receptionist said, "talk to that person, she knows who needs the handyman in the building", and I got to go through the process of a fixer-upper to enter the building. 

Self I

People seem to like it when I poke at myself:

In a recent conversation, we discussed Dad Jeans, described precisely here, but more as a state of mind. A parent needs to think several steps ahead on behalf of their kid who can swerve from disegaged to insightful, be prepared for spills, and be prepared to be out the door the instant the kids are ready unexpectedly for the playground in the winter. So, Dad Jeans, is the choice of wear, it communicates that you are unable to be anything or be anywhere else, beyond your control.

Being me, I have to find my own way to express that state of mind, so these days I am doing Dad Hair, baggy, ready to follow me instantly, and unable to be anything else. :)

Monday, March 06, 2017

CS Divisions

Thanks to a recommendation from Marc Donner from old google days who now runs Uber, NYC, I am reading Sapiens by Yuval Harari. The author tries to explain the history of humans, succinctly, and succeeds by having an insightful view of anthropology, sociology, behavioral theory, and of course, science and religion too.  One of the interesting parts for me was the need humans felt to divide people into categories (think commoner/noble, castes, etc).  Alas, with division into categories, comes an imposed order among them and fights to invert the order. The author argues that this imagined order among humans keeps societies stable when it works, and unstable when it doesnt.

I have always been suspicious of divisions. In CS, folks divide areas of research. These are not islands.  In any area of research (say AI, social networks, Robotics, Brain, whatever), there are (a) theoretical foundations and optimizations, (b) new systems research into hardware and software needed to program them, compile into executables, execute them efficiently, (c) new data and UI systems to use, analyze, report, mine and troubleshoot, and so on. A great research will include conceptual breakthroughs, cacophony of math symbols no more than what is needed, potential for pretty plots, and a storyline for NY Times for societal impact. Most individuals' research doesnt hit on all these metrics, doesnt have to, we rely on the cumulation of research to hit all of the metrics. Any research area will be potentially less engaging without ALL of these elements, no order amongst them is needed. 

Extreme Streaming

I am making my way back into researching streaming problems.

One of the directions I am focusing on: how to use not polylog memory as is standard in streaming algorithms, but even smaller, say O(1) memory.  My coauthors and I have such algorithms for estimating the H-index on streams (to appear in PODS 2017, will be on arxiv soon) and estimating heavy hitters in a stream of streams model (to appear in SDN 2017).

I was sort of pushed into this model the way I like to find problems in general. If you look at modern applications, there are some real constraints. For examples in SDNs (Software Defined Networks), there are memory pipelines that packets can percolate through, each memory stage can be thought of as a row of standard sketches, and then one needs to compute something on top of these row estimates, but use only memory that can fit into a single packet header. Another example is that streaming analyses are done for a very large number of groups (say for each source IP address or internet user) and in that case, polylog memory per group is already far too much.

I call these extreme streaming problems, inspired by Extreme Classification in Machine Learning, which studies ML problems with a very large number of labels. I think there is more to mill here.