Research on Algorithms and Incentives in Networks
(RAIN) seminar at Stanford is on Wednesdays. Yesterday, Ashish Goel started off the series for the Academic Year introducing the new faculty (
Kostas and
Mohsen) and introduced the first speaker of the series:
Jure Leskovec.
Jure set out the goal nicely. Consider large networks (collaboration, citation, web, telecom whatever). They have some modular structure (overlapping or disjoint clusters), but how do we form a picture of their structure, how do we formalize this picture? He summarized his approach as "computational experiments", that is, start with data, formulate some atomic notions, tweak, identify some initial structure, peel and find more, all the while formulating and solving interesting computational problems on networks.
Technically, he started with conductance of a set of nodes (number of cut edges/sum of degrees) and defined the network conductance profile (NCP) as min conductance for each size k of the set of nodes. You can compute this using spectral, multicommodity flow or SDP techniques, Metis and others. He showed the NCP of many networks (V shaped).
Ramesh Johari asked why doesnt the profile contain for each k, the number of sets of size k of nodes that achieve the min conductance.
Yoav Shahom asked more generally why not look at the entire distribution of conductance versus sets. A: doesnt change the main observations.
Amin Saberi asked whether the picture showed NCP or Intractability profile. A: you can use lower bounds on SDPs and optimizations to convince oneself that the curves are right. He then zoomed in on the down and up parts separately and eventually worked the crowd to figuring out the structure of the network that it has a periphery of "whiskers", you can peel then off (by focusing on largest biconnected components) and you will find similar structure again, and so on. He proceeded to taking labeled data of users belonging to various communities as ground truth, and showed series of analyses of probabilistic parameters of potential models that would generate such networks. Among other things, overlaps of these communities is dense.
Jure is a top data miner and like any good data miner, he is more than the sum of the plots he shows: behind each plot, is the sweat of many other plots and analyses that are mere carcasses on the way.