MSR-INRIAWorkshop on Community Detection at Institut Henri Poincaré on Feb 26-27

A A

21 January 2015

Workshop on Community Detection at Institut Henri Poincaré on Feb 26-27

Workshop is open but registration is mandatory, by sending an email to [email protected] .

Thursday February 26, 2015

9:30 Welcome

10am-10:45 Lenka Zdeborová (CEA), Asymptotically exact analysis of the sparse stochastic block model [slides]

10:45-11:30 Martin Weigt (UPMC), Inference of modular contact networks in protein-protein interaction

11:30-12:15 Charles Bordenave (CNRS), Non-backtracking spectrum of random graphs [slides]

12:15-2pm Lunch break

2pm-2:45 Joe Neeman (UT Austin), Detection and recovery in the two-part symmetric stochastic block model

2:45-3:30 Milan Vojnovic (MSR), Load Balancing Tasks with Overlapping Requirements

3:30-4pm Coffee Break

4pm-4:45 Pierre Borgnat (ENS-Lyon), Multiscale community mining in networks with wavelets [slides]

4:45-5:30 Emmanuel Abbe (Princeton), Recovery thresholds in block models [slides]

Friday February 27, 2015

9:30-10:15 Bruce Hajek (UIUC), Detecting a community in a relatively sparse graph: understanding the fundamental limits of polynomial time algorithms [slides]

10:15-11am Sofia Olhede (UCL), Understanding Large Networks using Blockmodels

11am-11:30 Coffee Break

11:30-12:15 Seyoung Yun (MSR-Inria), joint work with Alexandre Proutière (KTH), Community detection with sampling and streaming [slides]

12:15-1pm Yudong Chen (Berkeley), Statistical-Computational Tradeoffs in Community Detection

1pm-2pm Lunch Break

2pm-2:45 Florent Krzakala (ENS Ulm), The Bethe Hessian Operator [slides]

2:45-3:15 Closing Remarks (open problems / future directions)

Abstracts

Charles Bordenave, Non-backtracking spectrum of random graphs

The non-backtracking matrix of a graph is a non-symmetric matrix on the oriented edge of a graph which has interesting algebraic properties. It appears notably in connection with the Ihara zeta function and in some generalizations of Ramanujan graphs. Recently, Krzakala, Moore, Mossel, Neeman, Sly, Zdeborová and Zhang have advocated the use of this matrix in the context of community detection to bypass some limitations of usual spectral clustering methods on sparse graphs. In this talk, we will study the largest eigenvalues of this matrix for the Erdos-Renyi graph G(n,c/n) and for simple inhomogeneous random graphs (stochastic block model). Our results confirm predictions from Krzakala et al. This is a joint work with Marc Lelarge and Laurent Massoulié.

Joe Neeman, Detection and recovery in the two-part, symmetric stochastic block model

The stochastic block model is a simple model for communities in random graphs. Its simplest incarnation is the two-part, symmetric case, in which n vertices are divided into two classes of equal size; then edges are added independently, with probability p_n for an edge within a class and probability q_n for an edge between classes. We will discuss the problem of inferring the classes given the graph. Depending on p_n and q_n, there are four things that could happen:

1) nothing can be learned about the classes;

2) one can find a partition that is correlated with the classes;

3) one can recover the classes, up to o(n) errors; or

4) one can recover the classes exactly.

Milan Vojnovic, Load Balancing Tasks with Overlapping Requirements

In several data centre applications, computational tasks are processed in a distributed system such that each task requires a set of distinct data inputs. To ensure scalability, the assignment of tasks to machines should guarantee that tasks requiring similar data inputs are collocated, and that the load is balanced across machines. In this talk we shall consider the problem of load balancing tasks with overlapping requirements, specified by an input bipartite graph which defines the set of requirements needed by individual tasks, under different definitions of the processing load of a machine. For example, the load of a machine may be proportional to the number of distinct requirements needed to serve the set of tasks assigned to this machine. In the online task assignment problem, tasks arrive one at a time and each task needs to be irrevocably assigned to one of the machines at its arrival time. We shall present results on these type of generalized load balancing problems for both arbitrary and hidden cluster random graph inputs.

Bruce Hajek, Detecting a community in a relatively sparse graph: understanding the fundamental limits of polynomial time algorithms

This talk focuses on the problem of finding an underlying community, or dense subgraph, within a network using only knowledge of the network topology. We consider the planted cluster model, which is a simple extension of the classical Erdos-Renyi random graph. We derive a semidefinite programming (SDP) relaxation of the maximum likelihood estimator for recovering the community from the network. If the size of the community is linear in the total number of vertices, the performance guarantee of the SDP exactly matches the necessary information bound. However, if the community size is sub-linear in the total number of vertices, the performance guarantee of the SDP is far from the information limit. Building on average case reductions, we show there exists a significant gap between the information limit and what can be achieved by computationally efficient procedures, conditionally on the assumptions that some instances of the planted clique problem cannot be solved in randomized polynomial time, for both a community detection problem and a community recovery problem.

Based on joint work, available at arXiv 1406.6625 and 1412.6156, with Prof. Yihong Wu (Univ. Illinois) and Dr. Jiaming Xu (formerly Illinois, now at Wharton School of Statistics, U. Pennsylvania).

Sofia Olhede, Understanding Large Networks using Blockmodels

Networks have become pervasive in practical applications. Understanding large networks is hard, especially because of a number of typical features present in such observations create a number of technical analysis challenges. I will discuss some basic network models that are tractable for analysis, what sampling properties they can reproduce, and some results relating to their inference. I will especially touch on the importance of the stochastic block model as an analysis tool.

This is joint work with Patrick Wolfe (UCL)

Pierre Borgnat, Multiscale community mining in networks with wavelets

(joint work avec N. Tremblay)

A signal processing approach is developed for the multiscale detection of communities in networks. This method relies on a spectral graph wavelets that stands for a local and and ego-centered view of the graph seen from each node at a specific scale. This enables to find communities of nodes according to the scale of analysis. We will show how to make the method suitable for the analysis of real data (facing scalability issues and introducing a way to test the relevance of the clustering), how the method compares to other methods (for instance methods using random walks) and we will discuss some applications to analyses of social networks, simplified sensor networks or to study data in genomics.

Yudong Chen, Statistical-Computational Tradeoffs in Community Detection

The problem of graph clustering and community detection concerns with identifying densely connected groups of nodes in a graph. In this talk we show that one can use computationally more expensive algorithms to achieve better statistical performance. In particular, we consider the classical planted clustering model (a.k.a. stochastic blockmodel), and show that the parameter space can be partitioned into four regimes — “simple”, “easy”, “hard”, and “impossible” — which correspond to progressively noisier problems such that: (1) a near-linear time algorithm succeeds in the “simple” regime, but provably fails in the “easy” and “hard” regimes; (2) a polynomial time algorithm succeeds in the “easy” regime, but provably fails in the “hard” regime; (3) a super-polynomial time algorithm succeeds in the “hard” regime, for which no polynomial time algorithm is known; (4) all algorithms fail in the “impossible” regime. Our results apply to the setting with an unbounded number of clusters.

Bio: Yudong Chen is currently a postdoc in EECS at UC Berkeley with Martin Wainwright. He obtained his Ph.D. in ECE from the University of Texas at Austin in 2013. His research interests include high-dimensional and robust statistics, convex optimization, and applications in community detection.

Conferences & Workshops

Archives

21 January 2015