scispace - formally typeset
Search or ask a question
Author

Joe Neeman

Bio: Joe Neeman is an academic researcher from University of Texas at Austin. The author has contributed to research in topics: Gaussian measure & Random graph. The author has an hindex of 24, co-authored 67 publications receiving 3045 citations. Previous affiliations of Joe Neeman include University of Bonn & University of California.


Papers
More filters
Journal ArticleDOI
TL;DR: A way of encoding sparse data using a “nonbacktracking” matrix, and it is shown that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model.
Abstract: Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

702 citations

Journal ArticleDOI
TL;DR: This work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem and provides a simple and efficient algorithm for estimating a and b when clustering is possible.
Abstract: The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on $$n$$ nodes with two equal-sized clusters, with an between-class edge probability of $$q$$ and a within-class edge probability of $$p$$ . Although most of the literature on this model has focused on the case of increasing degrees (ie. $$pn, qn \rightarrow \infty $$ as $$n \rightarrow \infty $$ ), the sparse case $$p, q = O(1/n)$$ is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborova based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $$p = a/n$$ and $$q = b/n$$ , then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if $$(a - b)^2 > 2(a + b)$$ , and impossible if $$(a - b)^2 < 2(a + b)$$ . By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if $$(a - b)^2 > C (a + b)$$ for some sufficiently large $$C$$ . We prove half of their prediction, showing that it is indeed impossible to cluster if $$(a - b)^2 < 2(a + b)$$ . Furthermore we show that it is impossible even to estimate the model parameters from the graph when $$(a - b)^2 < 2(a + b)$$ ; on the other hand, we provide a simple and efficient algorithm for estimating $$a$$ and $$b$$ when $$(a - b)^2 > 2(a + b)$$ . Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.

377 citations

Journal ArticleDOI
TL;DR: In this article, it was shown that it is information theoretically impossible to cluster if s2 ≤ d and moreover it is even impossible to even estimate the model parameters from the graph when s2 d.
Abstract: We study a random graph model called the “stochastic block model” in statistics and the “planted partition model” in theoretical computer science. In its simplest form, this is a random graph with two equal-sized classes of vertices, with a within-class edge probability of q and a between-class edge probability of q′. A striking conjecture of Decelle, Krzkala, Moore and Zdeborova [9], based on deep, non-rigorous ideas from statistical physics, gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if q=a/n and q′=b/n, s=(a−b)/2 and d=(a+b)/2, then Decelle et al. conjectured that it is possible to efficiently cluster in a way correlated with the true partition if s2>d and impossible if s2 Cdlnd for sufficiently large C. In a previous work, we proved that indeed it is information theoretically impossible to cluster if s2 ≤ d and moreover that it is information theoretically impossible to even estimate the model parameters from the graph when s2 d. A different independent proof of the same result was recently obtained by Massoulie [20].

252 citations

Posted Content
TL;DR: This work proves the rest of the conjecture of Decelle, Krzkala, Moore and Zdeborová by providing an efficient algorithm for clustering in a way that is correlated with the true partition when s2>d.
Abstract: We study a random graph model named the "block model" in statistics and the "planted partition model" in theoretical computer science. In its simplest form, this is a random graph with two equal-sized clusters, with a between-class edge probability of $q$ and a within-class edge probability of $p$. A striking conjecture of Decelle, Krzkala, Moore and Zdeborova based on deep, non-rigorous ideas from statistical physics, gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $p = a/n$ and $q = b/n$, $s=(a-b)/2$ and $p=(a+b)/2$ then Decelle et al.\ conjectured that it is possible to efficiently cluster in a way correlated with the true partition if $s^2 > p$ and impossible if $s^2 C p \ln p$ for some sufficiently large $C$. In a previous work, we proved that indeed it is information theoretically impossible to to cluster if $s^2 p$. A different independent proof of the same result was recently obtained by Laurent Massoulie.

220 citations

Posted Content
TL;DR: Following Decelle et al, this work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem.
Abstract: The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on $n$ nodes with two equal-sized clusters, with an between-class edge probability of $q$ and a within-class edge probability of $p$. Although most of the literature on this model has focused on the case of increasing degrees (ie.\ $pn, qn \to \infty$ as $n \to \infty$), the sparse case $p, q = O(1/n)$ is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborova based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $p = a/n$ and $q = b/n$, then Decelle et al.\ conjectured that it is possible to cluster in a way correlated with the true partition if $(a - b)^2 > 2(a + b)$, and impossible if $(a - b)^2 C (a + b)$ for some sufficiently large $C$. We prove half of their prediction, showing that it is indeed impossible to cluster if $(a - b)^2 2(a + b)$. Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.

183 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences, including conceptual developments in ML motivated by physical insights, applications of machine learning techniques to several domains in physics, and cross fertilization between the two fields.
Abstract: Machine learning (ML) encompasses a broad range of algorithms and modeling tools used for a vast array of data processing tasks, which has entered most scientific disciplines in recent years. This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences. This includes conceptual developments in ML motivated by physical insights, applications of machine learning techniques to several domains in physics, and cross fertilization between the two fields. After giving a basic notion of machine learning methods and principles, examples are described of how statistical physics is used to understand methods in ML. This review then describes applications of ML methods in particle physics and cosmology, quantum many-body physics, quantum computing, and chemical and material physics. Research and development into novel computing architectures aimed at accelerating ML are also highlighted. Each of the sections describe recent successes as well as domain-specific methodology and challenges.

1,504 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a guided tour of the main aspects of community detection in networks and point out strengths and weaknesses of popular methods, and give directions to their use.

1,398 citations

Journal ArticleDOI
08 Jul 2016-Science
TL;DR: A generalized framework for clustering networks on the basis of higher-order connectivity patterns provides mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges.
Abstract: Networks are a fundamental tool for understanding and modeling complex systems in physics, biology, neuroscience, engineering, and social science. Many networks are known to exhibit rich, lower-order connectivity patterns that can be captured at the level of individual nodes and edges. However, higher-order organization of complex networks—at the level of small network subgraphs—remains largely unknown. Here, we develop a generalized framework for clustering networks on the basis of higher-order connectivity patterns. This framework provides mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges. The framework reveals higher-order organization in a number of networks, including information propagation units in neuronal networks and hub structure in transportation networks. Results show that networks exhibit rich higher-order organizational structures that are exposed by clustering based on higher-order connectivity patterns.

972 citations

Journal ArticleDOI
06 Aug 2015-Nature
TL;DR: This work maps the problem onto optimal percolation in random networks to identify the minimal set of influencers, which arises by minimizing the energy of a many-body system, where the form of the interactions is fixed by the non-backtracking matrix of the network.
Abstract: The whole frame of interconnections in complex networks hinges on a specific set of structural nodes, much smaller than the total size, which, if activated, would cause the spread of information to the whole network, or, if immunized, would prevent the diffusion of a large scale epidemic. Localizing this optimal, that is, minimal, set of structural nodes, called influencers, is one of the most important problems in network science. Despite the vast use of heuristic strategies to identify influential spreaders, the problem remains unsolved. Here we map the problem onto optimal percolation in random networks to identify the minimal set of influencers, which arises by minimizing the energy of a many-body system, where the form of the interactions is fixed by the non-backtracking matrix of the network. Big data analyses reveal that the set of optimal influencers is much smaller than the one predicted by previous heuristic centralities. Remarkably, a large number of previously neglected weakly connected nodes emerges among the optimal influencers. These are topologically tagged as low-degree nodes surrounded by hierarchical coronas of hubs, and are uncovered only through the optimal collective interplay of all the influencers in the network. The present theoretical framework may hold a larger degree of universality, being applicable to other hard optimization problems exhibiting a continuous transition from a known phase.

960 citations