scispace - formally typeset
Search or ask a question
Author

Allan Sly

Bio: Allan Sly is an academic researcher from Princeton University. The author has contributed to research in topics: Random graph & Ising model. The author has an hindex of 40, co-authored 175 publications receiving 6064 citations. Previous affiliations of Allan Sly include University of California, Berkeley & University of California.


Papers
More filters
Journal ArticleDOI
TL;DR: A way of encoding sparse data using a “nonbacktracking” matrix, and it is shown that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model.
Abstract: Spectral algorithms are classic approaches to clustering and community detection in networks. However, for sparse networks the standard versions of these algorithms are suboptimal, in some cases completely failing to detect communities even when other algorithms such as belief propagation can do so. Here, we present a class of spectral algorithms based on a nonbacktracking walk on the directed edges of the graph. The spectrum of this operator is much better-behaved than that of the adjacency matrix or other commonly used matrices, maintaining a strong separation between the bulk eigenvalues and the eigenvalues relevant to community structure even in the sparse case. We show that our algorithm is optimal for graphs generated by the stochastic block model, detecting communities all of the way down to the theoretical limit. We also show the spectrum of the nonbacktracking operator for some real-world networks, illustrating its advantages over traditional spectral clustering.

702 citations

Journal ArticleDOI
TL;DR: This work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem and provides a simple and efficient algorithm for estimating a and b when clustering is possible.
Abstract: The planted partition model (also known as the stochastic blockmodel) is a classical cluster-exhibiting random graph model that has been extensively studied in statistics, physics, and computer science. In its simplest form, the planted partition model is a model for random graphs on $$n$$ nodes with two equal-sized clusters, with an between-class edge probability of $$q$$ and a within-class edge probability of $$p$$ . Although most of the literature on this model has focused on the case of increasing degrees (ie. $$pn, qn \rightarrow \infty $$ as $$n \rightarrow \infty $$ ), the sparse case $$p, q = O(1/n)$$ is interesting both from a mathematical and an applied point of view. A striking conjecture of Decelle, Krzkala, Moore and Zdeborova based on deep, non-rigorous ideas from statistical physics gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if $$p = a/n$$ and $$q = b/n$$ , then Decelle et al. conjectured that it is possible to cluster in a way correlated with the true partition if $$(a - b)^2 > 2(a + b)$$ , and impossible if $$(a - b)^2 < 2(a + b)$$ . By comparison, the best-known rigorous result is that of Coja-Oghlan, who showed that clustering is possible if $$(a - b)^2 > C (a + b)$$ for some sufficiently large $$C$$ . We prove half of their prediction, showing that it is indeed impossible to cluster if $$(a - b)^2 < 2(a + b)$$ . Furthermore we show that it is impossible even to estimate the model parameters from the graph when $$(a - b)^2 < 2(a + b)$$ ; on the other hand, we provide a simple and efficient algorithm for estimating $$a$$ and $$b$$ when $$(a - b)^2 > 2(a + b)$$ . Following Decelle et al, our work establishes a rigorous connection between the clustering problem, spin-glass models on the Bethe lattice and the so called reconstruction problem. This connection points to fascinating applications and open problems.

377 citations

Journal ArticleDOI
TL;DR: In this paper, it was shown that the degree sequence of a regular graph has graph limits in the sense of Lovasz and Szegedy with identifiable limits, and a fast, provably convergent algorithm for the maximum likelihood estimate (MLE) was derived.
Abstract: Large graphs are sometimes studied through their degree sequences (power law or regular graphs). We study graphs that are uniformly chosen with a given degree sequence. Under mild conditions, it is shown that sequences of such graphs have graph limits in the sense of Lovasz and Szegedy with identifiable limits. This allows simple determination of other features such as the number of triangles. The argument proceeds by studying a natural exponential model having the degree sequence as a sufficient statistic. The maximum likelihood estimate (MLE) of the parameters is shown to be unique and consistent with high probability. Thus $n$ parameters can be consistently estimated based on a sample of size one. A fast, provably convergent, algorithm for the MLE is derived. These ingredients combine to prove the graph limit theorem. Along the way, a continuous version of the Erdős--Gallai characterization of degree sequences is derived.

259 citations

Journal ArticleDOI
TL;DR: In this article, it was shown that it is information theoretically impossible to cluster if s2 ≤ d and moreover it is even impossible to even estimate the model parameters from the graph when s2 d.
Abstract: We study a random graph model called the “stochastic block model” in statistics and the “planted partition model” in theoretical computer science. In its simplest form, this is a random graph with two equal-sized classes of vertices, with a within-class edge probability of q and a between-class edge probability of q′. A striking conjecture of Decelle, Krzkala, Moore and Zdeborova [9], based on deep, non-rigorous ideas from statistical physics, gave a precise prediction for the algorithmic threshold of clustering in the sparse planted partition model. In particular, if q=a/n and q′=b/n, s=(a−b)/2 and d=(a+b)/2, then Decelle et al. conjectured that it is possible to efficiently cluster in a way correlated with the true partition if s2>d and impossible if s2 Cdlnd for sufficiently large C. In a previous work, we proved that indeed it is information theoretically impossible to cluster if s2 ≤ d and moreover that it is information theoretically impossible to even estimate the model parameters from the graph when s2 d. A different independent proof of the same result was recently obtained by Massoulie [20].

252 citations

Proceedings ArticleDOI
Allan Sly1
23 Oct 2010
TL;DR: In this paper, it was shown that unless NP$=$RP there is no polynomial time approximation scheme for the partition function (the sum of such weighted independent sets) on graphs of maximum degree $d$ for fugacity parameter $\lambda_c(d) 0.
Abstract: The hardcore model is a model of lattice gas systems which has received much attention in statistical physics, probability theory and theoretical computer science. It is the probability distribution over independent sets $I$ of a graph weighted proportionally to $\lambda^{|I|}$ with fugacity parameter $\lambda$. We prove that at the uniqueness threshold of the hardcore model on the $d$-regular tree, approximating the partition function becomes computationally hard on graphs of maximum degree $d$. Specifically, we show that unless NP$=$RP there is no polynomial time approximation scheme for the partition function (the sum of such weighted independent sets) on graphs of maximum degree $d$ for fugacity $\lambda_c(d) 0$. Weitz produced an FPTAS for approximating the partition function when $0

241 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A thorough exposition of community structure, or clustering, is attempted, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists.
Abstract: The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such clusters, or communities, can be considered as fairly independent compartments of a graph, playing a similar role like, e. g., the tissues or the organs in the human body. Detecting communities is of great importance in sociology, biology and computer science, disciplines where systems are often represented as graphs. This problem is very hard and not yet satisfactorily solved, despite the huge effort of a large interdisciplinary community of scientists working on it over the past few years. We will attempt a thorough exposition of the topic, from the definition of the main elements of the problem, to the presentation of most methods developed, with a special focus on techniques designed by statistical physicists, from the discussion of crucial issues like the significance of clustering and how methods should be tested and compared against each other, to the description of applications to real networks.

9,057 citations

Proceedings ArticleDOI
22 Jan 2006
TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.
Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

7,116 citations

Book
01 Dec 2008
TL;DR: Markov Chains and Mixing Times as mentioned in this paper is an introduction to the modern approach to the theory of Markov chains and its application in the field of probability theory and linear algebra, where the main goal is to determine the rate of convergence of a Markov chain to the stationary distribution.
Abstract: This book is an introduction to the modern approach to the theory of Markov chains. The main goal of this approach is to determine the rate of convergence of a Markov chain to the stationary distribution as a function of the size and geometry of the state space. The authors develop the key tools for estimating convergence times, including coupling, strong stationary times, and spectral methods. Whenever possible, probabilistic methods are emphasized. The book includes many examples and provides brief introductions to some central models of statistical mechanics. Also provided are accounts of random walks on networks, including hitting and cover times, and analyses of several methods of shuffling cards. As a prerequisite, the authors assume a modest understanding of probability theory and linear algebra at an undergraduate level. ""Markov Chains and Mixing Times"" is meant to bring the excitement of this active area of research to a wide audience.

2,573 citations

Journal ArticleDOI
TL;DR: Baxter has inherited the mantle of Onsager who started the process by solving exactly the two-dimensional Ising model in 1944 as mentioned in this paper, and there has been a growing belief that all the twodimensional lattice statistical models will eventually be solved and that it will be Professor Baxter who solves them.
Abstract: R J Baxter 1982 London: Academic xii + 486 pp price £43.60 Over the past few years there has been a growing belief that all the twodimensional lattice statistical models will eventually be solved and that it will be Professor Baxter who solves them. Baxter has inherited the mantle of Onsager who started the process by solving exactly the two-dimensional Ising model in 1944.

1,658 citations