scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Estimating and understanding exponential random graph models

01 Oct 2013-Annals of Statistics (Institute of Mathematical Statistics)-Vol. 41, Iss: 5, pp 2428-2461
TL;DR: In this paper, the authors introduce a method for the theoretical analysis of exponential random graph models based on a large deviation approximation to the normalizing constant shown to be consistent using theory developed by Chatterjee and Varadhan [European J. Combin. 32 (2011) 1000-1017].
Abstract: We introduce a method for the theoretical analysis of exponential random graph models. The method is based on a large-deviations approximation to the normalizing constant shown to be consistent using theory developed by Chatterjee and Varadhan [European J. Combin. 32 (2011) 1000–1017]. The theory explains a host of difficulties encountered by applied workers: many distinct models have essentially the same MLE, rendering the problems “practically” ill-posed. We give the first rigorous proofs of “degeneracy” observed in these models. Here, almost all graphs have essentially no edges or are essentially complete. We supplement recent work of Bhamidi, Bresler and Sly [2008 IEEE 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS) (2008) 803–812 IEEE] showing that for many models, the extra sufficient statistics are useless: most realizations look like the results of a simple Erdős–Renyi model. We also find classes of models where the limiting graphs differ from Erdős–Renyi graphs. A limitation of our approach, inherited from the limitation of graph limit theory, is that it works only for dense graphs.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has "a little bit of structure" and achieves the minimax error rate up to a constant factor.
Abstract: Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candes and collaborators. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has "a little bit of structure." Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to solve problems related to low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, graphon estimation and generalized Bradley--Terry models for pairwise comparison.

405 citations

Journal ArticleDOI
TL;DR: The Universal Singular Value Thresholding (USVT) estimator as discussed by the authors achieves the minimax error rate up to a constant factor for any matrix that has a little bit of structure.
Abstract: Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candes and collaborators. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has “a little bit of structure.” Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to solve problems related to low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, graphon estimation and generalized Bradley–Terry models for pairwise comparison.

346 citations

Journal ArticleDOI
08 Jan 2019
TL;DR: This Review describes advances in the statistical physics of complex networks and provides a reference for the state of the art in theoretical network modelling and applications to real-world systems for pattern detection and network reconstruction.
Abstract: In the past 15 years, statistical physics has been successful as a framework for modelling complex networks. On the theoretical side, this approach has unveiled a variety of physical phenomena, such as the emergence of mixed distributions and ensemble non-equivalence, that are observed in heterogeneous networks but not in homogeneous systems. At the same time, thanks to the deep connection between the principle of maximum entropy and information theory, statistical physics has led to the definition of null models for networks that reproduce features of real-world systems but that are otherwise as random as possible. We review here the statistical physics approach and the null models for complex networks, focusing in particular on analytical frameworks that reproduce local network features. We show how these models have been used to detect statistically significant structural patterns in real-world networks and to reconstruct the network structure in cases of incomplete information. We further survey the statistical physics models that reproduce more complex, semilocal network features using Markov chain Monte Carlo sampling, as well as models of generalized network structures, such as multiplex networks, interacting networks and simplicial complexes. This Review describes advances in the statistical physics of complex networks and provides a reference for the state of the art in theoretical network modelling and applications to real-world systems for pattern detection and network reconstruction.

249 citations

Journal ArticleDOI
TL;DR: It is shown that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power.
Abstract: The growing availability of network data and of scientific interest in distributed systems has led to the rapid development of statistical models of network structure. Typically, however, these are models for the entire network, while the data consists only of a sampled sub-network. Parameters for the whole network, which is what is of interest, are estimated by applying the model to the sub-network. This assumes that the model is consistent under sampling, or, in terms of the theory of stochastic processes, that it defines a projective family. Focusing on the popular class of exponential random graph models (ERGMs), we show that this apparently trivial condition is in fact violated by many popular and scientifically appealing models, and that satisfying it drastically limits ERGM's expressive power. These results are actually special cases of more general results about exponential families of dependent random variables, which we also prove. Using such results, we offer easily checked conditions for the consistency of maximum likelihood estimation in ERGMs, and discuss some possible constructive responses.

215 citations

Proceedings ArticleDOI
13 May 2013
TL;DR: This work finds that the space of subgraph frequencies is governed both by its combinatorial properties --- based on extremal results that constrain all graphs --- as well as by its empirical properties, manifested in the way that real social graphs appear to lie near a simple one-dimensional curve through this space.
Abstract: A growing set of on-line applications are generating data that can be viewed as very large collections of small, dense social graphs --- these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods in large social networks A natural question is how to usefully define a domain-independent 'coordinate system' for such a collection of graphs, so that the set of possible structures can be compactly represented and understood within a common space In this work, we draw on the theory of graph homomorphisms to formulate and analyze such a representation, based on computing the frequencies of small induced subgraphs within each graph We find that the space of subgraph frequencies is governed both by its combinatorial properties --- based on extremal results that constrain all graphs --- as well as by its empirical properties --- manifested in the way that real social graphs appear to lie near a simple one-dimensional curve through this space We develop flexible frameworks for studying each of these aspects For capturing empirical properties, we characterize a simple stochastic generative model, a single-parameter extension of Erdos-Renyi random graphs, whose stationary distribution over subgraphs closely tracks the one-dimensional concentration of the real social graph families For the extremal properties, we develop a tractable linear program for bounding the feasible space of subgraph frequencies by harnessing a toolkit of known extremal graph theory Together, these two complementary frameworks shed light on a fundamental question pertaining to social graphs: what properties of social graphs are 'social' properties and what properties are 'graph' properties? We conclude with a brief demonstration of how the coordinate system we examine can also be used to perform classification tasks, distinguishing between structures arising from different types of social graphs

186 citations

References
More filters
Book
25 Nov 1994
TL;DR: This paper presents mathematical representation of social networks in the social and behavioral sciences through the lens of Dyadic and Triadic Interaction Models, which describes the relationships between actor and group measures and the structure of networks.
Abstract: Part I. Introduction: Networks, Relations, and Structure: 1. Relations and networks in the social and behavioral sciences 2. Social network data: collection and application Part II. Mathematical Representations of Social Networks: 3. Notation 4. Graphs and matrixes Part III. Structural and Locational Properties: 5. Centrality, prestige, and related actor and group measures 6. Structural balance, clusterability, and transitivity 7. Cohesive subgroups 8. Affiliations, co-memberships, and overlapping subgroups Part IV. Roles and Positions: 9. Structural equivalence 10. Blockmodels 11. Relational algebras 12. Network positions and roles Part V. Dyadic and Triadic Methods: 13. Dyads 14. Triads Part VI. Statistical Dyadic Interaction Models: 15. Statistical analysis of single relational networks 16. Stochastic blockmodels and goodness-of-fit indices Part VII. Epilogue: 17. Future directions.

17,104 citations

Journal ArticleDOI
TL;DR: In this article, a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability is presented.
Abstract: Let M(x) denote the expected value at level x of the response to a certain experiment. M(x) is assumed to be a monotone function of x but is unknown to the experimenter, and it is desired to find the solution x = θ of the equation M(x) = α, where a is a given constant. We give a method for making successive experiments at levels x1, x2, ··· in such a way that xn will tend to θ in probability.

9,312 citations

Book
13 Mar 2000

2,591 citations

Journal ArticleDOI
TL;DR: In this article, a fixed system of n sites, labelled by the first n positive integers, and an associated vector x of observations, Xi,..., Xn, which, in turn, is assumed to be a realization of a vector X of (dependent) random variables, Xi,.., Xn, X.. In practice the sites may represent points or regions in space and the random variables may be either continuous or discrete.
Abstract: In rather formal terms, the situation with which this paper is concerned may be described as follows. We are given a fixed system of n sites, labelled by the first n positive integers, and an associated vector x of observations, Xi, . . ., Xn, which, in turn, is presumed to be a realization of a vector X of (dependent) random variables, Xi, . . ., X.. In practice, the sites may represent points or regions in space and the random variables may be either continuous or discrete. The main statistical objectives are the following: firstly, to provide a means of using the available concomitant information, particularly the configuration of the sites, to attach a plausible probability distribution to the random vector X; secondly, to estimate any unknown parameters in the distribution from the realization x; thirdly, where possible, to quantify the extent of disagreement between hypothesis and observation.

1,716 citations

Journal ArticleDOI
TL;DR: An exponential family of distributions that can be used for analyzing directed graph data is described, and several special cases are discussed along with some possible substantive interpretations.
Abstract: Directed graph (or digraph) data arise in many fields, especially in contemporary research on structures of social relationships. We describe an exponential family of distributions that can be used for analyzing such data. A substantive rationale for the general model is presented, and several special cases are discussed along with some possible substantive interpretations. A computational algorithm based on iterative scaling procedures for use in fitting data is described, as are the results of a pilot simulation study. An example using previously reported empirical data is worked out in detail. An extension to multiple relationship data is discussed briefly.

1,238 citations

Trending Questions (1)
What are the advantages and disadvantages of exponential random graph models?

Advantages: Provides a theoretical analysis method. Disadvantages: Works only for dense graphs.