Sampling from large graphs

doi:10.1145/1150402.1150479

Proceedings ArticleDOI

Sampling from large graphs

- pp 631-636

TLDR

The best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

Abstract:

Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential.The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success?.We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample.In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Graph evolution: Densification and shrinking diameters

Jure Leskovec, +2 more

- 01 Mar 2007 -

ACM Transactions on Knowledge Discovery ...

TL;DR: In this paper, a new graph generator based on a forest fire spreading process was proposed, which has a simple, intuitive justification, requires very few parameters (like the flammability of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.

...read moreread less

Journal ArticleDOI

A Survey of Statistical Network Models

Anna Goldenberg, +3 more

TL;DR: In this paper, the authors provide an overview of the historical development of statistical network modeling and then introduce a number of examples that have been studied in the network literature and their subsequent discussion focuses on some prominent static and dynamic network models and their interconnections.

...read moreread less

Posted Content

Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

Fred Morstatter, +3 more

- 21 Jun 2013 -

arXiv: Social and Information Networks

TL;DR: Data collected using Twitter's sampled API service is compared with data collected using the full, albeit costly, Firehose stream that includes every single published tweet to help researchers and practitioners understand the implications of using the Streaming API.

...read moreread less

Journal Article

Opinion Leadership and Social Contagion in New Product Diffusion

Raghuram Iyengar, +2 more

- 01 Jan 2009 -

ACR North American Advances

Journal ArticleDOI

Opinion Leadership and Social Contagion in New Product Diffusion

Raghuram Iyengar, +2 more

- 01 Mar 2011 -

Marketing Science

TL;DR: There is evidence of contagion operating over network ties, even after controlling for marketing effort and arbitrary systemwide changes, and sociometric and self-reported measures of leadership are weakly correlated and associated with different kinds of adoption-related behaviors.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Collective dynamics of small-world networks

Duncan J. Watts, +1 more

- 04 Jun 1998 -

Nature

TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.

...read moreread less

Proceedings ArticleDOI

On power-law relationships of the Internet topology

Michalis Faloutsos, +2 more

TL;DR: These power-laws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period, and can be used to generate and select realistic topologies for simulation purposes.

...read moreread less

Proceedings ArticleDOI

Graphs over time: densification laws, shrinking diameters and possible explanations

Jure Leskovec, +2 more

TL;DR: A new graph generator is provided, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.

...read moreread less

Proceedings Article

R-MAT: A Recursive Model for Graph Mining

Deepayan Chakrabarti, +2 more

TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.

...read moreread less

Book ChapterDOI

Trust management for the semantic web

Matthew Richardson, +2 more

TL;DR: A web of trust is employed, in which each user maintains trusts in a small number of other users, and these trusts are composed into trust values for all other users.

...read moreread less