Proceedings ArticleDOI
Sampling from large graphs
Jure Leskovec,Christos Faloutsos +1 more
- pp 631-636
TLDR
The best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.Abstract:
Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential.The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success?.We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample.In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.read more
Citations
More filters
Journal ArticleDOI
Graph evolution: Densification and shrinking diameters
TL;DR: In this paper, a new graph generator based on a forest fire spreading process was proposed, which has a simple, intuitive justification, requires very few parameters (like the flammability of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Journal ArticleDOI
A Survey of Statistical Network Models
TL;DR: In this paper, the authors provide an overview of the historical development of statistical network modeling and then introduce a number of examples that have been studied in the network literature and their subsequent discussion focuses on some prominent static and dynamic network models and their interconnections.
Posted Content
Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose
TL;DR: Data collected using Twitter's sampled API service is compared with data collected using the full, albeit costly, Firehose stream that includes every single published tweet to help researchers and practitioners understand the implications of using the Streaming API.
Journal Article
Opinion Leadership and Social Contagion in New Product Diffusion
Journal ArticleDOI
Opinion Leadership and Social Contagion in New Product Diffusion
TL;DR: There is evidence of contagion operating over network ties, even after controlling for marketing effort and arbitrary systemwide changes, and sociometric and self-reported measures of leadership are weakly correlated and associated with different kinds of adoption-related behaviors.
References
More filters
Journal ArticleDOI
Collective dynamics of small-world networks
TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Proceedings ArticleDOI
On power-law relationships of the Internet topology
TL;DR: These power-laws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period, and can be used to generate and select realistic topologies for simulation purposes.
Proceedings ArticleDOI
Graphs over time: densification laws, shrinking diameters and possible explanations
TL;DR: A new graph generator is provided, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Proceedings Article
R-MAT: A Recursive Model for Graph Mining
TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.
Book ChapterDOI
Trust management for the semantic web
TL;DR: A web of trust is employed, in which each user maintains trusts in a small number of other users, and these trusts are composed into trust values for all other users.