scispace - formally typeset
Proceedings ArticleDOI

Sampling from large graphs

TLDR
The best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.
Abstract
Given a huge real graph, how can we derive a representative sample? There are many known algorithms to compute interesting measures (shortest paths, centrality, betweenness, etc.), but several of them become impractical for large graphs. Thus graph sampling is essential.The natural questions to ask are (a) which sampling method to use, (b) how small can the sample size be, and (c) how to scale up the measurements of the sample (e.g., the diameter), to get estimates for the large graph. The deeper, underlying question is subtle: how do we measure success?.We answer the above questions, and test our answers by thorough experiments on several, diverse datasets, spanning thousands nodes and edges. We consider several sampling methods, propose novel methods to check the goodness of sampling, and develop a set of scaling laws that describe relations between the properties of the original and the sample.In addition to the theoretical contributions, the practical conclusions from our work are: Sampling strategies based on edge selection do not perform well; simple uniform random node selection performs surprisingly well. Overall, best performing methods are the ones based on random-walks and "forest fire"; they match very accurately both static as well as evolutionary graph patterns, with sample sizes down to about 15% of the original graph.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Graph evolution: Densification and shrinking diameters

TL;DR: In this paper, a new graph generator based on a forest fire spreading process was proposed, which has a simple, intuitive justification, requires very few parameters (like the flammability of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Journal ArticleDOI

A Survey of Statistical Network Models

TL;DR: In this paper, the authors provide an overview of the historical development of statistical network modeling and then introduce a number of examples that have been studied in the network literature and their subsequent discussion focuses on some prominent static and dynamic network models and their interconnections.
Posted Content

Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose

TL;DR: Data collected using Twitter's sampled API service is compared with data collected using the full, albeit costly, Firehose stream that includes every single published tweet to help researchers and practitioners understand the implications of using the Streaming API.
Journal ArticleDOI

Opinion Leadership and Social Contagion in New Product Diffusion

TL;DR: There is evidence of contagion operating over network ties, even after controlling for marketing effort and arbitrary systemwide changes, and sociometric and self-reported measures of leadership are weakly correlated and associated with different kinds of adoption-related behaviors.
References
More filters
Journal ArticleDOI

Collective dynamics of small-world networks

TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Proceedings ArticleDOI

On power-law relationships of the Internet topology

TL;DR: These power-laws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period, and can be used to generate and select realistic topologies for simulation purposes.
Proceedings ArticleDOI

Graphs over time: densification laws, shrinking diameters and possible explanations

TL;DR: A new graph generator is provided, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Proceedings Article

R-MAT: A Recursive Model for Graph Mining

TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.
Book ChapterDOI

Trust management for the semantic web

TL;DR: A web of trust is employed, in which each user maintains trusts in a small number of other users, and these trusts are composed into trust values for all other users.
Related Papers (5)