scispace - formally typeset
Open AccessProceedings ArticleDOI

Distributed Graph Summarization

TLDR
Experimental results show that the proposed algorithms can produce good quality summaries and scale well with increasing data sizes, and this is the first work to study distributed graph summarization methods.
Abstract
Graph has been a ubiquitous and essential data representation to model real world objects and their relationships. Today, large amounts of graph data have been generated by various applications. Graph summarization techniques are crucial in uncovering useful insights about the patterns hidden in the underlying data. However, all existing works in graph summarization are single-process solutions, and as a result cannot scale to large graphs. In this paper, we introduce three distributed graph summarization algorithms to address this problem. Experimental results show that the proposed algorithms can produce good quality summaries and scale well with increasing data sizes. To the best of our knowledge, this is the first work to study distributed graph summarization methods.

read more

Citations
More filters
Journal ArticleDOI

Summarizing semantic graphs: a survey

TL;DR: This survey is the first to provide a comprehensive survey of summarization method for semantic RDF graphs, and proposes a taxonomy of existing works in this area, including also some closely related works developed prior to the adoption of RDF in the data management community.
Journal ArticleDOI

Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce

TL;DR: This paper proposes LSH-DDP, an approximate algorithm that exploits Locality Sensitive Hashing for partitioning data, performs local computation, and aggregates local results to approximate the final results, and presents formal analysis of this algorithm.
Book ChapterDOI

RDF Digest: Efficient Summarization of RDF/S KBs

TL;DR: RDF Digest is presented, a novel platform that automatically produces summaries of RDF/S Knowledge Bases KBs and exploits the semantics and the structure of the schema and the distribution of the corresponding data/instances.
Posted Content

Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations

TL;DR: A survey and taxonomy on lossless graph compression can be found in this paper, where the authors exhaustively analyze this domain and present a taxonomy of existing lossless compression schemes.
Proceedings ArticleDOI

SWeG: Lossless and Lossy Summarization of Web-Scale Graphs

TL;DR: SWeG is proposed, a fast parallel algorithm for summarizing graphs with compact representations designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory.
References
More filters
Journal ArticleDOI

Paper: Modeling by shortest data description

Jorma Rissanen
- 01 Sep 1978 - 
TL;DR: The number of digits it takes to write down an observed sequence x1,...,xN of a time series depends on the model with its parameters that one assumes to have generated the observed data.
Proceedings ArticleDOI

Approximate nearest neighbors: towards removing the curse of dimensionality

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Proceedings ArticleDOI

Pregel: a system for large-scale graph processing

TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Journal ArticleDOI

Distributed GraphLab: a framework for machine learning and data mining in the cloud

TL;DR: GraphLab as discussed by the authors extends the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees to reduce network congestion and mitigate the effect of network latency in the shared-memory setting.
Proceedings Article

R-MAT: A Recursive Model for Graph Mining

TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.