Distributed Graph Summarization
Xingjie Liu,Yuanyuan Tian,Qi He,Wang-Chien Lee,John McPherson +4 more
- pp 799-808
TLDR
Experimental results show that the proposed algorithms can produce good quality summaries and scale well with increasing data sizes, and this is the first work to study distributed graph summarization methods.Abstract:
Graph has been a ubiquitous and essential data representation to model real world objects and their relationships. Today, large amounts of graph data have been generated by various applications. Graph summarization techniques are crucial in uncovering useful insights about the patterns hidden in the underlying data. However, all existing works in graph summarization are single-process solutions, and as a result cannot scale to large graphs. In this paper, we introduce three distributed graph summarization algorithms to address this problem. Experimental results show that the proposed algorithms can produce good quality summaries and scale well with increasing data sizes. To the best of our knowledge, this is the first work to study distributed graph summarization methods.read more
Citations
More filters
Journal ArticleDOI
Summarizing semantic graphs: a survey
Ejla Čebirić,François Goasdoué,Haridimos Kondylakis,Dimitris Kotzinos,Ioana Manolescu,Georgia Troullinou,Mussab Zneika +6 more
TL;DR: This survey is the first to provide a comprehensive survey of summarization method for semantic RDF graphs, and proposes a taxonomy of existing works in this area, including also some closely related works developed prior to the adoption of RDF in the data management community.
Journal ArticleDOI
Efficient Distributed Density Peaks for Clustering Large Data Sets in MapReduce
Yanfeng Zhang,Shimin Chen,Ge Yu +2 more
TL;DR: This paper proposes LSH-DDP, an approximate algorithm that exploits Locality Sensitive Hashing for partitioning data, performs local computation, and aggregates local results to approximate the final results, and presents formal analysis of this algorithm.
Book ChapterDOI
RDF Digest: Efficient Summarization of RDF/S KBs
TL;DR: RDF Digest is presented, a novel platform that automatically produces summaries of RDF/S Knowledge Bases KBs and exploits the semantics and the structure of the schema and the distribution of the corresponding data/instances.
Posted Content
Survey and Taxonomy of Lossless Graph Compression and Space-Efficient Graph Representations
Maciej Besta,Torsten Hoefler +1 more
TL;DR: A survey and taxonomy on lossless graph compression can be found in this paper, where the authors exhaustively analyze this domain and present a taxonomy of existing lossless compression schemes.
Proceedings ArticleDOI
SWeG: Lossless and Lossy Summarization of Web-Scale Graphs
TL;DR: SWeG is proposed, a fast parallel algorithm for summarizing graphs with compact representations designed for not only shared-memory but also MapReduce settings to summarize graphs that are too large to fit in main memory.
References
More filters
Journal ArticleDOI
Paper: Modeling by shortest data description
TL;DR: The number of digits it takes to write down an observed sequence x1,...,xN of a time series depends on the model with its parameters that one assumes to have generated the observed data.
Proceedings ArticleDOI
Approximate nearest neighbors: towards removing the curse of dimensionality
Piotr Indyk,Rajeev Motwani +1 more
TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Proceedings ArticleDOI
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Journal ArticleDOI
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Yucheng Low,Danny Bickson,Joseph E. Gonzalez,Carlos Guestrin,Aapo Kyrola,Joseph M. Hellerstein +5 more
TL;DR: GraphLab as discussed by the authors extends the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees to reduce network congestion and mitigate the effect of network latency in the shared-memory setting.
Proceedings Article
R-MAT: A Recursive Model for Graph Mining
TL;DR: A simple, parsimonious model, the “recursive matrix” (R-MAT) model, which can quickly generate realistic graphs, capturing the essence of each graph in only a few parameters is proposed.