Fast and Accurate Graph Stream Summarization
Xiangyang Gou,Lei Zou,Chenxingyu Zhao,Tong Yang +3 more
- pp 1118-1129
Reads0
Chats0
TLDR
Wang et al. as mentioned in this paper proposed Graph Stream Sketch (GSS) to summarize the graph streams, which has linear space cost O(|E|) (E is the edge set of the graph) and constant update time cost (O(1)) and supports most kinds of queries over graph streams with the controllable errors.Abstract:
A graph stream is a continuous sequence of data items, in which each item indicates an edge, including its two endpoints and edge weight. It forms a dynamic graph that changes with every item. Graph streams play important roles in cyber security, social networks, cloud troubleshooting systems and more. Due to the vast volume and high update speed of graph streams, traditional data structures for graph storage such as the adjacency matrix and the adjacency list are no longer sufficient. However, prior art of graph stream summarization, like CM sketches, gSketches, TCM and gMatrix, either supports limited kinds of queries or suffers from poor accuracy of query results. In this paper, we propose a novel Graph Stream Sketch (GSS for short) to summarize the graph streams, which has linear space cost O(|E|) (E is the edge set of the graph) and constant update time cost (O(1)) and supports most kinds of queries over graph streams with the controllable errors. Both theoretical analysis and experiment results confirm the superiority of our solution with regard to the time/space complexity and query results’ precision compared with the state-of-the-art.read more
Citations
More filters
An improved data stream summary: The Count-Min Sketch and its applications
Graham Cormode,S. Muthukrishnan +1 more
TL;DR: In this paper, the authors introduce a sublinear space data structure called the countmin sketch for summarizing data streams, which allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc.
Posted ContentDOI
Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism.
TL;DR: This work provides the first analysis and taxonomy of dynamic and streaming graph processing, focusing on identifying the fundamental system designs and on understanding their support for concurrency and parallelism, and for different graph updates as well as analytics workloads.
Posted Content
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems
TL;DR: This work provides the first analysis and taxonomy of dynamic and streaming graph processing, focusing on identifying the fundamental system designs and on understanding their support for concurrency, and for different graph updates as well as analytics workloads.
Proceedings ArticleDOI
Incremental Lossless Graph Summarization
TL;DR: MoSSo as discussed by the authors is the first incremental algorithm for lossless summarization of fully dynamic graphs, which updates the output representation by repeatedly moving nodes among supernodes and edges.
Proceedings ArticleDOI
Incremental Lossless Graph Summarization
TL;DR: MoSSo is proposed, the first incremental algorithm for lossless summarization of fully dynamic graphs, and is shown to be Fast and 'any time': processing each change in near-constant time, up to 7 orders of magnitude faster than running state-of-the-art batch methods.
References
More filters
Proceedings ArticleDOI
Pregel: a system for large-scale graph processing
Grzegorz Malewicz,Matthew H. Austern,Aart J. C. Bik,James C. Dehnert,Ilan Horn,Naty Leiser,Grzegorz Czajkowski +6 more
TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Proceedings ArticleDOI
PowerGraph: distributed graph-parallel computation on natural graphs
TL;DR: This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges.
Journal ArticleDOI
Distributed GraphLab: a framework for machine learning and data mining in the cloud
Yucheng Low,Danny Bickson,Joseph E. Gonzalez,Carlos Guestrin,Aapo Kyrola,Joseph M. Hellerstein +5 more
TL;DR: GraphLab as discussed by the authors extends the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees to reduce network congestion and mitigate the effect of network latency in the shared-memory setting.
Proceedings ArticleDOI
GraphX: graph processing in a distributed dataflow framework
TL;DR: This paper introduces GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system and demonstrates that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.
Journal ArticleDOI
New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice
Cristian Estan,George Varghese +1 more
TL;DR: Two novel and scalable algorithms for identifying the large flows are proposed: sample and hold and multistage filters, which take a constant number of memory references per packet and use a small amount of memory, and a new form of accounting called threshold accounting in which only flows above a threshold are charged by usage while the rest are charged a fixed fee.