scispace - formally typeset
Open AccessProceedings ArticleDOI

Fast and Accurate Graph Stream Summarization

Reads0
Chats0
TLDR
Wang et al. as mentioned in this paper proposed Graph Stream Sketch (GSS) to summarize the graph streams, which has linear space cost O(|E|) (E is the edge set of the graph) and constant update time cost (O(1)) and supports most kinds of queries over graph streams with the controllable errors.
Abstract
A graph stream is a continuous sequence of data items, in which each item indicates an edge, including its two endpoints and edge weight. It forms a dynamic graph that changes with every item. Graph streams play important roles in cyber security, social networks, cloud troubleshooting systems and more. Due to the vast volume and high update speed of graph streams, traditional data structures for graph storage such as the adjacency matrix and the adjacency list are no longer sufficient. However, prior art of graph stream summarization, like CM sketches, gSketches, TCM and gMatrix, either supports limited kinds of queries or suffers from poor accuracy of query results. In this paper, we propose a novel Graph Stream Sketch (GSS for short) to summarize the graph streams, which has linear space cost O(|E|) (E is the edge set of the graph) and constant update time cost (O(1)) and supports most kinds of queries over graph streams with the controllable errors. Both theoretical analysis and experiment results confirm the superiority of our solution with regard to the time/space complexity and query results’ precision compared with the state-of-the-art.

read more

Citations
More filters

An improved data stream summary: The Count-Min Sketch and its applications

TL;DR: In this paper, the authors introduce a sublinear space data structure called the countmin sketch for summarizing data streams, which allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc.
Posted ContentDOI

Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism.

TL;DR: This work provides the first analysis and taxonomy of dynamic and streaming graph processing, focusing on identifying the fundamental system designs and on understanding their support for concurrency and parallelism, and for different graph updates as well as analytics workloads.
Posted Content

Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems

TL;DR: This work provides the first analysis and taxonomy of dynamic and streaming graph processing, focusing on identifying the fundamental system designs and on understanding their support for concurrency, and for different graph updates as well as analytics workloads.
Proceedings ArticleDOI

Incremental Lossless Graph Summarization

TL;DR: MoSSo as discussed by the authors is the first incremental algorithm for lossless summarization of fully dynamic graphs, which updates the output representation by repeatedly moving nodes among supernodes and edges.
Proceedings ArticleDOI

Incremental Lossless Graph Summarization

TL;DR: MoSSo is proposed, the first incremental algorithm for lossless summarization of fully dynamic graphs, and is shown to be Fast and 'any time': processing each change in near-constant time, up to 7 orders of magnitude faster than running state-of-the-art batch methods.
References
More filters
Proceedings ArticleDOI

Pregel: a system for large-scale graph processing

TL;DR: A model for processing large graphs that has been designed for efficient, scalable and fault-tolerant implementation on clusters of thousands of commodity computers, and its implied synchronicity makes reasoning about programs easier.
Proceedings ArticleDOI

PowerGraph: distributed graph-parallel computation on natural graphs

TL;DR: This paper describes the challenges of computation on natural graphs in the context of existing graph-parallel abstractions and introduces the PowerGraph abstraction which exploits the internal structure of graph programs to address these challenges.
Journal ArticleDOI

Distributed GraphLab: a framework for machine learning and data mining in the cloud

TL;DR: GraphLab as discussed by the authors extends the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees to reduce network congestion and mitigate the effect of network latency in the shared-memory setting.
Proceedings ArticleDOI

GraphX: graph processing in a distributed dataflow framework

TL;DR: This paper introduces GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system and demonstrates that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.
Journal ArticleDOI

New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice

TL;DR: Two novel and scalable algorithms for identifying the large flows are proposed: sample and hold and multistage filters, which take a constant number of memory references per packet and use a small amount of memory, and a new form of accounting called threshold accounting in which only flows above a threshold are charged by usage while the rest are charged a fixed fee.