scispace - formally typeset
Open AccessProceedings ArticleDOI

Data streaming algorithms for efficient and accurate estimation of flow size distribution

Reads0
Chats0
TLDR
A novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM, which not only dramatically improves the accuracy offlow distribution measurement, but also contributes to the field of data streaming.
Abstract
Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic demands, detect traffic anomalies, and accommodate new traffic demands through better traffic engineering. Previous work on estimating the flow size distribution has been focused on making inferences from sampled network traffic. Its accuracy is limited by the (typically) low sampling rate required to make the sampling operation affordable. In this paper we present a novel data streaming algorithm to provide much more accurate estimates of flow distribution, using a "lossy data structure" which consists of an array of counters fitted well into SRAM. For each incoming packet, our algorithm only needs to increment one underlying counter, making the algorithm fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of multiple flows may collide into the same counter. Our algorithm uses Bayesian statistical methods such as Expectation Maximization to infer the most likely flow size distribution that results in the observed counter values after collision. Evaluations of this algorithm on large Internet traces obtained from several sources (including a tier-1 ISP) demonstrate that it has very high measurement accuracy (within 2%). Our algorithm not only dramatically improves the accuracy of flow distribution measurement, but also contributes to the field of data streaming by formalizing an existing methodology and applying it to the context of estimating the flow-distribution.

read more

Citations
More filters
Journal ArticleDOI

A Survey on Software-Defined Networking

TL;DR: A generally accepted definition for SDN is presented, including decoupling the control plane from the data plane and providing programmability for network application development, and its three-layer architecture is dwelled on, including an infrastructure layer, a control layer, and an application layer.
Proceedings Article

Software defined traffic measurement with OpenSketch

TL;DR: This work proposes a software defined traffic measurement architecture OpenSketch, which separates the measurement data plane from the control plane and provides a measurement library that automatically configures the pipeline and allocates resources for different measurement tasks.
Journal ArticleDOI

A roadmap for traffic engineering in SDN-OpenFlow networks

TL;DR: This paper surveys the state-of-the-art in traffic engineering for SDNs, and mainly focuses on four thrusts including flow management, fault tolerance, topology update, and traffic analysis/characterization.
Proceedings ArticleDOI

One Sketch to Rule Them All: Rethinking Network Flow Monitoring with UnivMon

TL;DR: UnivMon is presented, a framework for flow monitoring which leverages recent theoretical advances and demonstrates that it is possible to achieve both generality and high accuracy, and evaluated using a range of trace-driven evaluations to show that it offers comparable (and sometimes better) accuracy relative to custom sketching solutions.
Proceedings ArticleDOI

Elastic sketch: adaptive and fast network-wide measurements

TL;DR: The Elastic sketch is proposed, which is adaptive to currently traffic characteristics, generic to measurement tasks and platforms, and implemented on six platforms to process typical measurement tasks.
References
More filters
Journal ArticleDOI

Space/time trade-offs in hash coding with allowable errors

TL;DR: Analysis of the paradigm problem demonstrates that allowing a small number of test messages to be falsely identified as members of the given set will permit a much smaller hash area to be used without increasing reject time.
Journal ArticleDOI

Summary cache: a scalable wide-area web cache sharing protocol

TL;DR: This paper demonstrates the benefits of cache sharing, measures the overhead of the existing protocols, and proposes a new protocol called "summary cache", which reduces the number of intercache protocol messages, reduces the bandwidth consumption, and eliminates 30% to 95% of the protocol CPU overhead, all while maintaining almost the same cache hit ratios as ICP.
Journal ArticleDOI

Data streams: algorithms and applications

TL;DR: Data Streams: Algorithms and Applications surveys the emerging area of algorithms for processing data streams and associated applications, which rely on metric embeddings, pseudo-random computations, sparse approximation theory and communication complexity.
Book

Data Streams: Algorithms and Applications

TL;DR: In this paper, the authors present a survey of basic mathematical foundations for data streaming systems, including basic mathematical ideas, basic algorithms, and basic algorithms and algorithms for data stream processing.
Related Papers (5)