A Statistical Analysis of Probabilistic Counting Algorithms

doi:10.1111/J.1467-9469.2010.00727.X

Open AccessJournal ArticleDOI

A Statistical Analysis of Probabilistic Counting Algorithms

Peter Clifford, +1 more

- 01 Mar 2012 -

Scandinavian Journal of Statistics

- Vol. 39, Iss: 1, pp 1-14

Chats0

TLDR

This article applies conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections, and derives estimators of the cardinality in both cases, and shows that the maximal‐term estimator is recursively computable and has exponentially decreasing error bounds.

Abstract:

. This article considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo-random variates to form low-dimensional data sketches. We apply conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections. We derive estimators of the cardinality in both cases, and show that the maximal-term estimator is recursively computable and has exponentially decreasing error bounds. Furthermore, we show that the estimators have comparable asymptotic efficiency, and explain this result by demonstrating an unexpected connection between the two approaches.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm

Stefan Heule, +2 more

TL;DR: A series of improvements to the HyperLogLog algorithm are presented, that reduce its memory requirements and significantly increase its accuracy for an important range of cardinalities.

...read moreread less

Journal ArticleDOI

Unknown Sparsity in Compressed Sensing: Denoising and Inference

Miles E. Lopes

- 01 Sep 2016 -

IEEE Transactions on Information Theory

TL;DR: In this paper, a new deconvolution-based method for estimating unknown sparsity was proposed, which has wider applicability and sharper theoretical guarantees than the original sparsity measure.

...read moreread less

An improved data stream summary: The Count-Min Sketch and its applications

Graham Cormode, +1 more

TL;DR: In this paper, the authors introduce a sublinear space data structure called the countmin sketch for summarizing data streams, which allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc.

...read moreread less

Journal ArticleDOI

Unknown sparsity in compressed sensing: Denoising and inference

Miles E. Lopes

- 25 Jul 2015 -

arXiv: Information Theory

TL;DR: A new deconvolution-based method for estimating unknown sparsity, which has wider applicability and sharper theoretical guarantees is offered, which introduces a family of entropy-based sparsity measures parameterized by q ϵ [0, ∞].

...read moreread less

Proceedings ArticleDOI

Distributed size estimation of dynamic anonymous networks

Håkan Terelius, +2 more

TL;DR: This work derives an explicit estimation scheme for a particular peer-to-peer service network, starting from its statistical model, and considers quadratic regularization terms since they lead to closed-form solutions and intuitive design laws.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

An introduction to probability theory and its applications

William Feller

Book

The Art of Computer Programming

Donald Ervin Knuth

TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.

...read moreread less

Book

Numerical Recipes in C: The Art of Scientific Computing

William H. Press, +3 more

TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.

...read moreread less

Book

Linear statistical inference and its applications

Calyampudi Radhakrishna Rao

TL;DR: Algebra of Vectors and Matrices, Probability Theory, Tools and Techniques, and Continuous Probability Models.

...read moreread less

Journal ArticleDOI

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

Herman Chernoff

- 01 Dec 1952 -

Annals of Mathematical Statistics

TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.

...read moreread less