scispace - formally typeset
Open AccessJournal ArticleDOI

A Statistical Analysis of Probabilistic Counting Algorithms

Reads0
Chats0
TLDR
This article applies conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections, and derives estimators of the cardinality in both cases, and shows that the maximal‐term estimator is recursively computable and has exponentially decreasing error bounds.
Abstract
. This article considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo-random variates to form low-dimensional data sketches. We apply conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections. We derive estimators of the cardinality in both cases, and show that the maximal-term estimator is recursively computable and has exponentially decreasing error bounds. Furthermore, we show that the estimators have comparable asymptotic efficiency, and explain this result by demonstrating an unexpected connection between the two approaches.

read more

Citations
More filters
Proceedings ArticleDOI

HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm

TL;DR: A series of improvements to the HyperLogLog algorithm are presented, that reduce its memory requirements and significantly increase its accuracy for an important range of cardinalities.
Journal ArticleDOI

Unknown Sparsity in Compressed Sensing: Denoising and Inference

TL;DR: In this paper, a new deconvolution-based method for estimating unknown sparsity was proposed, which has wider applicability and sharper theoretical guarantees than the original sparsity measure.

An improved data stream summary: The Count-Min Sketch and its applications

TL;DR: In this paper, the authors introduce a sublinear space data structure called the countmin sketch for summarizing data streams, which allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc.
Journal ArticleDOI

Unknown sparsity in compressed sensing: Denoising and inference

TL;DR: A new deconvolution-based method for estimating unknown sparsity, which has wider applicability and sharper theoretical guarantees is offered, which introduces a family of entropy-based sparsity measures parameterized by q ϵ [0, ∞].
Proceedings ArticleDOI

Distributed size estimation of dynamic anonymous networks

TL;DR: This work derives an explicit estimation scheme for a particular peer-to-peer service network, starting from its statistical model, and considers quadratic regularization terms since they lead to closed-form solutions and intuitive design laws.
References
More filters
Book

The Art of Computer Programming

TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Book

Numerical Recipes in C: The Art of Scientific Computing

TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.
Book

Linear statistical inference and its applications

TL;DR: Algebra of Vectors and Matrices, Probability Theory, Tools and Techniques, and Continuous Probability Models.
Journal ArticleDOI

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Related Papers (5)