A Statistical Analysis of Probabilistic Counting Algorithms
Peter Clifford,Ioana A. Cosma +1 more
Reads0
Chats0
TLDR
This article applies conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections, and derives estimators of the cardinality in both cases, and shows that the maximal‐term estimator is recursively computable and has exponentially decreasing error bounds.Abstract:
. This article considers the problem of cardinality estimation in data stream applications. We present a statistical analysis of probabilistic counting algorithms, focusing on two techniques that use pseudo-random variates to form low-dimensional data sketches. We apply conventional statistical methods to compare probabilistic algorithms based on storing either selected order statistics, or random projections. We derive estimators of the cardinality in both cases, and show that the maximal-term estimator is recursively computable and has exponentially decreasing error bounds. Furthermore, we show that the estimators have comparable asymptotic efficiency, and explain this result by demonstrating an unexpected connection between the two approaches.read more
Citations
More filters
Proceedings ArticleDOI
HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm
TL;DR: A series of improvements to the HyperLogLog algorithm are presented, that reduce its memory requirements and significantly increase its accuracy for an important range of cardinalities.
Journal ArticleDOI
Unknown Sparsity in Compressed Sensing: Denoising and Inference
TL;DR: In this paper, a new deconvolution-based method for estimating unknown sparsity was proposed, which has wider applicability and sharper theoretical guarantees than the original sparsity measure.
An improved data stream summary: The Count-Min Sketch and its applications
Graham Cormode,S. Muthukrishnan +1 more
TL;DR: In this paper, the authors introduce a sublinear space data structure called the countmin sketch for summarizing data streams, which allows fundamental queries in data stream summarization such as point, range, and inner product queries to be approximately answered very quickly; in addition it can be applied to solve several important problems in data streams such as finding quantiles, frequent items, etc.
Journal ArticleDOI
Unknown sparsity in compressed sensing: Denoising and inference
TL;DR: A new deconvolution-based method for estimating unknown sparsity, which has wider applicability and sharper theoretical guarantees is offered, which introduces a family of entropy-based sparsity measures parameterized by q ϵ [0, ∞].
Proceedings ArticleDOI
Distributed size estimation of dynamic anonymous networks
TL;DR: This work derives an explicit estimation scheme for a particular peer-to-peer service network, starting from its statistical model, and considers quadratic regularization terms since they lead to closed-form solutions and intuitive design laws.
References
More filters
Book
The Art of Computer Programming
TL;DR: The arrangement of this invention provides a strong vibration free hold-down mechanism while avoiding a large pressure drop to the flow of coolant fluid.
Book
Numerical Recipes in C: The Art of Scientific Computing
TL;DR: Numerical Recipes: The Art of Scientific Computing as discussed by the authors is a complete text and reference book on scientific computing with over 100 new routines (now well over 300 in all), plus upgraded versions of many of the original routines, with many new topics presented at the same accessible level.
Book
Linear statistical inference and its applications
TL;DR: Algebra of Vectors and Matrices, Probability Theory, Tools and Techniques, and Continuous Probability Models.
Journal ArticleDOI
A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.