A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data

doi:10.1109/TIT.2008.928987

Journal ArticleDOI

A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data

Liam Paninski

- 01 Oct 2008 -

IEEE Transactions on Information Theory

- Vol. 54, Iss: 10, pp 4750-4755

TLDR

The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform.

Abstract:

How many independent samples N do we need from a distribution p to decide that p is epsiv-distant from uniform in an L1 sense, Sigmai=1 m |p(i) - 1/m| > epsiv? (Here m is the number of bins on which the distribution is supported, and is assumed known a priori.) Somewhat surprisingly, we only need N epsiv2 Gt m 1/2 to make this decision reliably (this condition is both sufficient and necessary). The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform. Some connections to the classical birthday problem are noted.

Citations

PDF

Open Access

More filters

Book

Introduction to Property Testing

Oded Goldreich

TL;DR: In this article, a wide range of algorithmic techniques for the design and analysis of tests for algebraic properties, properties of Boolean functions, graph properties, and properties of distributions are presented.

...read moreread less

Journal ArticleDOI

Testing Closeness of Discrete Distributions

Tugkan Batu, +4 more

- 01 Feb 2013 -

Journal of the ACM

TL;DR: In this article, the authors present an algorithm which uses sublinear in n, specifically, O(n2/3e−8/3 log n), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than {e4/3n−1/3/32, en−1 /2/4}) or large (more than e) in e 1 distance.

...read moreread less

Proceedings ArticleDOI

An Automatic Inequality Prover and Instance Optimal Identity Testing

Gregory Valiant, +1 more

TL;DR: This work gives a complete characterization of a general class of inequalities- generalizing Cauchy-Schwarz, Holder's inequality, and the monotonicity of L_p norms, and significantly generalizes and tightens previous results.

...read moreread less

Proceedings ArticleDOI

Optimal algorithms for testing closeness of discrete distributions

Siu On Chan, +3 more

TL;DR: This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.

...read moreread less

Journal ArticleDOI

Introduction to the theory of error-correcting codes

I.F. Blake

TL;DR: Characterization of (2(q + 1) + 2, 2, t, q)-min-hyper in PG(t, q) (t 3, q 5) and its applications to error-correcting codes and the theory of rank metric is transposed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Asymptotic methods in statistical decision theory

Lucien Le Cam

TL;DR: In this article, the authors present a framework for the analysis of decision spaces in decision theory, including the space of risk functions and the spaces of decision processes, and propose a method for measuring the suitability of a decision space.

...read moreread less

Journal ArticleDOI

A new upper bound on the minimal distance of self-dual codes

John H. Conway, +1 more

- 01 Nov 1990 -

IEEE Transactions on Information Theory

TL;DR: It is shown that the minimal distance d of a binary self-dual code of length n>or=74 is at most 2((n+6)/10).

...read moreread less

Journal ArticleDOI

Geometrizing Rates of Convergence, III

David L. Donoho, +1 more

- 01 Jun 1991 -

Annals of Statistics

TL;DR: In this paper, it was shown that for well-behaved loss functions, the complexity of the full infinite-dimensional composite testing problem is comparable to the difficulty of the hardest simple two-point testing subproblem.

...read moreread less

Journal ArticleDOI

Entropy and information in neural spike trains: Progress on the sampling problem

Ilya Nemenman, +2 more

- 24 May 2004 -

Physical Review E

TL;DR: A recently introduced Bayesian entropy estimator is applied to synthetic data inspired by experiments, and to real experimental spike trains, and performs admirably even very deep in the undersampled regime, where other techniques fail.

...read moreread less

Journal ArticleDOI

An Efron-Stein inequality for nonsymmetric statistics

J. Michael Steele

- 01 Jun 1986 -

Annals of Statistics

TL;DR: In this article, the long common subsequence problem has been applied to sharpen known variance bounds in the long Common subsequence Problem, where the variance is defined as a function of random variables.

...read moreread less

Collapse

A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data

Citations

Introduction to Property Testing

Testing Closeness of Discrete Distributions

An Automatic Inequality Prover and Instance Optimal Identity Testing

Optimal algorithms for testing closeness of discrete distributions

Introduction to the theory of error-correcting codes

References

Asymptotic methods in statistical decision theory

A new upper bound on the minimal distance of self-dual codes

Geometrizing Rates of Convergence, III

Entropy and information in neural spike trains: Progress on the sampling problem

An Efron-Stein inequality for nonsymmetric statistics

Related Papers (5)

Optimal algorithms for testing closeness of discrete distributions

Testing random variables for independence and identity

Testing that distributions are close

An Automatic Inequality Prover and Instance Optimal Identity Testing

Sublinear algorithms for testing monotone and unimodal distributions