scispace - formally typeset
Journal ArticleDOI

A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data

Liam Paninski
- 01 Oct 2008 - 
- Vol. 54, Iss: 10, pp 4750-4755
TLDR
The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform.
Abstract
How many independent samples N do we need from a distribution p to decide that p is epsiv-distant from uniform in an L1 sense, Sigmai=1 m |p(i) - 1/m| > epsiv? (Here m is the number of bins on which the distribution is supported, and is assumed known a priori.) Somewhat surprisingly, we only need N epsiv2 Gt m 1/2 to make this decision reliably (this condition is both sufficient and necessary). The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform. Some connections to the classical birthday problem are noted.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Introduction to Property Testing

TL;DR: In this article, a wide range of algorithmic techniques for the design and analysis of tests for algebraic properties, properties of Boolean functions, graph properties, and properties of distributions are presented.
Journal ArticleDOI

Testing Closeness of Discrete Distributions

TL;DR: In this article, the authors present an algorithm which uses sublinear in n, specifically, O(n2/3e−8/3 log n), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than {e4/3n−1/3/32, en−1 /2/4}) or large (more than e) in e 1 distance.
Proceedings ArticleDOI

An Automatic Inequality Prover and Instance Optimal Identity Testing

TL;DR: This work gives a complete characterization of a general class of inequalities- generalizing Cauchy-Schwarz, Holder's inequality, and the monotonicity of Lp norms, and significantly generalizes and tightens previous results.
Proceedings ArticleDOI

Optimal algorithms for testing closeness of discrete distributions

TL;DR: This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.
Journal ArticleDOI

Introduction to the theory of error-correcting codes

TL;DR: Characterization of (2(q + 1) + 2, 2, t, q)-min-hyper in PG(t, q) (t 3, q 5) and its applications to error-correcting codes and the theory of rank metric is transposed.
References
More filters
Book

Asymptotic methods in statistical decision theory

Lucien Le Cam
TL;DR: In this article, the authors present a framework for the analysis of decision spaces in decision theory, including the space of risk functions and the spaces of decision processes, and propose a method for measuring the suitability of a decision space.
Journal ArticleDOI

A new upper bound on the minimal distance of self-dual codes

TL;DR: It is shown that the minimal distance d of a binary self-dual code of length n>or=74 is at most 2((n+6)/10).
Journal ArticleDOI

Geometrizing Rates of Convergence, III

TL;DR: In this paper, it was shown that for well-behaved loss functions, the complexity of the full infinite-dimensional composite testing problem is comparable to the difficulty of the hardest simple two-point testing subproblem.
Journal ArticleDOI

Entropy and information in neural spike trains: Progress on the sampling problem

TL;DR: A recently introduced Bayesian entropy estimator is applied to synthetic data inspired by experiments, and to real experimental spike trains, and performs admirably even very deep in the undersampled regime, where other techniques fail.
Journal ArticleDOI

An Efron-Stein inequality for nonsymmetric statistics

TL;DR: In this article, the long common subsequence problem has been applied to sharpen known variance bounds in the long Common subsequence Problem, where the variance is defined as a function of random variables.
Related Papers (5)