Journal ArticleDOI
A Coincidence-Based Test for Uniformity Given Very Sparsely Sampled Discrete Data
TLDR
The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform.Abstract:
How many independent samples N do we need from a distribution p to decide that p is epsiv-distant from uniform in an L1 sense, Sigmai=1 m |p(i) - 1/m| > epsiv? (Here m is the number of bins on which the distribution is supported, and is assumed known a priori.) Somewhat surprisingly, we only need N epsiv2 Gt m 1/2 to make this decision reliably (this condition is both sufficient and necessary). The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform. Some connections to the classical birthday problem are noted.read more
Citations
More filters
Book
Introduction to Property Testing
TL;DR: In this article, a wide range of algorithmic techniques for the design and analysis of tests for algebraic properties, properties of Boolean functions, graph properties, and properties of distributions are presented.
Journal ArticleDOI
Testing Closeness of Discrete Distributions
TL;DR: In this article, the authors present an algorithm which uses sublinear in n, specifically, O(n2/3e−8/3 log n), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than {e4/3n−1/3/32, en−1 /2/4}) or large (more than e) in e 1 distance.
Proceedings ArticleDOI
An Automatic Inequality Prover and Instance Optimal Identity Testing
Gregory Valiant,Paul Valiant +1 more
TL;DR: This work gives a complete characterization of a general class of inequalities- generalizing Cauchy-Schwarz, Holder's inequality, and the monotonicity of Lp norms, and significantly generalizes and tightens previous results.
Proceedings ArticleDOI
Optimal algorithms for testing closeness of discrete distributions
TL;DR: This work presents simple testers for both the e1 and e2 settings, with sample complexity that is information-theoretically optimal, to constant factors, and establishes that the sample complexity is Θ(max{n2/3/e4/3, n1/2/&epsilon2}.
Journal ArticleDOI
Introduction to the theory of error-correcting codes
TL;DR: Characterization of (2(q + 1) + 2, 2, t, q)-min-hyper in PG(t, q) (t 3, q 5) and its applications to error-correcting codes and the theory of rank metric is transposed.
References
More filters
Book
Asymptotic methods in statistical decision theory
TL;DR: In this article, the authors present a framework for the analysis of decision spaces in decision theory, including the space of risk functions and the spaces of decision processes, and propose a method for measuring the suitability of a decision space.
Journal ArticleDOI
A new upper bound on the minimal distance of self-dual codes
John H. Conway,Neil J. A. Sloane +1 more
TL;DR: It is shown that the minimal distance d of a binary self-dual code of length n>or=74 is at most 2((n+6)/10).
Journal ArticleDOI
Geometrizing Rates of Convergence, III
David L. Donoho,Richard C. Liu +1 more
TL;DR: In this paper, it was shown that for well-behaved loss functions, the complexity of the full infinite-dimensional composite testing problem is comparable to the difficulty of the hardest simple two-point testing subproblem.
Journal ArticleDOI
Entropy and information in neural spike trains: Progress on the sampling problem
TL;DR: A recently introduced Bayesian entropy estimator is applied to synthetic data inspired by experiments, and to real experimental spike trains, and performs admirably even very deep in the undersampled regime, where other techniques fail.
Journal ArticleDOI
An Efron-Stein inequality for nonsymmetric statistics
TL;DR: In this article, the long common subsequence problem has been applied to sharpen known variance bounds in the long Common subsequence Problem, where the variance is defined as a function of random variables.