scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

01 Dec 1952-Annals of Mathematical Statistics (Institute of Mathematical Statistics)-Vol. 23, Iss: 4, pp 493-507
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors examine noisy radio (broadcast) networks in which every bit transmitted has a certain probability of being flipped, and show a protocol to compute any threshold function using only a linear number of transmissions.
Abstract: In this paper, we examine noisy radio (broadcast) networks in which every bit transmitted has a certain probability of being flipped. Each processor has some initial input bit, and the goal is to compute a function of these input bits. In this model, we show a protocol to compute any threshold function using only a linear number of transmissions.

81 citations

Journal ArticleDOI

81 citations

Journal ArticleDOI
TL;DR: A protocol in which the expected delay of any message is O(1) for an analogous model in which users are synchronized, and messages are generated according to a Poisson distribution with generation rate up to 1/e.
Abstract: We study contention resolution in a multiple-access channel such as the Ethernet channel. In the model that we consider, n users generate messages for the channel according to a probability distribution. Raghavan and Upfal have given a protocol in which the expected delay (time to get serviced) of every message is O(log n) when messages are generated according to a Bernoulli distribution with generation rate up to about 1/10. Our main results are the following protocols: (a) one in which the expected average message delay is O(1) when messages are generated according to a Bernoulli distribution with a generation rate smaller than 1/e, and (b) one in which the expected delay of any message is O(1) for an analogous model in which users are synchronized (i.e., they agree about the time), there are potentially an infinite number of users, and messages are generated according to a Poisson distribution with generation rate up to 1/e. (Each message constitutes a new user.)To achieve (a), we first show how to simulate (b) using n synchronized users, and then show how to build the synchronization into the protocol.

81 citations

Journal ArticleDOI
TL;DR: It is shown that a nonparametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution.
Abstract: Information divergence functions play a critical role in statistics and information theory. In this paper we show that a nonparametric $f$ -divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm these theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks.

81 citations


Cites background or methods from "A Measure of Asymptotic Efficiency ..."

  • ...As an alternative, one can evaluate simpler expressions that specify bounds on the BER in terms of measures of distance or divergence between probability functions [3], [12]–[14]....

    [...]

  • ...For classification problems, the well-known upper bound on the probability of error based on the Chernoff -divergence has been used in a number of statistical learning applications [3]....

    [...]

  • ...This family includes the total variation distance, the Bhattacharya distance [1], the KullbackLeibler divergence [2], and more generally, the Chernoff -divergence [3], [4]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of learning an unknown Poisson binomial distribution with respect to the total variation distance, and gave an algorithm with running time of quasilinear in the size of its input data.
Abstract: We consider a basic problem in unsupervised learning: learning an unknown Poisson binomial distribution. A Poisson binomial distribution (PBD) over $$\{0,1,\ldots ,n\}$${0,1,?,n} is the distribution of a sum of $$n$$n independent Bernoulli random variables which may have arbitrary, potentially non-equal, expectations. These distributions were first studied by Poisson (Recherches sur la Probabilite des jugements en matie criminelle et en matiere civile. Bachelier, Paris, 1837) and are a natural $$n$$n-parameter generalization of the familiar Binomial Distribution. Surprisingly, prior to our work this basic learning problem was poorly understood, and known results for it were far from optimal. We essentially settle the complexity of the learning problem for this basic class of distributions. As our first main result we give a highly efficient algorithm which learns to $$\epsilon $$∈-accuracy (with respect to the total variation distance) using $$\tilde{O}(1/ \epsilon ^{3})$$O~(1/∈3) samples independent of$$n$$n. The running time of the algorithm is quasilinear in the size of its input data, i.e., $$\tilde{O}(\log (n)/\epsilon ^{3})$$O~(log(n)/∈3) bit-operations (we write $$\tilde{O}(\cdot )$$O~(·) to hide factors which are polylogarithmic in the argument to $$\tilde{O}(\cdot )$$O~(·); thus, for example, $$\tilde{O}(a \log b)$$O~(alogb) denotes a quantity which is $$O(a \log b \cdot \log ^c(a \log b))$$O(alogb·logc(alogb)) for some absolute constant $$c$$c. Observe that each draw from the distribution is a $$\log (n)$$log(n)-bit string). Our second main result is a proper learning algorithm that learns to $$\epsilon $$∈-accuracy using $$\tilde{O}(1/\epsilon ^{2})$$O~(1/∈2) samples, and runs in time $$(1/\epsilon )^{\mathrm {poly}(\log (1/\epsilon ))} \cdot \log n$$(1/∈)poly(log(1/∈))·logn. This sample complexity is nearly optimal, since any algorithm for this problem must use $$\Omega (1/\epsilon ^{2})$$Ω(1/∈2) samples. We also give positive and negative results for some extensions of this learning problem to weighted sums of independent Bernoulli random variables.

80 citations

References
More filters