A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations
01 Dec 1952-Annals of Mathematical Statistics (Institute of Mathematical Statistics)-Vol. 23, Iss: 4, pp 493-507
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.
Citations
More filters
•
01 Jan 1998
TL;DR: This book attempts to give an overview of the different recent efforts to deal with covariate shift, a challenging situation where the joint distribution of inputs and outputs differs between the training and test stages.
Abstract: All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from this is the last candidate. next esc will revert to uncompleted text. he publisher. Overview Dataset shift is a challenging situation where the joint distribution of inputs and outputs differs between the training and test stages. Covariate shift is a simpler particular case of dataset shift where only the input distribution changes (covariate denotes input), while the conditional distribution of the outputs given the inputs p(y|x) remains unchanged. Dataset shift is present in most practical applications for reasons ranging from the bias introduced by experimental design, to the mere irreproducibility of the testing conditions at training time. For example, in an image classification task, training data might have been recorded under controlled laboratory conditions, whereas the test data may show different lighting conditions. In other applications, the process that generates data is in itself adaptive. Some of our authors consider the problem of spam email filtering: successful " spammers " will try to build spam in a form that differs from the spam the automatic filter has been built on. Dataset shift seems to have raised relatively little interest in the machine learning community until very recently. Indeed, many machine learning algorithms are based on the assumption that the training data is drawn from exactly the same distribution as the test data on which the model will later be evaluated. Semi-supervised learning and active learning, two problems that seem very similar to covariate shift have received much more attention. How do they differ from covariate shift? Semi-supervised learning is designed to take advantage of unlabeled data present at training time, but is not conceived to be robust against changes in the input distribution. In fact, one can easily construct examples of covariate shift for which common SSL strategies such as the " cluster assumption " will lead to disaster. In active learning the algorithm is asked to select from the available unlabeled inputs those for which obtaining the label will be most beneficial for learning. This is very relevant in contexts where labeling data is very costly, but active learning strategies 2 Contents are not specifically design for dealing with covariate shift. This book attempts to give an overview of the different recent efforts that are being …
1,037 citations
Cites methods from "A Measure of Asymptotic Efficiency ..."
...Using this lemma we can now prove Chernoff’s inequality (Chernoff 1952; Okamoto 1958) which was later generalized by Hoeffding (1963)....
[...]
••
TL;DR: A result of Johnson and Lindenstrauss shows that a set of n points in high dimensional Euclidean space can be mapped into an O(log n/e2)-dimensional Euclidesan space such that the distance between any two points changes by only a factor of (1 ± e).
Abstract: A result of Johnson and Lindenstrauss [13] shows that a set of n points in high dimensional Euclidean space can be mapped into an O(log n/e2)-dimensional Euclidean space such that the distance between any two points changes by only a factor of (1 ± e). In this note, we prove this theorem using elementary probabilistic techniques.
1,036 citations
Cites methods from "A Measure of Asymptotic Efficiency ..."
...The proof uses by now standard techniques used for proving large deviation bounds on sums of random variables [4, 9]....
[...]
••
TL;DR: In this paper, the relation between a class of 0-1 integer linear programs and their rational relaxations was studied and a randomized algorithm for transforming an optimal solution of a relaxed problem into a provably good solution for the 0 -1 problem was given.
Abstract: We study the relation between a class of 0–1 integer linear programs and their rational relaxations. We give a randomized algorithm for transforming an optimal solution of a relaxed problem into a provably good solution for the 0–1 problem. Our technique can be a of extended to provide bounds on the disparity between the rational and 0–1 optima for a given problem instance.
1,033 citations
•
19 Oct 2009TL;DR: In this paper, the authors present a coherent and unified treatment of probabilistic techniques for obtaining high probability estimates on the performance of randomized algorithms, from the basic tool kit from the Chernoff-Hoeffding (CH) bounds to more sophisticated techniques like Martingales and isoperimetric inequalities, as well as some recent developments like Talagrand's inequality, transportation cost inequalities, and log-Sobolev inequalities.
Abstract: Randomized algorithms have become a central part of the algorithms curriculum based on their increasingly widespread use in modern applications. This book presents a coherent and unified treatment of probabilistic techniques for obtaining high- probability estimates on the performance of randomized algorithms. It covers the basic tool kit from the Chernoff-Hoeffding (CH) bounds to more sophisticated techniques like Martingales and isoperimetric inequalities, as well as some recent developments like Talagrand's inequality, transportation cost inequalities, and log-Sobolev inequalities. Along the way, variations on the basic theme are examined, such as CH bounds in dependent settings. The authors emphasize comparative study of the different methods, highlighting respective strengths and weaknesses in concrete example applications. The exposition is tailored to discrete settings sufficient for the analysis of algorithms, avoiding unnecessary measure-theoretic details, thus making the book accessible to computer scientists as well as probabilists and discrete mathematicians.
1,028 citations
••
TL;DR: This paper considers the question of determining whether a function f has property P or is ε-far from any function with property P, and devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a clique of density p-Clique with respect to the vertex set.
Abstract: In this paper, we consider the question of determining whether a function f has property P or is e-far from any function with property P. A property testing algorithm is given a sample of the value of f on instances drawn according to some distribution. In some cases, it is also allowed to query f on instances of its choice. We study this question for different properties and establish some connections to problems in learning theory and approximation.In particular, we focus our attention on testing graph properties. Given access to a graph G in the form of being able to query whether an edge exists or not between a pair of vertices, we devise algorithms to test whether the underlying graph has properties such as being bipartite, k-Colorable, or having a p-Clique (clique of density p with respect to the vertex set). Our graph property testing algorithms are probabilistic and make assertions that are correct with high probability, while making a number of queries that is independent of the size of the graph. Moreover, the property testing algorithms can be used to efficiently (i.e., in time linear in the number of vertices) construct partitions of the graph that correspond to the property being tested, if it holds for the input graph.
1,027 citations
References
More filters