scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

01 Dec 1952-Annals of Mathematical Statistics (Institute of Mathematical Statistics)-Vol. 23, Iss: 4, pp 493-507
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.
Citations
More filters
Dissertation
01 Jul 2003
TL;DR: The tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning is demonstrated and generic schemes for automatic model selection with many (hyper)parameters are developed.
Abstract: Non-parametric models and techniques enjoy a growing popularity in the field of machine learning, and among these Bayesian inference for Gaussian process (GP) models has recently received significant attention. We feel that GP priors should be part of the standard toolbox for constructing models relevant to machine learning in the same way as parametric linear models are, and the results in this thesis help to remove some obstacles on the way towards this goal. In the first main chapter, we provide a distribution-free finite sample bound on the difference between generalisation and empirical (training) error for GP classification methods. While the general theorem (the PAC-Bayesian bound) is not new, we give a much simplified and somewhat generalised derivation and point out the underlying core technique (convex duality) explicitly. Furthermore, the application to GP models is novel (to our knowledge). A central feature of this bound is that its quality depends crucially on task knowledge being encoded faithfully in the model and prior distributions, so there is a mutual benefit between a sharp theoretical guarantee and empirically well-established statistical practices. Extensive simulations on real-world classification tasks indicate an impressive tightness of the bound, in spite of the fact that many previous bounds for related kernel machines fail to give non-trivial guarantees in this practically relevant regime. In the second main chapter, sparse approximations are developed to address the problem of the unfavourable scaling of most GP techniques with large training sets. Due to its high importance in practice, this problem has received a lot of attention recently. We demonstrate the tractability and usefulness of simple greedy forward selection with information-theoretic criteria previously used in active learning (or sequential design) and develop generic schemes for automatic model selection with many (hyper)parameters. We suggest two new generic schemes and evaluate some of their variants on large real-world classification and regression tasks. These schemes and their underlying principles (which are clearly stated and analysed) can be applied to obtain sparse approximations for a wide regime of GP models far beyond the special cases we studied here.

202 citations


Cites methods from "A Measure of Asymptotic Efficiency ..."

  • ...The technique is often accredited to Chernoff [28], but has actually been used by Cramér before [35]: Pr {X ≥ x} = Pr { e ≥ e } ≤ eE [ e ]...

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors considered a series of single-server queues with unlimited waiting space and the first-in first-out service discipline, and they investigated the limiting behavior of the queueing times of all the customers at all the queues with a general distribution.
Abstract: We consider a series of $n$ single-server queues, each with unlimited waiting space and the first-in first-out service discipline. Initially, the system is empty; then $k$ customers are placed in the first queue. The service times of all the customers at all the queues are i.i.d. with a general distribution. We are interested in the time $D(k,n)$ required for all $k$ customers to complete service from all $n$ queues. In particular, we investigate the limiting behavior of $D(k,n)$ as $n \rightarrow \infty$ and/or $k \rightarrow \infty$. There is a duality implying that $D(k,n)$ is distributed the same as $D(n,k)$ so that results for large $n$ are equivalent to results for large $k$. A previous heavy-traffic limit theorem implies that $D(k,n)$ satisfies an invariance principle as $n \rightarrow \infty$, converging after normalization to a functional of $k$-dimensional Brownian motion. We use the subadditive ergodic theorem and a strong approximation to describe the limiting behavior of $D(k_n,n)$, where $k_n \rightarrow \infty$ as $n \rightarrow \infty$. The case of $k_n = \lbrack xn \rbrack$ corresponds to a hydrodynamic limit.

202 citations

Journal ArticleDOI
Suguru Arimoto1
TL;DR: A new definition of generalized information measures is introduced so as to investigate the finite-parameter estimation problem and yields a class of generalized entropy functions which are useful for treating the error-probability of decision and the other equivocation measures in the same framework.
Abstract: A new definition of generalized information measures is introduced so as to investigate the finite-parameter estimation problem. This definition yields a class of generalized entropy functions which is useful for treating the error-probability of decision and the other equivocation measures such as Shannon's logarithmic measure in the same framework and, in particular, deriving upper bounds to the error-probability. A few of inequalities between these equivocation measures are presented including an extension of Fano's inequality.

201 citations


Cites background from "A Measure of Asymptotic Efficiency ..."

  • ...8), is closely related to Chernoff's bound (Chernoff [1])....

    [...]

Journal ArticleDOI
TL;DR: This paper addresses the problem of characterizing ensemble similarity from sample similarity in a principled manner by using a reproducing kernel as a characterization of sample similarity, and suggests a probabilistic distance measure in the reproducingkernel Hilbert space (RKHS) as the ensemble similarity.
Abstract: This paper addresses the problem of characterizing ensemble similarity from sample similarity in a principled manner. Using a reproducing kernel as a characterization of sample similarity, we suggest a probabilistic distance measure in the reproducing kernel Hilbert space (RKHS) as the ensemble similarity. Assuming normality in the RKHS, we derive analytic expressions for probabilistic distance measures that are commonly used in many applications, such as Chernoff distance (or the Bhattacharyya distance as its special case), Kullback-Leibler divergence, etc. Since the reproducing kernel implicitly embeds a nonlinear mapping, our approach presents a new way to study these distances whose feasibility and efficiency is demonstrated using experiments with synthetic and real examples. Further, we extend the ensemble similarity to the reproducing kernel for ensemble and study the ensemble similarity for more general data representations.

201 citations

Journal ArticleDOI
TL;DR: In this article, the large deviation principle for stochastic processes with stationary and independent increments has been studied under the weak$^\ast$-topology, where the moment generating function of the increments is assumed to lie in the space of functions of bounded variation.
Abstract: Let $\mathscr{X}$ be a topological space and $\mathscr{F}$ denote the Borel $\sigma$-field in $\mathscr{X}$. A family of probability measures $\{P_\lambda\}$ is said to obey the large deviation principle (LDP) with rate function $I(\cdot)$ if $P_\lambda(A)$ can be suitably approximated by $\exp\{-\lambda \inf_{x\in A}I(x)\}$ for appropriate sets $A$ in $\mathscr{F}$. Here the LDP is studied for probability measures induced by stochastic processes with stationary and independent increments which have no Gaussian component. It is assumed that the moment generating function of the increments exists and thus the sample paths of such stochastic processes lie in the space of functions of bounded variation. The LDP for such processes is obtained under the weak$^\ast$-topology. This covers a case which was ruled out in the earlier work of Varadhan (1966). As applications, the large deviation principle for the Poisson, Gamma and Dirichlet processes are obtained.

198 citations


Cites background from "A Measure of Asymptotic Efficiency ..."

  • ...THEOREM 3.1 [Cramer (1937) and Chernoff (1952)]....

    [...]

  • ...…cases where Px (A a positive integer) is either (i) the probability measure induced by the average of A i.i.d. random variables [see Cramer (1937), Chernoff (1952), Bahadur and Zabell (1979) and Varadhan (1984)] or (ii) the probability measure of the empirical distribution of X i.i.d. random…...

    [...]

References
More filters