scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

01 Dec 1952-Annals of Mathematical Statistics (Institute of Mathematical Statistics)-Vol. 23, Iss: 4, pp 493-507
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.
Citations
More filters
Journal ArticleDOI
TL;DR: A new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.
Abstract: A central task in the study of molecular evolution is the reconstruction of a phylogenetic tree from sequences of current-day taxa. The most established approach to tree reconstruction is maximum likelihood (ML) analysis. Unfortunately, searching for the maximum likelihood phylogenetic tree is computationally prohibitive for large data sets. In this paper, we describe a new algorithm that uses Structural Expectation Maximization (EM) for learning maximum likelihood phylogenetic trees. This algorithm is similar to the standard EM method for edge-length estimation, except that during iterations of the Structural EM algorithm the topology is improved as well as the edge length. Our algorithm performs iterations of two steps. In the E-step, we use the current tree topology and edge lengths to compute expected sufficient statistics, which summarize the data. In the M-Step, we search for a topology that maximizes the likelihood with respect to these expected sufficient statistics. We show that searching for better topologies inside the M-step can be done efficiently, as opposed to standard methods for topology search. We prove that each iteration of this procedure increases the likelihood of the topology, and thus the procedure must converge. This convergence point, however, can be a suboptimal one. To escape from such "local optima," we further enhance our basic EM procedure by incorporating moves in the flavor of simulated annealing. We evaluate these new algorithms on both synthetic and real sequence data and show that for protein sequences even our basic algorithm finds more plausible trees than existing methods for searching maximum likelihood phylogenies. Furthermore, our algorithms are dramatically faster than such methods, enabling, for the first time, phylogenetic analysis of large protein data sets in the maximum likelihood framework.

125 citations


Additional excerpts

  • ...Lemma 6.1 (Variation of ( Chernoff 1952 )) Φ( x) <e xp(x)...

    [...]

Journal ArticleDOI
01 Sep 1988
TL;DR: On caracterise des suites kn de telle facon que l'estimateur de Hill de l'indice de queue base sur les kn statistiques d'ordre superieures d'un echantillon de taille n d'une distribution de Type Pareto soit fortement consistant as mentioned in this paper.
Abstract: On caracterise des suites kn de telle facon que l'estimateur de Hill de l'indice de queue base sur les kn statistiques d'ordre superieures d'un echantillon de taille n d'une distribution de Type Pareto soit fortement consistant

124 citations

Journal ArticleDOI
TL;DR: In this paper, the internal spatial structure of 16 open clusters in the Milky Way spanning a wide range of ages was analyzed by means of the minimum spanning tree method (Q parameter), King profile fitting, and the correlation dimension (Dc ) for those clusters with fractal patterns.
Abstract: The analysis of the distribution of stars in open clusters may yield important information on the star formation process and early dynamical evolution of stellar clusters. Here we address this issue by systematically characterizing the internal spatial structure of 16 open clusters in the Milky Way spanning a wide range of ages. Cluster stars have been selected from a membership probability analysis based on a nonparametric method that uses both positions and proper motions and does not make any a priori assumption on the underlying distributions. The internal structure is then characterized by means of the minimum spanning tree method (Q parameter), King profile fitting, and the correlation dimension (Dc ) for those clusters with fractal patterns. On average, clusters with fractal-like structure are younger than those exhibiting radial star density profiles and an apparent trend between Q and age is observed in agreement with previous ideas about the dynamical evolution of the internal spatial structure of stellar clusters. However, some new results are obtained from a more detailed analysis: (1) a clear correlation between Q and the concentration parameter of the King model for those cluster with radial density profiles, (2) the presence of spatial substructure in clusters as old as ~100 Myr, and (3) a significant correlation between fractal dimension and age for those clusters with internal substructure. Moreover, the lowest fractal dimensions seem to be considerably smaller than the average value measured in galactic molecular cloud complexes.

124 citations


Cites background from "A Measure of Asymptotic Efficiency ..."

  • ...The statistical separation between any two types of populations can be described through the Chernoff probabilistic distance (Chernoff 1952), which is a measure of the difference between two probability distributions....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors consider the multivariate normal mean model in the situation that the mean vector is sparse in the nearly black sense and show that if the number of nonzero parameters of the mean vectors is known, the horseshoe estimator attains the minimax risk, possibly up to a multiplicative constant.
Abstract: We consider the horseshoe estimator due to Carvalho, Polson and Scott (2010) for the multivariate normal mean model in the situation that the mean vector is sparse in the nearly black sense. We assume the frequentist framework where the data is generated according to a fixed mean vector. We show that if the number of nonzero parameters of the mean vector is known, the horseshoe estimator attains the minimax $\ell_2$ risk, possibly up to a multiplicative constant. We provide conditions under which the horseshoe estimator combined with an empirical Bayes estimate of the number of nonzero means still yields the minimax risk. We furthermore prove an upper bound on the rate of contraction of the posterior distribution around the horseshoe estimator, and a lower bound on the posterior variance. These bounds indicate that the posterior distribution of the horseshoe prior may be more informative than that of other one-component priors, including the Lasso.

124 citations


Cites background from "A Measure of Asymptotic Efficiency ..."

  • ...For X ∼ Bin(n, p), we have the bound P(X ≥ k) ≤ ( enp k )k as a consequence of Theorem 1 in (Chernoff, 1952)....

    [...]

Proceedings ArticleDOI
08 Jul 2008
TL;DR: The coalitional manipulation problem for generalized scoring rules is studied, and it is proved that under certain natural assumptions, the probability that a random profile is manipulable under this class of voting rules is 1--O (to any possible winner under the voting rule).
Abstract: We introduce a class of voting rules called generalized scoring rules. Under such a rule, each vote generates a vector of k scores, and the outcome of the voting rule is based only on the sum of these vectors---more specifically, only on the order (in terms of score) of the sum's components. This class is extremely general: we do not know of any commonly studied rule that is not a generalized scoring rule.We then study the coalitional manipulation problem for generalized scoring rules. We prove that under certain natural assumptions, if the number of manipulators is O(np) (for any p 1/2) and o(n), then the probability that a random profile is manipulable (to any possible winner under the voting rule) is 1--O(e--Ω(n2p--1)). We also show that common voting rules satisfy these conditions (for the uniform distribution). These results generalize earlier results by Procaccia and Rosenschein as well as even earlier results on the probability of an election being tied.

124 citations


Cites background from "A Measure of Asymptotic Efficiency ..."

  • ...The next lemma is known as Chernoff ’s inequality[5]....

    [...]

References
More filters