scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

01 Dec 1952-Annals of Mathematical Statistics (Institute of Mathematical Statistics)-Vol. 23, Iss: 4, pp 493-507
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.
Citations
More filters
Proceedings Article
16 Jun 2012
TL;DR: This paper uses Karamata’s theory of regular variation to prove that regularly varying heavy tails are sufficient for consistency, and derives a family of estimators which, in addition to being consistent, address some of the shortcomings of the Good-Turing estimator.
Abstract: This paper studies the problem of estimating the probability of symbols that have occurred very rarely, in samples drawn independently from an unknown, possibly infinite, discrete distribution. In particular, we study the multiplicative consistency of estimators, defined as the ratio of the estimate to the true quantity converging to one. We first show that the classical Good-Turing estimator is not universally consistent in this sense, despite enjoying favorable additive properties. We then use Karamata’s theory of regular variation to prove that regularly varying heavy tails are sufficient for consistency. At the core of this result is a multiplicative concentration that we establish both by extending the McAllester-Ortiz additive concentration for the missing mass to all rare probabilities and by exploiting regular variation. We also derive a family of estimators which, in addition to being consistent, address some of the shortcomings of the Good-Turing estimator. For example, they perform smoothing implicitly and have the absolute discounting structure of many heuristic algorithms. This also establishes a discrete parallel to extreme value theory, and many of the techniques therein can be adapted to the framework that we set forth.

47 citations


Cites background from "A Measure of Asymptotic Efficiency ..."

  • ...It is embodied in the following statement, which traces back to Chernoff (1952)....

    [...]

Proceedings ArticleDOI
03 Jul 2006
TL;DR: Analysis and simulation results show that, instead of deploying mobile sinks as much as possible, choosing appropriate number, transmission range, velocity and gathering mode of the sink nodes can significantly decrease the average end-to-end data delivery delay and improve the energy conservation.
Abstract: Introducing heterogeneous mobile devices, such as mobile phones into the large scale sparse wireless sensor networks is a promising research direction. These devices acting as mobile sinks offer many benefits to the network. For instance, they help to improve scalability, maintain load balance, conserve energy, prolong network lifetime and implement commercially. The paper investigates the impacts of different features and behavior of mobile sinks on the hybrid wireless sensor networks. Analysis and simulation results show that, instead of deploying mobile sinks as much as possible, choosing appropriate number, transmission range, velocity and gathering mode of the sink nodes can significantly decrease the average end-to-end data delivery delay and improve the energy conservation. The comparisons of performance metrics between fixed sinks and mobile sinks are also made in sparse networks along with the results that mobile sinks can bring higher data success rate and energy balance.

47 citations

Proceedings ArticleDOI
21 Jun 1989
TL;DR: The authors present a general approach to fault diagnosis that is widely applicable and requires only a limited number of connections among units and indicates that fault diagnosis can in fact be achieved quite simply in multiprocessor systems containing a low to moderate number of testing conditions.
Abstract: The authors present a general approach to fault diagnosis that is widely applicable and requires only a limited number of connections among units. Each unit in the system forms a private opinion on the status of each of its neighboring units based on duplication of jobs and comparison of job results over time. A diagnosis algorithm that consists of simply taking a majority vote among the neighbors of a unit to determine the status of that unit is then executed. The performance of this simple majority-vote diagnosis algorithm is analyzed using a probabilistic model for the faults in the system. It is shown that with high probability, for systems composed of n units, the algorithm will correctly identify the status of all units when each unit is connected to O(log n) other units. It is also shown that the algorithm works with high probability in a class of systems in which the average number of neighbors of a unit is constant. The results indicate that fault diagnosis can in fact be achieved quite simply in multiprocessor systems containing a low to moderate number of testing conditions. >

47 citations

Journal Article
TL;DR: In this paper, the ranking property of a k-round tournament over n=2/sup k/ players is analyzed, and it is demonstrated that the tournament possesses a surprisingly strong ranking property.
Abstract: A natural k-round tournament over n=2/sup k/ players is analyzed, and it is demonstrated that the tournament possesses a surprisingly strong ranking property. The ranking property of this tournament is exploited by being used as a building block for efficient parallel sorting algorithms under a variety of different models of computation. Three important applications are provided. First, a sorting circuit of depth 7.44 log n, which sorts all but a superpolynomially small fraction of the n-factorial possible input permutations, is defined. Secondly, a randomized sorting algorithm that runs in O(log n) word steps with very high probability is given for the hypercube and related parallel computers (the butterfly, cube-connected cycles, and shuffle-exchange). Thirdly, a randomized algorithm that runs in O(m+log n)-bit steps with very high probability is given for sorting n O(m)-bit records on an n log n-node butterfly.<>

47 citations

Journal ArticleDOI
TL;DR: Fractal Clustering (FC) places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least, which is a very natural way of clustering points.
Abstract: Clustering is a widely used knowledge discovery technique. It helps uncovering structures in data that were not previously known. The clustering of large data sets has received a lot of attention in recent years, however, clustering is a still a challenging task since many published algorithms fail to do well in scaling with the size of the data set and the number of dimensions that describe the points, or in finding arbitrary shapes of clusters, or dealing effectively with the presence of noise. In this paper, we present a new clustering algorithm, based in self-similarity properties of the data sets. Self-similarity is the property of being invariant with respect to the scale used to look at the data set. While fractals are self-similar at every scale used to look at them, many data sets exhibit self-similarity over a range of scales. Self-similarity can be measured using the fractal dimension. The new algorithm which we call Fractal Clustering (FC) places points incrementally in the cluster for which the change in the fractal dimension after adding the point is the least. This is a very natural way of clustering points, since points in the same cluster have a great degree of self-similarity among them (and much less self-similarity with respect to points in other clusters). FC requires one scan of the data, is suspendable at will, providing the best answer possible at that point, and is incremental. We show via experiments that FC effectively deals with large data sets, high-dimensionality and noise and is capable of recognizing clusters of arbitrary shape.

47 citations


Cites methods from "A Measure of Asymptotic Efficiency ..."

  • ...To add confidence to the stability of the clusters that are defined by this step, we can use the Chernoff bound (Chernoff, 1952) and the concept of adaptive sampling (Domingo et al....

    [...]

  • ...To add confidence to the stability of the clusters that are defined by this step, we can use the Chernoff bound (Chernoff, 1952) and the concept of adaptive sampling (Domingo et al., 1998, 2000; Domingos and Hulten, 2000; Lipton and Naughton, 1995; Lipton et al., 1993), to find the minimum number…...

    [...]

References
More filters