scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

01 Dec 1952-Annals of Mathematical Statistics (Institute of Mathematical Statistics)-Vol. 23, Iss: 4, pp 493-507
TL;DR: In this paper, it was shown that the likelihood ratio test for fixed sample size can be reduced to this form, and that for large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample with the second test.
Abstract: In many cases an optimum or computationally convenient test of a simple hypothesis $H_0$ against a simple alternative $H_1$ may be given in the following form. Reject $H_0$ if $S_n = \sum^n_{j=1} X_j \leqq k,$ where $X_1, X_2, \cdots, X_n$ are $n$ independent observations of a chance variable $X$ whose distribution depends on the true hypothesis and where $k$ is some appropriate number. In particular the likelihood ratio test for fixed sample size can be reduced to this form. It is shown that with each test of the above form there is associated an index $\rho$. If $\rho_1$ and $\rho_2$ are the indices corresponding to two alternative tests $e = \log \rho_1/\log \rho_2$ measures the relative efficiency of these tests in the following sense. For large samples, a sample of size $n$ with the first test will give about the same probabilities of error as a sample of size $en$ with the second test. To obtain the above result, use is made of the fact that $P(S_n \leqq na)$ behaves roughly like $m^n$ where $m$ is the minimum value assumed by the moment generating function of $X - a$. It is shown that if $H_0$ and $H_1$ specify probability distributions of $X$ which are very close to each other, one may approximate $\rho$ by assuming that $X$ is normally distributed.
Citations
More filters
Journal ArticleDOI
TL;DR: The requirements needed for clustering data streams are analyzed and some of the latest algorithms in the literature are reviewed to assess if they meet these requirements.
Abstract: Scientific and industrial examples of data streams abound in astronomy, telecommunication operations, banking and stock-market applications, e-commerce and other fields. A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new data arrives. In this paper, we analyze the requirements needed for clustering data streams. We review some of the latest algorithms in the literature and assess if they meet these requirements.

177 citations

Journal ArticleDOI
01 Jun 2011
TL;DR: This paper investigates a fundamental problem concerning uncertain graphs, which is the distance-constraint reachability (DCR) problem: Given two vertices s and t, what is the probability that the distance from s to t is less than or equal to a user-defined threshold d in the uncertain graph?
Abstract: Driven by the emerging network applications, querying and mining uncertain graphs has become increasingly important. In this paper, we investigate a fundamental problem concerning uncertain graphs, which we call the distance-constraint reachability (DCR) problem: Given two vertices s and t, what is the probability that the distance from s to t is less than or equal to a user-defined threshold d in the uncertain graph? Since this problem is #P-Complete, we focus on efficiently and accurately approximating DCR online. Our main results include two new estimators for the probabilistic reachability. One is a Horvitz-Thomson type estimator based on the unequal probabilistic sampling scheme, and the other is a novel recursive sampling estimator, which effectively combines a deterministic recursive computational procedure with a sampling process to boost the estimation accuracy. Both estimators can produce much smaller variance than the direct sampling estimator, which considers each trial to be either 1 or 0. We also present methods to make these estimators more computationally efficient. The comprehensive experiment evaluation on both real and synthetic datasets demonstrates the efficiency and accuracy of our new estimators.

176 citations

Journal ArticleDOI
TL;DR: The main result is an optimal randomized parallel algorithm for INTEGER_SORT, the first known that is optimal: the product of its time and processor bounds is upper bounded by a linear function of the input size.
Abstract: This paper assumes a parallel RAM (random access machine) model which allows both concurrent reads and concurrent writes of a global memory.The main result is an optimal randomized parallel algorithm for INTEGER_SORT (i.e., for sorting n integers in the range $[1,n]$). This algorithm costs only logarithmic time and is the first known that is optimal: the product of its time and processor bounds is upper bounded by a linear function of the input size. Also given is a deterministic sublogarithmic time algorithm for prefix sum. In addition this paper presents a sublogarithmic time algorithm for obtaining a random permutation of n elements in parallel. And finally, sublogarithmic time algorithms for GENERAL_SORT and INTEGER_SORT are presented. Our sub-logarithmic GENERAL_SORT algorithm is also optimal.

175 citations


Cites background from "A Measure of Asymptotic Efficiency ..."

  • ...The distribution function of X can easily be seen to be Probability(X ≤ x) = x∑ k=0 ( n k ) pn(1 − p)n−k. [Chernoff 52] and [Angluin and Valiant 79] have found ways of approximating the tail ends of a binomial distribution....

    [...]

Journal ArticleDOI
TL;DR: It is shown that, as the overall intensity of the underlying signal increases, an upper bound on the reconstruction error decays at an appropriate rate, but that for a fixed signal intensity, the error bound actually grows with the number of measurements or sensors.
Abstract: This paper describes performance bounds for compressed sensing (CS) where the underlying sparse or compressible (sparsely approximable) signal is a vector of nonnegative intensities whose measurements are corrupted by Poisson noise. In this setting, standard CS techniques cannot be applied directly for several reasons. First, the usual signal-independent and/or bounded noise models do not apply to Poisson noise, which is nonadditive and signal-dependent. Second, the CS matrices typically considered are not feasible in real optical systems because they do not adhere to important constraints, such as nonnegativity and photon flux preservation. Third, the typical l2 - l1 minimization leads to overfitting in the high-intensity regions and oversmoothing in the low-intensity areas. In this paper, we describe how a feasible positivity- and flux-preserving sensing matrix can be constructed, and then analyze the performance of a CS reconstruction approach for Poisson data that minimizes an objective function consisting of a negative Poisson log likelihood term and a penalty term which measures signal sparsity. We show that, as the overall intensity of the underlying signal increases, an upper bound on the reconstruction error decays at an appropriate rate (depending on the compressibility of the signal), but that for a fixed signal intensity, the error bound actually grows with the number of measurements or sensors. This surprising fact is both proved theoretically and justified based on physical intuition.

175 citations

Book ChapterDOI
TL;DR: It is demonstrated that random graphs satisfy some interesting Ramsey type properties and are shown to be finite, simple and undirected graphs.
Abstract: In this paper we shall demonstrate that random graphs satisfy some interesting Ramsey type properties. This paper deals with finite, simple and undirected graphs only. If G and H are graphs, write G → H to mean that if the edges of G are coloured by two colours, then G contains a monochromatic copy of H.

173 citations

References
More filters