scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 2005"


Journal ArticleDOI
TL;DR: This work presents tools for hierarchical clustering of imaged objects according to the shapes of their boundaries, learning of probability models for clusters of shapes, and testing of newly observed shapes under competing probability models.
Abstract: Using a differential-geometric treatment of planar shapes, we present tools for: 1) hierarchical clustering of imaged objects according to the shapes of their boundaries, 2) learning of probability models for clusters of shapes, and 3) testing of newly observed shapes under competing probability models. Clustering at any level of hierarchy is performed using a minimum variance type criterion and a Markov process. Statistical means of clusters provide shapes to be clustered at the next higher level, thus building a hierarchy of shapes. Using finite-dimensional approximations of spaces tangent to the shape space at sample means, we (implicitly) impose probability models on the shape space, and results are illustrated via random sampling and classification (hypothesis testing). Together, hierarchical clustering and hypothesis testing provide an efficient framework for shape retrieval. Examples are presented using shapes and images from ETH, Surrey, and AMCOM databases.

2,858 citations


Book
04 Aug 2005
TL;DR: This research presents a meta-modelling framework called ARMA, which automates the very labor-intensive and therefore time-heavy and expensive process of manually modeling the response of the immune system to changes in time.
Abstract: Introduction.- Stationary Time Series.- Smoothing in Time Series.- ARMA Modeling and Forecasting.- Parametric Nonlinear Time Series Models.- Nonparametric Models.- Hypothesis Testing.- Continuous Time Models in Finance.- Nonlinear Prediction.

1,135 citations


Journal ArticleDOI
01 Mar 2005-Oikos
TL;DR: Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue.
Abstract: Popular procedures to control the chance of making type I errors when multiple statistical tests are performed come at a high cost: a reduction in power. As the number of tests increases, power for an individual test may become unacceptably low. This is a consequence of minimizing the chance of making even a single type I error, which is the aim of, for instance, the Bonferroni and sequential Bonferroni procedures. An alternative approach, control of the false discovery rate (FDR), has recently been advocated for ecological studies. This approach aims at controlling the proportion of significant results that are in fact type I errors. Keeping the proportion of type I errors low among all significant results is a sensible, powerful, and easy-to-interpret way of addressing the multiple testing issue. To encourage practical use of the approach, in this note we illustrate how the proposed procedure works, we compare it to more traditional methods that control the familywise error rate, and we discuss some recent useful developments in FDR control.

902 citations


Book
24 Feb 2005
TL;DR: In this article, the Instrumental Variable Estimator in the Linear Regression Model is used to estimate the GMM in correctly specified models and GMM estimation in misspecified models.
Abstract: 1. Introduction 2. The Instrumental Variable Estimator in the Linear Regression Model 3. GMM Estimation in Correctly Specified Models 4. GMM Estimation in Misspecified Models 5. Hypothesis Testing 6. Asymptotic Theory and Finite Sample Behaviour 7. Moment Selection in Theory and in Practice 8. Alternative Approximations in Finite Sample Behaviour 9. Empirical Examples 10. Related Methods of Estimation Appendix: Mixing processes and Nonstationarity

596 citations


Book
02 Dec 2005
TL;DR: The authors describe in detail and systemize approaches, techniques, and methods for exploring spatial and temporal data in particular, developing a general view of data structures and characteristics and building on top of this a general task typology.
Abstract: Exploratory data analysis (EDA) is about detecting and describing patterns, trends, and relations in data, motivated by certain purposes of investigation. As something relevant is detected in data, new questions arise, causing specific parts to be viewed in more detail. So EDA has a significant appeal: it involves hypothesis generation rather than mere hypothesis testing. The authors describe in detail and systemize approaches, techniques, and methods for exploring spatial and temporal data in particular. They start by developing a general view of data structures and characteristics and then build on top of this a general task typology, distinguishing between elementary and synoptic tasks. This typology is then applied to the description of existing approaches and technologies, resulting not just in recommendations for choosing methods but in a set of generic procedures for data exploration. Professionals practicing analysis will profit from tested solutions - illustrated in many examples - for reuse in the catalogue of techniques presented. Students and researchers will appreciate the detailed description and classification of exploration techniques, which are not limited to spatial data only. In addition, the general principles and approaches described will be useful for designers of new methods for EDA.

562 citations



01 Jan 2005
TL;DR: In this paper, a modification of the Neyman-Pearson maximum-likelihood ratio test was suggested for this problem, and general comments on the formulation of the problem, a general large sample form for the test, and, finally, a number of examples.
Abstract: SUMMARY It is required to test a composite null hypothesis. High power is desired against a composite alternative hypothesis that is not in the same parametric family as the null hypothesis. In an earlier paper a modification of the Neyman-Pearson maximum-likelihood ratio test was suggested for this problem. The present paper gives some general comments on the formulation of the problem, a general large-sample form for the test, and, finally, a number of examples.

492 citations


Journal ArticleDOI
TL;DR: The statistic P rep estimates the probability of replicating an effect, and provides all of the information now used in evaluating research, while avoiding many of the pitfalls of traditional statistical inference.
Abstract: The statistic p(rep) estimates the probability of replicating an effect. It captures traditional publication criteria for signal-to-noise ratio, while avoiding parametric inference and the resulting Bayesian dilemma. In concert with effect size and replication intervals, p(rep) provides all of the information now used in evaluating research, while avoiding many of the pitfalls of traditional statistical inference.

429 citations


Reference EntryDOI
15 Oct 2005
TL;DR: Some formulas are given to obtain insight in the design aspects that are most influential for standard errors and power in multilevel designs.
Abstract: Sample size determination in multilevel designs requires attention to the fact that statistical power depends on the total sample sizes for each level. It is usually desirable to have as many units as possible at the top level of the multilevel hierarchy. Some formulas are given to obtain insight in the design aspects that are most influential for standard errors and power. Keywords: power; statistical tests; design; multilevel analysis; sample size; multisite trial; cluster randomization

418 citations


Book
01 Jan 2005
TL;DR: In this article, a simple theory leads to the identification of a single omnibus measure of correlation for the zi's order statistic, which relates to the correct choice of a null distribution for simultaneous significance testing and its effect on inference.
Abstract: Large-scale hypothesis testing problems, with hundreds or thousands of test statistics zi to consider at once, have become familiar in current practice. Applications of popular analysis methods, such as false discovery rate techniques, do not require independence of the zi's, but their accuracy can be compromised in high-correlation situations. This article presents computational and theoretical methods for assessing the size and effect of correlation in large-scale testing. A simple theory leads to the identification of a single omnibus measure of correlation for the zi's order statistic. The theory relates to the correct choice of a null distribution for simultaneous significance testing and its effect on inference.

410 citations


Book
25 May 2005
TL;DR: A comparison of Covariance: ANOVA With Statistical Controls and some Tests for Categorical Variables and Hypothesis Testing Concepts shows that comparing two Independent Groups results in different conclusions.
Abstract: Contents: Preface. Statistics and Communication Science. Fundamentals of Measurement. Sampling. Data Description and Visualization. Fundamentals of Probability. Assessing and Quantifying Reliability. Parameter Estimation. Hypothesis Testing Concepts. Testing a Hypothesis About a Single Mean. Comparing Two Independent Groups. Some Tests for Categorical Variables. Simple Linear Regression. Multiple Linear Regression. Single Factor Analysis of Variance. Analysis of Covariance: ANOVA With Statistical Controls. Interaction. Appendices.

Proceedings ArticleDOI
19 Oct 2005
TL;DR: It is explained here how any anomaly detection method can be viewed as a problem in statistical hypothesis testing, and four different methods for analyzing residuals, two of which are new are studied and compared.
Abstract: In this work we develop an approach for anomaly detection for large scale networks such as that of an enterprize or an ISP. The traffic patterns we focus on for analysis are that of a network-wide view of the traffic state, called the traffic matrix. In the first step a Kalman filter is used to filter out the "normal" traffic. This is done by comparing our future predictions of the traffic matrix state to an inference of the actual traffic matrix that is made using more recent measurement data than those used for prediction. In the second step the residual filtered process is then examined for anomalies. We explain here how any anomaly detection method can be viewed as a problem in statistical hypothesis testing. We study and compare four different methods for analyzing residuals, two of which are new. These methods focus on different aspects of the traffic pattern change. One focuses on instantaneous behavior, another focuses on changes in the mean of the residual process, a third on changes in the variance behavior, and a fourth examines variance changes over multiple timescales. We evaluate and compare all of these methods using ROC curves that illustrate the full tradeoff between false positives and false negatives for the complete spectrum of decision thresholds.

Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the proportion of true null hypotheses, π0, i n a multiple-hypothesis set-up, is considered and the tests are based on observed p-values.
Abstract: Summary. We consider the problem of estimating the proportion of true null hypotheses, π0 ,i n a multiple-hypothesis set-up. The tests are based on observed p-values. We first review published estimators based on the estimator that was suggested by Schweder and Spjotvoll. Then we derive new estimators based on nonparametric maximum likelihood estimation of the p-value density, restricting to decreasing and convex decreasing densities. The estimators of π0 are all derived under the assumption of independent test statistics. Their performance under dependence is investigated in a simulation study. We find that the estimators are relatively robust with respect to the assumption of independence and work well also for test statistics with moderate dependence.

Journal ArticleDOI
TL;DR: A new algorithm based on the multiple-instance learning framework and the naive Bayesian classifier (mi-NB) is developed which is specifically designed for the low false-alarm case, and is shown to have promising performance.
Abstract: We compare machine learning methods applied to a difficult real-world problem: predicting computer hard-drive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametrically-distributed data. We develop a new algorithm based on the multiple-instance learning framework and the naive Bayesian classifier (mi-NB) which is specifically designed for the low false-alarm case, and is shown to have promising performance. Other methods compared are support vector machines (SVMs), unsupervised clustering, and non-parametric statistical tests (rank-sum and reverse arrangements). The failure-prediction performance of the SVM, rank-sum and mi-NB algorithm is considerably better than the threshold method currently implemented in drives, while maintaining low false alarm rates. Our results suggest that nonparametric statistical tests should be considered for learning problems involving detecting rare events in time series data. An appendix details the calculation of rank-sum significance probabilities in the case of discrete, tied observations, and we give new recommendations about when the exact calculation should be used instead of the commonly-used normal approximation. These normal approximations may be particularly inaccurate for rare event problems like hard drive failures.

Journal ArticleDOI
TL;DR: These techniques can supplement current nonparametric statistical methods and should be included, where appropriate, in the armamentarium of data processing methodologies.

Journal ArticleDOI
TL;DR: In this article, the properties of least squares estimators for cross-section data with common shocks, such as macroeconomic shocks, have been analyzed, and necessary and sufficient conditions are given for consistency.
Abstract: This paper considers regression models for cross-section data that exhibit cross-section dependence due to common shocks, such as macroeconomic shocks. The paper analyzes the properties of least squares (LS) estimators in this context. The results of the paper allow for any form of cross-section dependence and heterogeneity across population units. The probability limits of the LS estimators are determined, and necessary and sufficient conditions are given for consistency. The asymptotic distributions of the estimators are found to be mixed normal after recentering and scaling. The t, Wald, and F statistics are found to have asymptotic standard normal, X 2 , and scaled X 2 distributions, respectively, under the null hypothesis when the conditions required for consistency of the parameter under test hold. However, the absolute values of t, Wald, and F statistics are found to diverge to infinity under the null hypothesis when these conditions fail. Confidence intervals exhibit similarly dichotomous behavior. Hence, common shocks are found to be innocuous in some circumstances, but quite problematic in others. Models with factor structures for errors and regressors are considered. Using the general results, conditions are determined under which consistency of the LS estimators holds and fails in models with factor structures. The results are extended to cover heterogeneous and functional factor structures in which common factors have different impacts on different population units.

Book ChapterDOI
06 Jul 2005
TL;DR: A statistical model checking algorithm that also verifies CSL formulas with unbounded untils, based on Monte Carlo simulation of the model and hypothesis testing of the samples, as opposed to sequential hypothesis testing is presented.
Abstract: Statistical methods to model check stochastic systems have been, thus far, developed only for a sublogic of continuous stochastic logic (CSL) that does not have steady state operator and unbounded until formulas. In this paper, we present a statistical model checking algorithm that also verifies CSL formulas with unbounded untils. The algorithm is based on Monte Carlo simulation of the model and hypothesis testing of the samples, as opposed to sequential hypothesis testing. We have implemented the algorithm in a tool called VESTA, and found it to be effective in verifying several examples.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the effect of record selection and amplitude scaling on the nonlinear seismic response of structures and found little evidence to support the need for a careful site-specific record selection by magnitude and distance, and showed that concern over scenario-to-scenario record scaling may not be justified.
Abstract: This study addresses the question of selection and amplitude scaling of accelerograms for predicting the nonlinear seismic response of structures. Despite the current practices of record selection according to a specific magnitude-distance scenario and scaling to a common level, neither aspect of this process has received significant research attention to ascertain the benefits or effects of these practices on the conclusions. This paper hypothesizes that neither these usual principal seismological characteristics nor scaling of records matters to the nonlinear response of structures. It then investigates under what conditions this hypothesis may not be sustainable. Two classes of records sets are compared in several case studies: one class is carefully chosen to represent a specific magnitude and distance scenario, the other is chosen randomly from a large catalog. Results of time-history analyses are formally compared by a simple statistical hypothesis test to assess the difference, if any, between nonlinear demands of the two classes of records. The effect of the degree of scaling by first-mode spectral acceleration level is investigated in the same way. Results here show 1 little evidence to support the need for a careful site-specific process of record selection by magnitude and distance, and 2 that concern over scenario-to-scenario record scaling, at least within the limits tested, may not be justified. DOI: 10.1193/1.1990199

Book
01 Jan 2005
TL;DR: An Introduction to Probability Theory: One Random Variable as mentioned in this paper and Many Random Variables * Statistics: An Introduction to Statistical Inference * Stochastic Processes: an Introduction to Poisson Processes and Markov Chains
Abstract: An Introduction to Probability Theory: One Random Variable * An Introduction to Probability Theory: Many Random Variables * Statistics: An Introduction to Statistical Inference * Stochastic Processes: An Introduction to Poisson Processes and Markov Chains * The Analysis of DNA Sequence Patterns: One sequence * The Analysis of DNA Sequences: Multiple sequences * Stochastic Processes: Random Walks * Statistics: Classical Estimation and Hypothesis Testing * BLAST * Stochastic Processes: Markov Chains * Hidden Markov Models * Computationally intensive methods * Evolutionary models * Phylogenetica tree estimation

Journal ArticleDOI
TL;DR: It is shown that when r is small, the proposed classes of quadratic form statistics based on the residuals of margins or multivariate moments up to order r have better small-sample properties and are asymptotically more powerful than X2 for some useful multivariate binary models.
Abstract: High-dimensional contingency tables tend to be sparse, and standard goodness-of-fit statistics such as X2 cannot be used without pooling categories. As an improvement on arbitrary pooling, for goodness of fit of large 2n contingency tables, we propose classes of quadratic form statistics based on the residuals of margins or multivariate moments up to order r. These classes of test statistics are asymptotically chi-squared distributed under the null hypothesis. Further, the marginal residuals are useful for diagnosing lack of fit of parametric models. We show that when r is small (r = 2, 3), the proposed statistics have better small-sample properties and are asymptotically more powerful than X2 for some useful multivariate binary models. Related to these test statistics is a class of limited-information estimators based on low-dimensional margins. We show that these estimators have high efficiency for one commonly used latent trait model for binary data.

Journal ArticleDOI
TL;DR: A crossover operator for evolutionary algorithms with real values that is based on the statistical theory of population distributions that takes into account the localization and dispersion features of the best individuals of the population with the objective that these features would be inherited by the offspring.
Abstract: In this paper we propose a crossover operator for evolutionary algorithms with real values that is based on the statistical theory of population distributions The operator is based on the theoretical distribution of the values of the genes of the best individuals in the population The proposed operator takes into account the localization and dispersion features of the best individuals of the population with the objective that these features would be inherited by the offspring Our aim is the optimization of the balance between exploration and exploitation in the search process In order to test the efficiency and robustness of this crossover, we have used a set of functions to be optimized with regard to different criteria, such as, multimodality, separability, regularity and epistasis With this set of functions we can extract conclusions in function of the problem at hand We analyze the results using ANOVA and multiple comparison statistical tests As an example of how our crossover can be used to solve artificial intelligence problems, we have applied the proposed model to the problem of obtaining the weight of each network in a ensemble of neural networks The results obtained are above the performance of standard methods

Journal ArticleDOI
TL;DR: The methodology includes uncertainty in the experimental measurement, and the posterior and prior distributions of the model output are used to compute a validation metric based on Bayesian hypothesis testing.

Journal ArticleDOI
TL;DR: It is argued that two statistical tests based on random field theory (RFT) satisfy this need for smooth classification images and are illustrated on classification images representative of the literature.
Abstract: Despite an obvious demand for a variety of statistical tests adapted to classification images, few have been proposed. We argue that two statistical tests based on random field theory (RFT) satisfy this need for smooth classification images. We illustrate these tests on classification images representative of the literature from F. Gosselin and P. G. Schyns (2001) and from A. B. Sekuler, C. M. Gaspar, J. M. Gold, and P. J. Bennett (2004). The necessary computations are performed using the Stat4Ci Matlab toolbox.

Journal ArticleDOI
TL;DR: A theoretical framework for inference problems in benchmark experiments is introduced and it is shown that standard statistical test procedures can be used to test for differences in the performances.
Abstract: The assessment of the performance of learners by means of benchmark experiments is an established exercise. In practice, benchmark studies are a tool to compare the performance of several competing algorithms for a certain learning problem. Cross-validation or resampling techniques are commonly used to derive point estimates of the performances which are compared to identify algorithms with good properties. For several benchmarking problems, test procedures taking the variability of those point estimates into account have been suggested. Most of the recently proposed inference procedures are based on special variance estimators for the cross-validated performance. We introduce a theoretical framework for inference problems in benchmark experiments and show that standard statistical test procedures can be used to test for differences in the performances. The theory is based on well-defined distributions of performance measures which can be compared with established tests. To demonstrate the usefulness in p...

Book
01 Jan 2005
TL;DR: In this article, the authors present a model for Bayesian inference based on decision theory and higher-order theory with special models and two-sided tests and conditional inference, using bootstrap methods.
Abstract: 1. Introduction 2. Decision theory 3. Bayesian methods 4. Hypothesis testing 5. Special models 6. Sufficiency and completeness 7. Two-sided tests and conditional inference 8. Likelihood theory 9. Higher-order theory 10. Predictive inference 11. Bootstrap methods.

Journal ArticleDOI
TL;DR: Applications to simulated and real data demonstrate that the proposed Monte Carlo approach provides accurate error control, and can be substantially more powerful than the Bonferroni and Holm methods, especially when the test statistics are highly correlated.
Abstract: Motivation: Multiple hypothesis testing is a common problem in genome research, particularly in microarray experiments and genomewide association studies. Failure to account for the effects of multiple comparisons would result in an abundance of false positive results. The Bonferroni correction and Holm's step-down procedure are overly conservative, whereas the permutation test is time-consuming and is restricted to simple problems. Results: We developed an efficient Monte Carlo approach to approximating the joint distribution of the test statistics along the genome. We then used the Monte Carlo distribution to evaluate the commonly used criteria for error control, such as familywise error rates and positive false discovery rates. This approach is applicable to any data structures and test statistics. Applications to simulated and real data demonstrate that the proposed approach provides accurate error control, and can be substantially more powerful than the Bonferroni and Holm methods, especially when the test statistics are highly correlated. Contact: [email protected]

Journal ArticleDOI
TL;DR: In this paper, a didactic discussion of covariance structure modeling in longitudinal studies with missing data is presented, and use of the full-information maximum likelihood method is considered for model fitting, parameter estimation, and hypothesis testing purposes, particularly when interested in patterns of temporal change and its covariates and predictors.
Abstract: A didactic discussion of covariance structure modeling in longitudinal studies with missing data is presented. Use of the full-information maximum likelihood method is considered for model fitting, parameter estimation, and hypothesis testing purposes, particularly when interested in patterns of temporal change as well as its covariates and predictors. The approach is illustrated with an application of the popular level-and-shape model to data from a cognitive intervention study of elderly adults.

Book
01 Jan 2005
TL;DR: In this paper, the dependence structure of the individual test statistics is taken into account in order to improve the ability to detect false null hypotheses by controlling the probability of k or more false rejections, called the k-FWER.
Abstract: Consider the problem of testing s hypotheses simultaneously. The usual approach tondealing with the multiplicity problem is to restrict attention to procedures that controlnthe probability of even one false rejection, the familiar familywise error rate (FWER). Innmany applications, particularly if s is large, one might be willing to tolerate more than onenfalse rejection if the number of such cases is controlled, thereby increasing the ability of thenprocedure to reject false null hypotheses One possibility is to replace control of the FWERnby control of the probability of k or more false rejections, which is called the k-FWER.nWe derive both single-step and stepdown procedures that control the k-FWER in finitensamples or asymptotically, depending on the situation. Lehmann and Romano (2005a)nderive some exact methods for this purpose, which apply whenever p-values are availablenfor individual tests; no assumptions are made on the joint dependence of the p-values. Inncontrast, we construct methods that implicitly take into account the dependence structurenof the individual test statistics in order to further increase the ability to detect false nullnhypotheses. We also consider the false discovery proportion (FDP) defined as the numbernof false rejections divided by the total number of rejections (and defined to be 0 if therenare no rejections). The false discovery rate proposed by Benjamini and Hochberg (1995)ncontrols E(FDP).

Journal ArticleDOI
TL;DR: In the case of no preference or avoidance, the distributions of the standard deviation of association indexes, or any other suitable test statistic, are not analytically tractable under the null hypothesis as discussed by the authors.

Book
01 Jan 2005
TL;DR: A review of a number of recent proposals from the statistical literature and how these procedures apply to the general problem of model selection and how to decide which hypotheses to reject is discussed.
Abstract: It is common in econometric applications that several hypothesis tests are carried out simultaneously. The problem then becomes how to decide which hypotheses to reject, accounting for the multitude of tests. The classical approach is to control the familywise error rate (FWE) which is the probability of one or more false rejections. But when the number of hypotheses under consideration is large, control of the FWE can become too demanding. As a result, the number of false hypotheses rejected may be small or even zero. This suggests replacing control of the FWE by a more liberal measure. To this end, we review a number of recent proposals from the statistical literature. We briefly discuss how these procedures apply to the general problem of model selection. A simulation study and two empirical applications illustrate the methods.