scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 2006"


Journal Article
TL;DR: A set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers is recommended: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparisons of more classifiers over multiple data sets.
Abstract: While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust non-parametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding post-hoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.

10,306 citations


01 Mar 2006
TL;DR: Bayesian inference with Markov Chain Monte Carlo with coda package for R contains a set of functions designed to help the user answer questions about how many samples are required to accurately estimate posterior quantities of interest.
Abstract: [1st paragraph] At first sight, Bayesian inference with Markov Chain Monte Carlo (MCMC) appears to be straightforward. The user defines a full probability model, perhaps using one of the programs discussed in this issue; an underlying sampling engine takes the model definition and returns a sequence of dependent samples from the posterior distribution of the model parameters, given the supplied data. The user can derive any summary of the posterior distribution from this sample. For example, to calculate a 95% credible interval for a parameter α, it suffices to take 1000 MCMC iterations of α and sort them so that α1<α2<...<α1000. The credible interval estimate is then (α25, α975). However, there is a price to be paid for this simplicity. Unlike most numerical methods used in statistical inference, MCMC does not give a clear indication of whether it has converged. The underlying Markov chain theory only guarantees that the distribution of the output will converge to the posterior in the limit as the number of iterations increases to infinity. The user is generally ignorant about how quickly convergence occurs, and therefore has to fall back on post hoc testing of the sampled output. By convention, the sample is divided into two parts: a “burn in” period during which all samples are discarded, and the remainder of the run in which the chain is considered to have converged sufficiently close to the limiting distribution to be used. Two questions then arise: 1. How long should the burn in period be? 2. How many samples are required to accurately estimate posterior quantities of interest? The coda package for R contains a set of functions designed to help the user answer these questions. Some of these convergence diagnostics are simple graphical ways of summarizing the data. Others are formal statistical tests.

3,098 citations


Journal ArticleDOI
TL;DR: A new, fast, approximate likelihood-ratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support and is shown to be accurate, powerful, and robust to certain violations of model assumptions.
Abstract: We revisit statistical tests for branches of evolutionary trees reconstructed upon molecular data. A new, fast, approximate likelihood-ratio test (aLRT) for branches is presented here as a competitive alternative to nonparametric bootstrap and Bayesian estimation of branch support. The aLRT is based on the idea of the conventional LRT, with the null hypothesis corresponding to the assumption that the inferred branch has length 0. We show that the LRT statistic is asymptotically distributed as a maximum of three random variables drawn from the chi(0)2 + chi(1)2 distribution. The new aLRT of interior branch uses this distribution for significance testing, but the test statistic is approximated in a slightly conservative but practical way as 2(l1- l2), i.e., double the difference between the maximum log-likelihood values corresponding to the best tree and the second best topological arrangement around the branch of interest. Such a test is fast because the log-likelihood value l2 is computed by optimizing only over the branch of interest and the four adjacent branches, whereas other parameters are fixed at their optimal values corresponding to the best ML tree. The performance of the new test was studied on simulated 4-, 12-, and 100-taxon data sets with sequences of different lengths. The aLRT is shown to be accurate, powerful, and robust to certain violations of model assumptions. The aLRT is implemented within the algorithm used by the recent fast maximum likelihood tree estimation program PHYML (Guindon and Gascuel, 2003).

2,369 citations


Journal ArticleDOI
TL;DR: In this article, a two-stage adaptive procedure is proposed to control the false discovery rate at the desired level q. This framework enables us to study analytically the properties of other procedures that exist in the literature.
Abstract: We provide a new two-stage procedure in which the linear step-up procedure is used in stage one to estimate mo, providing a new level q' which is used in the linear step-up procedure in the second stage. We prove that a general form of the two-stage procedure controls the false discovery rate at the desired level q. This framework enables us to study analytically the properties of other procedures that exist in the literature. A simulation study is presented that shows that two-stage adaptive procedures improve in power over the original procedure, mainly because they provide tighter control of the false discovery rate. We further study the performance of the current suggestions, some variations of the procedures, and previous suggestions, in the case where the test statistics are positively dependent, a case for which the original procedure controls the false discovery rate. In the setting studied here the newly proposed two-stage procedure is the only one that controls the false discovery rate. The procedures are illustrated with two examples of biological importance.

2,319 citations


Book
22 Nov 2006
TL;DR: The Implied Marginal Variance-Covariance Matrix for the Final Model Diagnostics for theFinal Model Software Notes and Recommendations Other Analytic Approaches Recommendations.
Abstract: INTRODUCTION What Are Linear Mixed Models (LMMs)? A Brief History of Linear Mixed Models LINEAR MIXED MODELS: AN OVERVIEW Introduction Specification of LMMs The Marginal Linear Model Estimation in LMMs Computational Issues Tools for Model Selection Model-Building Strategies Checking Model Assumptions (Diagnostics) Other Aspects of LMMs Power Analysis for Linear Mixed Models Chapter Summary TWO-LEVEL MODELS FOR CLUSTERED DATA: THE RAT PUP EXAMPLE Introduction The Rat Pup Study Overview of the Rat Pup Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model Estimating the Intraclass Correlation Coefficients (ICCs) Calculating Predicted Values Diagnostics for the Final Model Software Notes and Recommendations THREE-LEVEL MODELS FOR CLUSTERED DATA THE CLASSROOM EXAMPLE Introduction The Classroom Study Overview of the Classroom Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model Estimating the Intraclass Correlation Coefficients (ICCs) Calculating Predicted Values Diagnostics for the Final Model Software Notes Recommendations MODELS FOR REPEATED-MEASURES DATA: THE RAT BRAIN EXAMPLE Introduction The Rat Brain Study Overview of the Rat Brain Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model The Implied Marginal Variance-Covariance Matrix for the Final Model Diagnostics for the Final Model Software Notes Other Analytic Approaches Recommendations RANDOM COEFFICIENT MODELS FOR LONGITUDINAL DATA: THE AUTISM EXAMPLE Introduction The Autism Study Overview of the Autism Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model Calculating Predicted Values Diagnostics for the Final Model Software Note: Computational Problems with the D Matrix An Alternative Approach: Fitting the Marginal Model with an Unstructured Covariance Matrix MODELS FOR CLUSTERED LONGITUDINAL DATA: THE DENTAL VENEER EXAMPLE Introduction The Dental Veneer Study Overview of the Dental Veneer Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model The Implied Marginal Variance-Covariance Matrix for the Final Model Diagnostics for the Final Model Software Notes and Recommendations Other Analytic Approaches MODELS FOR DATA WITH CROSSED RANDOM FACTORS: THE SAT SCORE EXAMPLE Introduction The SAT Score Study Overview of the SAT Score Data Analysis Analysis Steps in the Software Procedures Results of Hypothesis Tests Comparing Results across the Software Procedures Interpreting Parameter Estimates in the Final Model The Implied Marginal Variance-Covariance Matrix for the Final Model Recommended Diagnostics for the Final Model Software Notes and Additional Recommendations APPENDIX A: STATISTICAL SOFTWARE RESOURCES APPENDIX B: CALCULATION OF THE MARGINAL VARIANCE-COVARIANCE MATRIX APPENDIX C: ACRONYMS/ABBREVIATIONS BIBLIOGRAPHY INDEX

1,680 citations


Journal ArticleDOI
TL;DR: This work argues that the Poisson-like variability observed in cortex reduces a broad class of Bayesian inference to simple linear combinations of populations of neural activity, and demonstrates that these results hold for arbitrary probability distributions over the stimulus, for tuning curves of arbitrary shape and for realistic neuronal variability.
Abstract: Recent psychophysical experiments indicate that humans perform near-optimal Bayesian inference in a wide variety of tasks, ranging from cue integration to decision making to motor control. This implies that neurons both represent probability distributions and combine those distributions according to a close approximation to Bayes' rule. At first sight, it would seem that the high variability in the responses of cortical neurons would make it difficult to implement such optimal statistical inference in cortical circuits. We argue that, in fact, this variability implies that populations of neurons automatically represent probability distributions over the stimulus, a type of code we call probabilistic population codes. Moreover, we demonstrate that the Poisson-like variability observed in cortex reduces a broad class of Bayesian inference to simple linear combinations of populations of neural activity. These results hold for arbitrary probability distributions over the stimulus, for tuning curves of arbitrary shape and for realistic neuronal variability.

1,445 citations


Journal ArticleDOI
10 Jul 2006
TL;DR: A novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by the experiments.
Abstract: Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. Results: We study the practical feasibility of an MMD-based test on three central data integration tasks: Testing cross-platform comparability of microarray data, cancer diagnosis, and data-content based schema matching for two different protein function classification schemas. In all of these experiments, including high-dimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments. Availability: Contact: [email protected]

1,315 citations


Journal ArticleDOI
TL;DR: The authors pointed out that even large changes in significance levels can correspond to small, nonsignificant changes in the underlying quantities, which encourages the dismissal of observed differences in favor of the usually less interesting null hypothesis of no difference.
Abstract: It is common to summarize statistical comparisons by declarations of statistical significance or nonsignificance. Here we discuss one problem with such declarations, namely that changes in statistical significance are often not themselves statistically significant. By this, we are not merely making the commonplace observation that any particular threshold is arbitrary—for example, only a small change is required to move an estimate from a 5.1% significance level to 4.9%, thus moving it into statistical significance. Rather, we are pointing out that even large changes in significance levels can correspond to small, nonsignificant changes in the underlying quantities.The error we describe is conceptually different from other oft-cited problems—that statistical significance is not the same as practical importance, that dichotomization into significant and nonsignificant results encourages the dismissal of observed differences in favor of the usually less interesting null hypothesis of no difference, and that...

845 citations


Journal ArticleDOI
TL;DR: In this paper, a nonparametric test for Granger non-causality was proposed to avoid the over-rejection observed in the frequently used test proposed by Hiemstra and Jones [1994].

794 citations


Journal ArticleDOI
TL;DR: In this paper, a step-by-step guide for performing bootstrap mediation analyses is provided, and the test of joint significance is also briefly described as an alternative to both the normal theory and bootstrap methods.
Abstract: P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some other approaches. The authors describe an alternative developed by P. E. Shrout and N. Bolger (2002) based on bootstrap resampling methods. An example and step-by-step guide for performing bootstrap mediation analyses are provided. The test of joint significance is also briefly described as an alternative to both the normal theory and bootstrap methods. The relative advantages and disadvantages of each approach in terms of precision in estimating confidence intervals of indirect effects, Type I error, and Type II error are discussed.

776 citations


Journal ArticleDOI
TL;DR: This article reanalyze four datasets by adapting the general conceptual framework to these challenging inference problems and using the coin add-on package in the R system for statistical computing to show what one can gain from going beyond the “classical” test procedures.
Abstract: Conditioning on the observed data is an important and flexible design principle for statistical test procedures. Although generally applicable, permutation tests currently in use are limited to the treatment of special cases, such as contingency tables or K-sample problems. A new theoretical framework for permutation tests opens up the way to a unified and generalized view. This article argues that the transfer of such a theory to practical data analysis has important implications in many applications and requires tools that enable the data analyst to compute on the theoretical concepts as closely as possible. We reanalyze four datasets by adapting the general conceptual framework to these challenging inference problems and using the coin add-on package in the R system for statistical computing to show what one can gain from going beyond the “classical” test procedures.

Journal ArticleDOI
TL;DR: In this article, a simple test for causality in the frequency domain is proposed to investigate the predictive content of the yield spread for future output growth, which can also be applied to cointegrated systems.

Journal ArticleDOI
TL;DR: The paper deals with the f-divergences of Csiszar generalizing the discrimination information of Kullback, the total variation distance, the Hellinger divergence, and the Pearson divergence, where basic properties of f-Divergence including relations to the decision errors are proved in a new manner replacing the classical Jensen inequality.
Abstract: The paper deals with the f-divergences of Csiszar generalizing the discrimination information of Kullback, the total variation distance, the Hellinger divergence, and the Pearson divergence. All basic properties of f-divergences including relations to the decision errors are proved in a new manner replacing the classical Jensen inequality by a new generalized Taylor expansion of convex functions. Some new properties are proved too, e.g., relations to the statistical sufficiency and deficiency. The generalized Taylor expansion also shows very easily that all f-divergences are average statistical informations (differences between prior and posterior Bayes errors) mutually differing only in the weights imposed on various prior distributions. The statistical information introduced by De Groot and the classical information of Shannon are shown to be extremal cases corresponding to alpha=0 and alpha=1 in the class of the so-called Arimoto alpha-informations introduced in this paper for 0

Book
01 Jan 2006
TL;DR: This book discusses Phi-divergence Test Statistics under Sparseness Assumptions, as well as Independence Symmetry Marginal Homogeneity Quasi-symmetry Homogeneity, and more.
Abstract: DIVERGENCE MEASURES: DEFINITION AND PROPERTIES Introduction Phi-divergence. Measures between Two Probability Distributions: Definition and Properties Other Divergence Measures between Two Probability Distributions Divergence among k Populations Phi-disparities Exercises Answers to Exercises ENTROPY AS A MEASURE OF DIVERSITY: SAMPLING DISTRIBUTIONS Introduction Phi-entropies. Asymptotic Distribution Testing and Confidence Intervals for Phi-entropies Multinomial Populations: Asymptotic Distributions Maximum Entropy Principle and Statistical Inference on Condensed Ordered Data Exercises Answers to Exercises GOODNESS-OF-FIT: SIMPLE NULL HYPOTHESIS Introduction Phi-divergences and Goodness-of-fit with Fixed Number of Classes Phi-divergence Test Statistics under Sparseness Assumptions Nonstandard Problems: Tests Statistics based on Phi-divergences Exercises Answers to Exercises OPTIMALITY OF PHI-DIVERGENCE TEST STATISTICS IN GOODNESS-OF-FIT Introduction Asymptotic Effciency Exact and Asymptotic Moments: Comparison A Second Order Approximation to the Exact Distribution Exact Powers Based on Exact Critical Regions Small Sample Comparisons for the Phi-divergence Test Statistics Exercises Answers to Exercises MINIMUM PHI-DIVERGENCE ESTIMATORS Introduction Maximum Likelihood and Minimum Phi-divergence Estimators Properties of the Minimum Phi-divergence Estimator Normal Mixtures: Minimum Phi-divergence Estimator Minimum Phi-divergence Estimator with Constraints: Properties Exercises Answers to Exercises GOODNESS-OF-FIT: COMPOSITE NULL HYPOTHESIS Introduction Asymptotic Distribution with Fixed Number of Classes Nonstandard Problems: Test Statistics Based on Phi-divergences Exercises Answers to Exercises Testing Loglinear Models Using Phi-divergence Test Statistics Introduction Loglinear Models: Definition Asymptotic Results for Minimum Phi-divergence Estimators in Loglinear Models Testing in Loglinear Models Simulation Study Exercises Answers to Exercises PHI-DIVERGENCE MEASURES IN CONTINGENCY TABLES Introduction Independence Symmetry Marginal Homogeneity Quasi-symmetry Homogeneity Exercises Answers to Exercises TESTING IN GENERAL POPULATIONS Introduction Simple Null Hypotheses: Wald, Rao, Wilks and Phi-divergence Test Statistics Composite Null Hypothesis Multi-sample Problem Some Topics in Multivariate Analysis Exercises Answers to Exercises References Index

Journal ArticleDOI
TL;DR: In this article, the Nash-Sutcliffe efficiency index (Ef) is used for assessing the goodness of fit of hydrologic models. But, a method for estimating the statistical significance of sample values has not been documented; also, factors that contribute to poor sample values are not well understood.
Abstract: The Nash–Sutcliffe efficiency index ( Ef ) is a widely used and potentially reliable statistic for assessing the goodness of fit of hydrologic models; however, a method for estimating the statistical significance of sample values has not been documented. Also, factors that contribute to poor sample values are not well understood. This research focuses on the interpretation of sample values of Ef . Specifically, the objectives were to present an approximation of the sampling distribution of the index; provide a method for conducting hypothesis tests and computing confidence intervals for sample values; and identify the effects of factors that influence sample values of Ef including the sample size, outliers, bias in magnitude, time-offset bias of hydrograph models, and the sampling interval of hydrologic data. Actual hydrologic data and hypothetical analyses were used to show these effects. The analyses show that outliers can significantly influence sample values of Ef . Time-offset bias and bias in magnit...

Book
01 Jan 2006
TL;DR: This book studies and applies modern flexible regression models for survival data with a special focus on extensions of the Cox model and alternative models with the specific aim of describing time-varying effects of explanatory variables.
Abstract: In survival analysis there has long been a need for models that goes beyond the Cox model as the proportional hazards assumption often fails in practice. This book studies and applies modern flexible regression models for survival data with a special focus on extensions of the Cox model and alternative models with the specific aim of describing time-varying effects of explanatory variables. One model that receives special attention is Aalen’s additive hazards model that is particularly well suited for dealing with time-varying effects. The book covers the use of residuals and resampling techniques to assess the fit of the models and also points out how the suggested models can be utilised for clustered survival data. The authors demonstrate the practically important aspect of how to do hypothesis testing of time-varying effects making backwards model selection strategies possible for the flexible models considered. The use of the suggested models and methods is illustrated on real data examples. The methods are available in the R-package timereg developed by the authors, which is applied throughout the book with worked examples for the data sets. This gives the reader a unique chance of obtaining hands-on experience. This book is well suited for statistical consultants as well as for those who would like to see more about the theoretical justification of the suggested procedures. It can be used as a textbook for a graduate/master course in survival analysis, and students will appreciate the exercises included after each chapter. The applied side of the book with many worked examples accompanied with R-code shows in detail how one can analyse real data and at the same time gives a deeper understanding of the underlying theory.

Book
01 Jan 2006
TL;DR: In this paper, the authors introduce the concept of location descriptors and define a set of metrics for evaluating the quality of a location's descriptors, such as: Situation-Specific Maximum Dispersion (SMP), Bivariate Relationships, and Bivariate Normality.
Abstract: Part I Introductory Terms and Concepts. Definitions of Some Basic Terms. Levels of Scale. Some Experimental Design Considerations. Some Key Concepts. Reflection Problems. Part II Location. Reasonable Expectations for Statistics. Location Concepts. Three Classical Location Descriptive Statistics. Four Criteria for Evaluating Statistics. Two Robust Location Statistics. Some Key Concepts. Reflection Problems. Part III Dispersion.Quality of Location Descriptive Statistics. Important in Its Own Right. Measures of Score Spread. Variance. Situation-Specific Maximum Dispersion. Robust Dispersion Descriptive Statistics. Standardized Score World. Some Key Concepts. Reflection Problems. Part IV Shape. Two Shape Descriptive Statistics. Normal Distributions. Two Additional Univariate Graphics. Some Key Concepts. Reflection Problems. Part V Bivariate Relationships. Pearson's r. Three Features of r. Three Interpretation Contextual Factors. Psychometrics of the Pearson r. Spearman's rho. Two Other r -Equivalent Correlation Coefficients. Bivariate Normality. Some Key Concepts. Reflection Problems. Part VI Statistical Significance. Sampling Distributions. Hypothesis Testing. Properties of Sampling Distributions. Standard Error/Sampling Error. Test Statistics. Statistical Precision and Power. pCALCULATED. Some Key Concepts. Reflection Problems. Part VII Practical Significance. Effect Sizes. Confidence Intervals. Confidence Intervals for Effect Sizes. Some Key Concepts. Reflection Problems. Part VIII Multiple Regression Analysis: Basic GLM Concepts. Purposes of Regression. Simple Linear Prediction. Case #1: Perfectly Uncorrelated Predictors. Case #2: Correlated Predictors, No Suppressor. Effects. Case #3: Correlated Predictors, Suppressor. Effects Present. b Weights versus Structure Coefficients. A Final Comment on Collinearity. Some Key Concepts. Reflection Problems. Part IX A GLM Interpretation Rubric. Do I Have Anything?Where Does My Something Originate? Stepwise Methods. Invoking Some Alternative Models. Some Key Concepts. Reflection Problems. Part X One-way Analysis of Variance (ANOVA). Experimentwise Type I Error. ANOVA Terminology. The Logic of Analysis of Variance. Practical and Statistical Significance. The "Homogeneity of Variance" Assumption. Post Hoc Tests. Some Key Concepts. Reflection Problems. Part XI Multiway and Alternative ANOVA Models. Multiway Models. Factorial versus Nonfactorial Analyses. Fixed-, Random-, and Mixed-Effects Models. Brief Comment on ANCOVA. Some Key Concepts. Reflection Problems. Part XII The General Linear Model (GLM): ANOVA via Regression. Planned Contrasts. Trend/Polynomial Planned Contrasts. Repeated Measures ANOVA via Regression. GLM Lessons. Some Key Concepts. Reflection Problems. Part XIII Some Logistic Models: Model Fitting in a Logistic Context. Logistic Regression. Loglinear Analysis. Some Key Concepts. Reflection Problems. Appendix: Scores (n = 100) with Near Normal Distributions.

Journal ArticleDOI
TL;DR: The use of the Akaike information criterion (AIC) in model selection and inference, as well as the interpretation of results analysed in this framework with two real herpetological data sets are illustrated.
Abstract: In ecology, researchers frequently use observational studies to explain a given pattern, such as the number of individuals in a habitat patch, with a large number of explanatory (i.e., independent) variables. To elucidate such relationships, ecologists have long relied on hypothesis testing to include or exclude variables in regression models, although the conclusions often depend on the approach used (e.g., forward, backward, stepwise selection). Though better tools have surfaced in the mid 1970's, they are still underutilized in certain fields, particularly in herpetology. This is the case of the Akaike information criterion (AIC) which is remarkably superior in model selection (i.e., variable selection) than hypothesis- based approaches. It is simple to compute and easy to understand, but more importantly, for a given data set, it provides a measure of the strength of evidence for each model that represents a plausible biological hypothesis relative to the entire set of models considered. Using this approach, one can then compute a weighted average of the estimate and standard error for any given variable of interest across all the models considered. This procedure, termed model-averaging or multimodel inference, yields precise and robust estimates. In this paper, I illustrate the use of the AIC in model selection and inference, as well as the interpretation of results analysed in this framework with two real herpetological data sets. The AIC and measures derived from it is should be routinely adopted by herpetologists.

Journal ArticleDOI
TL;DR: The issue of prior specification for such multiple tests; computation of key posterior quantities; and useful ways to display these quantities are studied.

Journal ArticleDOI
TL;DR: In this article, the authors consider the null that a given series follows a zero mean martingale difference against the alternative that it is linearly predictable and show analytically and via simulations that despite this equality, the alternative model's sample MSPE is expected to be greater than the null's.

Journal ArticleDOI
TL;DR: A modification of the standard null hypothesis of zero difference in fit is proposed allowing for testing an interval hypothesis that the difference inFit between models is small, rather than zero, and these developments are combined yielding a procedure for estimating power of a test of anull hypothesis of small difference infit versus an alternative hypothesis of larger difference.
Abstract: For comparing nested covariance structure models, the standard procedure is the likelihood ratio test of the difference in fit, where the null hypothesis is that the models fit identically in the population. A procedure for determining statistical power of this test is presented where effect size is based on a specified difference in overall fit of the models. A modification of the standard null hypothesis of zero difference in fit is proposed allowing for testing an interval hypothesis that the difference in fit between models is small, rather than zero. These developments are combined yielding a procedure for estimating power of a test of a null hypothesis of small difference in fit versus an alternative hypothesis of larger difference.

Journal ArticleDOI
TL;DR: An easy-to-implement global procedure for testing the four assumptions of the linear model and its performance is compared with three potential competitors, including a procedure based on the Box–Cox power transformation.
Abstract: An easy-to-implement global procedure for testing the four assumptions of the linear model is proposed. The test can be viewed as a Neyman smooth test and it only relies on the standardized residual vector. If the global procedure indicates a violation of at least one of the assumptions, the components of the global test statistic can be utilized to gain insights into which assumptions have been violated. The procedure can also be used in conjunction with associated deletion statistics to detect unusual observations. Simulation results are presented indicating the sensitivity of the procedure in detecting model violations under a variety of situations, and its performance is compared with three potential competitors, including a procedure based on the Box-Cox power transformation. The procedure is demonstrated by applying it to a new car mileage data set and a water salinity data set that has been used previously to illustrate model diagnostics.

Journal ArticleDOI
TL;DR: In this article, the authors present a method for multiple hypothesis testing that maintains control of the false discovery rate while incorporating prior information about the hypotheses, which takes the form of p-value weights.
Abstract: We present a method for multiple hypothesis testing that maintains control of the False Discovery Rate while incorporating prior information about the hypotheses. The prior information takes the form of p-value weights. If the assignment of weights is positively associated with the null hypotheses being false, the procedure improves power, except in cases where power is already near one. Even if the assignment of weights is poor, power is only reduced slightly, as long as the weights are not too large. We also provide a similar method to control False Discovery Exceedance.

Journal ArticleDOI
TL;DR: In this article, neural networks are applied to predict energy consumption in buildings, guided by statistical procedures, such as hypothesis testing, information criteria and cross validation, and the performance of the developed models and predictors is evaluated using two different data sets, the energy use data of the Energy Prediction Shootout I contest, and of an office building, located in Athens.

Journal ArticleDOI
01 Aug 2006-Ecology
TL;DR: In this article, a simulation envelope is created by calculating, at every distance, the minimum and maximum results computed across the simulated patterns, and a statistical test is performed by evaluating where the results from an observed pattern fall with respect to the simulation envelope.
Abstract: Spatial point pattern analysis provides a statistical method to compare an observed spatial pattern against a hypothesized spatial process model. The G statistic, which considers the distribution of nearest neighbor distances, and the K statistic, which evaluates the distribution of all neighbor distances, are commonly used in such analyses. One method of employing these statistics involves building a simulation envelope from the result of many simulated patterns of the hypothesized model. Specifically, a simulation envelope is created by calculating, at every distance, the minimum and maximum results computed across the simulated patterns. A statistical test is performed by evaluating where the results from an observed pattern fall with respect to the simulation envelope. However, this method, which differs from P. Diggle's suggested approach, is invalid for inference because it violates the assumptions of Monte Carlo methods and results in incorrect type I error rate performance. Similarly, using the simulation envelope to estimate the range of distances over which an observed pattern deviates from the hypothesized model is also suspect. The technical details of why the simulation envelope provides incorrect type I error rate performance are described. A valid test is then proposed, and details about how the number of simulated patterns impacts the statistical significance are explained. Finally, an example of using the proposed test within an exploratory data analysis framework is provided.

Journal ArticleDOI
TL;DR: The strength of the error models are presented in improving statistical power of microarray data analysis, particularly, in increasing expression detection sensitivity and specificity when the number of replicates is limited.
Abstract: Motivation: In microarray gene expression studies, the number of replicated microarrays is usually small because of cost and sample availability, resulting in unreliable variance estimation and thus unreliable statistical hypothesis tests. The unreliable variance estimation is further complicated by the fact that the technology-specific variance is intrinsically intensity-dependent. Results: The Rosetta error model captures the variance-intensity relationship for various types of microarray technologies, such as single-color arrays and two-color arrays. This error model conservatively estimates intensity error and uses this value to stabilize the variance estimation. We present two commonly used error models: the intensity error-model for single-color microarrays and the ratio error model for two-color microarrays or ratios built from two single-color arrays. We present examples to demonstrate the strength of our error models in improving statistical power of microarray data analysis, particularly, in increasing expression detection sensitivity and specificity when the number of replicates is limited. Availability: Rosetta error models are available in the Rosetta Resolver® system for gene expression analysis. These technology-specific error models are designed and optimized for different microarray technologies, such as Affymetrix® and Agilent Technologies. Contact: lee_weng@rosettabio.com Supplementary information: Supplementary data and Appendices are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: A new statistical method, called SOBER, is proposed, which automatically localizes software faults without any prior knowledge of the program semantics and models the predicate evaluation in both correct and incorrect executions.
Abstract: Manual debugging is tedious, as well as costly. The high cost has motivated the development of fault localization techniques, which help developers search for fault locations. In this paper, we propose a new statistical method, called SOBER, which automatically localizes software faults without any prior knowledge of the program semantics. Unlike existing statistical approaches that select predicates correlated with program failures, SOBER models the predicate evaluation in both correct and incorrect executions and regards a predicate as fault-relevant if its evaluation pattern in incorrect executions significantly diverges from that in correct ones. Featuring a rationale similar to that of hypothesis testing, SOBER quantifies the fault relevance of each predicate in a principled way. We systematically evaluate SOBER under the same setting as previous studies. The result clearly demonstrates the effectiveness: SOBER could help developers locate 68 out of the 130 faults in the Siemens suite by examining no more than 10 percent of the code, whereas the cause transition approach proposed by Holger et al. [2005] and the statistical approach by Liblit et al. [2005] locate 34 and 52 faults, respectively. Moreover, the effectiveness of SOBER is also evaluated in an "imperfect world", where the test suite is either inadequate or only partially labeled. The experiments indicate that SOBER could achieve competitive quality under these harsh circumstances. Two case studies with grep 2.2 and bc 1.06 are reported, which shed light on the applicability of SOBER on reasonably large programs


Journal ArticleDOI
TL;DR: Model and data support the SCH view of resource allocation; at the under 1000-ms level of analysis, mixtures of cognitive and perceptual-motor resources are adjusted based on their cost-benefit tradeoffs for interactive behavior.
Abstract: Soft constraints hypothesis (SCH) is a rational analysis approach that holds that the mixture of perceptual-motor and cognitive resources allocated for interactive behavior is adjusted based on temporal cost-benefit tradeoffs. Alternative approaches maintain that cognitive resources are in some sense protected or conserved in that greater amounts of perceptual-motor effort will be expended to conserve lesser amounts of cognitive effort. One alternative, the minimum memory hypothesis (MMH), holds that people favor strategies that minimize the use of memory. SCH is compared with MMH across 3 experiments and with predictions of an Ideal Performer Model that uses ACT-R’s memory system in a reinforcement learning approach that maximizes expected utility by minimizing time. Model and data support the SCH view of resource allocation; at the under 1000-ms level of analysis, mixtures of cognitive and perceptual-motor resources are adjusted based on their cost-benefit tradeoffs for interactive behavior.

Journal ArticleDOI
TL;DR: Alternatives to hypothesis testing are reviewed including techniques for parameter estimation and model selection using likelihood and Bayesian techniques, which hold promise for new insight in ecology by encouraging thoughtful model building as part of inquiry.
Abstract: Statistical methods emphasizing formal hypothesis testing have dominated the analyses used by ecologists to gain insight from data Here, we review alternatives to hypothesis testing including techniques for parameter estimation and model selection using likelihood and Bayesian techniques These methods emphasize evaluation of weight of evidence for multiple hypotheses, multimodel inference, and use of prior information in analysis We provide a tutorial for maximum likelihood estimation of model parameters and model selection using information theoretics, including a brief treatment of procedures for model comparison, model averaging, and use of data from multiple sources We discuss the advantages of likelihood estimation, Bayesian analysis, and meta-analysis as ways to accumulate understanding across multiple studies These statistical methods hold promise for new insight in ecology by encouraging thoughtful model building as part of inquiry, providing a unified framework for the empirical analysis of theoretical models, and by facilitating the formal accumulation of evidence bearing on fundamental questions