scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 2013"


Book
06 May 2013
TL;DR: In this paper, the authors present a discussion of whether, if, how, and when a moderate mediator can be used to moderate another variable's effect in a conditional process analysis.
Abstract: I. FUNDAMENTAL CONCEPTS 1. Introduction 1.1. A Scientist in Training 1.2. Questions of Whether, If, How, and When 1.3. Conditional Process Analysis 1.4. Correlation, Causality, and Statistical Modeling 1.5. Statistical Software 1.6. Overview of this Book 1.7. Chapter Summary 2. Simple Linear Regression 2.1. Correlation and Prediction 2.2. The Simple Linear Regression Equation 2.3. Statistical Inference 2.4. Assumptions for Interpretation and Statistical Inference 2.5. Chapter Summary 3. Multiple Linear Regression 3.1. The Multiple Linear Regression Equation 3.2. Partial Association and Statistical Control 3.3. Statistical Inference in Multiple Regression 3.4. Statistical and Conceptual Diagrams 3.5. Chapter Summary II. MEDIATION ANALYSIS 4. The Simple Mediation Model 4.1. The Simple Mediation Model 4.2. Estimation of the Direct, Indirect, and Total Effects of X 4.3. Example with Dichotomous X: The Influence of Presumed Media Influence 4.4. Statistical Inference 4.5. An Example with Continuous X: Economic Stress among Small Business Owners 4.6. Chapter Summary 5. Multiple Mediator Models 5.1. The Parallel Multiple Mediator Model 5.2. Example Using the Presumed Media Influence Study 5.3. Statistical Inference 5.4. The Serial Multiple Mediator Model 5.5. Complementarity and Competition among Mediators 5.6. OLS Regression versus Structural Equation Modeling 5.7. Chapter Summary III. MODERATION ANALYSIS 6. Miscellaneous Topics in Mediation Analysis 6.1. What About Baron and Kenny? 6.2. Confounding and Causal Order 6.3. Effect Size 6.4. Multiple Xs or Ys: Analyze Separately or Simultaneously? 6.5. Reporting a Mediation Analysis 6.6. Chapter Summary 7. Fundamentals of Moderation Analysis 7.1. Conditional and Unconditional Effects 7.2. An Example: Sex Discrimination in the Workplace 7.3. Visualizing Moderation 7.4. Probing an Interaction 7.5. Chapter Summary 8. Extending Moderation Analysis Principles 8.1. Moderation Involving a Dichotomous Moderator 8.2. Interaction between Two Quantitative Variables 8.3. Hierarchical versus Simultaneous Variable Entry 8.4. The Equivalence between Moderated Regression Analysis and a 2 x 2 Factorial Analysis of Variance 8.5. Chapter Summary 9. Miscellaneous Topics in Moderation Analysis 9.1. Truths and Myths about Mean Centering 9.2. The Estimation and Interpretation of Standardized Regression Coefficients in a Moderation Analysis 9.3. Artificial Categorization and Subgroups Analysis 9.4. More Than One Moderator 9.5. Reporting a Moderation Analysis 9.6. Chapter Summary IV. CONDITIONAL PROCESS ANALYSIS 10. Conditional Process Analysis 10.1. Examples of Conditional Process Models in the Literature 10.2. Conditional Direct and Indirect Effects 10.3. Example: Hiding Your Feelings from Your Work Team 10.4. Statistical Inference 10.5. Conditional Process Analysis in PROCESS 10.6. Chapter Summary 11. Further Examples of Conditional Process Analysis 11.1. Revisiting the Sexual Discrimination Study 11.2. Moderation of the Direct and Indirect Effects in a Conditional Process Model 11.3. Visualizing the Direct and Indirect Effects 11.4. Mediated Moderation 11.5. Chapter Summary 12. Miscellaneous Topics in Conditional Process Analysis 12.1. A Strategy for Approaching Your Analysis 12.2. Can a Variable Simultaneously Mediate and Moderate Another Variable's Effect? 12.3. Comparing Conditional Indirect Effects and a Formal Test of Moderated Mediation 12.4. The Pitfalls of Subgroups Analysis 12.5. Writing about Conditional Process Modeling 12.6. Chapter Summary Appendix A. Using PROCESS Appendix B. Monte Carlo Confidence Intervals in SPSS and SAS

26,144 citations


Journal ArticleDOI
TL;DR: It is argued that researchers using LMEMs for confirmatory hypothesis testing should minimally adhere to the standards that have been in place for many decades, and it is shown thatLMEMs generalize best when they include the maximal random effects structure justified by the design.

6,878 citations


Journal ArticleDOI
TL;DR: It is demonstrated that power for 2SLS MR can be derived using the non-centrality parameter (NCP) of the statistical test that is employed to test whether the two-stage least squares regression coefficient is zero and represented theoretically using this NCP-based approach.
Abstract: In Mendelian randomization (MR) studies, where genetic variants are used as proxy measures for an exposure trait of interest, obtaining adequate statistical power is frequently a concern due to the small amount of variation in a phenotypic trait that is typically explained by genetic variants. A range of power estimates based on simulations and specific parameters for two-stage least squares (2SLS) MR analyses based on continuous variables has previously been published. However there are presently no specific equations or software tools one can implement for calculating power of a given MR study. Using asymptotic theory, we show that in the case of continuous variables and a single instrument, for example a single-nucleotide polymorphism (SNP) or multiple SNP predictor, statistical power for a fixed sample size is a function of two parameters: the proportion of variation in the exposure variable explained by the genetic predictor and the true causal association between the exposure and outcome variable. We demonstrate that power for 2SLS MR can be derived using the non-centrality parameter (NCP) of the statistical test that is employed to test whether the 2SLS regression coefficient is zero. We show that the previously published power estimates from simulations can be represented theoretically using this NCP-based approach, with similar estimates observed when the simulation-based estimates are compared with our NCP-based approach. General equations for calculating statistical power for 2SLS MR using the NCP are provided in this note, and we implement the calculations in a web-based application.

859 citations


BookDOI
31 Oct 2013
TL;DR: Significance testing has been a controversial topic in the analysis of scientific data as discussed by the authors, with many opponents arguing that it should be replaced by confidence intervals instead of statistical significance tests.
Abstract: Contents: Preface. Part I: Overview. L.L. Harlow, Significance Testing Introduction and Overview. Part II: The Debate: Against and For Significance Testing. J.Cohen, The Earth Is Round. F.L. Schmidt, J. Hunter, Eight Objections to the Discontinuation of Significance Testing in the Analysis of Research Data. S.A. Mulaik, N.S. Raju, R. Harshman, There Is a Time and Place for Significance Testing. R.P. Abelson, A Retrospective on the Significance Test Ban of 1999 (If There Were No Significance Tests, They Would Be Invented). Part III: Suggested Alternatives to Significance Testing. R.J. Harris, Reforming Significance Testing via Three-Valued Logic. J.S. Rossi, Spontaneous Recovery of Verbal Learning: A Case Study in the Failure of Psychology as a Cumulative Science. J.H. Steiger, R.T. Fouladi, Noncentrality Interval Estimation and the Evaluation of Statistical Models. R.P. McDonald, Goodness of Approximation in the Linear Model. Part IV: A Bayesian Approach to Hypothesis Testing. R.M. Pruzek, An Introduction to Bayesian Inference and Its Application. D. Rindskopf, Testing 'Small,' Not Null, Hypotheses: Classical and Bayesian Approaches. C.S. Reichardt, H.F. Gollob, When Confidence Intervals Should Be Used Instead of Statistical Significance Tests, and Vice Versa. Part V: Philosophy of Science Issues. W.W. Rozeboom, Good Science Is Abductive, Not Hypothetico-Deductive. P.E. Meehl, The Problem Is Epistemology, Not Statistics: Replace Significance Tests by Confidence Intervals and Quantify Accuracy of Risky Numerical Predictions.

699 citations


Journal ArticleDOI
TL;DR: Modifications of common standards of evidence are proposed to reduce the rate of nonreproducibility of scientific research by a factor of 5 or greater and to correct the problem of unjustifiably high levels of significance.
Abstract: Recent advances in Bayesian hypothesis testing have led to the development of uniformly most powerful Bayesian tests, which represent an objective, default class of Bayesian hypothesis tests that have the same rejection regions as classical significance tests. Based on the correspondence between these two classes of tests, it is possible to equate the size of classical hypothesis tests with evidence thresholds in Bayesian tests, and to equate P values with Bayes factors. An examination of these connections suggest that recent concerns over the lack of reproducibility of scientific studies can be attributed largely to the conduct of significance tests at unjustifiably high levels of significance. To correct this problem, evidence thresholds required for the declaration of a significant finding should be increased to 25–50:1, and to 100–200:1 for the declaration of a highly significant finding. In terms of classical hypothesis tests, these evidence standards mandate the conduct of tests at the 0.005 or 0.001 level of significance.

671 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed to use a combination of graphical results, absolute value error statistics (i.e. root mean square error), and normalized goodness-of-fit statistics (e.g. Nash-Sutcliffe Efficiency coefficient, NSE) for quantifying the goodness of observations against model-calculated values.

626 citations


Journal ArticleDOI
TL;DR: The method, called 'statistical parametric mapping' (SPM), uses random field theory to objectively identify field regions which co-vary significantly with the experimental design, thereby making it useful for objectively guiding analyses of complex biomechanical systems.

438 citations


Journal ArticleDOI
TL;DR: A general framework for assessing predictive stream learning algorithms and defends the use of prequential error with forgetting mechanisms to provide reliable error estimators, and proves that, in stationary data and for consistent learning algorithms, the holdout estimator, the preQUential error and the prequentially error estimated over a sliding window or using fading factors, all converge to the Bayes error.
Abstract: Most streaming decision models evolve continuously over time, run in resource-aware environments, and detect and react to changes in the environment generating data. One important issue, not yet convincingly addressed, is the design of experimental work to evaluate and compare decision models that evolve over time. This paper proposes a general framework for assessing predictive stream learning algorithms. We defend the use of prequential error with forgetting mechanisms to provide reliable error estimators. We prove that, in stationary data and for consistent learning algorithms, the holdout estimator, the prequential error and the prequential error estimated over a sliding window or using fading factors, all converge to the Bayes error. The use of prequential error with forgetting mechanisms reveals to be advantageous in assessing performance and in comparing stream learning algorithms. It is also worthwhile to use the proposed methods for hypothesis testing and for change detection. In a set of experiments in drift scenarios, we evaluate the ability of a standard change detection algorithm to detect change using three prequential error estimators. These experiments point out that the use of forgetting mechanisms (sliding windows or fading factors) are required for fast and efficient change detection. In comparison to sliding windows, fading factors are faster and memoryless, both important requirements for streaming applications. Overall, this paper is a contribution to a discussion on best practice for performance assessment when learning is a continuous process, and the decision models are dynamic and evolve over time.

432 citations


Book
20 Feb 2013
TL;DR: This Second edition is a practical guide to data analysis using the bootstrap, cross-validation, and permutation tests and is an essential resource for industrial statisticians, statistical consultants, and research professionals in science, engineering, and technology.
Abstract: The goal of this book is to introduce statistical methodology-estimation, hypothesis, testing and classification-to a wide applied audience through resampling from existing data via the bootstrap, and estimation or cross-validation methods. The book provides an accessible introduction and practical guide to the power, simplicity and veritability of the bootstrap, cross-validation and permutation tests. Industrial statistical consultants, professionals and researchers will find the book's methods and software imimediately helpful. (unvollstandig)) This Second edition is a practical guide to data analysis using the bootstrap, cross-validation, and permutation tests. It is an essential resource for industrial statisticians, statistical consultants, and research professionals in science, engineering, and technology. Only requiring minimal mathematics beyond algebra, it provides a table-free introduction to data analysis utilizing numerous exercizes, practical data sets, and freely available statistical shareware. Topics and features: *Thoroughly revised text features more practical examples plus an additional chapter devoted to regression and data mining techniques and their limitations *Uses resampling approach to introduction statistics *A Practical presentation that covers all three sampling methods - bootstrap, density-estimation, and permutations *Includes systematic guide to help one select correct procedure for a particular application *Detailed coverage of all three statistical methodologies - classification, estimation, and hypothesis testing *Suitable for classroom use and individual, self-study purposes *Numerous practical examples using popular computer programs such as SAS, Stata, and StatXact *Useful appendices with computer programs and code to develop own methods *Downloadable freeware from author's website: http://users.oco.net/drphilgood/resamp.htm With its accessable style and intuitive topic development, the book is an excellent basic resource and guide to the power, simplicity and versatility of bootstrap, cross-validation and permutation tests. Students, professionals, and researchers will find it a particularly useful guide to modern resampling methods and their applications.

376 citations


Book
31 May 2013
TL;DR: In this paper, a link between frontier estimation and extreme value theory has been established, and several approaches exist for introducing environmental variables into production models; both two-stage approaches, in which estimated efficiencies are regressed on environmental variables, and conditional efficiency measures, as well as the underlying assumptions required for either approach, are examined.
Abstract: Nonparametric estimators are widely used to estimate the productive efficiency of firms and other organizations, but often without any attempt to make statistical inference. Recent work has provided statistical properties of these estimators as well as methods for making statistical inference, and a link between frontier estimation and extreme value theory has been established. New estimators that avoid many of the problems inherent with traditional efficiency estimators have also been developed; these new estimators are robust with respect to outliers and avoid the well-known curse of dimensionality. Statistical properties, including asymptotic distributions, of the new estimators have been uncovered. Finally, several approaches exist for introducing environmental variables into production models; both two-stage approaches, in which estimated efficiencies are regressed on environmental variables, and conditional efficiency measures, as well as the underlying assumptions required for either approach, are examined.

359 citations


Journal ArticleDOI
TL;DR: The purpose of this tutorial is to describe procedures for estimating sample size for a variety of different experimental designs that are common in strength and conditioning research, using the G*Power software package.
Abstract: The statistical power, or sensitivity of an experiment, is defined as the probability of rejecting a false null hypothesis. Only 3 factors can affect statistical power: (a) the significance level ([alpha]), (b) the magnitude or size of the treatment effect (effect size), and (c) the sample size (n). Of these 3 factors, only the sample size can be manipulated by the investigator because the significance level is usually selected before the study, and the effect size is determined by the effectiveness of the treatment. Thus, selection of an appropriate sample size is one of the most important components of research design but is often misunderstood by beginning researchers. The purpose of this tutorial is to describe procedures for estimating sample size for a variety of different experimental designs that are common in strength and conditioning research. Emphasis is placed on selecting an appropriate effect size because this step fully determines sample size when power and the significance level are fixed. There are many different software packages that can be used for sample size estimation. However, I chose to describe the procedures for the G*Power software package (version 3.1.4) because this software is freely downloadable and capable of estimating sample size for many of the different statistical tests used in strength and conditioning research. Furthermore, G*Power provides a number of different auxiliary features that can be useful for researchers when designing studies. It is my hope that the procedures described in this article will be beneficial for researchers in the field of strength and conditioning.

MonographDOI
01 Jan 2013
TL;DR: The author revealed that traditional approaches to Statistical Analysis and Regression Analysis, as well as the Logic of Statistical Inference, had changed in recent years and needed to be rethought.
Abstract: PART ONE: GETTING STARTED WITH STATISTICAL ANALYSIS How Do I Prepare Data for Statistical Analysis? How Do I Examine Data Prior to Analysis? PART TWO: THE LOGIC OF STATISTICAL ANALYSIS: ISSUES REGARDING THE NATURE OF STATISTICS AND STATISTICAL TESTS Traditional Approaches to Statistical Analysis and the Logic of Statistical Inference Rethinking Traditional Paradigms Power, Effect Size and Hypothesis Testing Alternatives What Are the Assumptions of Statistical Testing? How Do I Select the Appropriate Statistical Test? PART THREE: ISSUES RELATED TO VARIABLES AND THEIR DISTRIBUTIONS How Do I Deal with Non-Normality, Missing Values and Outliers? Types of Variables and Their Treatment in Statistical Analysis PART FOUR: UNDERSTANDING THE BIG TWO: MAJOR QUESTIONS ABOUT ANALYSIS OF VARIANCE AND REGRESSION ANALYSIS Questions about Analysis of Variance Questions about Multiple Regression Analysis The Bigger Picture

Journal ArticleDOI
TL;DR: It is shown how the power may be standardized across different sample sizes in a wide range of models by considering the dependence of power on the number of groups used in the Hosmer-Lemeshow test.
Abstract: The Hosmer-Lemeshow test is a commonly used procedure for assessing goodness of fit in logistic regression. It has, for example, been widely used for evaluation of risk-scoring models. As with any statistical test, the power increases with sample size; this can be undesirable for goodness of fit tests because in very large data sets, small departures from the proposed model will be considered significant. By considering the dependence of power on the number of groups used in the Hosmer-Lemeshow test, we show how the power may be standardized across different sample sizes in a wide range of models. We provide and confirm mathematical derivations through simulation and analysis of data on 31,713 children from the Collaborative Perinatal Project. We make recommendations on how to choose the number of groups in the Hosmer-Lemeshow test based on sample size and provide example applications of the recommendations.

Journal ArticleDOI
TL;DR: A new test for testing the hypothesis H 0 is proposed and investigated to enjoy certain optimality and to be especially powerful against sparse alternatives and applications to gene selection are discussed.
Abstract: In the high-dimensional setting, this article considers three interrelated problems: (a) testing the equality of two covariance matrices and ; (b) recovering the support of ; and (c) testing the equality of and row by row. We propose a new test for testing the hypothesis H 0: and investigate its theoretical and numerical properties. The limiting null distribution of the test statistic is derived and the power of the test is studied. The test is shown to enjoy certain optimality and to be especially powerful against sparse alternatives. The simulation results show that the test significantly outperforms the existing methods both in terms of size and power. Analysis of a prostate cancer dataset is carried out to demonstrate the application of the testing procedures. When the null hypothesis of equal covariance matrices is rejected, it is often of significant interest to further investigate how they differ from each other. Motivated by applications in genomics, we also consider recovering the support of and ...

01 Nov 2013
TL;DR: Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs, which suggests that many researchers do not know the correct interpretation of a CI.
Abstract: Null hypothesis significance testing (NHST) is undoubtedly the most common inferential technique used to justify claims in the social sciences. However, even staunch defenders of NHST agree that its outcomes are often misinterpreted. Confidence intervals (CIs) have frequently been proposed as a more useful alternative to NHST, and their use is strongly encouraged in the APA Manual. Nevertheless, little is known about how researchers interpret CIs. In this study, 120 researchers and 442 students—all in the field of psychology—were asked to assess the truth value of six particular statements involving different interpretations of a CI. Although all six statements were false, both researchers and students endorsed, on average, more than three statements, indicating a gross misunderstanding of CIs. Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever. Our findings suggest that many researchers do not know the correct interpretation of a CI. The misunderstandings surrounding p-values and CIs are particularly unfortunate because they constitute the main tools by which psychologists draw conclusions from data.

Journal ArticleDOI
TL;DR: It is demonstrated that optimal methods are based on continuity-corrected versions of the Wilson interval or Yates’ test, and that commonly-held beliefs about weaknesses of tests are misleading.
Abstract: Many statistical methods rely on an underlying mathematical model of probability based on a simple approximation, one that is simultaneously well-known and yet frequently misunderstood. The Normal approximation to the Binomial distribution underpins a range of statistical tests and methods, including the calculation of accurate confidence intervals, performing goodness of fit and contingency tests, line- and model-fitting, and computational methods based upon these. A common mistake is in assuming that, since the probable distribution of error about the “true value” in the population is approximately Normally distributed, the same can be said for the error about an observation. This paper is divided into two parts: fundamentals and evaluation. First, we examine the estimation of confidence intervals using three initial approaches: the “Wald” (Normal) interval, the Wilson score interval and the “exact” Clopper-Pearson Binomial interval. Whereas the first two can be calculated directly from formula...

Journal ArticleDOI
TL;DR: It is proved theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that Y is not a risk factor when controlling for X, and recommended that hypothesis testing for no improvement be limited to evaluation of Y as a risk factors.
Abstract: Authors have proposed new methodology in recent years for evaluating the improvement in prediction performance gained by adding a new predictor, Y, to a risk model containing a set of baseline predictors, X, for a binary outcome D. We prove theoretically that null hypotheses concerning no improvement in performance are equivalent to the simple null hypothesis that Y is not a risk factor when controlling for X, H0 : P(D = 1 | X,Y ) = P(D = 1 | X). Therefore, testing for improvement in prediction performance is redundant if Y has already been shown to be a risk factor. We also investigate properties of tests through simulation studies, focusing on the change in the area under the ROC curve (AUC). An unexpected finding is that standard testing procedures that do not adjust for variability in estimated regression coefficients are extremely conservative. This may explain why the AUC is widely considered insensitive to improvements in prediction performance and suggests that the problem of insensitivity has to do with use of invalid procedures for inference rather than with the measure itself. To avoid redundant testing and use of potentially problematic methods for inference, we recommend that hypothesis testing for no improvement be limited to evaluation of Y as a risk factor, for which methods are well developed and widely available. Analyses of measures of prediction performance should focus on estimation rather than on testing for no improvement in performance. Copyright © 2013 John Wiley & Sons, Ltd.

Journal ArticleDOI
01 Mar 2013-Oikos
TL;DR: In this article, the authors compared different schemes and measures for testing model performance using 79 species from the North American Breeding Bird Survey (NBS) and found that higher levels of independence between test and training data lead to lower assessments of prediction accuracy.
Abstract: Distribution models are used to predict the likelihood of occurrence or abundance of a species at locations where census data are not available. An integral part of modelling is the testing of model performance. We compared different schemes and measures for testing model performance using 79 species from the North American Breeding Bird Survey. The four testing schemes we compared featured increasing independence between test and training data: resubstitution, random data hold-out and two spatially segregated data hold-out designs. The different testing measures also addressed different levels of information content in the dependent variable: regression R 2 for absolute abundance, squared correlation coefficient r 2 for relative abundance and AUC/Somer’s D for presence/absence. We found that higher levels of independence between test and training data lead to lower assessments of prediction accuracy. Even for data collected independently, spatial autocorrelation leads to dependence between random hold-out test data and training data, and thus to inflated measures of model performance. While there is a general awareness of the importance of autocorrelation to model building and hypothesis testing, its consequences via violation of independence between training and testing data have not been addressed systematically and comprehensively before. Furthermore, increasing information content (from correctly classifying presence/absence, to predicting relative abundance, to predicting absolute abundance) leads to decreasing predictive performance. The current tests for presence/absence distribution models are typically overly optimistic because a) the test and training data are not independent and b) the correct classification of presence/absence has a relatively low information content and thus capability to address ecological and conservation questions compared to a prediction of abundance. Meaningful evaluation of model performance requires testing on spatially independent data, if the intended application of the model is to predict into new geographic or climatic space, which arguably is the case for most applications of distribution models.

Journal ArticleDOI
TL;DR: The pitfalls for commonly used statistical techniques in dental research are described and some recommendations for avoiding them and the potential of some of the newer statistical techniques for dental research is explored.

Journal ArticleDOI
TL;DR: In this paper, a new scheme of hypothesis classification is proposed to facilitate and clarify the proper use of statistical hypothesis testing in empirical research, based on the explicated, sound relationship between the research and statistical hypotheses.

Journal ArticleDOI
TL;DR: This paper proposes a simultaneous testing procedure for conditional dependence in GGM by a multiple testing procedure that can control the false discovery rate (FDR) asymptotically and the numerical performance shows that the method works quite well.
Abstract: This paper studies the estimation of a high-dimensional Gaussian graphical model (GGM). Typically, the existing methods depend on regularization techniques. As a result, it is necessary to choose the regularized parameter. However, the precise relationship between the regularized parameter and the number of false edges in GGM estimation is unclear. In this paper we propose an alternative method by a multiple testing procedure. Based on our new test statistics for conditional dependence, we propose a simultaneous testing procedure for conditional dependence in GGM. Our method can control the false discovery rate (FDR) asymptotically. The numerical performance of the proposed method shows that our method works quite well.

Journal ArticleDOI
TL;DR: In this article, the authors extend the cross sectionally augmented panel unit root test proposed by Pesaran (2007) to the case of a multifactor error structure by exploiting information regarding the unobserved factors that are shared by other time series in addition to the variable under consideration.

Journal ArticleDOI
TL;DR: A test based on earlier work by Chernoff for binary hypothesis testing, is shown to be first-order asymptotically optimal for multihypothesis testing in a strong sense, using the notion of decision making risk in place of the overall probability of error.
Abstract: The problem of multiple hypothesis testing with observation control is considered in both fixed sample size and sequential settings. In the fixed sample size setting, for binary hypothesis testing, the optimal exponent for the maximal error probability corresponds to the maximum Chernoff information over the choice of controls, and a pure stationary open-loop control policy is asymptotically optimal within the larger class of all causal control policies. For multihypothesis testing in the fixed sample size setting, lower and upper bounds on the optimal error exponent are derived. It is also shown through an example with three hypotheses that the optimal causal control policy can be strictly better than the optimal open-loop control policy. In the sequential setting, a test based on earlier work by Chernoff for binary hypothesis testing, is shown to be first-order asymptotically optimal for multihypothesis testing in a strong sense, using the notion of decision making risk in place of the overall probability of error. Another test is also designed to meet hard risk constrains while retaining asymptotic optimality. The role of past information and randomization in designing optimal control policies is discussed.

Book
01 Jan 2013
TL;DR: Roles of modeling in Statistical Inference: Likelihood construction and estimation, likelihood-based tests and confidence regions, Bayesian Inference, M-Estimation, Hypothesis Tests under Misspecification and Relaxed Assumptions.
Abstract: Roles of Modeling in Statistical Inference.- Likelihood Construction and Estimation.- Likelihood-Based Tests and Confidence Regions.- Bayesian Inference.- Large Sample Theory: The Basics.- Large Sample Results for Likelihood-Based Methods.- M-Estimation (Estimating Equations).- Hypothesis Tests under Misspecification and Relaxed Assumptions.- Monte Carlo Simulation Studies.- Jackknife.- Bootstrap.- Permutation and Rank Tests.- Appendix: Derivative Notation and Formulas.- References.- Author Index.- Example Index.- R-code Index.- Subject Index.

Journal ArticleDOI
TL;DR: A dimension reduction method that can take advantage of the hierarchical factor structure so that the integrals can be approximated far more efficiently and a new test statistic that can be substantially better calibrated and more powerful than the original M2 statistic when the test is long and the items are polytomous is proposed.
Abstract: In applications of item response theory, assessment of model fit is a critical issue. Recently, limited-information goodness-of-fit testing has received increased attention in the psychometrics literature. In contrast to full-information test statistics such as Pearson's X(2) or the likelihood ratio G(2) , these limited-information tests utilize lower-order marginal tables rather than the full contingency table. A notable example is Maydeu-Olivares and colleagues'M2 family of statistics based on univariate and bivariate margins. When the contingency table is sparse, tests based on M2 retain better Type I error rate control than the full-information tests and can be more powerful. While in principle the M2 statistic can be extended to test hierarchical multidimensional item factor models (e.g., bifactor and testlet models), the computation is non-trivial. To obtain M2 , a researcher often has to obtain (many thousands of) marginal probabilities, derivatives, and weights. Each of these must be approximated with high-dimensional numerical integration. We propose a dimension reduction method that can take advantage of the hierarchical factor structure so that the integrals can be approximated far more efficiently. We also propose a new test statistic that can be substantially better calibrated and more powerful than the original M2 statistic when the test is long and the items are polytomous. We use simulations to demonstrate the performance of our new methods and illustrate their effectiveness with applications to real data.

BookDOI
11 Jan 2013
TL;DR: Research Design and Statistical Analysis as discussed by the authors provides comprehensive coverage of the design principles and statistical concepts necessary to make sense of real data and provides a strong conceptual foundation to enable readers to generalize concepts to new research situations.
Abstract: Research Design and Statistical Analysis provides comprehensive coverage of the design principles and statistical concepts necessary to make sense of real data. The book’s goal is to provide a strong conceptual foundation to enable readers to generalize concepts to new research situations. Emphasis is placed on the underlying logic and assumptions of the analysis and what it tells the researcher, the limitations of the analysis, and the consequences of violating assumptions. Sampling, design efficiency, and statistical models are emphasized throughout. As per APA recommendations, emphasis is also placed on data exploration, effect size measures, confidence intervals, and using power analyses to determine sample size. "Real-world" data sets are used to illustrate data exploration, analysis, and interpretation. The book offers a rare blend of the underlying statistical assumptions, the consequences of their violations, and practical advice on dealing with them. Changes in the New Edition: Each section of the book concludes with a chapter that provides an integrated example of how to apply the concepts and procedures covered in the chapters of the section. In addition, the advantages and disadvantages of alternative designs are discussed. A new chapter (1) reviews the major steps in planning and executing a study, and the implications of those decisions for subsequent analyses and interpretations. A new chapter (13) compares experimental designs to reinforce the connection between design and analysis and to help readers achieve the most efficient research study. A new chapter (27) on common errors in data analysis and interpretation. Increased emphasis on power analyses to determine sample size using the G*Power 3 program. Many new data sets and problems. More examples of the use of SPSS (PASW) Version 17, although the analyses exemplified are readily carried out by any of the major statistical software packages. A companion website with the data used in the text and the exercises in SPSS and Excel formats; SPSS syntax files for performing analyses; extra material on logistic and multiple regression; technical notes that develop some of the formulas; and a solutions manual and the text figures and tables for instructors only. Part 1 reviews research planning, data exploration, and basic concepts in statistics including sampling, hypothesis testing, measures of effect size, estimators, and confidence intervals. Part 2 presents between-subject designs. The statistical models underlying the analysis of variance for these designs are emphasized, along with the role of expected mean squares in estimating effects of variables, the interpretation of nteractions, and procedures for testing contrasts and controlling error rates. Part 3 focuses on repeated-measures designs and considers the advantages and disadvantages of different mixed designs. Part 4 presents detailed coverage of correlation and bivariate and multiple regression with emphasis on interpretation and common errors, and discusses the usefulness and limitations of these procedures as tools for prediction and for developing theory. This is one of the few books with coverage sufficient for a 2-semester course sequence in experimental design and statistics as taught in psychology, education, and other behavioral, social, and health sciences. Incorporating the analyses of both experimental and observational data provides continuity of concepts and notation. Prerequisites include courses on basic research methods and statistics. The book is also an excellent resource for practicing researchers.

01 Jan 2013
TL;DR: In this paper, a Statistical Model Checking (SMC) approach based on Bayesian statistics is proposed for hybrid systems with stochastic transitions, a generaliza- tion of Simulink/Stateow models.
Abstract: We address the problem of model checking stochastic systems, i.e., checking whether a stochastic system satises a certain temporal property with a probability greater (or smaller) than a xed threshold. In particular, we present a Statistical Model Checking (SMC) approach based on Bayesian statistics. We show that our approach is feasible for a certain class of hybrid systems with stochastic transitions, a generaliza- tion of Simulink/Stateow models. Standard approaches to stochastic discrete systems require numerical solutions for large optimization problems and quickly become infea- sible with larger state spaces. Generalizations of these techniques to hybrid systems with stochastic eects are even more challenging. The SMC approach was pioneered by Younes and Simmons in the discrete and non-Bayesian case. It solves the verication problem by combining randomized sampling of system traces (which is very ecient for Simulink/Stateow ) with hypothesis testing (i.e., testing against a probability thresh-

Journal ArticleDOI
TL;DR: It is demonstrated that simple extensions of TRCA can provide most distinctive signals for two tasks and can integrate multiple modalities of information to remove task-unrelated artifacts.

Posted Content
TL;DR: In this paper, a multiple testing procedure was proposed to control the false discovery rate (FDR) asymptotically in Gaussian graphical models (GGM) estimation.
Abstract: This paper studies the estimation of high dimensional Gaussian graphical model (GGM). Typically, the existing methods depend on regularization techniques. As a result, it is necessary to choose the regularized parameter. However, the precise relationship between the regularized parameter and the number of false edges in GGM estimation is unclear. Hence, it is impossible to evaluate their performance rigorously. In this paper, we propose an alternative method by a multiple testing procedure. Based on our new test statistics for conditional dependence, we propose a simultaneous testing procedure for conditional dependence in GGM. Our method can control the false discovery rate (FDR) asymptotically. The numerical performance of the proposed method shows that our method works quite well.

Book ChapterDOI
01 Jan 2013
TL;DR: This chapter first reviews the fundamental concepts of machine learning such as feature assessment, unsupervised versus supervised learning and types of classification, and introduces some supervised learning methods.
Abstract: Traditional statistical tests are unable to handle large numbers of variables. The simplest method to reduce large numbers of variables is the use of add-up scores. But add-up scores do not account the relative importance of the separate variables, their interactions and differences in units. Machine learning can be defined as knowledge for making predictions as obtained from processing training data through a computer. If data sets involve multiple variables, data analyses will be complex, and modern computationally intensive methods will have to be applied for analysis.