Showing papers on "Statistical hypothesis testing published in 2018"

PDF

Open Access

Journal Article•DOI•

Correlation Coefficients: Appropriate Use and Interpretation.

[...]

Patrick Schober¹, Christa Boer, Lothar A. Schwarte•Institutions (1)

01 May 2018-Anesthesia & Analgesia

TL;DR: The aim of this tutorial is to guide researchers and clinicians in the appropriate use and interpretation of correlation coefficients.

...read moreread less

Abstract: Correlation in the broadest sense is a measure of an association between variables. In correlated data, the change in the magnitude of 1 variable is associated with a change in the magnitude of another variable, either in the same (positive correlation) or in the opposite (negative correlation) direction. Most often, the term correlation is used in the context of a linear relationship between 2 continuous variables and expressed as Pearson product-moment correlation. The Pearson correlation coefficient is typically used for jointly normally distributed data (data that follow a bivariate normal distribution). For nonnormally distributed continuous data, for ordinal data, or for data with relevant outliers, a Spearman rank correlation can be used as a measure of a monotonic association. Both correlation coefficients are scaled such that they range from -1 to +1, where 0 indicates that there is no linear or monotonic association, and the relationship gets stronger and ultimately approaches a straight line (Pearson correlation) or a constantly increasing or decreasing curve (Spearman correlation) as the coefficient approaches an absolute value of 1. Hypothesis tests and confidence intervals can be used to address the statistical significance of the results and to estimate the strength of the relationship in the population from which the data were sampled. The aim of this tutorial is to guide researchers and clinicians in the appropriate use and interpretation of correlation coefficients.

...read moreread less

3,452 citations

Journal Article•DOI•

Bayesian inference for psychology. Part II: Example applications with JASP

[...]

Eric-Jan Wagenmakers¹, Jonathon Love¹, Maarten Marsman¹, Tahira Jamil¹, Alexander Ly¹, Josine Verhagen¹, Ravi Selker¹, Quentin Frederik Gronau¹, Damian Dropmann¹, Bruno Boutin¹, Frans Meerhoff¹, Patrick Knight¹, Akash Raj², Erik-Jan van Kesteren¹, Johnny van Doorn¹, Martin Šmíra³, Sacha Epskamp¹, Alexander Etz⁴, Dora Matzke¹, Tim de Jong¹, Don van den Bergh¹, Alexandra Sarafoglou¹, Helen Steingroever¹, Koen Derks¹, Jeffrey N. Rouder⁵, Richard D. Morey⁶ - Show less +22 more•Institutions (6)

University of Amsterdam¹, Birla Institute of Technology and Science², Masaryk University³, University of California, Irvine⁴, University of Missouri⁵, Cardiff University⁶

01 Feb 2018-Psychonomic Bulletin & Review

TL;DR: This part of this series introduces JASP (http://www.jasp-stats.org), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems.

...read moreread less

Abstract: Bayesian hypothesis testing presents an attractive alternative to p value hypothesis testing. Part I of this series outlined several advantages of Bayesian hypothesis testing, including the ability to quantify evidence and the ability to monitor and update this evidence as data come in, without the need to know the intention with which the data were collected. Despite these and other practical advantages, Bayesian hypothesis tests are still reported relatively rarely. An important impediment to the widespread adoption of Bayesian tests is arguably the lack of user-friendly software for the run-of-the-mill statistical problems that confront psychologists for the analysis of almost every experiment: the t-test, ANOVA, correlation, regression, and contingency tables. In Part II of this series we introduce JASP (http://www.jasp-stats.org), an open-source, cross-platform, user-friendly graphical software package that allows users to carry out Bayesian hypothesis tests for standard statistical problems. JASP is based in part on the Bayesian analyses implemented in Morey and Rouder’s BayesFactor package for R. Armed with JASP, the practical advantages of Bayesian hypothesis testing are only a mouse click away.

...read moreread less

1,031 citations

Journal Article•DOI•

Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications.

[...]

Eric-Jan Wagenmakers¹, Maarten Marsman¹, Tahira Jamil¹, Alexander Ly¹, Josine Verhagen¹, Jonathon Love¹, Ravi Selker¹, Quentin Frederik Gronau¹, Martin Šmíra², Sacha Epskamp¹, Dora Matzke¹, Jeffrey N. Rouder³, Richard D. Morey⁴ - Show less +9 more•Institutions (4)

University of Amsterdam¹, Masaryk University², University of Missouri³, Cardiff University⁴

01 Feb 2018-Psychonomic Bulletin & Review

TL;DR: Ten prominent advantages of the Bayesian approach are outlined, and several objections to Bayesian hypothesis testing are countered.

...read moreread less

Abstract: Bayesian parameter estimation and Bayesian hypothesis testing present attractive alternatives to classical inference using confidence intervals and p values. In part I of this series we outline ten prominent advantages of the Bayesian approach. Many of these advantages translate to concrete opportunities for pragmatic researchers. For instance, Bayesian hypothesis testing allows researchers to quantify evidence and monitor its progression as data come in, without needing to know the intention with which the data were collected. We end by countering several objections to Bayesian hypothesis testing. Part II of this series discusses JASP, a free and open source software program that makes it easy to conduct Bayesian estimation and testing for a range of popular statistical scenarios (Wagenmakers et al. this issue).

...read moreread less

940 citations

Journal Article•DOI•

Equivalence Testing for Psychological Research: A Tutorial

[...]

Daniel Lakens, Anne M. Scheel, Peder M. Isager

01 Jun 2018

TL;DR: Two One-Sided Tests (TOSTs) as discussed by the authors were used to test both for the presence of an effect and for the absence of a effect in a test set.

...read moreread less

Abstract: Psychologists must be able to test both for the presence of an effect and for the absence of an effect. In addition to testing against zero, researchers can use the two one-sided tests (TOST) proce...

...read moreread less

721 citations

Journal Article•DOI•

Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests (tsfresh – A Python package)

[...]

Maximilian Christ, Nils Braun¹, Julius Neuffer, Andreas W. Kempa-Liehr², Andreas W. Kempa-Liehr³ - Show less +1 more•Institutions (3)

Karlsruhe Institute of Technology¹, University of Auckland², University of Freiburg³

13 Sep 2018-Neurocomputing

TL;DR: The Python package tsfresh (Time Series FeatuRe Extraction on basis of Scalable Hypothesis tests) accelerates this process by combining 63 time series characterization methods, which by default compute a total of 794 time series features, with feature selection on basis automatically configured hypothesis tests.

...read moreread less

626 citations

Journal Article•DOI•

The Bayesian New Statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective

[...]

John K. Kruschke¹, Torrin M. Liddell¹•Institutions (1)

Indiana University¹

01 Feb 2018-Psychonomic Bulletin & Review

TL;DR: In this paper, the authors compare Bayesian and frequentist approaches to hypothesis testing and estimation with confidence or credible intervals, and explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods.

...read moreread less

Abstract: In the practice of data analysis, there is a conceptual distinction between hypothesis testing, on the one hand, and estimation with quantified uncertainty on the other. Among frequentists in psychology, a shift of emphasis from hypothesis testing to estimation has been dubbed "the New Statistics" (Cumming 2014). A second conceptual distinction is between frequentist methods and Bayesian methods. Our main goal in this article is to explain how Bayesian methods achieve the goals of the New Statistics better than frequentist methods. The article reviews frequentist and Bayesian approaches to hypothesis testing and to estimation with confidence or credible intervals. The article also describes Bayesian approaches to meta-analysis, randomized controlled trials, and power analysis.

...read moreread less

562 citations

Raincloud plots: a multi-platform tool for robust data visualization.

[...]

Micah Allen¹, Micah Allen², Micah Allen³, Davide Poggiali⁴, Kirstie Whitaker³, Tom R. Marshall⁵, Rogier A. Kievit⁶ - Show less +3 more•Institutions (6)

Aarhus University Hospital¹, Aarhus University², University of Cambridge³, University of Padua⁴, University of Oxford⁵, Cognition and Brain Sciences Unit⁶

01 Jan 2018

TL;DR: This tutorial paper provides basic demonstrations of the strength of raincloud plots and similar approaches, outlines potential modifications for their optimal use, and provides open-source code for their streamlined implementation in R, Python and Matlab.

...read moreread less

Abstract: Across scientific disciplines, there is a rapidly growing recognition of the need for more statistically robust, transparent approaches to data visualization. Complementary to this, many scientists have called for plotting tools that accurately and transparently convey key aspects of statistical effects and raw data with minimal distortion. Previously common approaches, such as plotting conditional mean or median barplots together with error-bars have been criticized for distorting effect size, hiding underlying patterns in the raw data, and obscuring the assumptions upon which the most commonly used statistical tests are based. Here we describe a data visualization approach which overcomes these issues, providing maximal statistical information while preserving the desired 'inference at a glance' nature of barplots and other similar visualization devices. These "raincloud plots" can visualize raw data, probability density, and key summary statistics such as median, mean, and relevant confidence intervals in an appealing and flexible format with minimal redundancy. In this tutorial paper, we provide basic demonstrations of the strength of raincloud plots and similar approaches, outline potential modifications for their optimal use, and provide open-source code for their streamlined implementation in R, Python and Matlab ( https://github.com/RainCloudPlots/RainCloudPlots). Readers can investigate the R and Python tutorials interactively in the browser using Binder by Project Jupyter.

...read moreread less

505 citations

Journal Article•DOI•

Data-driven robust optimization

[...]

Dimitris Bertsimas¹, Vishal Gupta², Nathan Kallus³•Institutions (3)

Massachusetts Institute of Technology¹, University of Southern California², Cornell University³

01 Feb 2018-Mathematical Programming

TL;DR: This work proposes a novel schema for utilizing data to design uncertainty sets for robust optimization using statistical hypothesis tests, and shows that data-driven sets significantly outperform traditional robust optimization techniques whenever data is available.

...read moreread less

Abstract: The last decade witnessed an explosion in the availability of data for operations research applications. Motivated by this growing availability, we propose a novel schema for utilizing data to design uncertainty sets for robust optimization using statistical hypothesis tests. The approach is flexible and widely applicable, and robust optimization problems built from our new sets are computationally tractable, both theoretically and practically. Furthermore, optimal solutions to these problems enjoy a strong, finite-sample probabilistic guarantee whenever the constraints and objective function are concave in the uncertainty. We describe concrete procedures for choosing an appropriate set for a given application and applying our approach to multiple uncertain constraints. Computational evidence in portfolio management and queueing confirm that our data-driven sets significantly outperform traditional robust optimization techniques whenever data are available.

...read moreread less

430 citations

Posted Content•

Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

[...]

Sebastian Raschka¹•Institutions (1)

University of Wisconsin-Madison¹

13 Nov 2018-arXiv: Learning

TL;DR: Different flavors of the bootstrap technique are introduced for estimating the uncertainty of performance estimates, as an alternative to confidence intervals via normal approximation if bootstrapping is computationally feasible.

...read moreread less

Abstract: The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning. Common methods such as the holdout method for model evaluation and selection are covered, which are not recommended when working with small datasets. Different flavors of the bootstrap technique are introduced for estimating the uncertainty of performance estimates, as an alternative to confidence intervals via normal approximation if bootstrapping is computationally feasible. Common cross-validation techniques such as leave-one-out cross-validation and k-fold cross-validation are reviewed, the bias-variance trade-off for choosing k is discussed, and practical tips for the optimal choice of k are given based on empirical evidence. Different statistical tests for algorithm comparisons are presented, and strategies for dealing with multiple comparisons such as omnibus tests and multiple-comparison corrections are discussed. Finally, alternative methods for algorithm selection, such as the combined F-test 5x2 cross-validation and nested cross-validation, are recommended for comparing machine learning algorithms when datasets are small.

...read moreread less

408 citations

Journal Article•DOI•

Rejecting or Accepting Parameter Values in Bayesian Estimation

[...]

John K. Kruschke¹•Institutions (1)

Indiana University¹

08 May 2018

TL;DR: In this paper, a decision rule that uses Bayesian posterior distributions as the basis for accepting or rejecting null values of parameters is presented, focusing on the range of plausible values indicated by the highest density interval of the posterior distribution and the relation between this range and a region of practical equivalence (ROPE) around the null value.

...read moreread less

Abstract: This article explains a decision rule that uses Bayesian posterior distributions as the basis for accepting or rejecting null values of parameters. This decision rule focuses on the range of plausible values indicated by the highest density interval of the posterior distribution and the relation between this range and a region of practical equivalence (ROPE) around the null value. The article also discusses considerations for setting the limits of a ROPE and emphasizes that analogous considerations apply to setting the decision thresholds for p values and Bayes factors.

...read moreread less

328 citations

Journal Article•DOI•

Applying the Model-Comparison Approach to Test Specific Research Hypotheses in Psychophysical Research Using the Palamedes Toolbox.

[...]

Nicolaas Prins¹, Frederick A. A. Kingdom²•Institutions (2)

University of Mississippi¹, McGill University²

23 Jul 2018-Frontiers in Psychology

TL;DR: A relatively new toolbox of Matlab routines called Palamedes is introduced which allows users to perform sophisticated model comparisons and generalize to any field in which statistical hypotheses are tested.

...read moreread less

Abstract: In the social sciences it is common practice to test specific theoretically motivated research hypotheses using formal statistical procedures. Typically, students in these disciplines are trained in such methods starting at an early stage in their academic tenure. On the other hand, in psychophysical research, where parameter estimates are generally obtained using a maximum-likelihood (ML) criterion and data do not lend themselves well to the least-squares methods taught in introductory courses, it is relatively uncommon to see formal model comparisons performed. Rather, it is common practice to estimate the parameters of interest (e.g., detection thresholds) and their standard errors individually across the different experimental conditions and to 'eyeball' whether the observed pattern of parameter estimates supports or contradicts some proposed hypothesis. We believe that this is at least in part due to a lack of training in the proper methodology as well as a lack of available software to perform such model comparisons when ML estimators are used. We introduce here a relatively new toolbox of Matlab routines called Palamedes which allows users to perform sophisticated model comparisons. In Palamedes, we implement the model-comparison approach to hypothesis testing. This approach allows researchers considerable flexibility in targeting specific research hypotheses. We discuss in a non-technical manner how this method can be used to perform statistical model comparisons when ML estimators are used. With Palamedes we hope to make sophisticated statistical model comparisons available to researchers who may not have the statistical background or the programming skills to perform such model comparisons from scratch. Note that while Palamedes is specifically geared toward psychophysical data, the core ideas behind the model-comparison approach that our paper discusses generalize to any field in which statistical hypotheses are tested.

...read moreread less

Journal Article•DOI•

Surrogate data for hypothesis testing of physical systems

[...]

Gemma Lancaster¹, Dmytro Iatsenko², Aleksandra Pidde¹, Aleksandra Pidde³, Valentina Ticcinelli¹, Aneta Stefanovska¹ - Show less +2 more•Institutions (3)

Lancaster University¹, Deutsche Bank², Pompeu Fabra University³

18 Jul 2018-Physics Reports

TL;DR: A detailed overview of a wide range of surrogate types is provided, which include Fourier transform based surrogates, which have since been developed to test increasingly varied null hypotheses while characterizing the dynamics of complex systems, including uncorrelated and correlated noise, coupling between systems, and synchronization.

...read moreread less

Journal Article•DOI•

Bayesian alternatives for common null-hypothesis significance tests in psychiatry: a non-technical guide using JASP

[...]

Daniel Quintana¹, Donald R. Williams²•Institutions (2)

Oslo University Hospital¹, University of California, Davis²

07 Jun 2018-BMC Psychiatry

TL;DR: An applied introduction to Bayesian inference with Bayes factors using JASP provides a straightforward means of performing reproducible Bayesian hypothesis tests using a graphical “point and click” environment that will be familiar to researchers conversant with other graphical statistical packages, such as SPSS.

...read moreread less

Abstract: Despite its popularity as an inferential framework, classical null hypothesis significance testing (NHST) has several restrictions. Bayesian analysis can be used to complement NHST, however, this approach has been underutilized largely due to a dearth of accessible software options. JASP is a recently developed open-source statistical package that facilitates both Bayesian and NHST analysis using a graphical interface. This article provides an applied introduction to Bayesian inference with Bayes factors using JASP. We use JASP to compare and contrast Bayesian alternatives for several common classical null hypothesis significance tests: correlations, frequency distributions, t-tests, ANCOVAs, and ANOVAs. These examples are also used to illustrate the strengths and limitations of both NHST and Bayesian hypothesis testing. A comparison of NHST and Bayesian inferential frameworks demonstrates that Bayes factors can complement p-values by providing additional information for hypothesis testing. Namely, Bayes factors can quantify relative evidence for both alternative and null hypotheses. Moreover, the magnitude of this evidence can be presented as an easy-to-interpret odds ratio. While Bayesian analysis is by no means a new method, this type of statistical inference has been largely inaccessible for most psychiatry researchers. JASP provides a straightforward means of performing reproducible Bayesian hypothesis tests using a graphical “point and click” environment that will be familiar to researchers conversant with other graphical statistical packages, such as SPSS.

...read moreread less

Journal Article•DOI•

A Survey of Statistical Model Checking

[...]

Gul Agha¹, Karl Palmskog¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

31 Jan 2018-ACM Transactions on Modeling and Computer Simulation

TL;DR: SMC provides a more widely applicable and scalable alternative to analysis of properties of stochastic systems using numerical and symbolic methods, while emphasizing current limitations and tradeoffs between precision and scalability.

...read moreread less

Abstract: Interactive, distributed, and embedded systems often behave stochastically, for example, when inputs, message delays, or failures conform to a probability distribution. However, reasoning analytically about the behavior of complex stochastic systems is generally infeasible. While simulations of systems are commonly used in engineering practice, they have not traditionally been used to reason about formal specifications. Statistical model checking (SMC) addresses this weakness by using a simulation-based approach to reason about precise properties specified in a stochastic temporal logic. A specification for a communication system may state that within some time bound, the probability that the number of messages in a queue will be greater than 5 must be less than 0.01. Using SMC, executions of a stochastic system are first sampled, after which statistical techniques are applied to determine whether such a property holds. While the output of sample-based methods are not always correct, statistical inference can quantify the confidence in the result produced. In effect, SMC provides a more widely applicable and scalable alternative to analysis of properties of stochastic systems using numerical and symbolic methods. SMC techniques have been successfully applied to analyze systems with large state spaces in areas such as computer networking, security, and systems biology. In this article, we survey SMC algorithms, techniques, and tools, while emphasizing current limitations and tradeoffs between precision and scalability.

...read moreread less

Journal Article•DOI•

On p-Values and Bayes Factors

[...]

Leonhard Held¹, Manuela Ott¹•Institutions (1)

University of Zurich¹

07 Mar 2018-Social Science Research Network

TL;DR: The relationship between p-values and minimum Bayes factors also depends on the sample size and on the dimension of the parameter of interest as discussed by the authors, and the relationship between the two-sided significance tests for a point null hypothesis in more detail.

...read moreread less

Abstract: The p-value quantifies the discrepancy between the data and a null hypothesis of interest, usually the assumption of no difference or no effect. A Bayesian approach allows the calibration of p-values by transforming them to direct measures of the evidence against the null hypothesis, so-called Bayes factors. We review the available literature in this area and consider two-sided significance tests for a point null hypothesis in more detail. We distinguish simple from local alternative hypotheses and contrast traditional Bayes factors based on the data with Bayes factors based on p-values or test statistics. A well-known finding is that the minimum Bayes factor, the smallest possible Bayes factor within a certain class of alternative hypotheses, provides less evidence against the null hypothesis than the corresponding p-value might suggest. It is less known that the relationship between p-values and minimum Bayes factors also depends on the sample size and on the dimension of the parameter of interest. We i...

...read moreread less

Journal Article•DOI•

Distributed testing and estimation under sparse high dimensional models.

[...]

Heather Battey¹, Heather Battey², Jianqing Fan², Jianqing Fan³, Han Liu², Junwei Lu², Ziwei Zhu² - Show less +3 more•Institutions (3)

Imperial College London¹, Princeton University², Fudan University³

03 May 2018-Annals of Statistics

TL;DR: This paper addresses the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible.

...read moreread less

Abstract: This paper studies hypothesis testing and parameter estimation in the context of the divide-and-conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various statistics from k subsamples of size n/k, where n is the sample size. In both low dimensional and sparse high dimensional settings, we address the important question of how large k can be, as n grows large, such that the loss of efficiency due to the divide-and-conquer algorithm is negligible. In other words, the resulting estimators have the same inferential efficiencies and estimation rates as an oracle with access to the full sample. Thorough numerical results are provided to back up the theory.

...read moreread less

Journal Article•DOI•

Kernel-based Tests for Joint Independence

[...]

Niklas Pfister, Peter Bühlmann, Bernhard Schölkopf¹, Jonas Peters², Jonas Peters¹ - Show less +1 more•Institutions (2)

Max Planck Society¹, University of Copenhagen²

01 Jan 2018-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: This work embeds the joint distribution and the product of the marginals in a reproducing kernel Hilbert space and defines the d‐variable Hilbert–Schmidt independence criterion dHSIC as the squared distance between the embeddings.

...read moreread less

Abstract: Summary We investigate the problem of testing whether d possibly multivariate random variables, which may or may not be continuous, are jointly (or mutually) independent. Our method builds on ideas of the two-variable Hilbert–Schmidt independence criterion but allows for an arbitrary number of variables. We embed the joint distribution and the product of the marginals in a reproducing kernel Hilbert space and define the d-variable Hilbert–Schmidt independence criterion dHSIC as the squared distance between the embeddings. In the population case, the value of dHSIC is 0 if and only if the d variables are jointly independent, as long as the kernel is characteristic. On the basis of an empirical estimate of dHSIC, we investigate three non-parametric hypothesis tests: a permutation test, a bootstrap analogue and a procedure based on a gamma approximation. We apply non-parametric independence testing to a problem in causal discovery and illustrate the new methods on simulated and real data sets.

...read moreread less

Journal Article•DOI•

Synthetic Control Method: Inference, Sensitivity Analysis and Confidence Sets

[...]

Sergio Firpo¹, Vítor Augusto Possebom²•Institutions (2)

Insper¹, Yale University²

25 Sep 2018

TL;DR: In this article, the authors propose parametric weights for the p-value that includes the equal weights benchmark of Abadie et al. and invert the test statistic to estimate confidence sets that quickly show the point-estimates' precision, and the test's significance and robustness.

...read moreread less

Abstract: We extend the inference procedure for the synthetic control method in two ways First, we propose parametric weights for the p-value that includes the equal weights benchmark of Abadie et al [1] By changing the value of this parameter, we can analyze the sensitivity of the test’s result to deviations from the equal weights benchmark Second, we modify the RMSPE statistic to test any sharp null hypothesis, including, as a specific case, the null hypothesis of no effect whatsoever analyzed by Abadie et al [1] Based on this last extension, we invert the test statistic to estimate confidence sets that quickly show the point-estimates’ precision, and the test’s significance and robustness We also extend these two tools to other test statistics and to problems with multiple outcome variables or multiple treated units Furthermore, in a Monte Carlo experiment, we find that the RMSPE statistic has good properties with respect to size, power and robustness Finally, we illustrate the usefulness of our proposed tools by reanalyzing the economic impact of ETA’s terrorism in the Basque Country, studied first by Abadie and Gardeazabal [2] and Abadie et al [3]

...read moreread less

Journal Article•DOI•

Choosing between methods of combining p-values

[...]

Nicholas A. Heard¹, Patrick Rubin-Delanchy²•Institutions (2)

Imperial College London¹, University of Bristol²

01 Mar 2018-Biometrika

TL;DR: A diverse range of p-value combination methods appear in the literature, each with different statistical properties as mentioned in this paper, and the final choice used in a meta-analysis can appear arbitrary, as if all effort has been expended building the models that gave rise to the pvalues.

...read moreread less

Abstract: Combining p-values from independent statistical tests is a popular approach to meta-analysis, particularly when the data underlying the tests are either no longer available or are difficult to combine. A diverse range of p-value combination methods appear in the literature, each with different statistical properties. Yet all too often the final choice used in a meta-analysis can appear arbitrary, as if all effort has been expended building the models that gave rise to the p-values. Birnbaum (1954) showed that any reasonable p-value combiner must be optimal against some alternative hypothesis. Starting from this perspective and recasting each method of combining p-values as a likelihood ratio test, we present theoretical results for some of the standard combiners which provide guidance about how a powerful combiner might be chosen in practice.

...read moreread less

Proceedings Article•

Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment

[...]

Masatoshi Tsuchiya¹•Institutions (1)

Toyohashi University of Technology¹

01 Apr 2018

TL;DR: The experimental result of the Stanford Natural Language Inference (SNLI) corpus indicates that the SNLI corpus has a hidden bias which allows prediction of textual entailment labels from hypothesis sentences even if no context information is given by a premise sentence.

...read moreread less

Abstract: The quality of training data is one of the crucial problems when a learning-centered approach is employed. This paper proposes a new method to investigate the quality of a large corpus designed for the recognizing textual entailment (RTE) task. The proposed method, which is inspired by a statistical hypothesis test, consists of two phases: the first phase is to introduce the predictability of textual entailment labels as a null hypothesis which is extremely unacceptable if a target corpus has no hidden bias, and the second phase is to test the null hypothesis using a Naive Bayes model. The experimental result of the Stanford Natural Language Inference (SNLI) corpus does not reject the null hypothesis. Therefore, it indicates that the SNLI corpus has a hidden bias which allows prediction of textual entailment labels from hypothesis sentences even if no context information is given by a premise sentence. This paper also presents the performance impact of NN models for RTE caused by this hidden bias.

...read moreread less

Journal Article•DOI•

Large-scale kernel methods for independence testing

[...]

Qinyi Zhang¹, Sarah Filippi², Arthur Gretton³, Dino Sejdinovic¹•Institutions (3)

University of Oxford¹, Imperial College London², University College London³

01 Jan 2018-Statistics and Computing

TL;DR: This contribution provides an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nyström and random Fourier feature approaches and demonstrates that the methods give comparable performance with existing methods while using significantly less computation time and memory.

...read moreread less

Abstract: Representations of probability measures in reproducing kernel Hilbert spaces provide a flexible framework for fully nonparametric hypothesis tests of independence, which can capture any type of departure from independence, including nonlinear associations and multivariate interactions. However, these approaches come with an at least quadratic computational cost in the number of observations, which can be prohibitive in many applications. Arguably, it is exactly in such large-scale datasets that capturing any type of dependence is of interest, so striking a favourable trade-off between computational efficiency and test performance for kernel independence tests would have a direct impact on their applicability in practice. In this contribution, we provide an extensive study of the use of large-scale kernel approximations in the context of independence testing, contrasting block-based, Nystrom and random Fourier feature approaches. Through a variety of synthetic data experiments, it is demonstrated that our large-scale methods give comparable performance with existing methods while using significantly less computation time and memory.

...read moreread less

Journal Article•DOI•

Small-Sample Methods for Cluster-Robust Variance Estimation and Hypothesis Testing in Fixed Effects Models

[...]

James E. Pustejovsky¹, Elizabeth Tipton²•Institutions (2)

University of Texas at Austin¹, Columbia University²

02 Oct 2018-Journal of Business & Economic Statistics

TL;DR: This paper proposed a generalization of BRL that can be applied in models with arbitrary sets of fixed effects, where the original BRL method is undefined, and describe how to apply the method when the regression is estimated after absorbing the fixed effects.

...read moreread less

Abstract: In panel data models and other regressions with unobserved effects, fixed effects estimation is often paired with cluster-robust variance estimation (CRVE) to account for heteroscedasticity and un-modeled dependence among the errors. Although asymptotically consistent, CRVE can be biased downward when the number of clusters is small, leading to hypothesis tests with rejection rates that are too high. More accurate tests can be constructed using bias-reduced linearization (BRL), which corrects the CRVE based on a working model, in conjunction with a Satterthwaite approximation for t-tests. We propose a generalization of BRL that can be applied in models with arbitrary sets of fixed effects, where the original BRL method is undefined, and describe how to apply the method when the regression is estimated after absorbing the fixed effects. We also propose a small-sample test for multiple-parameter hypotheses, which generalizes the Satterthwaite approximation for t-tests. In simulations covering a wide...

...read moreread less

Journal Article•DOI•

Inference for empirical Wasserstein distances on finite spaces

[...]

Max Sommerfeld¹, Axel Munk¹, Axel Munk²•Institutions (2)

University of Göttingen¹, Max Planck Society²

01 Jan 2018-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: The asymptotic distribution of empirical Wasserstein distances is derived as the optimal value of a linear programme with random objective function, which facilitates statistical inference in large generality.

...read moreread less

Abstract: Summary The Wasserstein distance is an attractive tool for data analysis but statistical inference is hindered by the lack of distributional limits. To overcome this obstacle, for probability measures supported on finitely many points, we derive the asymptotic distribution of empirical Wasserstein distances as the optimal value of a linear programme with random objective function. This facilitates statistical inference (e.g. confidence intervals for sample-based Wasserstein distances) in large generality. Our proof is based on directional Hadamard differentiability. Failure of the classical bootstrap and alternatives are discussed. The utility of the distributional results is illustrated on two data sets.

...read moreread less

Journal Article•DOI•

Social Learning and Distributed Hypothesis Testing

[...]

Anusha Lalitha¹, Tara Javidi¹, Anand D. Sarwate²•Institutions (2)

University of California, San Diego¹, Rutgers University²

15 May 2018-IEEE Transactions on Information Theory

TL;DR: In this article, the authors consider the problem of distributed hypothesis testing over a network and characterize the exponential rate of learning in terms of the nodes' influence of the network and the divergences between the observations' distributions.

...read moreread less

Abstract: This paper considers a problem of distributed hypothesis testing over a network. Individual nodes in a network receive noisy local (private) observations whose distribution is parameterized by a discrete parameter (hypothesis). The marginals of the joint observation distribution conditioned on each hypothesis are known locally at the nodes, but the true parameter/hypothesis is not known. An update rule is analyzed in which nodes first perform a Bayesian update of their belief (distribution estimate) of each hypothesis based on their local observations, communicate these updates to their neighbors, and then perform a “non-Bayesian” linear consensus using the log-beliefs of their neighbors. Under mild assumptions, we show that the belief of any node on a wrong hypothesis converges to zero exponentially fast. We characterize the exponential rate of learning, which we call the network divergence, in terms of the nodes’ influence of the network and the divergences between the observations’ distributions. For a broad class of observation statistics which includes distributions with unbounded support such as Gaussian mixtures, we show that rate of rejection of wrong hypothesis satisfies a large deviation principle, i.e., the probability of sample paths on which the rate of rejection of wrong hypothesis deviates from the mean rate vanishes exponentially fast and we characterize the rate function in terms of the nodes’ influence of the network and the local observation models.

...read moreread less

Journal Article•DOI•

Change Detection and the Causal Impact of the Yield Curve

[...]

Shuping Shi¹, Peter C.B. Phillips, Stan Hurn²•Institutions (2)

Macquarie University¹, Queensland University of Technology²

01 Nov 2018-Journal of Time Series Analysis

TL;DR: In this article, the authors developed a test for detecting changes in causal relationships based on a recursive evolving window, which is analogous to a procedure used in recent work on financial bubble detection.

...read moreread less

Abstract: Causal relationships in econometrics are typically based on the concept of predictability and are established by testing Granger causality. Such relationships are susceptible to change, especially during times of financial turbulence, making the real‐time detection of instability an important practical issue. This article develops a test for detecting changes in causal relationships based on a recursive evolving window, which is analogous to a procedure used in recent work on financial bubble detection. The limiting distribution of the test takes a simple form under the null hypothesis and is easy to implement in conditions of homoskedasticity and conditional heteroskedasticity of an unknown form. Bootstrap methods are used to control family‐wise size in implementation. Simulation experiments compare the efficacy of the proposed test with two other commonly used tests, the forward recursive and the rolling window tests. The results indicate that the recursive evolving approach offers the best finite sample performance, followed by the rolling window algorithm. The testing strategies are illustrated in an empirical application that explores the causal relationship between the slope of the yield curve and real economic activity in the United States over the period 1980–2015.

...read moreread less

Journal Article•DOI•

[...]

Pieter Vermeesch¹•Institutions (1)

University College London¹

01 Mar 2018-Earth-Science Reviews

TL;DR: This paper shows that the KFE-based approach produces sensible results, whereas the likeness and cross-correlation methods do not, and suggests that the added complexity of the K FE approach is only worth the effort in studies that combine data acquired on equipment with hugely variable analytical precision.

...read moreread less

Journal Article•DOI•

PCA in High Dimensions: An Orientation

[...]

Iain M. Johnstone¹, Debashis Paul²•Institutions (2)

Stanford University¹, University of California, Davis²

18 Jul 2018

TL;DR: The behavior of the bulk of the sample eigenvalues under weak distributional assumptions on the observations has been described and alternative classes of estimation procedures have been developed by exploiting sparsity of the eigenvectors or the covariance matrix.

...read moreread less

Abstract: When the data are high dimensional, widely used multivariate statistical methods such as principal component analysis can behave in unexpected ways. In settings where the dimension of the observations is comparable to the sample size, upward bias in sample eigenvalues and inconsistency of sample eigenvectors are among the most notable phenomena that appear. These phenomena, and the limiting behavior of the rescaled extreme sample eigenvalues, have recently been investigated in detail under the spiked covariance model. The behavior of the bulk of the sample eigenvalues under weak distributional assumptions on the observations has been described. These results have been exploited to develop new estimation and hypothesis testing methods for the population covariance matrix. Furthermore, partly in response to these phenomena, alternative classes of estimation procedures have been developed by exploiting sparsity of the eigenvectors or the covariance matrix. This paper gives an orientation to these areas.

...read moreread less

Journal Article•DOI•

Unconditional or Conditional Logistic Regression Model for Age-Matched Case-Control Data?

[...]

Chia-Ling Kuo¹, Yinghui Duan¹, James J. Grady¹•Institutions (1)

University of Connecticut Health Center¹

02 Mar 2018-Frontiers in Public Health

TL;DR: The results support the hypothesis that unconditional logistic regression is a proper method to perform, but the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status.

...read moreread less

Abstract: Matching on demographic variables is commonly used in case-control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case-control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose matching data and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case-control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls.

...read moreread less

Journal Article•DOI•

Statistical Significance Versus Clinical Importance of Observed Effect Sizes: What Do P Values and Confidence Intervals Really Represent?

[...]

Patrick Schober¹, Sebastiaan M. Bossers, Lothar A. Schwarte•Institutions (1)

VU University Medical Center¹

01 Mar 2018-Anesthesia & Analgesia

TL;DR: This article reviewed different effect size measures and described how confidence intervals can be used to address not only the statistical significance but also the clinical significance of the observed effect or association, and discussed what P values actually represent and how they provide supplemental information about the significant versus nonsignificant dichotomy.

...read moreread less

Abstract: Effect size measures are used to quantify treatment effects or associations between variables. Such measures, of which >70 have been described in the literature, include unstandardized and standardized differences in means, risk differences, risk ratios, odds ratios, or correlations. While null hypothesis significance testing is the predominant approach to statistical inference on effect sizes, results of such tests are often misinterpreted, provide no information on the magnitude of the estimate, and tell us nothing about the clinically importance of an effect. Hence, researchers should not merely focus on statistical significance but should also report the observed effect size. However, all samples are to some degree affected by randomness, such that there is a certain uncertainty on how well the observed effect size represents the actual magnitude and direction of the effect in the population. Therefore, point estimates of effect sizes should be accompanied by the entire range of plausible values to quantify this uncertainty. This facilitates assessment of how large or small the observed effect could actually be in the population of interest, and hence how clinically important it could be. This tutorial reviews different effect size measures and describes how confidence intervals can be used to address not only the statistical significance but also the clinical significance of the observed effect or association. Moreover, we discuss what P values actually represent, and how they provide supplemental information about the significant versus nonsignificant dichotomy. This tutorial intentionally focuses on an intuitive explanation of concepts and interpretation of results, rather than on the underlying mathematical theory or concepts.

...read moreread less

Book Chapter•DOI•

Testing for autocorrelation in linear regression models: a survey

[...]

Maxwell L. King

05 Mar 2018

TL;DR: A survey of the literature concerned with testing for autocorrelation in the context of the linear regression model can be found in this article, where the authors attempt to survey the vast and varied literature.

...read moreread less

Abstract: The seminal work of Cochrane and Orcutt (1949) did much to alert econometricians to the difficulties of assuming uncorrelated disturbances in time series applications of the general linear model. Because the neglect of correlation between regression disturbances can lead to inefficient parameter estimates, misleading inferences from hypothesis tests and inefficient predictions, the desirability of testing for the presence of such correlation is widely accepted. This paper attempts to survey the vast and varied literature concerned with testing for autocorrelation in the context of the linear regression model. This particular testing problem must surely be the most intensely researched testing problem in econometrics. We will therefore be interested to see what lessons concerning hypothesis testing in econometrics can be learnt from the study of this literature.

...read moreread less

Collapse