scispace - formally typeset
Search or ask a question

Showing papers on "Statistical hypothesis testing published in 2017"


Book
17 Apr 2017
TL;DR: In this article, a preliminary survey of the central limit theorem for heads and tails with independent increments is presented. But the authors do not consider the case of independence and dependence exchangeability.
Abstract: Part 7 A preliminary survey: heads and tails - preliminary considerations heads and tails - the random process laws of "large numbers" the "central limit theorem". Part 8 Random processes with independent increments: the case of asymptotic normality the Wiener-Levy process behaviour and asymptotic behaviour ruin problems ballot problems. Part 9 An introduction to other types of stochastic process: Markov processes stationary processes. Part 10 Problems in higher dimensions: second-order characteristics and the normal distribution the discrete case the continuous case the case of spherical symmetry. Part 11 Inductive reasoning, statistical inference: the basic formulation and preliminary clarifications the case of independence and the case of dependence exchangeability. Part 12 Mathematical statistics: the scope and limits of the treatment the likelihood principle and sufficient statistics a Bayesian approach to "estimation" and "hypothesis testing" the connections with decision theory.

614 citations


Journal ArticleDOI
TL;DR: This article recast Judd et al.'s approach in the path-analytic framework that is now commonly used in between-participant mediation analysis, and extends the method to models with multiple mediators operating in parallel and serially and discusses the comparison of indirect effects in these more complex models.
Abstract: Researchers interested in testing mediation often use designs where participants are measured on a dependent variable Y and a mediator M in both of 2 different circumstances. The dominant approach to assessing mediation in such a design, proposed by Judd, Kenny, and McClelland (2001), relies on a series of hypothesis tests about components of the mediation model and is not based on an estimate of or formal inference about the indirect effect. In this article we recast Judd et al.'s approach in the path-analytic framework that is now commonly used in between-participant mediation analysis. By so doing, it is apparent how to estimate the indirect effect of a within-participant manipulation on some outcome through a mediator as the product of paths of influence. This path-analytic approach eliminates the need for discrete hypothesis tests about components of the model to support a claim of mediation, as Judd et al.'s method requires, because it relies only on an inference about the product of paths-the indirect effect. We generalize methods of inference for the indirect effect widely used in between-participant designs to this within-participant version of mediation analysis, including bootstrap confidence intervals and Monte Carlo confidence intervals. Using this path-analytic approach, we extend the method to models with multiple mediators operating in parallel and serially and discuss the comparison of indirect effects in these more complex models. We offer macros and code for SPSS, SAS, and Mplus that conduct these analyses. (PsycINFO Database Record

559 citations


Journal ArticleDOI
TL;DR: A brief review on mathematical framework, general concepts and common methods of adjustment for multiple comparisons is provided, which is expected to facilitate the understanding and selection of adjustment methods.
Abstract: In experimental research a scientific conclusion is always drawn from the statistical testing of hypothesis, in which an acceptable cutoff of probability, such as 005 or 001, is used for decision-making However, the probability of committing false statistical inferences would considerably increase when more than one hypothesis is simultaneously tested (namely the multiple comparisons), which therefore requires proper adjustment Although the adjustment for multiple comparisons is proposed to be mandatory in some journals, it still remains difficult to select a proper method suitable for the various experimental properties and study purposes, especially for researchers without good background in statistics In the present paper, we provide a brief review on mathematical framework, general concepts and common methods of adjustment for multiple comparisons, which is expected to facilitate the understanding and selection of adjustment methods

398 citations


Journal ArticleDOI
TL;DR: This contribution investigates the properties of a procedure for Bayesian hypothesis testing that allows optional stopping with unlimited multiple testing, even after each participant, and investigates the long-term rate of misleading evidence, the average expected sample sizes, and the biasedness of effect size estimates when an SBF design is applied to a test of mean differences between 2 groups.
Abstract: Unplanned optional stopping rules have been criticized for inflating Type I error rates under the null hypothesis significance testing (NHST) paradigm. Despite these criticisms, this research practice is not uncommon, probably because it appeals to researcher's intuition to collect more data to push an indecisive result into a decisive region. In this contribution, we investigate the properties of a procedure for Bayesian hypothesis testing that allows optional stopping with unlimited multiple testing, even after each participant. In this procedure, which we call Sequential Bayes Factors (SBFs), Bayes factors are computed until an a priori defined level of evidence is reached. This allows flexible sampling plans and is not dependent upon correct effect size guesses in an a priori power analysis. We investigated the long-term rate of misleading evidence, the average expected sample sizes, and the biasedness of effect size estimates when an SBF design is applied to a test of mean differences between 2 groups. Compared with optimal NHST, the SBF design typically needs 50% to 70% smaller samples to reach a conclusion about the presence of an effect, while having the same or lower long-term rate of wrong inference. (PsycINFO Database Record

327 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that in the presence of unit and time fixed effects, it is impossible to identify the linear component of the path of pre-trends and dynamic treatment effects.
Abstract: A broad empirical literature uses "event study" research designs for treatment effect estimation, a setting in which all units in the panel receive treatment but at random times. We make four novel points about identification and estimation of causal effects in this setting and show their practical relevance. First, we show that in the presence of unit and time fixed effects, it is impossible to identify the linear component of the path of pre-trends and dynamic treatment effects. Second, we propose graphical and statistical tests for pre-trends. Third, we consider commonly-used "static" regressions, with a treatment dummy instead of a full set of leads and lags around the treatment event, and we show that OLS does not recover a weighted average of the treatment effects: long-term effects are weighted negatively, and we introduce a different estimator that is robust to this issue. Fourth, we show that equivalent problems of under-identification and negative weighting arise in difference-in-differences settings when the control group is allowed to be on a different time trend or in the presence of unit-specific time trends. Finally, we show the practical relevance of these issues in a series of examples from the existing literature, with a focus on the estimation of the marginal propensity to consume out of tax rebates.

323 citations


Journal ArticleDOI
TL;DR: It is shown that permutations of the raw observational (or ‘pre‐network’) data consistently account for underlying structure in the generated social network, and thus can reduce both type I and type II error rates.
Abstract: Null models are an important component of the social network analysis toolbox. However, their use in hypothesis testing is still not widespread. Furthermore, several different approaches for constructing null models exist, each with their relative strengths and weaknesses, and often testing different hypotheses.In this study, I highlight why null models are important for robust hypothesis testing in studies of animal social networks. Using simulated data containing a known observation bias, I test how different statistical tests and null models perform if such a bias was unknown.I show that permutations of the raw observational (or 'pre-network') data consistently account for underlying structure in the generated social network, and thus can reduce both type I and type II error rates. However, permutations of pre-network data remain relatively uncommon in animal social network analysis because they are challenging to implement for certain data types, particularly those from focal follows and GPS tracking.I explain simple routines that can easily be implemented across different types of data, and supply R code that applies each type of null model to the same simulated dataset. The R code can easily be modified to test hypotheses with empirical data. Widespread use of pre-network data permutation methods will benefit researchers by facilitating robust hypothesis testing.

312 citations


Journal ArticleDOI
26 Jan 2017-Entropy
TL;DR: This work forms a chain of connections from univariate methods like the Kolmogorov-Smirnov test, PP/QQ plots and ROC/ODC curves, to multivariate tests involving energy statistics and kernel based maximum mean discrepancy, to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.
Abstract: Nonparametric two-sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being designed and analyzed, both for the unidimensional and the multivariate setting. In this short survey, we focus on test statistics that involve the Wasserstein distance. Using an entropic smoothing of the Wasserstein distance, we connect these to very different tests including multivariate methods involving energy statistics and kernel based maximum mean discrepancy and univariate methods like the Kolmogorov–Smirnov test, probability or quantile (PP/QQ) plots and receiver operating characteristic or ordinal dominance (ROC/ODC) curves. Some observations are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two-sample testing’s classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.

287 citations


Journal ArticleDOI
TL;DR: The decorrelated score function can be used to construct point and confidence region estimators that are semiparametrically efficient and extended to handle high dimensional null hypothesis, where the number of parameters of interest can increase exponentially fast with the sample size.
Abstract: We consider the problem of uncertainty assessment for low dimensional components in high dimensional models. Specifically, we propose a novel decorrelated score function to handle the impact of high dimensional nuisance parameters. We consider both hypothesis tests and confidence regions for generic penalized M-estimators. Unlike most existing inferential methods which are tailored for individual models, our method provides a general framework for high dimensional inference and is applicable to a wide variety of applications. In particular, we apply this general framework to study five illustrative examples: linear regression, logistic regression, Poisson regression, Gaussian graphical model and additive hazards model. For hypothesis testing, we develop general theorems to characterize the limiting distributions of the decorrelated score test statistic under both null hypothesis and local alternatives. These results provide asymptotic guarantees on the type I errors and local powers. For confidence region construction, we show that the decorrelated score function can be used to construct point estimators that are asymptotically normal and semiparametrically efficient. We further generalize this framework to handle the settings of misspecified models. Thorough numerical results are provided to back up the developed theory.

231 citations


Journal Article
TL;DR: In this article, the authors argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for NHST in machine learning.
Abstract: The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better - more sound and useful - alternatives for it.

227 citations


Journal ArticleDOI
TL;DR: A novel estimation technique is presented that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables that results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale.
Abstract: We begin by reviewing the statistical framework of information theory as applicable to neuroimaging data analysis. A major factor hindering wider adoption of this framework in neuroimaging is the difficulty of estimating information theoretic quantities in practice. We present a novel estimation technique that combines the statistical theory of copulas with the closed form solution for the entropy of Gaussian variables. This results in a general, computationally efficient, flexible, and robust multivariate statistical framework that provides effect sizes on a common meaningful scale, allows for unified treatment of discrete, continuous, unidimensional and multidimensional variables, and enables direct comparisons of representations from behavioral and brain responses across any recording modality. We validate the use of this estimate as a statistical test within a neuroimaging context, considering both discrete stimulus classes and continuous stimulus features. We also present examples of analyses facilitated by these developments, including application of multivariate analyses to MEG planar magnetic field gradients, and pairwise temporal interactions in evoked EEG responses. We show the benefit of considering the instantaneous temporal derivative together with the raw values of M/EEG signals as a multivariate response, how we can separately quantify modulations of amplitude and direction for vector quantities, and how we can measure the emergence of novel information over time in evoked responses. Open-source Matlab and Python code implementing the new methods accompanies this article. Hum Brain Mapp, 2016. © 2016 The Authors Human Brain Mapping Published by Wiley Periodicals, Inc.

222 citations


Journal ArticleDOI
TL;DR: It is found in this review that the use of Bayes has increased and broadened in the sense that this methodology can be used in a flexible manner to tackle many different forms of questions.
Abstract: Although the statistical tools most often used by researchers in the field of psychology over the last 25 years are based on frequentist statistics, it is often claimed that the alternative Bayesian approach to statistics is gaining in popularity. In the current article, we investigated this claim by performing the very first systematic review of Bayesian psychological articles published between 1990 and 2015 (n = 1,579). We aim to provide a thorough presentation of the role Bayesian statistics plays in psychology. This historical assessment allows us to identify trends and see how Bayesian methods have been integrated into psychological research in the context of different statistical frameworks (e.g., hypothesis testing, cognitive models, IRT, SEM, etc.). We also describe take-home messages and provide "big-picture" recommendations to the field as Bayesian statistics becomes more popular. Our review indicated that Bayesian statistics is used in a variety of contexts across subfields of psychology and related disciplines. There are many different reasons why one might choose to use Bayes (e.g., the use of priors, estimating otherwise intractable models, modeling uncertainty, etc.). We found in this review that the use of Bayes has increased and broadened in the sense that this methodology can be used in a flexible manner to tackle many different forms of questions. We hope this presentation opens the door for a larger discussion regarding the current state of Bayesian statistics, as well as future trends. (PsycINFO Database Record

Journal ArticleDOI
TL;DR: In this article, a detailed exposition of the statistical notion of stationarity and statistical testing of dynamic functional connectivity is presented. But, the authors do not consider the effect of the random sampling variability of static functional connectivity.

Journal ArticleDOI
TL;DR: It is suggested that, after sustained negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research.
Abstract: Null hypothesis significance testing (NHST) has several shortcomings that are likely contributing factors behind the widely debated replication crisis of (cognitive) neuroscience, psychology and biomedical science in general. We review these shortcomings and suggest that, after about 60 years of negative experience, NHST should no longer be the default, dominant statistical practice of all biomedical and psychological research. If theoretical predictions are weak we should not rely on all or nothing hypothesis tests. Different inferential methods may be most suitable for different types of research questions. Whenever researchers use NHST they should justify its use, and publish pre-study power calculations and effect sizes, including negative findings. Hypothesis-testing studies should be pre-registered and optimally raw data published. The current statistics lite educational approach for students that has sustained the widespread, spurious use of NHST should be phased out. Instead, we should encourage either more in-depth statistical training of more researchers and/or more widespread involvement of professional statisticians in all research.

Journal ArticleDOI
TL;DR: The results of analyses of the Type 1 error efficiency and power of standard parametric and non-parametric statistical tests when applied to non-normal data sets are summarised.
Abstract: There have been many changes in statistical theory in the past 30 years, including increased evidence that non-robust methods may fail to detect important results. The statistical advice available to software engineering researchers needs to be updated to address these issues. This paper aims both to explain the new results in the area of robust analysis methods and to provide a large-scale worked example of the new methods. We summarise the results of analyses of the Type 1 error efficiency and power of standard parametric and non-parametric statistical tests when applied to non-normal data sets. We identify parametric and non-parametric methods that are robust to non-normality. We present an analysis of a large-scale software engineering experiment to illustrate their use. We illustrate the use of kernel density plots, and parametric and non-parametric methods using four different software engineering data sets. We explain why the methods are necessary and the rationale for selecting a specific analysis. We suggest using kernel density plots rather than box plots to visualise data distributions. For parametric analysis, we recommend trimmed means, which can support reliable tests of the differences between the central location of two or more samples. When the distribution of the data differs among groups, or we have ordinal scale data, we recommend non-parametric methods such as Cliff's ź or a robust rank-based ANOVA-like method.

Proceedings Article
30 Oct 2017
TL;DR: PixelDefend as discussed by the authors purifies a maliciously perturbed image by moving it back towards the distribution seen in the training data, and runs the purified image through an unmodified classifier, making the method agnostic to both the classifier and the attacking method.
Abstract: Adversarial perturbations of normal images are usually imperceptible to humans, but they can seriously confuse state-of-the-art machine learning models. What makes them so special in the eyes of image classifiers? In this paper, we show empirically that adversarial examples mainly lie in the low probability regions of the training distribution, regardless of attack types and targeted models. Using statistical hypothesis testing, we find that modern neural density models are surprisingly good at detecting imperceptible image perturbations. Based on this discovery, we devised PixelDefend, a new approach that purifies a maliciously perturbed image by moving it back towards the distribution seen in the training data. The purified image is then run through an unmodified classifier, making our method agnostic to both the classifier and the attacking method. As a result, PixelDefend can be used to protect already deployed models and be combined with other model-specific defenses. Experiments show that our method greatly improves resilience across a wide variety of state-of-the-art attacking methods, increasing accuracy on the strongest attack from 63% to 84% for Fashion MNIST and from 32% to 70% for CIFAR-10.

Journal ArticleDOI
TL;DR: It is shown how randomization inference can best be conducted in Stata and a new command, ritest, is introduced and introduced to simplify such analyses, to handle problems often faced by applied researchers.
Abstract: Randomization inference or permutation tests are only sporadically used in economics and other social sciences—this despite a steep increase in randomization in field and laboratory experiments that provide perfect experimental setups for applying randomization inference. In the context of causal inference, such tests can handle problems often faced by applied researchers, including issues arising in the context of small samples, stratified or clustered treatment assignments, or nonstandard randomization techniques. Standard statistical software packages have either no implementation of randomization tests or very basic implementations. Whenever researchers use randomization inference, they regularly code individual program routines, risking inconsistencies and coding mistakes. In this article, I show how randomization inference can best be conducted in Stata and introduce a new command, ritest, to simplify such analyses. I illustrate this approach’s usefulness by replicating the results in Fujiwara and Wantchekon (2013, American Economic Journal: Applied Economics 5: 241–255) and running simulations. The applications cover clustered and stratified assignments, with varying cluster sizes, pairwise randomization, and the computation of nonapproximate p-values. The applications also touch upon joint hypothesis testing with randomization inference.

Journal ArticleDOI
TL;DR: The Bayesian advantages of the newly developed statistical software program JASP are discussed using real data on the relation between Quality of Life and Executive Functioning in children with Autism Spectrum Disorder.
Abstract: We illustrate the Bayesian approach to data analysis using the newly developed statistical software program JASP. With JASP, researchers are able to take advantage of the benefits that the Bayesian framework has to offer in terms of parameter estimation and hypothesis testing. The Bayesian advantages are discussed using real data on the relation between Quality of Life and Executive Functioning in children with Autism Spectrum Disorder.

Journal ArticleDOI
TL;DR: In this article, a generalized sparse representation-based classification (SRC) algorithm was proposed for open set recognition where not all classes presented during testing are known during training, and the SRC algorithm uses class reconstruction errors for classification.
Abstract: We propose a generalized Sparse Representation-based Classification (SRC) algorithm for open set recognition where not all classes presented during testing are known during training. The SRC algorithm uses class reconstruction errors for classification. As most of the discriminative information for open set recognition is hidden in the tail part of the matched and sum of non-matched reconstruction error distributions, we model the tail of those two error distributions using the statistical Extreme Value Theory (EVT). Then we simplify the open set recognition problem into a set of hypothesis testing problems. The confidence scores corresponding to the tail distributions of a novel test sample are then fused to determine its identity. The effectiveness of the proposed method is demonstrated using four publicly available image and object classification datasets and it is shown that this method can perform significantly better than many competitive open set recognition algorithms.

Journal ArticleDOI
TL;DR: This tutorial clarifies the concept of Fisher information as it manifests itself across three different statistical paradigms, including the frequentist paradigm, the Bayesian paradigm and the minimum description length paradigm.

Journal ArticleDOI
TL;DR: The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, non Parametric, and permutation tests through extensive simulations under various conditions and using real data examples to overcome the problem related with small samples in hypothesis testing.
Abstract: Experimental studies in biomedical research frequently pose analytical problems related to small sample size. In such studies, there are conflicting findings regarding the choice of parametric and nonparametric analysis, especially with non-normal data. In such instances, some methodologists questioned the validity of parametric tests and suggested nonparametric tests. In contrast, other methodologists found nonparametric tests to be too conservative and less powerful and thus preferred using parametric tests. Some researchers have recommended using a bootstrap test; however, this method also has small sample size limitation. We used a pooled method in nonparametric bootstrap test that may overcome the problem related with small samples in hypothesis testing. The present study compared nonparametric bootstrap test with pooled resampling method corresponding to parametric, nonparametric, and permutation tests through extensive simulations under various conditions and using real data examples. The nonparametric pooled bootstrap t-test provided equal or greater power for comparing two means as compared with unpaired t-test, Welch t-test, Wilcoxon rank sum test, and permutation test while maintaining type I error probability for any conditions except for Cauchy and extreme variable lognormal distributions. In such cases, we suggest using an exact Wilcoxon rank sum test. Nonparametric bootstrap paired t-test also provided better performance than other alternatives. Nonparametric bootstrap test provided benefit over exact Kruskal-Wallis test. We suggest using nonparametric bootstrap test with pooled resampling method for comparing paired or unpaired means and for validating the one way analysis of variance test results for non-normal data in small sample size studies. Copyright © 2017 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: A new deep mutational scanning statistical model is developed that generates error estimates for each measurement, capturing both sampling error and consistency between replicates and demonstrates its superiority in removing noisy variants and conducting hypothesis testing.
Abstract: Deep mutational scanning is a widely used method for multiplex measurement of functional consequences of protein variants. We developed a new deep mutational scanning statistical model that generates error estimates for each measurement, capturing both sampling error and consistency between replicates. We apply our model to one novel and five published datasets comprising 243,732 variants and demonstrate its superiority in removing noisy variants and conducting hypothesis testing. Simulations show our model applies to scans based on cell growth or binding and handles common experimental errors. We implemented our model in Enrich2, software that can empower researchers analyzing deep mutational scanning data.

Journal ArticleDOI
TL;DR: A semiparametric problem of two-sample hypothesis testing for a class of latent position random graphs is considered and a notion of consistency is formulated and a valid test is proposed for the hypothesis that two finite-dimensional random dot product graphs on a common vertex set have the same generating latent positions.
Abstract: Two-sample hypothesis testing for random graphs arises naturally in neuroscience, social networks, and machine learning. In this article, we consider a semiparametric problem of two-sample hypothesis testing for a class of latent position random graphs. We formulate a notion of consistency in this context and propose a valid test for the hypothesis that two finite-dimensional random dot product graphs on a common vertex set have the same generating latent positions or have generating latent positions that are scaled or diagonal transformations of one another. Our test statistic is a function of a spectral decomposition of the adjacency matrix for each graph and our test procedure is consistent across a broad range of alternatives. We apply our test procedure to real biological data: in a test-retest dataset of neural connectome graphs, we are able to distinguish between scans from different subjects; and in the C. elegans connectome, we are able to distinguish between chemical and electrical netwo...

Journal ArticleDOI
TL;DR: In this article, a geometric characterization of the space of graph Laplacian matri- ces and a nonparametric notion of averaging due to Fr echet was proposed for hypothesis testing in neural networks.
Abstract: In recent years, it has become common practice in neuroscience to use networks to summarize relational information in a set of measure- ments, typically assumed to be reective of either functional or structural relationships between regions of interest in the brain. One of the most basic tasks of interest in the analysis of such data is the testing of hypotheses, in answer to questions such as \'Is there a dierence between the networks of these two groups of subjects?" In the classical setting, where the unit of interest is a scalar or a vector, such questions are answered through the use of familiar two-sample testing strategies. Networks, however, are not Euclidean objects, and hence classical methods do not directly apply. We address this challenge by drawing on concepts and techniques from geom- etry, and high-dimensional statistical inference. Our work is based on a precise geometric characterization of the space of graph Laplacian matri- ces and a nonparametric notion of averaging due to Fr echet. We motivate and illustrate our resulting methodologies for testing in the context of net- works derived from functional neuroimaging data on human subjects from the 1000 Functional Connectomes Project. In particular, we show that this global test is more statistical powerful, than a mass-univariate approach. AMS 2000 subject classications: Fr echet mean, fMRI, Graph Lapla- cian, Hypothesis Testing, Matrix manifold, Network data, Neuroscience.

Journal ArticleDOI
20 Dec 2017-PLOS ONE
TL;DR: It is shown that the existing asymptotic approach based on adjusted residual is often more likely to reject the null hypothesis as compared to the exact approach due to the inflated family-wise error rates as observed, and the proposed new exact p-values based on three commonly used test statistics are the same.
Abstract: This research is motivated by one of our survey studies to assess the potential influence of introducing zebra mussels to the Lake Mead National Recreation Area, Nevada. One research question in this study is to investigate the association between the boating activity type and the awareness of zebra mussels. A chi-squared test is often used for testing independence between two factors with nominal levels. When the null hypothesis of independence between two factors is rejected, we are often left wondering where does the significance come from. Cell residuals, including standardized residuals and adjusted residuals, are traditionally used in testing for cell significance, which is often known as a post hoc test after a statistically significant chi-squared test. In practice, the limiting distributions of these residuals are utilized for statistical inference. However, they may lead to different conclusions based on the calculated p-values, and their p-values could be over- o6r under-estimated due to the unsatisfactory performance of asymptotic approaches with regards to type I error control. In this article, we propose new exact p-values by using Fisher's approach based on three commonly used test statistics to order the sample space. We theoretically prove that the proposed new exact p-values based on these test statistics are the same. Based on our extensive simulation studies, we show that the existing asymptotic approach based on adjusted residual is often more likely to reject the null hypothesis as compared to the exact approach due to the inflated family-wise error rates as observed. We would recommend the proposed exact p-value for use in practice as a valuable post hoc analysis technique for chi-squared analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate that the community weighted means correlation (CWM) and its parallel approach in linking trait variation to the environment, the species niche centroid correlation (SNC), have important shortcomings, arguing against their continuing application.
Abstract: Establishing trait–environment relationships has become routine in community ecology. Here, we demonstrate that the community weighted means correlation (CWM) and its parallel approach in linking trait variation to the environment, the species niche centroid correlation (SNC), have important shortcomings, arguing against their continuing application. Using mathematical derivations and simulations, we show that the two major issues are inconsistent parameter estimation and unacceptable significance rates when only the environment or only traits are structuring species distributions, but they themselves are not linked. We show how both CWM and SNC are related to the fourth-corner correlation and propose to replace all by the Chessel fourth-corner correlation, which is the fourth-corner correlation divided by its maximum attainable value. We propose an appropriate hypothesis testing procedure that is not only unbiased but also has much greater statistical power in detecting trait–environmental relationships. We derive an additive framework in which trait variation is partitioned among and within communities, which can be then modeled against the environment. We finish by presenting a contrast between methods and an application of our proposed framework across 85 lake-fish metacommunities.

Proceedings Article
17 Jul 2017
TL;DR: In this article, a theory of weak convergence for KSDs based on Stein's method is developed, which is suitable for comparing biased, exact, and deterministic sample sequences and simpler to compute and parallelize than alternative Stein discrepancies.
Abstract: Approximate Markov chain Monte Carlo (MCMC) offers the promise of more rapid sampling at the cost of more biased inference. Since standard MCMC diagnostics fail to detect these biases, researchers have developed computable Stein discrepancy measures that provably determine the convergence of a sample to its target distribution. This approach was recently combined with the theory of reproducing kernels to define a closed-form kernel Stein discrepancy (KSD) computable by summing kernel evaluations across pairs of sample points. We develop a theory of weak convergence for KSDs based on Stein’s method, demonstrate that commonly used KSDs fail to detect non-convergence even for Gaussian targets, and show that kernels with slowly decaying tails provably determine convergence for a large class of target distributions. The resulting convergence-determining KSDs are suitable for comparing biased, exact, and deterministic sample sequences and simpler to compute and parallelize than alternative Stein discrepancies. We use our tools to compare biased samplers, select sampler hyperparameters, and improve upon existing KSD approaches to one-sample hypothesis testing and sample quality improvement.

Journal ArticleDOI
TL;DR: The R package chngpt makes several unique contributions to the software for threshold regression models and will make these models more accessible to practitioners interested in modeling threshold effects.
Abstract: Threshold regression models are a diverse set of non-regular regression models that all depend on change points or thresholds. They provide a simple but elegant and interpretable way to model certain kinds of nonlinear relationships between the outcome and a predictor. The R package chngpt provides both estimation and hypothesis testing functionalities for four common variants of threshold regression models. All allow for adjustment of additional covariates not subjected to thresholding. We demonstrate the consistency of the estimating procedures and the type 1 error rates of the testing procedures by Monte Carlo studies, and illustrate their practical uses using an example from the study of immune response biomarkers in the context of Mother-To-Child-Transmission of HIV-1 viruses. chngpt makes several unique contributions to the software for threshold regression models and will make these models more accessible to practitioners interested in modeling threshold effects.

Journal ArticleDOI
TL;DR: In this article, the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging are discussed, and the present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques.
Abstract: Brain-imaging research has predominantly generated insight by means of classical statistics, including regression-type analyses and null-hypothesis testing using t-test and ANOVA. Throughout recent years, statistical learning methods enjoy increasing popularity especially for applications in rich and complex data, including cross-validated out-of-sample prediction using pattern classification and sparsity-inducing regression. This concept paper discusses the implications of inferential justifications and algorithmic methodologies in common data analysis scenarios in neuroimaging. It is retraced how classical statistics and statistical learning originated from different historical contexts, build on different theoretical foundations, make different assumptions, and evaluate different outcome metrics to permit differently nuanced conclusions. The present considerations should help reduce current confusion between model-driven classical hypothesis testing and data-driven learning algorithms for investigating the brain with imaging techniques.

Journal ArticleDOI
TL;DR: Ca2+ fluxes in isolated cardiomyocytes show so much clustering that the common statistical approach that assumes independence of each data point will frequently give the false appearance of statistically significant changes.
Abstract: Aims It is generally accepted that post-MI heart failure (HF) changes a variety of aspects of sarcoplasmic reticular Ca2+ fluxes but for some aspects there is disagreement over whether there is an increase or decrease. The commonest statistical approach is to treat data collected from each cell as independent, even though they are really clustered with multiple likely similar cells from each heart. In this study, we test whether this statistical assumption of independence can lead the investigator to draw conclusions that would be considered erroneous if the analysis handled clustering with specific statistical techniques (hierarchical tests). Methods and results Ca2+ transients were recorded in cells loaded with Fura-2AM and sparks were recorded in cells loaded with Fluo-4AM. Data were analysed twice, once with the common statistical approach (assumption of independence) and once with hierarchical statistical methodologies designed to allow for any clustering. The statistical tests found that there was significant hierarchical clustering. This caused the common statistical approach to underestimate the standard error and report artificially small P values. For example, this would have led to the erroneous conclusion that time to 50% peak transient amplitude was significantly prolonged in HF. Spark analysis showed clustering, both within each cell and also within each rat, for morphological variables. This means that a three-level hierarchical model is sometimes required for such measures. Standard statistical methodologies, if used instead, erroneously suggest that spark amplitude is significantly greater in HF and spark duration is reduced in HF. Conclusion Ca2+ fluxes in isolated cardiomyocytes show so much clustering that the common statistical approach that assumes independence of each data point will frequently give the false appearance of statistically significant changes. Hierarchical statistical methodologies need a little more effort, but are necessary for reliable conclusions. We present cost-free simple tools for performing these analyses.

Journal ArticleDOI
TL;DR: In this paper, Gagnon-Bartsch et al. unify these methods in the same framework, generalize them to include multiple primary variables and multiple nuisance variables, and analyze their statistical properties.
Abstract: We consider large-scale studies in which thousands of significance tests are performed simultaneously. In some of these studies, the multiple testing procedure can be severely biased by latent confounding factors such as batch effects and unmeasured covariates that correlate with both primary variable(s) of interest (e.g., treatment variable, phenotype) and the outcome. Over the past decade, many statistical methods have been proposed to adjust for the confounders in hypothesis testing. We unify these methods in the same framework, generalize them to include multiple primary variables and multiple nuisance variables, and analyze their statistical properties. In particular, we provide theoretical guarantees for RUV-4 [Gagnon-Bartsch, Jacob and Speed (2013)] and LEAPP [Ann. Appl. Stat. 6 (2012) 1664–1688], which correspond to two different identification conditions in the framework: the first requires a set of “negative controls” that are known a priori to follow the null distribution; the second requires the true nonnulls to be sparse. Two different estimators which are based on RUV-4 and LEAPP are then applied to these two scenarios. We show that if the confounding factors are strong, the resulting estimators can be asymptotically as powerful as the oracle estimator which observes the latent confounding factors. For hypothesis testing, we show the asymptotic $z$-tests based on the estimators can control the type I error. Numerical experiments show that the false discovery rate is also controlled by the Benjamini–Hochberg procedure when the sample size is reasonably large.