Showing papers on "Statistical hypothesis testing published in 2021"

PDF

Open Access

Book•

Statistics for Linguistics with R : A Practical Introduction

[...]

10 May 2021

TL;DR: This book is the revised and extended second edition of Statistics for Linguistics with R, and contains a complete rewrite of the chapter on multifactorial approaches, which now contains sections on linear regression, binary and ordinal logistic regression, multinomial and Poisson regression, and repeated-measures ANOVA.

...read moreread less

Abstract: This book is an introduction to statistics for linguists using the open source software R. It is aimed at students and instructors/professors with little or no statistical background and is written in a non-technical and reader-friendly/accessible style. It first introduces in detail the overall logic underlying quantitative studies: exploration, hypothesis formulation and operationalization, and the notion and meaning of significance tests. It then introduces some basics of the software R relevant to statistical data analysis. A chapter on descriptive statistics explains how summary statistics for frequencies, averages, and correlations are generated with R and how they are graphically represented best. A chapter on analytical statistics explains how statistical tests are performed in R on the basis of many different linguistic case studies: For nearly every single example, it is explained what the structure of the test looks like, how hypotheses are formulated, explored, and tested for statistical significance, how the results are graphically represented, and how one would summarize them in a paper/article. A chapter on selected multifactorial methods introduces how more complex research designs can be studied: methods for the study of multifactorial frequency data, correlations, tests for means, and binary response data are discussed and exemplified step-by-step. Also, the exploratory approach of hierarchical cluster analysis is illustrated in detail. The book comes with many exercises, boxes with short think breaks and warnings, recommendations for further study, and answer keys as well as a statistics for linguists newsgroup on the companion website. The volume is aimed at beginners on every level of linguistic education: undergraduate students, graduate students, and instructors/professors and can be used in any research methods and statistics class for linguists. It presupposes no quantitative/statistical knowledge whatsoever and, unlike most competing books, begins at step 1 for every method and explains everything explicitly.

...read moreread less

246 citations

Journal Article•DOI•

Sample size determination and power analysis using the G*Power software.

[...]

Hyun Kang¹•Institutions (1)

Chung-Ang University¹

30 Jul 2021-Journal of Educational Evaluation for Health Professions

TL;DR: G*Power as mentioned in this paper is a free software that supports sample size and power calculation for various statistical methods (F, t, χ2, Z, and exact tests) and is easy to use and free.

...read moreread less

Abstract: Appropriate sample size calculation and power analysis have become major issues in research and publication processes. However, the complexity and difficulty of calculating sample size and power require broad statistical knowledge, there is a shortage of personnel with programming skills, and commercial programs are often too expensive to use in practice. The review article aimed to explain the basic concepts of sample size calculation and power analysis; the process of sample estimation; and how to calculate sample size using G*Power software (latest ver. 3.1.9.7; Heinrich-Heine-Universitat Dusseldorf, Dusseldorf, Germany) with 5 statistical examples. The null and alternative hypothesis, effect size, power, alpha, type I error, and type II error should be described when calculating the sample size or power. G*Power is recommended for sample size and power calculations for various statistical methods (F, t, χ2, Z, and exact tests), because it is easy to use and free. The process of sample estimation consists of establishing research goals and hypotheses, choosing appropriate statistical tests, choosing one of 5 possible power analysis methods, inputting the required variables for analysis, and selecting the “calculate” button. The G*Power software supports sample size and power calculation for various statistical methods (F, t, χ2, z, and exact tests). This software is helpful for researchers to estimate the sample size and to conduct power analysis.

...read moreread less

238 citations

Journal Article•DOI•

Rewriting results sections in the language of evidence

[...]

N. Sukumar¹, Stefanie Muff¹, Erlend B. Nilsen¹, Erlend B. Nilsen², Robert B. O'Hara¹, Chloé R. Nater¹ - Show less +2 more•Institutions (2)

Norwegian University of Science and Technology¹, Nord University²

16 Nov 2021-Trends in Ecology and Evolution

TL;DR: The authors suggest language of evidence that allows for a more nuanced approach to communicate scientific findings as a simple and intuitive alternative to statistical significance testing, and provide examples for rewriting results sections in research papers accordingly.

...read moreread less

Abstract: Despite much criticism, black-or-white null-hypothesis significance testing with an arbitrary P-value cutoff still is the standard way to report scientific findings. One obstacle to progress is likely a lack of knowledge about suitable alternatives. Here, we suggest language of evidence that allows for a more nuanced approach to communicate scientific findings as a simple and intuitive alternative to statistical significance testing. We provide examples for rewriting results sections in research papers accordingly. Language of evidence has previously been suggested in medical statistics, and it is consistent with reporting approaches of international research networks, like the Intergovernmental Panel on Climate Change, for example. Instead of re-inventing the wheel, ecology and evolution might benefit from adopting some of the 'good practices' that exist in other fields.

...read moreread less

149 citations

Journal Article•DOI•

Violating the normality assumption may be the lesser of two evils

[...]

Ulrich Knief¹, Wolfgang Forstmeier²•Institutions (2)

Ludwig Maximilian University of Munich¹, Max Planck Society²

07 May 2021-Behavior Research Methods

TL;DR: In this article, Monte Carlo simulations were used to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation.

...read moreread less

Abstract: When data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that p values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data.

...read moreread less

123 citations

Journal Article•DOI•

Testing Statistical Significance of the Area under a Receiving Operating Characteristics Curve for Repeated Measures Design with Bootstrapping

[...]

Honghu Liu, Gang Li, William G. Cumberland, Tong Tong Wu

19 Jul 2021-Journal of data science

TL;DR: In this article, the authors estimate the area of a ROC curve under a repeated measures design through generalized linear mixed model (GLMM) using the predicted probability of a disease or positivity of a condition.

...read moreread less

Abstract: Receiver operating characteristic (ROC) curve is an effective and widely used method for evaluating the discriminating power of a diagnostic test or statistical model. As a useful statistical method, a wealth of literature about its theories and computation methods has been established. The research on ROC curves, however, has focused mainly on cross-sectional design. Very little research on estimating ROC curves and their summary statistics, especially significance testing, has been conducted for repeated measures design. Due to the complexity of estimating the standard error of a ROC curve, there is no currently established statistical method for testing the significance of ROC curves under a repeated measures design. In this paper, we estimate the area of a ROC curve under a repeated measures design through generalized linear mixed model (GLMM) using the predicted probability of a disease or positivity of a condition and propose a bootstrap method to estimate the standard error of the area under a ROC curve for such designs. Statistical significance testing of the area under a ROC curve is then conducted using the bootstrapped standard error. The validity of bootstrap approach and the statistical testing of the area under the ROC curve was validated through simulation analyses. A special statistical software written in SAS/IML/MACRO v8 was also created for implementing the bootstrapping algorithm, conducting the calculations and statistical testing.

...read moreread less

75 citations

Book•

Statistics and Data Analysis for Microarrays Using R and Bioconductor

[...]

Sorin Draghici¹•Institutions (1)

Wayne State University¹

01 Jan 2021

TL;DR: This best-selling book guides students from very basic notions to advanced analysis techniques in R and Bioconductor as well as how to choose and apply the proper data analysis tool to specific problems.

...read moreread less

Abstract: Introduction Bioinformatics - An Emerging Discipline The Cell and Its Basic Mechanisms The Cell The Building Blocks of Genomic Information Expression of Genetic Information The Need for High-Throughput Methods Microarrays Microarrays - Tools for Gene Expression Analysis Fabrication of Microarrays Applications of Microarrays Challenges in Using Microarrays in Gene Expression Studies Sources of Variability Reliability and Reproducibility Issues in DNA Microarray Measurements Introduction What Is Expected from Microarrays? Basic Considerations of Microarray Measurements Sensitivity Accuracy Reproducibility Cross Platform Consistency Sources of Inaccuracy and Inconsistencies in Microarray Measurements The MicroArray Quality Control (MAQC) Project Image Processing Introduction Basic Elements of Digital Imaging Microarray Image Processing Image Processing of cDNA Microarrays Image Processing of Affymetrix Arrays Introduction to R Introduction to R The Basic Concepts Data Structures and Functions Other Capabilities The R Environment Installing Bioconductor Graphics Control Structures in R Programming in R vs C/C++/Java Bioconductor: Principles and Illustrations Overview The Portal Some Explorations and Analyses Elements of Statistics Introduction Some Basic Concepts Elementary Statistics Degrees of Freedom Probabilities Bayes' Theorem Testing for (or Predicting) a Disease Probability Distributions Probability Distributions Central Limit Theorem Are Replicates Useful? Basic Statistics in R Introduction Descriptive Statistics in R Probabilities and Distributions in R Central Limit Theorem Statistical Hypothesis Testing Introduction The Framework Hypothesis Testing and Significance "I Do Not Believe God Does Not Exist" An Algorithm for Hypothesis Testing Errors in Hypothesis Testing Classical Approaches to Data Analysis Introduction Tests Involving a Single Sample Tests Involving Two Samples Analysis of Variance (ANOVA) Introduction One-Way ANOVA Two-Way ANOVA Quality Control Linear Models in R Introduction and Model Formulation Fitting Linear Models in R Extracting Information from a Fitted Model: Testing Hypotheses and Making Predictions Some Limitations of the Linear Models Dealing with Multiple Predictors and Interactions in the Linear Models, and Interpreting Model Coefficients Experiment Design The Concept of Experiment Design Comparing Varieties Improving the Production Process Principles of Experimental Design Guidelines for Experimental Design A Short Synthesis of Statistical Experiment Designs Some Microarray Specific Experiment Designs Multiple Comparisons Introduction The Problem of Multiple Comparisons A More Precise Argument Corrections for Multiple Comparisons Corrections for Multiple Comparisons in R Analysis and Visualization Tools Introduction Box Plots Gene Pies Scatter Plots Volcano Plots Histograms Time Series Time Series Plots in R Principal Component Analysis (PCA) Independent Component Analysis (ICA) Cluster Analysis Introduction Distance Metric Clustering Algorithms Partitioning around Medoids (PAM) Biclustering Clustering in R Quality Control Introduction Quality Control for Affymetrix Data Quality Control of Illumina Data Data Pre-Processing and Normalization Introduction General Pre-Processing Techniques Normalization Issues Specific to cDNA Data Normalization Issues Specific to Affymetrix Data Other Approaches to the Normalization of Affymetrix Data Useful Pre-Processing and Normalization Sequences Normalization Procedures in R Batch Pre-Processing Normalization Functions and Procedures for Illumina Data Methods for Selecting Differentially Regulated Genes Introduction Criteria Fold Change Unusual Ratio Hypothesis Testing, Corrections for Multiple Comparisons, and Resampling ANOVA Noise Sampling Model-Based Maximum Likelihood Estimation Methods Affymetrix Comparison Calls Significance Analysis of Microarrays (SAM) A Moderated t-Statistic Other Methods Reproducibility Selecting Differentially Expressed (DE) Genes in R The Gene Ontology (GO) Introduction The Need for an Ontology What Is the Gene Ontology (GO)? What Does GO Contain? Access to GO Other Related Resources Functional Analysis and Biological Interpretation of Microarray Data Over-Representation Analysis (ORA) Onto-Express Functional Class Scoring The Gene Set Enrichment Analysis (GSEA) Uses, Misuses, and Abuses in GO Profiling Introduction "Known Unknowns" Which Way Is Up? Negative Annotations Common Mistakes in Functional Profiling Using a Custom Level of Abstraction through the GO Hierarchy Correlation between GO Terms GO Slims and Subsets A Comparison of Several Tools for Ontological Analysis Introduction Existing tools for Ontological Analysis Comparison of Existing Functional Profiling Tools Drawbacks and Limitations of the Current Approach Focused Microarrays - Comparison and Selection Introduction Criteria for Array Selection Onto-Compare Some Comparisons ID Mapping Issues Introduction Name Space Issues in Annotation Databases A Comparison of Some ID Mapping Tools Pathway Analysis Terms and Problem Definition Over-Representation and Functional Class Scoring Approaches in Pathway Analysis An Approach for the Analysis of Metabolic Pathways An Impact Analysis of Signaling Pathways Variations on the Impact Analysis Theme Pathway Guide Kinetic models vs. Impact Analysis Conclusions Data Sets and Software Availability Machine Learning Techniques Introduction Main Concepts and Definitions Supervised Learning Practicalities Using R The Road Ahead What Next? References A Summary appears at the end of each chapter.

...read moreread less

66 citations

Journal Article•DOI•

When to adjust alpha during multiple testing: A consideration of disjunction, conjunction, and individual testing

[...]

Mark Rubin

06 Jul 2021-arXiv: Methodology

TL;DR: This article argued that alpha adjustment is only appropriate in the case of disjunction testing, in which at least one test result must be significant in order to reject the associated joint null hypothesis.

...read moreread less

Abstract: Scientists often adjust their significance threshold (alpha level) during null hypothesis significance testing in order to take into account multiple testing and multiple comparisons. This alpha adjustment has become particularly relevant in the context of the replication crisis in science. The present article considers the conditions in which this alpha adjustment is appropriate and the conditions in which it is inappropriate. A distinction is drawn between three types of multiple testing: disjunction testing, conjunction testing, and individual testing. It is argued that alpha adjustment is only appropriate in the case of disjunction testing, in which at least one test result must be significant in order to reject the associated joint null hypothesis. Alpha adjustment is inappropriate in the case of conjunction testing, in which all relevant results must be significant in order to reject the joint null hypothesis. Alpha adjustment is also inappropriate in the case of individual testing, in which each individual result must be significant in order to reject each associated individual null hypothesis. The conditions under which each of these three types of multiple testing is warranted are examined. It is concluded that researchers should not automatically (mindlessly) assume that alpha adjustment is necessary during multiple testing. Illustrations are provided in relation to joint studywise hypotheses and joint multiway ANOVAwise hypotheses.

...read moreread less

60 citations

Journal Article•DOI•

Bayesian Data Analysis in Empirical Software Engineering Research

[...]

Carlo A. Furia¹, Robert Feldt², Richard Torkar²•Institutions (2)

University of Lugano¹, Chalmers University of Technology²

01 Sep 2021-IEEE Transactions on Software Engineering

TL;DR: In this paper, the authors present Bayesian data analysis techniques that provide tangible benefits, as they can provide clearer results that are simultaneously robust and nuanced, and demonstrate the concrete advantages of the latter.

...read moreread less

Abstract: Statistics comes in two main flavors: frequentist and Bayesian. For historical and technical reasons, frequentist statistics have traditionally dominated empirical data analysis, and certainly remain prevalent in empirical software engineering. This situation is unfortunate because frequentist statistics suffer from a number of shortcomings—such as lack of flexibility and results that are unintuitive and hard to interpret—that curtail their effectiveness when dealing with the heterogeneous data that is increasingly available for empirical analysis of software engineering practice. In this paper, we pinpoint these shortcomings, and present Bayesian data analysis techniques that provide tangible benefits—as they can provide clearer results that are simultaneously robust and nuanced. After a short, high-level introduction to the basic tools of Bayesian statistics, we present the reanalysis of two empirical studies on the effectiveness of automatically generated tests and the performance of programming languages. By contrasting the original frequentist analyses with our new Bayesian analyses, we demonstrate the concrete advantages of the latter. To conclude we advocate a more prominent role for Bayesian statistical techniques in empirical software engineering research and practice.

...read moreread less

58 citations

Journal Article•DOI•

E-values: Calibration, combination and applications

[...]

Vladimir Vovk¹, Ruodu Wang²•Institutions (2)

Royal Holloway, University of London¹, University of Waterloo²

01 Jun 2021-Annals of Statistics

TL;DR: It is demonstrated that e-values are often mathematically more tractable; in particular, in multiple testing of a single hypothesis, e- Values can be merged simply by averaging them, which allows to develop efficient procedures using e- values for testing multiple hypotheses.

...read moreread less

Abstract: Multiple testing of a single hypothesis and testing multiple hypotheses are usually done in terms of p-values. In this paper, we replace p-values with their natural competitor, e-values, which are closely related to betting, Bayes factors and likelihood ratios. We demonstrate that e-values are often mathematically more tractable; in particular, in multiple testing of a single hypothesis, e-values can be merged simply by averaging them. This allows us to develop efficient procedures using e-values for testing multiple hypotheses.

...read moreread less

52 citations

Journal Article•DOI•

How to design the fair experimental classifier evaluation

[...]

Katarzyna Stapor¹, Paweł Ksieniewicz², Salvador García³, Michał Woźniak²•Institutions (3)

Silesian University of Technology¹, Wrocław University of Technology², University of Granada³

01 Jun 2021-Applied Soft Computing

TL;DR: This paper aims to show the commonly used experimental protocols’ weaknesses and discuss if one can trust such evaluation methodology, if all presented evaluations are fair and if it is possible to manipulate the experimental results using well-known statistical evaluation methods.

...read moreread less

51 citations

Journal Article•DOI•

Multiple comparison test by Tukey’s honestly significant difference (HSD): Do the confident level control type I error

[...]

Anita Nanda, Bibhuti Bhusan Mohapatra, Abikesh Prasada Kumar Mahapatra, Abiresh Prasad Kumar Mahapatra, Abinash Prasad Kumar Mahapatra - Show less +1 more

01 Jan 2021

TL;DR: In this article, the authors explored the effect of committing type I error with selection of confidence level or error rate with citing suitable case study in a special education setting in a case study focusing cognitive effect of music on growth and enhancement of the motor behavior (running, jumping and sliding) of children with mild intellectual disability enrolled in the special school setting.

...read moreread less

Abstract: Examining a huge amount of data is a typical issue in any research process. However, different statistical processes and techniques play essential role to derive a meaningful conclusion from the presented enormous data. Control of type I error is highly essential for a researcher or statistician while dealing with comparisons test with more than two variables. Multiple testing statistical tests provides a structural system and minimizes the error rate by helping to derive meaningful accurate conclusions. Among the different multiple test procedures Tukey's honestly significant difference test (Tukey's HSD) is most common and popular techniques. The main objective of this study was to explore how significantly selection of confidence level or error rate can affect the rate of committing type I error while drawing conclusion. The effect of committing type I error with selection of confidence level or error rate was explored with citing suitable case study in a special education setting. The case study focuses cognitive effect of music on growth and enhancement of the motor behavior (running, jumping and sliding) of children with mild intellectual disability enrolled in the special school setting. ANOVA test was performed and the significance of selection of individual confidence level and simultaneous confidence level) in Tukey’s HSD test was described.

...read moreread less

Journal Article•DOI•

MultipleTesting.com: A tool for life science researchers for multiple hypothesis testing correction.

[...]

Otília Menyhárt¹, Boglárka Weltz, Balázs Győrffy¹•Institutions (1)

Semmelweis University¹

09 Jun 2021-PLOS ONE

TL;DR: A much needed practical synthesis of basic statistical concepts regarding multiple hypothesis testing in a comprehensible language with well-illustrated examples and an easy-to-follow guide for selecting the most suitable correction technique is provided.

...read moreread less

Abstract: Scientists from nearly all disciplines face the problem of simultaneously evaluating many hypotheses. Conducting multiple comparisons increases the likelihood that a non-negligible proportion of associations will be false positives, clouding real discoveries. Drawing valid conclusions require taking into account the number of performed statistical tests and adjusting the statistical confidence measures. Several strategies exist to overcome the problem of multiple hypothesis testing. We aim to summarize critical statistical concepts and widely used correction approaches while also draw attention to frequently misinterpreted notions of statistical inference. We provide a step-by-step description of each multiple-testing correction method with clear examples and present an easy-to-follow guide for selecting the most suitable correction technique. To facilitate multiple-testing corrections, we developed a fully automated solution not requiring programming skills or the use of a command line. Our registration free online tool is available at www.multipletesting.com and compiles the five most frequently used adjustment tools, including the Bonferroni, the Holm (step-down), the Hochberg (step-up) corrections, allows to calculate False Discovery Rates (FDR) and q-values. The current summary provides a much needed practical synthesis of basic statistical concepts regarding multiple hypothesis testing in a comprehensible language with well-illustrated examples. The web tool will fill the gap for life science researchers by providing a user-friendly substitute for command-line alternatives.

...read moreread less

Journal Article•DOI•

Common datastream permutations of animal social network data are not appropriate for hypothesis testing using regression models

[...]

Michael N. Weiss¹, Daniel W. Franks², Lauren J. N. Brent¹, Samuel Ellis¹, Matthew J. Silk¹, Darren P. Croft¹ - Show less +2 more•Institutions (2)

University of Exeter¹, University of York²

01 Feb 2021-Methods in Ecology and Evolution

TL;DR: It is shown that datastream permutations typically do not represent the null hypothesis of interest to researchers interfacing animal social network analysis with regression modelling, and simulations are used to demonstrate the potential pitfalls of using this methodology.

...read moreread less

Abstract: O_LISocial network methods have become a key tool for describing, modelling, and testing hypotheses about the social structures of animals. However, due to the non-independence of network data and the presence of confounds, specialized statistical techniques are often needed to test hypotheses in these networks. Datastream permutations, originally developed to test the null hypothesis of random social structure, have become a popular tool for testing a wide array of null hypotheses. In particular, they have been used to test whether exogenous factors are related to network structure by interfacing these permutations with regression models. C_LIO_LIHere, we show that these datastream permutations typically do not represent the null hypothesis of interest to researchers interfacing animal social network analysis with regression modelling, and use simulations to demonstrate the potential pitfalls of using this methodology. C_LIO_LIOur simulations show that utilizing common datastream permutations to test the coefficients of regression models can lead to extremely high type I (false-positive) error rates (> 30%) in the presence of non-random social structure. The magnitude of this problem is primarily dependent on the degree of non-randomness within the social structure and the intensity of sampling C_LIO_LIWe strongly recommend against utilizing datastream permutations to test regression models in animal social networks. We suggest that a potential solution may be found in regarding the problems of non-independence of network data and unreliability of observations as separate problems with distinct solutions. C_LI

...read moreread less

Journal Article•DOI•

Uncertain hypothesis test with application to uncertain regression analysis

[...]

Tingqing Ye¹, Baoding Liu¹•Institutions (1)

Tsinghua University¹

09 Jul 2021-Fuzzy Optimization and Decision Making

TL;DR: In this paper, uncertain hypothesis test is employed in uncertain regression analysis to test whether the estimated disturbance term and the fitted regression model are appropriate, in order to illustrate the test process.

...read moreread less

Abstract: This paper first establishes uncertain hypothesis test as a mathematical tool that uses uncertainty theory to help people rationally judge whether some hypotheses are correct or not, according to observed data. As an application, uncertain hypothesis test is employed in uncertain regression analysis to test whether the estimated disturbance term and the fitted regression model are appropriate. In order to illustrate the test process, some numerical examples are documented.

...read moreread less

Journal Article•DOI•

Attribute reduction methods in fuzzy rough set theory: An overview, comparative experiments, and new directions

[...]

Zhong Yuan¹, Hongmei Chen¹, Peng Xie¹, Pengfei Zhang¹, Jia Liu¹, Tianrui Li¹ - Show less +2 more•Institutions (1)

Southwest Jiaotong University¹

01 Aug 2021-Applied Soft Computing

TL;DR: In order to comprehensively investigate attribute reduction methods in fuzzy rough set theory, all methods are summarized through six different aspects including data sources, preprocessing methods, fuzzy similarity metrics, fuzzy operations, reduction rules, and evaluation methods.

...read moreread less

Journal Article•DOI•

Analysis goals, error-cost sensitivity, and analysis hacking: Essential considerations in hypothesis testing and multiple comparisons.

[...]

Sander Greenland¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 2021-Paediatric and Perinatal Epidemiology

TL;DR: Issues arising in single-parameter inference (such as error costs and loss functions) that are often skipped in basic statistics, yet are crucial to understanding controversies in testing and multiple comparisons are reviewed.

...read moreread less

Abstract: The "replication crisis" has been attributed to perverse incentives that lead to selective reporting and misinterpretations of P-values and confidence intervals. A crude fix offered for this problem is to lower testing cut-offs (α levels), either directly or in the form of null-biased multiple comparisons procedures such as naive Bonferroni adjustments. Methodologists and statisticians have expressed positions that range from condemning all such procedures to demanding their application in almost all analyses. Navigating between these unjustifiable extremes requires defining analysis goals precisely enough to separate inappropriate from appropriate adjustments. To meet this need, I here review issues arising in single-parameter inference (such as error costs and loss functions) that are often skipped in basic statistics, yet are crucial to understanding controversies in testing and multiple comparisons. I also review considerations that should be made when examining arguments for and against modifications of decision cut-offs and adjustments for multiple comparisons. The goal is to provide researchers a better understanding of what is assumed by each side and to enable recognition of hidden assumptions. Basic issues of goal specification and error costs are illustrated with simple fixed cut-off hypothesis testing scenarios. These illustrations show how adjustment choices are extremely sensitive to implicit decision costs, making it inevitable that different stakeholders will vehemently disagree about what is necessary or appropriate. Because decisions cannot be justified without explicit costs, resolution of inference controversies is impossible without recognising this sensitivity. Pre-analysis statements of funding, scientific goals, and analysis plans can help counter demands for inappropriate adjustments, and can provide guidance as to what adjustments are advisable. Hierarchical (multilevel) regression methods (including Bayesian, semi-Bayes, and empirical-Bayes methods) provide preferable alternatives to conventional adjustments, insofar as they facilitate use of background information in the analysis model, and thus can provide better-informed estimates on which to base inferences and decisions.

...read moreread less

Journal Article•DOI•

Concept Drift Detection via Equal Intensity k-Means Space Partitioning

[...]

Anjin Liu¹, Jie Lu¹, Guangquan Zhang¹•Institutions (1)

University of Technology, Sydney¹

18 May 2021-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A cluster-based histogram, called equal intensity $k$ -means space partitioning (EI-kMeans) is proposed and a heuristic method to improve the sensitivity of drift detection is introduced.

...read moreread less

Abstract: The data stream poses additional challenges to statistical classification tasks because distributions of the training and target samples may differ as time passes. Such a distribution change in streaming data is called concept drift. Numerous histogram-based distribution change detection methods have been proposed to detect drift. Most histograms are developed on the grid-based or tree-based space partitioning algorithms which makes the space partitions arbitrary, unexplainable, and may cause drift blind spots. There is a need to improve the drift detection accuracy for the histogram-based methods with the unsupervised setting. To address this problem, we propose a cluster-based histogram, called equal intensity $k$ -means space partitioning (EI-kMeans). In addition, a heuristic method to improve the sensitivity of drift detection is introduced. The fundamental idea of improving the sensitivity is to minimize the risk of creating partitions in distribution offset regions. Pearson’s chi-square test is used as the statistical hypothesis test so that the test statistics remain independent of the sample distribution. The number of bins and their shapes, which strongly influence the ability to detect drift, are determined dynamically from the sample based on an asymptotic constraint in the chi-square test. Accordingly, three algorithms are developed to implement concept drift detection, including a greedy centroids initialization algorithm, a cluster amplify–shrink algorithm, and a drift detection algorithm. For drift adaptation, we recommend retraining the learner if a drift is detected. The results of experiments on the synthetic and real-world datasets demonstrate the advantages of EI-kMeans and show its efficacy in detecting concept drift.

...read moreread less

Journal Article•DOI•

Choosing the Level of Significance: A Decision‐theoretic Approach

[...]

Jae H. Kim¹, In Choi²•Institutions (2)

La Trobe University¹, Sogang University²

01 Mar 2021-Abacus

TL;DR: In this article, the authors present a decision-theoretic approach to choosing the optimal level of significance, with a consideration of the key factors of hypothesis testing, including sample size, prior belief, and losses from Type I and II errors.

...read moreread less

Abstract: In many areas of science, including business disciplines, statistical decisions are often made almost exclusively at a conventional level of significance. Serious concerns have been raised that this contributes to a range of poor practices such as p‐hacking and data‐mining that undermine research credibility. In this paper, we present a decision‐theoretic approach to choosing the optimal level of significance, with a consideration of the key factors of hypothesis testing, including sample size, prior belief, and losses from Type I and II errors. We present the method in the context of testing for linear restrictions in the linear regression model. From the empirical applications in accounting, economics, and finance, we find that the decisions made at the optimal significance levels are more sensible and unambiguous than those at a conventional level, providing inferential outcomes consistent with estimation results, descriptive analysis, and economic reasoning. Computational resources are provided with two R packages.

...read moreread less

Journal Article•DOI•

The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing

[...]

Cencheng Shen¹, Joshua T. Vogelstein²•Institutions (2)

University of Delaware¹, Johns Hopkins University²

01 Sep 2021-AStA Advances in Statistical Analysis

TL;DR: In this article, a simple and elegant bijection between metric and kernel is proposed to better preserve the similarity structure, allow distance correlation and Hilbert-Schmidt independence criterion to be always the same for hypothesis testing, streamlines the code base for implementation, and enables a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.

...read moreread less

Abstract: Distance correlation and Hilbert-Schmidt independence criterion are widely used for independence testing, two-sample testing, and many inference tasks in statistics and machine learning. These two methods are tightly related, yet are treated as two different entities in the majority of existing literature. In this paper, we propose a simple and elegant bijection between metric and kernel. The bijective transformation better preserves the similarity structure, allows distance correlation and Hilbert-Schmidt independence criterion to be always the same for hypothesis testing, streamlines the code base for implementation, and enables a rich literature of distance-based and kernel-based methodologies to directly communicate with each other.

...read moreread less

Journal Article•DOI•

Asymptotically independent u-statistics in high-dimensional testing.

[...]

Yinqiu He¹, Gongjun Xu¹, Chong Wu², Wei Pan³•Institutions (3)

University of Michigan¹, Florida State University², University of Minnesota³

29 Jan 2021-Annals of Statistics

TL;DR: In this article, a family of U-statistics as unbiased estimators of the normalized features of high-dimensional joint distributions of the features of a highdimensional joint distribution is presented.

...read moreread less

Abstract: Many high-dimensional hypothesis tests aim to globally examine marginal or low-dimensional features of a high-dimensional joint distribution, such as testing of mean vectors, covariance matrices and regression coefficients This paper constructs a family of U-statistics as unbiased estimators of the $\ell_{p}$-norms of those features We show that under the null hypothesis, the U-statistics of different finite orders are asymptotically independent and normally distributed Moreover, they are also asymptotically independent with the maximum-type test statistic, whose limiting distribution is an extreme value distribution Based on the asymptotic independence property, we propose an adaptive testing procedure which combines $p$-values computed from the U-statistics of different orders We further establish power analysis results and show that the proposed adaptive procedure maintains high power against various alternatives

...read moreread less

Journal Article•DOI•

A novel probabilistic intuitionistic fuzzy set based model for high order fuzzy time series forecasting

[...]

Radha Mohan Pattanayak¹, Himansu Sekhar Behera¹, Sibarama Panigrahi²•Institutions (2)

Veer Surendra Sai University of Technology¹, Sambalpur University²

01 Mar 2021-Engineering Applications of Artificial Intelligence

TL;DR: In this article, a probabilistic intuitionistic fuzzy time series forecasting (PIFTSF) model using support vector machine (SVM) is proposed to address both uncertainty and non-determinism associated with real world time series data.

...read moreread less

Journal Article•DOI•

Statistical assumptions in L2 research: A systematic review

[...]

Yuhang Hu¹, Luke Plonsky¹•Institutions (1)

Northern Arizona University¹

01 Jan 2021-Second Language Research

TL;DR: Overall weak reporting of checks on assumptions in two major journals of second-language (L2) research over a span of six years is examined as well as implications for researcher training.

...read moreread less

Abstract: Statistical tests carry with them a number of assumptions that must be checked. Failing to do so and to report the results of such preliminary analyses introduce a potential threat to the internal ...

...read moreread less

Journal Article•DOI•

True scale-free networks hidden by finite size effects.

[...]

Matteo Serafino, Giulio Cimini¹, Amos Maritan², Andrea Rinaldo³, Andrea Rinaldo², Samir Suweis², Jayanth R. Banavar⁴, Guido Caldarelli - Show less +4 more•Institutions (4)

University of Rome Tor Vergata¹, University of Padua², École Polytechnique Fédérale de Lausanne³, University of Oregon⁴

12 Jan 2021-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: In this article, the authors analyze about 200 naturally occurring networks with distinct dynamical origins to formally test whether the commonly assumed hypothesis of an underlying scale-free structure is generally viable.

...read moreread less

Abstract: We analyze about 200 naturally occurring networks with distinct dynamical origins to formally test whether the commonly assumed hypothesis of an underlying scale-free structure is generally viable. This has recently been questioned on the basis of statistical testing of the validity of power law distributions of network degrees. Specifically, we analyze by finite size scaling analysis the datasets of real networks to check whether the purported departures from power law behavior are due to the finiteness of sample size. We find that a large number of the networks follows a finite size scaling hypothesis without any self-tuning. This is the case of biological protein interaction networks, technological computer and hyperlink networks, and informational networks in general. Marked deviations appear in other cases, especially involving infrastructure and transportation but also in social networks. We conclude that underlying scale invariance properties of many naturally occurring networks are extant features often clouded by finite size effects due to the nature of the sample data.

...read moreread less

Journal Article•DOI•

Algorithm Supported Induction for Building Theory: How Can We Use Prediction Models to Theorize?

[...]

Yash Raj Shrestha¹, Vivianna Fang He², Phanish Puranam³, Georg von Krogh¹•Institutions (3)

ETH Zurich¹, ESSEC Business School², INSEAD³

01 May 2021-Organization Science

TL;DR: Across many fields of social science, machine learning algorithms are rapidly advancing research as tools to support traditional hypothesis testing research through data reduction and a...

...read moreread less

Abstract: Across many fields of social science, machine learning (ML) algorithms are rapidly advancing research as tools to support traditional hypothesis testing research (e.g., through data reduction and a...

...read moreread less

Journal Article•DOI•

Training Students and Researchers in Bayesian Methods

[...]

Bruno Lecoutre

13 Jul 2021-Journal of data science

TL;DR: The conclusion is that teaching the Bayesian approach in the context of experimental data analysis appears both desirable and feasible.

...read moreread less

Abstract: Frequentist Null Hypothesis Significance Testing (NHST) is so an integral part of scientists' behavior that its uses cannot be discontinued by flinging it out of the window. Faced with this situation, the suggested strategy for training students and researchers in statistical inference meth- ods for experimental data analysis involves a smooth transition towards the Bayesian paradigm. Its general outlines are as follows. (1) To present nat- ural Bayesian interpretations of NHST outcomes to draw attention to their shortcomings. (2) To create as a result of this the need for a change of em- phasis in the presentation and interpretation of results. (3) Finally to equip users with a real possibility of thinking sensibly about statistical inference problems and behaving in a more reasonable manner. The conclusion is that teaching the Bayesian approach in the context of experimental data analysis appears both desirable and feasible. This feasibility is illustrated for analysis of variance methods.

...read moreread less

Journal Article•DOI•

A tutorial on assessing statistical power and determining sample size for structural equation models.

[...]

Lisa J. Jobst¹, Martina Bader¹, Morten Moshagen¹•Institutions (1)

University of Ulm¹

21 Oct 2021-Psychological Methods

TL;DR: A step-by-step illustration of how a priori, post hoc, and compromise power analyses can be conducted for a range of different structural equation modeling applications can be found in this paper.

...read moreread less

Abstract: Structural equation modeling (SEM) is a widespread approach to test substantive hypotheses in psychology and other social sciences. However, most studies involving structural equation models neither report statistical power analysis as a criterion for sample size planning nor evaluate the achieved power of the performed tests. In this tutorial, we provide a step-by-step illustration of how a priori, post hoc, and compromise power analyses can be conducted for a range of different SEM applications. Using illustrative examples and the R package semPower, we demonstrate power analyses for hypotheses regarding overall model fit, global model comparisons, particular individual model parameters, and differences in multigroup contexts (such as in tests of measurement invariance). We encourage researchers to yield reliable-and thus more replicable-results based on thoughtful sample size planning, especially if small or medium-sized effects are expected. (PsycInfo Database Record (c) 2021 APA, all rights reserved).

...read moreread less

Journal Article•DOI•

On the Length of Post-Model-Selection Confidence Intervals Conditional on Polyhedral Constraints

[...]

Danijel Kivaranovic¹, Hannes Leeb¹•Institutions (1)

University of Vienna¹

03 Apr 2021-Journal of the American Statistical Association

TL;DR: In this paper, the polyhedral method is used for valid inference after model selection, which is a very active area of research in the area of probabilistic model selection and inference.

...read moreread less

Abstract: Valid inference after model selection is currently a very active area of research. The polyhedral method, introduced in an article by Lee et al., allows for valid inference after model selection if...

...read moreread less

Journal Article•DOI•

Application of Bayesian networks to generate synthetic health data.

[...]

Dhamanpreet Kaur¹, Matthew Sobiesk¹, Shubham Patil², Jin Liu³, Puran Bhagat³, Amar Gupta¹, Natasha Markuzon³ - Show less +3 more•Institutions (3)

Massachusetts Institute of Technology¹, Rochester Institute of Technology², Philips³

18 Mar 2021-Journal of the American Medical Informatics Association

TL;DR: In this article, a Bayesian network was used to learn probabilistic graphical structures and simulated synthetic patient records from the learned structure, which could be used by medical organizations to distribute health data to researchers, reducing the need for access to real data.

...read moreread less

Journal Article•DOI•

Permutation tests for hypothesis testing with animal social network data: Problems and potential solutions

[...]

Damien R. Farine¹, Damien R. Farine², Gerald G. Carter³, Gerald G. Carter⁴•Institutions (4)

University of Zurich¹, University of Konstanz², Ohio State University³, Smithsonian Tropical Research Institute⁴

28 Oct 2021-Methods in Ecology and Evolution

TL;DR: The double permutation procedure provides one potential solution to issues arising from elevated type I and type II error rates when testing null hypotheses with social network data, and is suggested to be less likely to produce elevated error rates relative to using only node permutations, pre‐network permutations or node permutation with simple covariates.

...read moreread less

Journal Article•DOI•

Estimating the cumulative incidence of COVID-19 in the United States using influenza surveillance, virologic testing, and mortality data: Four complementary approaches.

[...]

Fred Lu¹, Andre T. Nguyen², Andre T. Nguyen³, Nicholas B. Link⁴, Mathieu Molina⁴, Jessica T. Davis⁵, Matteo Chinazzi⁵, Xinyue Xiong⁵, Alessandro Vespignani⁵, Marc Lipsitch⁶, Mauricio Santillana⁶, Mauricio Santillana⁴ - Show less +8 more•Institutions (6)

Stanford University¹, University of Maryland, Baltimore County², Booz Allen Hamilton³, Boston Children's Hospital⁴, Northeastern University⁵, Harvard University⁶

17 Jun 2021-PLOS Computational Biology

TL;DR: In this article, the authors introduce four complementary approaches to estimate the cumulative incidence of symptomatic COVID-19 in each state in the US as well as Puerto Rico and the District of Columbia.

...read moreread less

Abstract: Effectively designing and evaluating public health responses to the ongoing COVID-19 pandemic requires accurate estimation of the prevalence of COVID-19 across the United States (US). Equipment shortages and varying testing capabilities have however hindered the usefulness of the official reported positive COVID-19 case counts. We introduce four complementary approaches to estimate the cumulative incidence of symptomatic COVID-19 in each state in the US as well as Puerto Rico and the District of Columbia, using a combination of excess influenza-like illness reports, COVID-19 test statistics, COVID-19 mortality reports, and a spatially structured epidemic model. Instead of relying on the estimate from a single data source or method that may be biased, we provide multiple estimates, each relying on different assumptions and data sources. Across our four approaches emerges the consistent conclusion that on April 4, 2020, the estimated case count was 5 to 50 times higher than the official positive test counts across the different states. Nationally, our estimates of COVID-19 symptomatic cases as of April 4 have a likely range of 2.3 to 4.8 million, with possibly as many as 7.6 million cases, up to 25 times greater than the cumulative confirmed cases of about 311,000. Extending our methods to May 16, 2020, we estimate that cumulative symptomatic incidence ranges from 4.9 to 10.1 million, as opposed to 1.5 million positive test counts. The proposed combination of approaches may prove useful in assessing the burden of COVID-19 during resurgences in the US and other countries with comparable surveillance systems.

...read moreread less

Collapse