scispace - formally typeset
Search or ask a question
Topic

Statistical hypothesis testing

About: Statistical hypothesis testing is a research topic. Over the lifetime, 19580 publications have been published within this topic receiving 1037815 citations. The topic is also known as: statistical hypothesis testing & confirmatory data analysis.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, a Bayesian framework for exploratory data analysis based on posterior predictive checks is presented, which can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis.
Abstract: Summary Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)—which are generally considered as unrelated statistical paradigms—can be particularly effective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data yrep and replicated parameters θrep follows a long tradition of generalizations in Bayesian theory. On the theoretical level, we present a predictive Bayesian formulation of goodness-of-fit testing, distinguishing between p-values (posterior probabilities that specified antisymmetric discrepancy measures will exceed 0) and u-values (data summaries with uniform sampling distributions). We explain that p-values, unlike u-values, are Bayesian probability statements in that they condition on observed data. Having reviewed the general theoretical framework, we discuss the implications for statistical graphics and exploratory data analysis, with the goal being to unify exploratory data analysis with more formal statistical methods based on probability models. We interpret various graphical displays as posterior predictive checks and discuss how Bayesian inference can be used to determine reference distributions. The goal of this work is not to downgrade descriptive statistics, or to suggest they be replaced by Bayesian modeling, but rather to suggest how exploratory data analysis fits into the probability-modeling paradigm. We conclude with a discussion of the implications for practical Bayesian inference. In particular, we anticipate that Bayesian software can be generalized to draw simulations of replicated data and parameters from their posterior predictive distribution, and these can in turn be used to calibrate EDA graphs.

239 citations

Journal ArticleDOI
TL;DR: Design considerations and the role of randomization-based inference in randomized community intervention trials are discussed, and it is stressed that longitudinal follow-up of cohorts within communities often yields useful information on the effects of intervention on individuals, whereas cross-sectional surveys can usefully assess the impact of intervention.
Abstract: This paper discusses design considerations and the role of randomization-based inference in randomized community intervention trials. We stress that longitudinal follow-up of cohorts within communities often yields useful information on the effects of intervention on individuals, whereas cross-sectional surveys can usefully assess the impact of intervention on group indices of health. We also discuss briefly special design considerations, such as sampling cohorts from targeted subpopulations (for example, heavy smokers), matching the communities, calculating sample size, and other practical issues. We present randomization tests for matched and unmatched cohort designs. As is well known, these tests necessarily have proper size under the strong null hypothesis that treatment has no effect on any community response. It is less well known, however, that the size of randomization tests can exceed nominal levels under the ‘weak’ null hypothesis that intervention does not affect the average community response. Because this weak null hypothesis is of interest in community intervention trials, we study the size of randomization tests by simulation under conditions in which the weak null hypothesis holds but the strong null hypothesis does not. In unmatched studies, size may exceed nominal levels under the weak null hypothesis if there are more intervention than control communities and if the variance among community responses is larger among control communities than among intervention communities; size may also exceed nominal levels if there are more control than intervention communities and if the variance among community responses is larger among intervention communities. Otherwise, size is likely near nominal levels. To avoid such problems, we recommend use of the same numbers of control and intervention communities in unmatched designs. Pair-matched designs usually have size near nominal levels, even under the weak null hypothesis. We have identified some extreme cases, unlikely to arise in practice, in which even the size of pair-matched studies can exceed nominal levels. These simulations, however, tend to confirm the robustness of randomization tests for matched and unmatched community intervention trials, particularly if the latter designs have equal numbers of intervention and control communities. We also describe adaptations of randomization tests to allow for covariate adjustment, missing data, and application to cross-sectional surveys. We show that covariate adjustment can increase power, but such power gains diminish as the random component of variation among communities increases, which corresponds to increasing intraclass correlation of responses within communities. We briefly relate our results to model-based methods of inference for community intervention trials that include hierarchical models such as an analysis of variance model with random community effects and fixed intervention effects. Although we have tailored this paper to the design of community intervention trials, many of the ideas apply to other experiments in which one allocates groups or clusters of subjects at random to intervention or control treatments.

239 citations

Journal ArticleDOI
TL;DR: In this article, the authors introduce a family of goodness-of-fit statistics for testing composite null hypotheses in multidimensional contingency tables, which are quadratic forms in marginal residuals up to order r.
Abstract: We introduce a family of goodness-of-fit statistics for testing composite null hypotheses in multidimensional contingency tables. These statistics are quadratic forms in marginal residuals up to order r. They are asymptotically chi-square under the null hypothesis when parameters are estimated using any asymptotically normal consistent estimator. For a widely used item response model, when r is small and multidimensional tables are sparse, the proposed statistics have accurate empirical Type I errors, unlike Pearson’s X2. For this model in nonsparse situations, the proposed statistics are also more powerful than X2. In addition, the proposed statistics are asymptotically chi-square when applied to subtables, and can be used for a piecewise goodness-of-fit assessment to determine the source of misfit in poorly fitting models.

239 citations

Journal ArticleDOI
Hyun Kang1
TL;DR: G*Power as mentioned in this paper is a free software that supports sample size and power calculation for various statistical methods (F, t, χ2, Z, and exact tests) and is easy to use and free.
Abstract: Appropriate sample size calculation and power analysis have become major issues in research and publication processes. However, the complexity and difficulty of calculating sample size and power require broad statistical knowledge, there is a shortage of personnel with programming skills, and commercial programs are often too expensive to use in practice. The review article aimed to explain the basic concepts of sample size calculation and power analysis; the process of sample estimation; and how to calculate sample size using G*Power software (latest ver. 3.1.9.7; Heinrich-Heine-Universitat Dusseldorf, Dusseldorf, Germany) with 5 statistical examples. The null and alternative hypothesis, effect size, power, alpha, type I error, and type II error should be described when calculating the sample size or power. G*Power is recommended for sample size and power calculations for various statistical methods (F, t, χ2, Z, and exact tests), because it is easy to use and free. The process of sample estimation consists of establishing research goals and hypotheses, choosing appropriate statistical tests, choosing one of 5 possible power analysis methods, inputting the required variables for analysis, and selecting the “calculate” button. The G*Power software supports sample size and power calculation for various statistical methods (F, t, χ2, z, and exact tests). This software is helpful for researchers to estimate the sample size and to conduct power analysis.

238 citations

Journal ArticleDOI
TL;DR: An algorithm is proposed, AFTER, to convexly combine the models for a better performance of prediction, and the results show an advantage of combining by AFTER over selection in terms of forecasting accuracy at several settings.

238 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
88% related
Linear model
19K papers, 1M citations
88% related
Inference
36.8K papers, 1.3M citations
87% related
Regression analysis
31K papers, 1.7M citations
86% related
Sampling (statistics)
65.3K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023267
2022696
2021959
2020998
20191,033
2018943