scispace - formally typeset
Search or ask a question
Topic

Statistical hypothesis testing

About: Statistical hypothesis testing is a research topic. Over the lifetime, 19580 publications have been published within this topic receiving 1037815 citations. The topic is also known as: statistical hypothesis testing & confirmatory data analysis.


Papers
More filters
Journal ArticleDOI
01 Mar 2013-Oikos
TL;DR: In this article, the authors compared different schemes and measures for testing model performance using 79 species from the North American Breeding Bird Survey (NBS) and found that higher levels of independence between test and training data lead to lower assessments of prediction accuracy.
Abstract: Distribution models are used to predict the likelihood of occurrence or abundance of a species at locations where census data are not available. An integral part of modelling is the testing of model performance. We compared different schemes and measures for testing model performance using 79 species from the North American Breeding Bird Survey. The four testing schemes we compared featured increasing independence between test and training data: resubstitution, random data hold-out and two spatially segregated data hold-out designs. The different testing measures also addressed different levels of information content in the dependent variable: regression R 2 for absolute abundance, squared correlation coefficient r 2 for relative abundance and AUC/Somer’s D for presence/absence. We found that higher levels of independence between test and training data lead to lower assessments of prediction accuracy. Even for data collected independently, spatial autocorrelation leads to dependence between random hold-out test data and training data, and thus to inflated measures of model performance. While there is a general awareness of the importance of autocorrelation to model building and hypothesis testing, its consequences via violation of independence between training and testing data have not been addressed systematically and comprehensively before. Furthermore, increasing information content (from correctly classifying presence/absence, to predicting relative abundance, to predicting absolute abundance) leads to decreasing predictive performance. The current tests for presence/absence distribution models are typically overly optimistic because a) the test and training data are not independent and b) the correct classification of presence/absence has a relatively low information content and thus capability to address ecological and conservation questions compared to a prediction of abundance. Meaningful evaluation of model performance requires testing on spatially independent data, if the intended application of the model is to predict into new geographic or climatic space, which arguably is the case for most applications of distribution models.

194 citations

Journal ArticleDOI
TL;DR: The following problem is addressed: given that the peripheral encoders that satisfy capacity constraints are scalar quantizers, how should they be designed in order that the central test to be performed on their output indices is most powerful?
Abstract: In a decentralized hypothesis testing network, several peripheral nodes observe an environment and communicate their observations to a central node for the final decision. The presence of capacity constraints introduces theoretical and practical problems. The following problem is addressed: given that the peripheral encoders that satisfy these constraints are scalar quantizers, how should they be designed in order that the central test to be performed on their output indices is most powerful? The scheme is called cooperative design-separate encoding since the quantizers process separate observations but have a common goal; they seek to maximize a system-wide performance measure. The Bhattacharyya distance of the joint index space as such a criterion is suggested, and a design algorithm to optimize arbitrarily many quantizers cyclically is proposed. A simplified version of the algorithm, namely an independent design-separate encoding scheme, where the correlation is either absent or neglected for the sake of simplicity, is outlined. Performances are compared through worked examples. >

193 citations

Journal ArticleDOI
07 Sep 1996-BMJ
TL;DR: Data presented as a series of posterior probability distributions would be a much better guide to policy, reflecting the reality that degrees of belief are often continuous, not dichotomous, and often vary from one person to another in the face of inconclusive evidence.
Abstract: The recent controversy over the increased risk of venous thrombosis with third generation oral contraceptives illustrates the public policy dilemma that can be created by relying on conventional statistical tests and estimates: case-control studies showed a significant increase in risk and forced a decision either to warn or not to warn. Conventional statistical tests are an improper basis for such decisions because they dichotomise results according to whether they are or are not significant and do not allow decision makers to take explicit account of additional evidence—for example, of biological plausibility or of biases in the studies. A Bayesian approach overcomes both these problems. A Bayesian analysis starts with a “prior” probability distribution for the value of interest (for example, a true relative risk)—based on previous knowledge—and adds the new evidence (via a model) to produce a “posterior” probability distribution. Because different experts will have different prior beliefs sensitivity analyses are important to assess the effects on the posterior distributions of these differences. Sensitivity analyses should also examine the effects of different assumptions about biases and about the model which links the data with the value of interest. One advantage of this method is that it allows such assumptions to be handled openly and explicitly. Data presented as a series of posterior probability distributions would be a much better guide to policy, reflecting the reality that degrees of belief are often continuous, not dichotomous, and often vary from one person to another in the face of inconclusive evidence. Every five to 10 years a “pill scare” hits the headlines. Imagine that you are the chairperson of the Committee on Safety of Medicines. You have been sent the galley proofs of four case-control studies showing that the leading brands of oral contraceptive, which have been widely used for some five years, …

193 citations

Journal ArticleDOI
TL;DR: The pitfalls for commonly used statistical techniques in dental research are described and some recommendations for avoiding them and the potential of some of the newer statistical techniques for dental research is explored.

193 citations

Journal ArticleDOI
TL;DR: The results of analyses of the Type 1 error efficiency and power of standard parametric and non-parametric statistical tests when applied to non-normal data sets are summarised.
Abstract: There have been many changes in statistical theory in the past 30 years, including increased evidence that non-robust methods may fail to detect important results. The statistical advice available to software engineering researchers needs to be updated to address these issues. This paper aims both to explain the new results in the area of robust analysis methods and to provide a large-scale worked example of the new methods. We summarise the results of analyses of the Type 1 error efficiency and power of standard parametric and non-parametric statistical tests when applied to non-normal data sets. We identify parametric and non-parametric methods that are robust to non-normality. We present an analysis of a large-scale software engineering experiment to illustrate their use. We illustrate the use of kernel density plots, and parametric and non-parametric methods using four different software engineering data sets. We explain why the methods are necessary and the rationale for selecting a specific analysis. We suggest using kernel density plots rather than box plots to visualise data distributions. For parametric analysis, we recommend trimmed means, which can support reliable tests of the differences between the central location of two or more samples. When the distribution of the data differs among groups, or we have ordinal scale data, we recommend non-parametric methods such as Cliff's ź or a robust rank-based ANOVA-like method.

192 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
88% related
Linear model
19K papers, 1M citations
88% related
Inference
36.8K papers, 1.3M citations
87% related
Regression analysis
31K papers, 1.7M citations
86% related
Sampling (statistics)
65.3K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023267
2022696
2021959
2020998
20191,033
2018943