scispace - formally typeset
Search or ask a question
Topic

Statistical hypothesis testing

About: Statistical hypothesis testing is a research topic. Over the lifetime, 19580 publications have been published within this topic receiving 1037815 citations. The topic is also known as: statistical hypothesis testing & confirmatory data analysis.


Papers
More filters
Journal ArticleDOI
TL;DR: GeneMerge is a web-based and standalone program written in PERL that returns a range of functional and genomic data for a given set of study genes and provides statistical rank scores for over-representation of particular functions or categories in the data set.
Abstract: Summary: GeneMerge is a web-based and standalone program written in PERL that returns a range of functional and genomic data for a given set of study genes and provides statistical rank scores for over-representation of particular functions or categories in the data set. Functional or categorical data of all kinds can be analyzed with GeneMerge, facilitating regulatory and metabolic pathway analysis, tests of population genetic hypotheses, cross-experiment comparisons, and tests of chromosomal clustering, among others. GeneMerge can perform analyses on a wide variety of genomic data quickly and easily and facilitates both data mining and hypothesis testing. Availability: GeneMerge is available free of charge for academic use over the web and for download from: http://www.oeb.harvard.edu/hartl/lab/publications/ GeneMerge.html.

321 citations

Journal ArticleDOI
01 Sep 2007
TL;DR: GPower as mentioned in this paper is a free general power analysis program available in two essentially equivalent versions, one designed for Macintosh OS/OS X and the other designed for MS DOS/Windows platforms Psychological research examples are presented to illustrate the various features of the GPower software.
Abstract: The purpose of this paper is to promote statistical power analysis in the behavioral sciences by introducing the easy to use GPower software GPower is a free general power analysis program available in two essentially equivalent versions, one designed for Macintosh OS/OS X and the other for MS‐DOS/Windows platforms Psychological research examples are presented to illustrate the various features of the GPower software In particular, a priori, post‐hoc, and compromise power analyses for t‐tests, F‐tests, and χ2‐tests will be demonstrated For all examples, the underlying statistical concepts as well as the implementation in GPower will be described In the behavioral sciences, we routinely apply statistical tests, but control of statistical power cannot be taken for granted However, neglecting statistical power—the probability of rejecting false null hypotheses—can have severe consequences For example, without control of statistical power it is very difficult to interpret nonsignificant results Statistical tests can produce nonsignificant results because (a) the null hypothesis (H0) holds and is retained correctly or (b) the alternative hypothesis (H1) holds but the test has not been powerful enough to detect the deviations from H0 Obviously, there is no reasonable way to decide between interpretations (a) and

321 citations

Journal ArticleDOI
01 Aug 2006-Ecology
TL;DR: In this article, a simulation envelope is created by calculating, at every distance, the minimum and maximum results computed across the simulated patterns, and a statistical test is performed by evaluating where the results from an observed pattern fall with respect to the simulation envelope.
Abstract: Spatial point pattern analysis provides a statistical method to compare an observed spatial pattern against a hypothesized spatial process model. The G statistic, which considers the distribution of nearest neighbor distances, and the K statistic, which evaluates the distribution of all neighbor distances, are commonly used in such analyses. One method of employing these statistics involves building a simulation envelope from the result of many simulated patterns of the hypothesized model. Specifically, a simulation envelope is created by calculating, at every distance, the minimum and maximum results computed across the simulated patterns. A statistical test is performed by evaluating where the results from an observed pattern fall with respect to the simulation envelope. However, this method, which differs from P. Diggle's suggested approach, is invalid for inference because it violates the assumptions of Monte Carlo methods and results in incorrect type I error rate performance. Similarly, using the simulation envelope to estimate the range of distances over which an observed pattern deviates from the hypothesized model is also suspect. The technical details of why the simulation envelope provides incorrect type I error rate performance are described. A valid test is then proposed, and details about how the number of simulated patterns impacts the statistical significance are explained. Finally, an example of using the proposed test within an exploratory data analysis framework is provided.

320 citations

Book ChapterDOI
TL;DR: The Fisher and Neyman-Pearson approaches to testing statistical hypotheses are compared with respect to their attitudes to the interpretation of the outcome, to power, to conditioning, and to the use of fixed significance levels as discussed by the authors.
Abstract: The Fisher and Neyman-Pearson approaches to testing statistical hypotheses are compared with respect to their attitudes to the interpretation of the outcome, to power, to conditioning, and to the use of fixed significance levels. It is argued that despite basic philosophical differences, in their main practical aspects the two theories are complementary rather than contradictory and that a unified approach is possible that combines the best features of both. As applications, the controversies about the Behrens-Fisher problem and the comparison of two binomials (2 × 2 tables) are considered from the present point of view.

319 citations

Journal ArticleDOI
TL;DR: Investigation of the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses, suggest that model partitioning is important for large data sets.
Abstract: As larger, more complex data sets are being used to infer phylogenies, accuracy of these phylogenies increasingly requires models of evolution that accommodate heterogeneity in the processes of molecular evolution. We investigated the effect of improper data partitioning on phylogenetic accuracy, as well as the type I error rate and sensitivity of Bayes factors, a commonly used method for choosing among different partitioning strategies in Bayesian analyses. We also used Bayes factors to test empirical data for the need to divide data in a manner that has no expected biological meaning. Posterior probability estimates are misleading when an incorrect partitioning strategy is assumed. The error was greatest when the assumed model was underpartitioned. These results suggest that model partitioning is important for large data sets. Bayes factors performed well, giving a 5% type I error rate, which is remarkably consistent with standard frequentist hypothesis tests. The sensitivity of Bayes factors was found to be quite high when the across-class model heterogeneity reflected that of empirical data. These results suggest that Bayes factors represent a robust method of choosing among partitioning strategies. Lastly, results of tests for the inclusion of unexpected divisions in empirical data mirrored the simulation results, although the outcome of such tests is highly dependent on accounting for rate variation among classes. We conclude by discussing other approaches for partitioning data, as well as other applications of Bayes factors.

319 citations


Network Information
Related Topics (5)
Estimator
97.3K papers, 2.6M citations
88% related
Linear model
19K papers, 1M citations
88% related
Inference
36.8K papers, 1.3M citations
87% related
Regression analysis
31K papers, 1.7M citations
86% related
Sampling (statistics)
65.3K papers, 1.2M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023267
2022696
2021959
2020998
20191,033
2018943