scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Bayesian versus Orthodox statistics: which side are you on?

01 May 2011-Perspectives on Psychological Science (SAGE Publications)-Vol. 6, Iss: 3, pp 274-290
TL;DR: This article presents some common situations in which Bayesian and orthodox approaches to significance testing come to different conclusions; the reader is shown how to apply Bayesian inference in practice, using free online software, to allow more coherent inferences from data.
Abstract: Researchers are often confused about what can be inferred from significance tests. One problem occurs when people apply Bayesian intuitions to significance testing—two approaches that must be firmly separated. This article presents some common situations in which the approaches come to different conclusions; you can see where your intuitions initially lie. The situations include multiple testing, deciding when to stop running participants, and when a theory was thought of relative to finding out results. The interpretation of nonsignificant results has also been persistently problematic in a way that Bayesian inference can clarify. The Bayesian and orthodox approaches are placed in the context of different notions of rationality, and I accuse myself and others as having been irrational in the way we have been using statistics on a key notion of rationality. The reader is shown how to apply Bayesian inference in practice, using free online software, to allow more coherent inferences from data.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: It is argued Bayes factors allow theory to be linked to data in a way that overcomes the weaknesses of the other approaches, and provides a coherent approach to determining whether non-significant results support a null hypothesis over a theory, or whether the data are just insensitive.
Abstract: No scientific conclusion follows automatically from a statistically non-significant result, yet people routinely use non-significant results to guide conclusions about the status of theories (or the effectiveness of practices). To know whether a non-significant result counts against a theory, or if it just indicates data insensitivity, researchers must use one of: power, intervals (such as confidence or credibility intervals), or else an indicator of the relative evidence for one theory over another, such as a Bayes factor. I argue Bayes factors allow theory to be linked to data in a way that overcomes the weaknesses of the other approaches. Specifically, Bayes factors use the data themselves to determine their sensitivity in distinguishing theories (unlike power), and they make use of those aspects of a theory’s predictions that are often easiest to specify (unlike power and intervals, which require specifying the minimal interesting value in order to address theory). Bayes factors provide a coherent approach to determining whether non-significant results support a null hypothesis over a theory, or whether the data are just insensitive. They allow accepting and rejecting the null hypothesis to be put on an equal footing. Concrete examples are provided to indicate the range of application of a simple online Bayes calculator, which reveal both the strengths and weaknesses of Bayes factors.

1,496 citations


Cites background or methods from "Bayesian versus Orthodox statistics..."

  • ...…with orthodox statistics, unless pre-specified as possible by one’s stopping rule (Armitage et al., 1969); by contrast, with Bayes, one can always collect more participants until the data are sensitive enough, that is, B < 1/3 or B > 3; see e.g., Berger and Wolpert (1988), Dienes (2008, 2011)....

    [...]

  • ...Further, the advantages of Bayes go well beyond the interpretation of non-significant results (e.g., Dienes, 2011)....

    [...]

  • ...It is open to public scrutiny and debate (unlike many of the factors that affect significance testing; see Dienes, 2011)....

    [...]

  • ...…and the criticisms of 1-tailed tests in an orthodox sense (criticizing a researcher using a 1-tailed test because he would have rejected the null if the results had been extreme in the other direction, even though they were not) hence do not apply to Bayes factors (cf. Royall, 1997; Dienes, 2011)....

    [...]

  • ...For that assertion to be relevant to a given scientific context, the minimal value must be relevant to that scientific context (i.e., it cannot be determined by properties of the data alone nor can it be a generic default)....

    [...]

Journal ArticleDOI
TL;DR: Bayesian estimation for 2 groups provides complete distributions of credible values for the effect size, group means and their difference, standard deviations and their Difference, and the normality of the data.
Abstract: Bayesian estimation for 2 groups provides complete distributions of credible values for the effect size, group means and their difference, standard deviations and their difference, and the normality of the data. The method handles outliers. The decision rule can accept the null value (unlike traditional t tests) when certainty in the estimate is high (unlike Bayesian model comparison using Bayes factors). The method also yields precise estimates of statistical power for various research goals. The software and programs are free and run on Macintosh, Windows, and Linux platforms.

1,214 citations


Cites background or methods from "Bayesian versus Orthodox statistics..."

  • ...Dienes (2008, 2011) provided another analytical solution, with a corresponding online calculator....

    [...]

  • ...Nevertheless, the BF can be highly sensitive to the choice of alternative-hypothesis prior distribution (e.g., Dienes, 2008, 2011; Kruschke, 2011a; Liu & Aitkin, 2008; Vanpaemel, 2010), even to such as extent that the BF can change from substantially favoring the null hypothesis to substantially…...

    [...]

  • ...Bayesian analysis is also more intuitive than traditional methods of null hypothesis significance testing (e.g., Dienes, 2011)....

    [...]

Book
14 Apr 2014
TL;DR: In this article, the basics of Bayesian analysis are discussed, and a WinBUGS-based approach is presented to get started with WinBUGs, which is based on the SIMPLE model of memory.
Abstract: Part I. Getting Started: 1. The basics of Bayesian analysis 2. Getting started with WinBUGS Part II. Parameter Estimation: 3. Inferences with binomials 4. Inferences with Gaussians 5. Some examples of data analysis 6. Latent mixture models Part III. Model Selection: 7. Bayesian model comparison 8. Comparing Gaussian means 9. Comparing binomial rates Part IV. Case Studies: 10. Memory retention 11. Signal detection theory 12. Psychophysical functions 13. Extrasensory perception 14. Multinomial processing trees 15. The SIMPLE model of memory 16. The BART model of risk taking 17. The GCM model of categorization 18. Heuristic decision-making 19. Number concept development.

1,192 citations

Book
17 Nov 2014
TL;DR: Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples.
Abstract: There is an explosion of interest in Bayesian statistics, primarily because recently created computational methods have finally made Bayesian analysis obtainable to a wide audience. Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan provides an accessible approach to Bayesian data analysis, as material is explained clearly with concrete examples. The book begins with the basics, including essential concepts of probability and random sampling, and gradually progresses to advanced hierarchical modeling methods for realistic data. Included are step-by-step instructions on how to conduct Bayesian data analyses in the popular and free software R and WinBugs. This book is intended for first-year graduate students or advanced undergraduates. It provides a bridge between undergraduate training and modern Bayesian methods for data analysis, which is becoming the accepted research standard. Knowledge of algebra and basic calculus is a prerequisite. New to this Edition (partial list): * There are all new programs in JAGS and Stan. The new programs are designed to be much easier to use than the scripts in the first edition. In particular, there are now compact high-level scripts that make it easy to run the programs on your own data sets. This new programming was a major undertaking by itself.* The introductory Chapter 2, regarding the basic ideas of how Bayesian inference re-allocates credibility across possibilities, is completely rewritten and greatly expanded.* There are completely new chapters on the programming languages R (Ch. 3), JAGS (Ch. 8), and Stan (Ch. 14). The lengthy new chapter on R includes explanations of data files and structures such as lists and data frames, along with several utility functions. (It also has a new poem that I am particularly pleased with.) The new chapter on JAGS includes explanation of the RunJAGS package which executes JAGS on parallel computer cores. The new chapter on Stan provides a novel explanation of the concepts of Hamiltonian Monte Carlo. The chapter on Stan also explains conceptual differences in program flow between it and JAGS.* Chapter 5 on Bayes' rule is greatly revised, with a new emphasis on how Bayes' rule re-allocates credibility across parameter values from prior to posterior. The material on model comparison has been removed from all the early chapters and integrated into a compact presentation in Chapter 10.* What were two separate chapters on the Metropolis algorithm and Gibbs sampling have been consolidated into a single chapter on MCMC methods (as Chapter 7). There is extensive new material on MCMC convergence diagnostics in Chapters 7 and 8. There are explanations of autocorrelation and effective sample size. There is also exploration of the stability of the estimates of the HDI limits. New computer programs display the diagnostics, as well.* Chapter 9 on hierarchical models includes extensive new and unique material on the crucial concept of shrinkage, along with new examples.* All the material on model comparison, which was spread across various chapters in the first edition, in now consolidated into a single focused chapter (Ch. 10) that emphasizes its conceptualization as a case of hierarchical modeling.* Chapter 11 on null hypothesis significance testing is extensively revised. It has new material for introducing the concept of sampling distribution. It has new illustrations of sampling distributions for various stopping rules, and for multiple tests.* Chapter 12, regarding Bayesian approaches to null value assessment, has new material about the region of practical equivalence (ROPE), new examples of accepting the null value by Bayes factors, and new explanation of the Bayes factor in terms of the Savage-Dickey method.* Chapter 13, regarding statistical power and sample size, has an extensive new section on sequential testing, and making the research goal be precision of estimation instead of rejecting or accepting a particular value.* Chapter 15, which introduces the generalized linear model, is fully revised, with more complete tables showing combinations of predicted and predictor variable types.* Chapter 16, regarding estimation of means, now includes extensive discussion of comparing two groups, along with explicit estimates of effect size.* Chapter 17, regarding regression on a single metric predictor, now includes extensive examples of robust regression in JAGS and Stan. New examples of hierarchical regression, including quadratic trend, graphically illustrate shrinkage in estimates of individual slopes and curvatures. The use of weighted data is also illustrated.* Chapter 18, on multiple linear regression, includes a new section on Bayesian variable selection, in which various candidate predictors are probabilistically included in the regression model.* Chapter 19, on one-factor ANOVA-like analysis, has all new examples, including a completely worked out example analogous to analysis of covariance (ANCOVA), and a new example involving heterogeneous variances.* Chapter 20, on multi-factor ANOVA-like analysis, has all new examples, including a completely worked out example of a split-plot design that involves a combination of a within-subjects factor and a between-subjects factor.* Chapter 21, on logistic regression, is expanded to include examples of robust logistic regression, and examples with nominal predictors.* There is a completely new chapter (Ch. 22) on multinomial logistic regression. This chapter fills in a case of the generalized linear model (namely, a nominal predicted variable) that was missing from the first edition.* Chapter 23, regarding ordinal data, is greatly expanded. New examples illustrate single-group and two-group analyses, and demonstrate how interpretations differ from treating ordinal data as if they were metric.* There is a new section (25.4) that explains how to model censored data in JAGS.* Many exercises are new or revised. * Accessible, including the basics of essential concepts of probability and random sampling* Examples with R programming language and JAGS software* Comprehensive coverage of all scenarios addressed by non-Bayesian textbooks: t-tests, analysis of variance (ANOVA) and comparisons in ANOVA, multiple regression, and chi-square (contingency table analysis)* Coverage of experiment planning* R and JAGS computer programming code on website* Exercises have explicit purposes and guidelines for accomplishment* Provides step-by-step instructions on how to conduct Bayesian data analyses in the popular and free software R and WinBugs

1,190 citations

Journal ArticleDOI
11 Oct 2012-Nature
TL;DR: The main workshop recommendation is that at a minimum studies should report on sample-size estimation, whether and how animals were randomized, whether investigators were blind to the treatment, and the handling of data.
Abstract: The US National Institute of Neurological Disorders and Stroke convened major stakeholders in June 2012 to discuss how to improve the methodological reporting of animal studies in grant applications and publications. The main workshop recommendation is that at a minimum studies should report on sample-size estimation, whether and how animals were randomized, whether investigators were blind to the treatment, and the handling of data. We recognize that achieving a meaningful improvement in the quality of reporting will require a concerted effort by investigators, reviewers, funding agencies and journal editors. Requiring better reporting of animal studies will raise awareness of the importance of rigorous study design to accelerate scientific progress.

1,037 citations

References
More filters
Journal ArticleDOI
TL;DR: A fatal flaw of NHST is reviewed and some benefits of Bayesian data analysis are introduced and illustrative examples of multiple comparisons in Bayesian analysis of variance and Bayesian approaches to statistical power are presented.
Abstract: Bayesian methods have garnered huge interest in cognitive science as an approach to models of cognition and perception. On the other hand, Bayesian methods for data analysis have not yet made much headway in cognitive science against the institutionalized inertia of 20th century null hypothesis significance testing (NHST). Ironically, specific Bayesian models of cognition and perception may not long endure the ravages of empirical verification, but generic Bayesian methods for data analysis will eventually dominate. It is time that Bayesian data analysis became the norm for empirical methods in cognitive science. This article reviews a fatal flaw of NHST and introduces the reader to some benefits of Bayesian data analysis. The article presents illustrative examples of multiple comparisons in Bayesian analysis of variance and Bayesian approaches to statistical power. Copyright © 2010 John Wiley & Sons, Ltd. For further resources related to this article, please visit the WIREs website.

6,081 citations


"Bayesian versus Orthodox statistics..." refers background or methods in this paper

  • ...(Note that I will not discuss confidence intervals and credibility intervals, the Bayesian equivalent to a confidence interval, in this article; see Dienes, 2008, and Kruschke 2010b, 2011, for detailed discussion and calculations.)...

    [...]

  • ...For this, credibility or likelihood intervals can be used (see Dienes, 2008; Kruschke, 2010b, 2011; Royall, 1997)....

    [...]

  • ...…is compared with a default theory (namely, the theory that effects may occur in either direction, scaled to a large standardized effect size); and Kruschke (2010a, 2010b, 2011) for Bayes factors for a set of default hypotheses (much like the default effects in analyses of variance) where…...

    [...]

  • ...…all the problems enumerated above for Neyman Pearson inference in general (unlike credibility or likelihood intervals): Because confidence intervals consist of all values nonsignificantly different from the sample mean, they inherit the arbitrariness of significance testing (e.g., Kruschke, 2010a)....

    [...]

  • ...Power can be calculated in the Bayesian approach to determine likely numbers of subjects needed to make a point, though this is a practical matter, and power does not figure in the inferential procedure itself, unlike in the Neyman Pearson approach (see Kruschke, 2010 a, 2010b, 2010c; Royall, 1997)....

    [...]

BookDOI
TL;DR: In this article, a survey of elementary applications of probability theory can be found, including the following: 1. Plausible reasoning 2. The quantitative rules 3. Elementary sampling theory 4. Elementary hypothesis testing 5. Queer uses for probability theory 6. Elementary parameter estimation 7. The central, Gaussian or normal distribution 8. Sufficiency, ancillarity, and all that 9. Repetitive experiments, probability and frequency 10. Advanced applications: 11. Discrete prior probabilities, the entropy principle 12. Simple applications of decision theory 15.
Abstract: Foreword Preface Part I. Principles and Elementary Applications: 1. Plausible reasoning 2. The quantitative rules 3. Elementary sampling theory 4. Elementary hypothesis testing 5. Queer uses for probability theory 6. Elementary parameter estimation 7. The central, Gaussian or normal distribution 8. Sufficiency, ancillarity, and all that 9. Repetitive experiments, probability and frequency 10. Physics of 'random experiments' Part II. Advanced Applications: 11. Discrete prior probabilities, the entropy principle 12. Ignorance priors and transformation groups 13. Decision theory: historical background 14. Simple applications of decision theory 15. Paradoxes of probability theory 16. Orthodox methods: historical background 17. Principles and pathology of orthodox statistics 18. The Ap distribution and rule of succession 19. Physical measurements 20. Model comparison 21. Outliers and robustness 22. Introduction to communication theory References Appendix A. Other approaches to probability theory Appendix B. Mathematical formalities and style Appendix C. Convolutions and cumulants.

4,641 citations

Book
01 Jan 1963

4,061 citations

Journal ArticleDOI
TL;DR: Experiment 1 showed that participants whose concept of rudeness was printed interrupted the experimenter more quickly and frequently than did participants primed with polite-related stimuli, consistent with the content of that stereotype.
Abstract: Previous research has shown that trait concepts and stereotypes become active automatically in the presence of relevant behavior or stereotyped-group features. Through the use of the same priming procedures as in previous impression formation research, Experiment l showed that participants whose concept of rudeness was primed interrupted the experimenter more quickly and frequently than did participants primed with polite-related stimuli. In Experiment 2, participants for whom an elderly stereotype was primed walked more slowly down the hallway when leaving the experiment than did control participants, consistent with the content of that stereotype. In Experiment 3, participants for whom the African American stereotype was primed subliminally reacted with more hostility to a vexatious request of the experimenter. Implications of this automatic behavior priming effect for self-fulfilling prophecies are discussed, as is whether social behavior is necessarily mediated by conscious choice processes.

3,392 citations


"Bayesian versus Orthodox statistics..." refers background in this paper

  • ...Mussweiler bases his effect on previous similar social priming explored by Bargh, Chen, and Burrows (1996), who found large effects (Cohen’s d of about one)....

    [...]

Journal ArticleDOI
TL;DR: To facilitate use of the Bayes factor, an easy-to-use, Web-based program is provided that performs the necessary calculations and has better properties than other methods of inference that have been advocated in the psychological literature.
Abstract: Progress in science often comes from discovering invariances in relationships among variables; these invariances often correspond to null hypotheses. As is commonly known, it is not possible to state evidence for the null hypothesis in conventional significance testing. Here we highlight a Bayes factor alternative to the conventional t test that will allow researchers to express preference for either the null hypothesis or the alternative. The Bayes factor has a natural and straightforward interpretation, is based on reasonable assumptions, and has better properties than other methods of inference that have been advocated in the psychological literature. To facilitate use of the Bayes factor, we provide an easy-to-use, Web-based program that performs the necessary calculations.

3,012 citations


"Bayesian versus Orthodox statistics..." refers background or methods in this paper

  • ...For a p value, if the null is true, any value in the interval 0 to 1 is equally likely no matter how much data you collect (Rouder et al., 2009)....

    [...]

  • ...One solution is to use a default Bayes factor for all occasions (Rouder et al., 2009; Wetzels et al., 2011), though this amounts to evaluating a default theory for all occasions, regardless of one’s actual theory....

    [...]

  • ...Different ways of using Bayes factors For a couple of other ways of using Bayes factors, see Rouder et al. (2009) and Wetzels et al. (2011) for a suggested ‘‘default’’ Bayes factor to be used on any data where the null hypothesis is compared with a default theory (namely, the theory that effects…...

    [...]