Using Effect Size-or Why the P Value Is Not Enough.

doi:10.4300/JGME-D-12-00156.1

Home
/
Papers
/
Using Effect Size-or Why the P Value Is Not Enough.

Journal Article•DOI•

Using Effect Size-or Why the P Value Is Not Enough.

01 Sep 2012-Journal of Graduate Medical Education (J Grad Med Educ)-Vol. 4, Iss: 3, pp 279-282

TL;DR: Effect size helps readers understand the magnitude of differences found, whereas statistical significance examines whether the findings are likely to be due to chance and is essential for readers to understand the full impact of your work.

read less

Abstract: Effect size helps readers understand the magnitude of differences found, whereas statistical significance examines whether the findings are likely to be due to chance. Both are essential for readers to understand the full impact of your work. Report both in the Abstract and Results sections.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

The need to report effect size estimates revisited. An overview of some recommended measures of effect size

[...]

Maciej Tomczak, Ewa Tomczak¹•Institutions (1)

Adam Mickiewicz University in Poznań¹

01 Jan 2014

TL;DR: In this article, the main objectives of this contribution are to promote various effect size measures in sport sciences through, once again, bringing to the readers' attention the benefits of reporting them, and to present examples of such estimates with a greater focus on those that can be calculated for non-parametric tests.

...read moreread less

Abstract: Recent years have witnessed a growing number of published reports that point out the need for reporting various effect size estimates in the context of null hypothesis testing (H0) as a response to a tendency for reporting tests of statistical significance only, with less attention on other important aspects of statistical analysis. In the face of considerable changes over the past several years, neglect to report effect size estimates may be noted in such fields as medical science, psychology, applied linguistics, or pedagogy. Nor have sport sciences managed to totally escape the grips of this suboptimal practice: here statistical analyses in even some of the current research reports do not go much further than computing p-values. The p-value, however, is not meant to provide information on the actual strength of the relationship between variables, and does not allow the researcher to determine the effect of one variable on another. Effect size measures serve this purpose well. While the number of reports containing statistical estimates of effect sizes calculated after applying parametric tests is steadily increasing, reporting effect sizes with non-parametric tests is still very rare. Hence, the main objectives of this contribution are to promote various effect size measures in sport sciences through, once again, bringing to the readers’ attention the benefits of reporting them, and to present examples of such estimates with a greater focus on those that can be calculated for non-parametric tests

...read moreread less

732 citations

Cites background from "Using Effect Size-or Why the P Valu..."

...Relying on the p-value alone for statistical inference does not permit an evaluation of the magnitude and importance of the obtained result [10, 12, 13]....
[...]
...Sometimes a result that is statistically significant mainly indicates that a huge sample size was used [10, 11]....
[...]

Journal Article•DOI•

EPA guidance on the early intervention in clinical high risk states of psychoses.

[...]

Frauke Schultze-Lutter¹, Chantal Michel¹, Stefanie Julia Schmidt¹, Benno Karl Edgar Schimmelmann¹, Nadja P. Maric², Raimo K. R. Salokangas³, Anita Riecher-Rössler, M. van der Gaag⁴, Merete Nordentoft⁵, Andrea Raballo, Anna Meneghelli, Max Marshall⁶, Anthony P. Morrison⁶, Stephan Ruhrmann⁷, Joachim Klosterkötter⁷ - Show less +11 more•Institutions (7)

University of Bern¹, University of Belgrade², University of Turku³, VU University Amsterdam⁴, University of Copenhagen⁵, University of Manchester⁶, University of Cologne⁷

01 Mar 2015-European Psychiatry

TL;DR: In this article, the authors provided evidence-based recommendations on early intervention in clinical high risk (CHR) states of psychosis, assessed according to the EPA guidance on early detection, derived from a meta-analysis of current empirical evidence on the efficacy of psychological and pharmacological interventions in CHR samples.

...read moreread less

432 citations

Journal Article•DOI•

Indices of Effect Existence and Significance in the Bayesian Framework.

[...]

Dominique Makowski¹, Mattan S. Ben-Shachar², S. H. Annabel Chen¹, Daniel Lüdecke³•Institutions (3)

Nanyang Technological University¹, Ben-Gurion University of the Negev², University of Hamburg³

10 Dec 2019-Frontiers in Psychology

TL;DR: This study describes and compares several Bayesian indices, provide intuitive visual representation of their “behavior” in relationship with common sources of variance such as sample size, magnitude of effects and also frequentist significance, and contributes to the development of an intuitive understanding of the values that researchers report, critical for the standardization of scientific reporting.

...read moreread less

Abstract: Turmoil has engulfed psychological science. Causes and consequences of the reproducibility crisis are in dispute. With the hope of addressing some of its aspects, Bayesian methods are gaining increasing attention in psychological science. Some of their advantages, as opposed to the frequentist framework, are the ability to describe parameters in probabilistic terms and explicitly incorporate prior knowledge about them into the model. These issues are crucial in particular regarding the current debate about statistical significance. Bayesian methods are not necessarily the only remedy against incorrect interpretations or wrong conclusions, but there is an increasing agreement that they are one of the keys to avoid such fallacies. Nevertheless, its flexible nature is its power and weakness, for there is no agreement about what indices of "significance" should be computed or reported. This lack of a consensual index or guidelines, such as the frequentist p-value, further contributes to the unnecessary opacity that many non-familiar readers perceive in Bayesian statistics. Thus, this study describes and compares several Bayesian indices, provide intuitive visual representation of their "behavior" in relationship with common sources of variance such as sample size, magnitude of effects and also frequentist significance. The results contribute to the development of an intuitive understanding of the values that researchers report, allowing to draw sensible recommendations for Bayesian statistics description, critical for the standardization of scientific reporting.

...read moreread less

327 citations

Cites background from "Using Effect Size-or Why the P Valu..."

...It is interesting to note that this perspective unites significance testing with the focus on effect size (involving a discrete separation between at least two categories: negligible and non-negligible), which finds an echo in recent statistical recommendations (Ellis and Steyn, 2003; Sullivan and Feinn, 2012; Simonsohn et al., 2014)....
[...]

Journal Article•DOI•

Consumer acceptance of insect-based alternative meat products in Western countries

[...]

Rudy Caparros Megido¹, Chloé Gierts¹, Christophe Blecker¹, Yves Brostaux¹, Eric Haubruge¹, Taofic Alabi¹, Frédéric Francis¹ - Show less +3 more•Institutions (1)

Gembloux Agro-Bio Tech¹

01 Sep 2016-Food Quality and Preference

TL;DR: In conclusion, insect tasting sessions are important to decrease food neophobia, as they encourage people to “take the first step” and become acquainted with entomophagy.

...read moreread less

315 citations

Things I have learned so far

[...]

Caesar Saloma

01 Jan 2008

274 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

An effect size primer: A guide for clinicians and researchers.

[...]

Christopher J. Ferguson¹•Institutions (1)

Texas A&M International University¹

01 Oct 2009-Professional Psychology: Research and Practice

TL;DR: The use of effect size reporting in the analysis of social science data remains inconsistent and interpretation of the effect size estimates continues to be confused as discussed by the authors, and clinicians also may have little guidance in the interpretation of effect sizes relevant for clinical practice.

...read moreread less

Abstract: Increasing emphasis has been placed on the use of effect size reporting in the analysis of social science data. Nonetheless, the use of effect size reporting remains inconsistent, and interpretation of effect size estimates continues to be confused. Researchers are presented with numerous effect sizes estimate options, not all of which are appropriate for every research question. Clinicians also may have little guidance in the interpretation of effect sizes relevant for clinical practice. The current article provides a primer of effect size estimates for the social sciences. Common effect sizes estimates, their use, and interpretations are presented as a guide for researchers.

...read moreread less

2,680 citations

Journal Article•DOI•

Things I Have Learned (So Far).

[...]

Jacob Cohen¹•Institutions (1)

York University¹

01 Dec 1990-American Psychologist

TL;DR: The application of statistics to psychology and the other sociobiomedical sciences has been studied extensively as discussed by the authors, including the principles "less is more" (fewer variables, more highly targeted issues, sharp rounding off), "simple is better" (graphic representation, unit weighting for linear composites), and "some things you learn aren't so."

...read moreread less

Abstract: This is an account of what I have learned (so far) about the application of statistics to psychology and the other sociobiomedical sciences. It includes the principles "less is more" (fewer variables, more highly targeted issues, sharp rounding off), "simple is better" (graphic representation, unit weighting for linear composites), and "some things you learn aren't so." I have learned to avoid the many misconceptions that surround Fisherian null hypothesis testing. I have also learned the importance of power analysis and the determination of just how big (rather than how statistically significant) are the effects that we study. Finally, I have learned that there is no royal road to statistical induction, that the informed judgment of the investigator is the crucial element in the interpretation of data, and that things take time.

...read moreread less

1,764 citations

Book•DOI•

Beyond significance testing : reforming data analysis methods in behavioral research

[...]

Rex B. Kline¹•Institutions (1)

Concordia University¹

01 Jan 2004

TL;DR: Beyond Significance Testing as mentioned in this paper provides integrative and clear presentations about the limitations of statistical tests and reviews alternative methods of data analysis, such as effect size estimation (at both the group and case levels) and interval estimation (i.e., confidence intervals).

...read moreread less

Abstract: Practices of data analysis in psychology and related disciplines are changing. This is evident in the longstanding controversy about statistical tests in the behavioral sciences and the increasing number of journals requiring effect size information. Beyond Significance Testing offers integrative and clear presentations about the limitations of statistical tests and reviews alternative methods of data analysis, such as effect size estimation (at both the group and case levels) and interval estimation (i.e., confidence intervals). Written in a clear and accessible style, the book is intended for applied researchers and students who may not have strong quantitative backgrounds. Readers will learn how to measure effect size on continuous or dichotomous outcomes in comparative studies with independent or dependent samples. They will also learn how to calculate and correctly interpret confidence intervals for effect sizes. Numerous research examples from a wide range of areas illustrate the application of these principles and how to estimate substantive significance instead of just statistical significance.

...read moreread less

924 citations

Things I have learned so far

[...]

Caesar Saloma

01 Jan 2008

274 citations

Journal Article•DOI•

Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research.

[...]

Donald Sharpe¹•Institutions (1)

University of Regina¹

01 Nov 2004-Canadian Psychology

TL;DR: Kline as discussed by the authors reviewed the controversy regarding significance testing, and offered methods for effect size and confidence interval estimation, and suggested some alternative methodologies, and concluded that there is no "magical alternative" to statistical tests and that such tests are appropriate in some circumstances when applied correctly.

...read moreread less

Abstract: REX B. KLINE Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research Washington, DC: American Psychological Association, 2004, 336 pages (ISBN 1-59147-118-4, US$49.95 Hardcover) In 1999, a blue-ribbon task force assembled by the American Psychological Association published their findings with regards to the long-standing controversy pertaining to null hypothesis significance testing (NHST). The task force dictated effect sizes and confidence intervals be reported, and p values and dichotomous accept-reject decisions be given less weight. Editorial policies in a number of journals came to reflect the views of the task force as did a subsequent revision to the American Psychological Association Publication Manual. Rex B. Kline wrote Beyond Significance Testing. Reforming Data Analysis Methods in Behavioral Research as a follow-up to both the task force recommendations and the revision to the publication manual. Kline's 1998 book Principles and Practice of Structural Equation Modeling (Guilford Press) was well received and a second edition is being published this fall. In Beyond Significance Testing, Kline reviews the controversy regarding significance testing, offers methods for effect size and confidence interval estimation, and suggests some alternative methodologies. There is an accompanying website that includes resources for instructors and students. Part I of the book is a review of fundamental concepts and the debate regarding significance testing. Part II provides statistics for effect size and confidence interval estimation for parametric and nonparametric two-group, oneway, and factorial designs. Part III examines metaanalysis, resampling, and Bayesian estimation procedures. In the first chapter, Kline provides a scholarly summary of the null hypothesis testing debate concluding with the APA task force findings and what Kline regards as ambiguous recommendations in the publication manual. Kline predicts the future will see a smaller role for traditional statistical testing (p values) in psychology. This change will take time and may not occur until the next generation of researchers are trained, but Kline anticipates the social sciences will then become more like the natural sciences in that "we will report the directions and magnitudes of our effects, determine whether they replicate, and evaluate them for their theoretical, clinical, or practical significance" (p. 15). Chapter 2 is a review of fundamental concepts of research design, including sampling and estimation, the logic of statistical significance testing, and t, F, and chi-square tests. The problems with statistical tests are revisited in Chapter 3. What follows is a long list of errors in interpretation of p values and conclusions made after null hypothesis testing. The emphasis on null hypothesis significance testing in psychology is also argued to inhibit advancement of the discipline. To be fair, Kline recognizes there is yet no "magical alternative" to statistical tests and that such tests are appropriate in some circumstances when applied correctly. Nonetheless, Kline envisions a future where effect sizes and confidence intervals are reported, substantive rather than statistical significance predominates, and "NHST-Centric" thinking has diminished. Part II covers effect size and confidence interval calculations. Chapter 4 is a presentation of parametric effect size indexes. Independent and dependent sample statistics are covered separately. The textbook's website has a supplementary chapter on twogroup multivariate designs. Group difference indexes such as d are distinguished from measures of association such as r. Case level analyses of group differences are also reviewed. Sections not relevant to a reader's needs can be skipped without loss of continuity. Interpretive guidelines for effect size magnitude and how one might be fooled by effect size estimation are sections that should not be passed over. …

...read moreread less

174 citations