scispace - formally typeset
Open AccessJournal ArticleDOI

Misunderstandings and omissions in textbook accounts of effect sizes

Paul Morris
- 01 May 2020 - 
- Vol. 111, Iss: 2, pp 395-410
Reads0
Chats0
TLDR
This work seeks to highlight areas of misunderstanding about ESs found in the pedagogical literature in the light of the more specialist literature and make recommendations to researchers for the appropriate use and interpretation of ESs.
Abstract
There have been frequent attempts in psychology to reduce the reliance on null hypothesis significance testing (NHST) as the criterion for establishing the importance of results. Many authorities now recommend the reporting of effect sizes (ESs) as a supplement or alternative to NHST. However, there is extensive specialist literature highlighting problems associated with the use and interpretation of ESs. A review of the coverage of ESs in over 100 textbooks on statistical analysis in behavioural science revealed widespread neglect of ESs and the relevant critical issues that have widespread coverage in the more specialist literature. For example, many textbooks claim that ESs should be interpreted as a simple measure of the practical real-world importance of a result despite the fact that ESs are profoundly influenced by features of design and analysis strategy. We seek to highlight areas of misunderstanding about ESs found in the pedagogical literature in the light of the more specialist literature and make recommendations to researchers for the appropriate use and interpretation of ESs. This is critical as statistics textbooks have a crucial role in the education of researchers.

read more

Content maybe subject to copyright    Report

Running head: MISUNDERSTANDINGS OF EFFECT SIZES
1
Misunderstandings and omissions in textbook accounts of effect sizes
Paul H. Morris
Abstract
There have been frequent attempts in psychology to reduce the reliance on null hypothesis
significance testing (NHST) as the criterion for establishing the importance of results. Many
authorities now recommend the reporting of effect sizes (ESs) as a supplement or alternative
to NHST. However, there is extensive specialist literature highlighting problems associated
with the use and interpretation of ESs. A review of the coverage of ESs in over 100 textbooks
on statistical analysis in behavioural science revealed widespread neglect of ESs and the
relevant critical issues that have widespread coverage in the more specialist literature. For
example, many textbooks claim that ESs should be interpreted as a simple measure of the
practical real-world importance of a result despite the fact that ESs are profoundly influenced
by features of design and analysis strategy. We seek to highlight areas of misunderstanding
about ESs found in the pedagogical literature in light of the more specialist literature, and
make recommendations to researchers for the appropriate use and interpretation of ESs. This
is critical as statistics textbooks have a crucial role in the education of researchers.
Keywords: effect size, the new statistics, NHST, textbooks, methodology, statistical
reporting

Running head: MISUNDERSTANDINGS OF EFFECT SIZES
2
There is now widespread acceptance that there has been an over reliance on null
hypothesis significance testing (NHST) as a means of determining the credibility and
importance of results from inferential tests in psychological research (Boring, 1919;
Chandler, 1957; Cohen, 1994; Cumming, Fidler, Kalinowski & Lai, 2012; Denis, 2003;
Gigerenzer, 1993; Meehl, 1978; Rozeboom, 1960). Indeed, hundreds of articles documenting
the shortcomings of NHST have been published over the decades but as Rozeboom (1997)
states, “It is a sociology of science wonderment that this statistical practice has remained so
unresponsive to criticism” (p. 335). It is also noteworthy that there are consistent
misunderstandings about the true meaning of NHST (Berger, 2003).
Critics of NHST have proposed a number of alternative approaches (e.g. Bayesian
modelling), but the most common suggestion to improve statistical interpretation has been to
include effect sizes (ESs) in addition to p values (Cohen, 1962; 1988; 1994; Vaughan &
Corballis, 1969; Wilkinson & APA Task Force on Statistical Inference, 1999). The reporting
and interpretation of ESs also play a critical role in what Cumming (2014) terms “The New
Statistics” (p. 7). However, the calculation and interpretation of ESs is by no means
straightforward and there is an extensive specialist literature covering difficulties associated
with the use of ESs (Baguley, 2009; Cortina, & Landis, 2009; Fern & Monroe, 1996;
Grissom & Kim, 2014; Morris & DeShon 2002; O’Grady, 1982; Olejnik & Algina, 2003;
Onwuegbuzie & Levin, 2003; Osborne, 2003; Petrinovich, 1979; Richardson, 1996;
Rosenthal, Rosnow & Rubin, 2000; Sackett, Laczo & Arvey, 2002). A recent article by Pek
and Flora (2018) provides a guide to the appropriate reporting of ESs.
Despite widespread attempts to encourage researchers to include and interpret ESs in
research papers, change in relevant practice has been very slow (Cumming et al., 2007; Sun,

Running head: MISUNDERSTANDINGS OF EFFECT SIZES
3
Pan & Wang, 2010). Gigerenzer, Krauss and Vitouch (2004) have identified the statistics
textbook as one important reason for the glacial nature of change. Indeed, there is a growing
literature on the influence of the information and misinformation found in psychology
textbooks (Costa & Shimp, 2011; Costall & Morris, 2015; Levine, Worboys & Taylor, 1973).
For most research psychologists, statistical analysis is a tool of the trade rather than a
research interest of itself. Many active researchers refer to statistics textbooks for guidance
with regard to the conduct and interpretation of statistical analyses and textbooks may have
an important role in determining how many researchers use and interpret ESs. Textbooks also
often reflect the received wisdom about a particular topic. Therefore, a careful examination of
the treatment of ESs in textbooks may be very important in understanding how research
psychologists are being advised with regard to ESs. In this paper, we highlight the major
issues surrounding the use of ESs covered in the specialist literature and show how the
neglect of these issues is leading to consistent misunderstandings about ESs in textbooks.
We used a variety of methods to source information from textbooks, some systematic
and some less systematic. We looked for relevant books (books with ES in the title; general
books on research methods and statistical analysis for the behavioural sciences and related
disciplines) from all major publishers (education publishers in the top 10 by income, source
of information was Publishers Weekly), we also used “effect size” as a search term in Google
Books. In our analyses below, we include evidence from over 100 books.
Are ES transparent measures of the practical “real world” importance of a result?
In this section we address the issue of whether ESs can be regarded as a measure of
practical importance. There are two types of ES, standardized and unstandardized.
Standardized ESs have been defined by Baguley (2009) as “a standardized measure of effect
[…..]which has been scaled in terms of the variability of the sample or population from

Running head: MISUNDERSTANDINGS OF EFFECT SIZES
4
which the measure was taken. In contrast, simple effect size (Frick, 1994) is unstandardized
and expressed in the original unit of analysis” (p. 604). In the overwhelming majority of
journal articles and books the use of the term ES refers to standardized ES. We focus on
standardized ESs in this section.
One of the criticisms of NHST is that it provides no indication of the magnitude of any
effect, and therefore indicates nothing about the practical importance of a result. In contrast it
seems intuitively reasonable that ESs may provide evidence for the practical importance of a
result. However, there is a fundamental problem with treating ES as a measure of practical,
real world importance – ESs are profoundly influenced by experimental design and the type
of analysis employed (Fern & Monroe, 1996; Onwuegbuzie & Levin, 2003). For example,
the use of an independent groups or repeated measures design can have a major influence on
ES. In an independent groups design, individual differences variance is included in the
general error term and is thus part of the calculation of the ES. In a repeated measures study,
the portion of variance attributable to individual differences is typically treated as a separate
effect and is thus excluded from the comparison of treatment and error variance in the
calculation of the ES (Maxwell & Delaney, 1990). Therefore, repeated measures designs
often produce larger ESs than independent groups designs (Keppel, 1991; O’Grady, 1982).
For example, data (Levy, 1973) producing the following means and standard deviations: M =
8.00, SD = 4.85; M = 11.00, SD = 5.43; M = 14.00, SD = 3.74
1
were analysed using a one-
way repeated measures analysis of variance (ANOVA) and then rearranged for analysis using
a one-way independent groups ANOVA. The eta squared produced from the repeated
measures ANOVA was .66 whereas the eta squared for the independent groups ANOVA was
less than half at .30. There are better measures of ES than eta squared, however, these are the
1
It is typically the case that repeated measures designs produce larger ESs as the repeated
measures are often highly correlated.

Running head: MISUNDERSTANDINGS OF EFFECT SIZES
5
results that Statistical Package for the Social Sciences (SPSS [IBM], 2017) would have
produced and we suspect that many researchers would have taken the eta squared values at
face value
2
.
A related problem is that depending on the ES employed, the magnitude of an effect
associated with a particular independent variable can change when other variables are added
into the model. This is a particular issue for partial eta squared (the default ANOVA ES
measure in SPSS). We produced a data set where the variance associated with one
independent variable was 224.50 and the error variance was 10946.10. We added another
independent variable to the analysis that accounted for a large amount of variance which
reduced the size of error term from 10946.10 to 683.60. The partial eta squared for the first
independent variable increased from .02 to .25 with the inclusion of the second variable. We
would again stress that there are better measures of ES than partial eta squared, however, the
point is that the selection and interpretation of ES statistics is by no means straightforward.
The addition of levels within a variable can also change the magnitude of the ES (O’Grady,
1982). There have been attempts to correct for various aspects of design on ES, with varying
success (e.g. Cooper, Hedges & Valentine, 2009; Dunlap, Cortina, Vaslow & Burke, 1996;
Morris & DeShon, 2002; Olejnik & Algina, 2003).
There are other factors that can profoundly affect study ES. Range restriction and
attenuation both affect the accuracy of the estimate of the true magnitude of ESs (Bobko,
Roth & Bobko, 2001; Osborne, 2003; Sackett, Laczo & Arvey, 2002). Both these effects are
more traditionally associated with correlational techniques but can also affect tests of
difference (Bobko, Roth & Bobko, 2001). The range restriction effect is a function of
sampling technique. If the sample for a test is taken from the centre of the distribution,
2
SPSS reports partial eta squared by default but there is no difference between partial eta
squared and eta squared for simple one factor designs.

Citations
More filters
Journal Article

Research methods in social relations

A. R. Ilersic
- 01 Jan 1961 - 
TL;DR: This sales letter may not influence you to be smarter, but the book that this research methods in social relations will evoke you to being smarter.
Journal ArticleDOI

The Psychology Research Handbook: A Guide for Graduate Students and Research Assistants.

Terri Gullickson
- 01 Jan 1997 - 
TL;DR: The Psychology research handbook a guide for graduate students and research assistants of Physician Assistants AAPA, “PAs are vital to healthcare”, and more.
Journal ArticleDOI

Research Methods in Psychology (2nd ed.).

Donald A. Dewsbury
- 01 Jan 1986 - 

Analyzing Quantitative Behavioral Observation Data

TL;DR: The analyzing quantitative behavioral observation data is universally compatible with any devices to read and it is set as public so you can download it instantly.

Experimental Design For The Life Sciences

Lena Schwartz
TL;DR: The experimental design for the life sciences is universally compatible with any devices to read and is available in the digital library an online access to it is set as public so you can download it instantly.
References
More filters
Book

Nonparametric statistics for the behavioral sciences

Sidney Siegel
TL;DR: This is the revision of the classic text in the field, adding two new chapters and thoroughly updating all others as discussed by the authors, and the original structure is retained, and the book continues to serve as a combined text/reference.
Book

Discovering Statistics Using SPSS

TL;DR: Suitable for those new to statistics as well as students on intermediate and more advanced courses, the book walks students through from basic to advanced level concepts, all the while reinforcing knowledge through the use of SAS(R).
Book

Research Methods in Education

TL;DR: In this article, the context of educational research, planning educational research and the styles of education research are discussed, along with strategies and instruments for data collection and research for data analysis.
Book

Discovering Statistics Using Ibm Spss Statistics

Andy P. Field
TL;DR: The Fourth Edition of Andy Field's Discovering Statistics Using SPSS 4th Edition focuses on providing essential content updates, better accessibility to key features, more instructor resources, and more content specific to select disciplines.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions in "Misunderstandings and omissions in textbook accounts of effect sizes" ?

Many authorities now recommend the reporting of effect sizes ( ESs ) as a supplement or alternative to NHST. This is critical as statistics textbooks have a crucial role in the education of researchers.