The standards for educational and psychological testing.

doi:10.1037/14047-013

Home
/
Papers
/
The standards for educational and psychological testing.

Book Chapter•DOI•

The standards for educational and psychological testing.

Daniel R. Eignor¹•Institutions (1)

Princeton University¹

01 Jan 2013-

TL;DR: In this article, the authors present a survey of sales in terms of total units sold in the United States for the years 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018

read less

Abstract: Period Notes No. of Units FY 1999 FY 2000 FY 2001 FY 2002 FY 2003 FY 2004 FY 2005 FY 2006 7/1/06-12/31/06 FY 2007 FY 2008 FY 2009 FY 2010 FY 2011 FY 2012 FY 2013 FY 2014 Total Units Sold est. est. est. est. est. est. Actual Actual Actual Actual Actual Actual Actual Actual Actual Actual Actual 1,768 3,797 3,755 5,592 3,310 3,218 3,803 3,888 2,144 3,077 3,358 2,590 3,043 2,132 1,649 1,732 855 49,710

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Estimating Ordinal Reliability for Likert-Type and Ordinal Item Response Data: A Conceptual, Empirical, and Practical Guide.

[...]

Anne M. Gadermann, Martin Guhn, Bruno D. Zumbo

01 Jan 2012-Practical Assessment, Research and Evaluation

TL;DR: In this article, the authors proposed a method for the first publication of first publication to the Practical Assessment, Research & Evaluation (PARE) journal for the purpose of obtaining a first publication license.

...read moreread less

Abstract: Copyright is retained by the first or sole author, who grants right of first publication to the Practical Assessment, Research & Evaluation. Permission is granted to distribute this article for nonprofit, educational purposes if it is copied in its entirety and the journal is credited. PARE has the right to authorize third party reproduction of this article in print, electronic and database forms.

...read moreread less

992 citations

Journal Article•DOI•

Servant leadership: A systematic review and call for future research.

[...]

Nathan Eva, Mulyadi Robin, Sen Sendjaya¹, Dirk van Dierendonck², Robert C. Liden³ - Show less +1 more•Institutions (3)

Swinburne University of Technology¹, Erasmus University Rotterdam², University of Illinois at Urbana–Champaign³

01 Feb 2019-Leadership Quarterly

TL;DR: In this article, an integrative and comprehensive review of the 285 articles on servant leadership spanning 20 years (1998-2018) is presented. But, a lack of coherence and clarity around the construct has impeded its theory development.

...read moreread less

Abstract: Notwithstanding the proliferation of servant leadership studies with over 100 articles published in the last four years alone, a lack of coherence and clarity around the construct has impeded its theory development. We provide an integrative and comprehensive review of the 285 articles on servant leadership spanning 20 years (1998–2018), and in so doing extend the field in four different ways. First, we provide a conceptual clarity of servant leadership vis-a-vis other value-based leadership approaches and offer a new definition of servant leadership. Second, we evaluate 16 existing measures of servant leadership in light of their respective rigor of scale construction and validation. Third, we map the theoretical and nomological network of servant leadership in relation to its antecedents, outcomes, moderators, mediators. We finally conclude by presenting a detailed future research agenda to bring the field forward encompassing both theoretical and empirical advancement. All in all, our review paints a holistic picture of where the literature has been and where it should go into the future.

...read moreread less

689 citations

Journal Article•DOI•

Measurement Matters: Assessing Personal Qualities Other Than Cognitive Ability for Educational Purposes

[...]

Angela L. Duckworth¹, David S. Yeager•Institutions (1)

University of Pennsylvania¹

01 May 2015-Educational Researcher

TL;DR: It is concluded that debate over the optimal name for this broad category of personal qualities obscures substantial agreement about the specific attributes worth measuring and medium-term innovations that may make measures of these personal qualities more suitable for educational purposes are highlighted.

...read moreread less

Abstract: There has been perennial interest in personal qualities other than cognitive ability that determine success, including self-control, grit, growth mindset, and many others. Attempts to measure such qualities for the purposes of educational policy and practice, however, are more recent. In this article, we identify serious challenges to doing so. We first address confusion over terminology, including the descriptor "non-cognitive." We conclude that debate over the optimal name for this broad category of personal qualities obscures substantial agreement about the specific attributes worth measuring. Next, we discuss advantages and limitations of different measures. In particular, we compare self-report questionnaires, teacher-report questionnaires, and performance tasks, using self-control as an illustrative case study to make the general point that each approach is imperfect in its own way. Finally, we discuss how each measure's imperfections can affect its suitability for program evaluation, accountability, individual diagnosis, and practice improvement. For example, we do not believe any available measure is suitable for between-school accountability judgments. In addition to urging caution among policymakers and practitioners, we highlight medium-term innovations that may make measures of these personal qualities more suitable for educational purposes.

...read moreread less

687 citations

Journal Article•

False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks"

[...]

Anthony W. Flores, Kristin Bechtel, Christopher T. Lowenkamp

01 Sep 2016-Federal Probation

TL;DR: The authors pointed out that ProPublica's report was based on faulty statistics and data analysis, and that the report failed to show that the COMPAS itself is racially biased, let alone that other risk instruments are biased.

...read moreread less

Abstract: The validity and intellectual honesty of conducting and reporting analysis are critical, since the ramifications of published data, accurate or misleading, may have consequences for years to come.-Marco and Larkin, 2000, p. 692PROPUBLICA RECENTLY RELEASED a much-heralded investigative report claiming that a risk assessment tool (known as the COMPAS) used in criminal justice is biased against black defendants.12 The report heavily implied that such bias is inherent in all actuarial risk assessment instruments (ARAIs).We think ProPublica's report was based on faulty statistics and data analysis, and that the report failed to show that the COMPAS itself is racially biased, let alone that other risk instruments are biased. Not only do ProPublica's results contradict several comprehensive existing studies concluding that actuarial risk can be predicted free of racial and/or gender bias, a correct analysis of the underlying data (which we provide below) sharply undermines ProPublicas approach.Our reasons for writing are simple. It might be that the existing justice system is biased against poor minorities due to a wide variety of reasons (including economic factors, policing patterns, prosecutorial behavior, and judicial biases), and therefore, regardless of the degree of bias, risk assessment tools informed by objective data can help reduce racial bias from its current level. It would be a shame if policymakers mistakenly thought that risk assessment tools were somehow worse than the status quo. Because we are at a time in history when there appears to be bipartisan political support for criminal justice reform, one poorly executed study that makes such absolute claims of bias should not go unchallenged. The gravity of this study's erroneous conclusions is exacerbated by the large-market outlet in which it was published (ProPublica).Before we expand further into our criticisms of the ProPublica piece, we describe some context and characteristics of the American criminal justice system and risk assessments.Mass Incarceration and ARAIsThe United States is clearly the worldwide leader in imprisonment. The prison population in the United States has declined by small percentages in recent years and at year-end 2014 the prison population was the smallest it had been since 2004. Yet, we still incarcerated 1,561,500 individuals in federal and state correctional facilities (Carson, 2015). By sheer numbers, or rates per 100,000 inhabitants, the United States incarcerates more people than just about any country in the world that reports reliable incarceration statistics (Wagner & Walsh, 2016).Further, it appears that there is a fair amount of racial disproportion when comparing the composition of the general population with the composition of the prison population. The 2014 United States Census population projection estimates that, across the U.S., the racial breakdown of the 318 million residents comprised 62.1 percent white, 13.2 percent black or African American, and 17.4 percent Hispanic. In comparison, 37 percent of the prison population was categorized as black, 32 percent was categorized as white, and 22 percent as Hispanic (Carson, 2015). Carson (2015:15) states that, "As a percentage of residents of all ages at yearend 2014, 2.7 percent of black males (or 2,724 per 100,000 black male residents) and 1.1 percent of Hispanic males (1,090 per 100,000 Hispanic males) were serving sentences of at least 1 year in prison, compared to less than 0.5 percent of white males (465 per 100,000 white male residents)."Aside from the negative effects caused by imprisonment, there is a massive financial cost that extends beyond official correctional budgets. A recent report by The Vera Institute of Justice (Henrichson & Delaney, 2012) indicated that the cost of prison operations (including such things as pension and insurance contributions, capital costs, legal fees, and administrative fees) in 40 states participating in their study was 39. …

...read moreread less

679 citations

Journal Article•DOI•

The Ability Model of Emotional Intelligence: Principles and Updates:

[...]

John D. Mayer¹, David R. Caruso², Peter Salovey²•Institutions (2)

University of New Hampshire¹, Yale University²

18 Aug 2016-Emotion Review

TL;DR: The authors present seven principles that have guided our thinking about emotional intelligence, some of them new, and reformulated our original ability model here guided by these principles, and present a new ability model based on these principles.

...read moreread less

Abstract: This article presents seven principles that have guided our thinking about emotional intelligence, some of them new. We have reformulated our original ability model here guided by these principles,...

...read moreread less

642 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Cutoff criteria for fit indexes in covariance structure analysis : Conventional criteria versus new alternatives

[...]

Li-tze Hu, Peter M. Bentler¹•Institutions (1)

University of California, Los Angeles¹

01 Jan 1999-Structural Equation Modeling

TL;DR: In this article, the adequacy of the conventional cutoff criteria and several new alternatives for various fit indexes used to evaluate model fit in practice were examined, and the results suggest that, for the ML method, a cutoff value close to.95 for TLI, BL89, CFI, RNI, and G...

...read moreread less

Abstract: This article examines the adequacy of the “rules of thumb” conventional cutoff criteria and several new alternatives for various fit indexes used to evaluate model fit in practice. Using a 2‐index presentation strategy, which includes using the maximum likelihood (ML)‐based standardized root mean squared residual (SRMR) and supplementing it with either Tucker‐Lewis Index (TLI), Bollen's (1989) Fit Index (BL89), Relative Noncentrality Index (RNI), Comparative Fit Index (CFI), Gamma Hat, McDonald's Centrality Index (Mc), or root mean squared error of approximation (RMSEA), various combinations of cutoff values from selected ranges of cutoff criteria for the ML‐based SRMR and a given supplemental fit index were used to calculate rejection rates for various types of true‐population and misspecified models; that is, models with misspecified factor covariance(s) and models with misspecified factor loading(s). The results suggest that, for the ML method, a cutoff value close to .95 for TLI, BL89, CFI, RNI, and G...

...read moreread less

76,383 citations

Journal Article•DOI•

Convergent and discriminant validation by the multitrait-multimethod matrix.

[...]

Donald T. Campbell, Donald W. Fiske

01 Mar 1959-Psychological Bulletin

TL;DR: This transmutability of the validation matrix argues for the comparisons within the heteromethod block as the most generally relevant validation data, and illustrates the potential interchangeability of trait and method components.

...read moreread less

Abstract: Content Memory (Learning Ability) As Comprehension 82 Vocabulary Cs .30 ( ) .23 .31 ( ) .31 .31 .35 ( ) .29 .48 .35 .38 ( ) .30 .40 .47 .58 .48 ( ) As judged against these latter values, comprehension (.48) and vocabulary (.47), but not memory (.31), show some specific validity. This transmutability of the validation matrix argues for the comparisons within the heteromethod block as the most generally relevant validation data, and illustrates the potential interchangeability of trait and method components. Some of the correlations in Chi's (1937) prodigious study of halo effect in ratings are appropriate to a multitrait-multimethod matrix in which each rater might be regarded as representing a different method. While the published report does not make these available in detail because it employs averaged values, it is apparent from a comparison of his Tables IV and VIII that the ratings generally failed to meet the requirement that ratings of the same trait by different raters should correlate higher than ratings of different traits by the same rater. Validity is shown to the extent that of the correlations in the heteromethod block, those in the validity diagonal are higher than the average heteromethod-heterotrait values. A conspicuously unsuccessful multitrait-multimethod matrix is provided by Campbell (1953, 1956) for rating of the leadership behavior of officers by themselves and by their subordinates. Only one of 11 variables (Recognition Behavior) met the requirement of providing a validity diagonal value higher than any of the heterotrait-heteromethod values, that validity being .29. For none of the variables were the validities higher than heterotrait-monomethod values. A study of attitudes toward authority and nonauthority figures by Burwen and Campbell (1957) contains a complex multitrait-multimethod matrix, one symmetrical excerpt from which is shown in Table 6. Method variance was strong for most of the procedures in this study. Where validity was found, it was primarily at the level of validity diagonal values higher than heterotrait-heteromethod values. As illustrated in Table 6, attitude toward father showed this kind of validity, as did attitude toward peers to a lesser degree. Attitude toward boss showed no validity. There was no evidence of a generalized attitude toward authority which would include father and boss, although such values as the VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX

...read moreread less

15,795 citations

Journal Article•DOI•

Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance

[...]

Gordon W. Cheung, Roger B. Rensvold

01 Apr 2002-Structural Equation Modeling

TL;DR: In this paper, the authors examined the change in the goodness-of-fit index (GFI) when cross-group constraints are imposed on a measurement model and found that the change was independent of both model complexity and sample size.

...read moreread less

Abstract: Measurement invariance is usually tested using Multigroup Confirmatory Factor Analysis, which examines the change in the goodness-of-fit index (GFI) when cross-group constraints are imposed on a measurement model. Although many studies have examined the properties of GFI as indicators of overall model fit for single-group data, there have been none to date that examine how GFIs change when between-group constraints are added to a measurement model. The lack of a consensus about what constitutes significant GFI differences places limits on measurement invariance testing. We examine 20 GFIs based on the minimum fit function. A simulation under the two-group situation was used to examine changes in the GFIs (ΔGFIs) when invariance constraints were added. Based on the results, we recommend using Δcomparative fit index, ΔGamma hat, and ΔMcDonald's Noncentrality Index to evaluate measurement invariance. These three ΔGFIs are independent of both model complexity and sample size, and are not correlated with the o...

...read moreread less

10,597 citations

Journal Article•DOI•

A Review and Synthesis of the Measurement Invariance Literature: Suggestions, Practices, and Recommendations for Organizational Research

[...]

Robert J. Vandenberg, Charles E. Lance¹•Institutions (1)

University of Georgia¹

01 Jan 2000-Organizational Research Methods

TL;DR: The establishment of measurement invariance across groups is a logical prerequisite to conducting substantive cross-group comparisons (e.g., tests of group mean differences, invariance of structura, etc.).

...read moreread less

Abstract: The establishment of measurement invariance across groups is a logical prerequisite to conducting substantive cross-group comparisons (e.g., tests of group mean differences, invariance of structura...

...read moreread less

6,086 citations

Principles and practice of structural equation modeling, 3rd ed.

[...]

Rex B. Kline¹•Institutions (1)

Concordia University¹

01 Jan 2011

5,475 citations