Home
/
Authors
/
Steven M. Downing

Author

Steven M. Downing

Other affiliations: National Board of Medical Examiners, Arizona State University, University of Illinois at Urbana–Champaign

Bio: Steven M. Downing is an academic researcher from University of Illinois at Chicago. The author has contributed to research in topics: Test (assessment) & Test validity. The author has an hindex of 38, co-authored 63 publications receiving 6649 citations. Previous affiliations of Steven M. Downing include National Board of Medical Examiners & Arizona State University.

Papers published on a yearly basis

2019
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
1997
1996
1995
1993
1989

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Validity: on the meaningful interpretation of assessment data

[...]

Steven M. Downing¹•Institutions (1)

University of Illinois at Chicago¹

01 Sep 2003-Medical Education

TL;DR: Five sources – content, response process, internal structure, relationship to other variables and consequences – are noted by the Standards for Educational and Psychological Testing as fruitful areas to seek validity evidence.

...read moreread less

Abstract: support or fail to support the proposed score interpretations, at a given point in time. Data and logic are assembled into arguments – pro and con – for some specific interpretation of assessment data. Examples of types of validity evidence, data and information from each source are discussed in the context of a high-stakes written and performance examination in medical education. Conclusion All assessments require evidence of the reasonableness of the proposed interpretation, as test data in education have little or no intrinsic meaning. The constructs purported to be measured by our assessments are important to students, faculty, administrators, patients and society and require solid scientific evidence of their meaning.

...read moreread less

1,193 citations

Journal Article•DOI•

A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment

[...]

Thomas M. Haladyna¹, Steven M. Downing², Michael C. Rodriguez•Institutions (2)

Arizona State University¹, University of Illinois at Chicago²

01 Jul 2002-Applied Measurement in Education

TL;DR: A taxonomy of 31 multiple-choice item-writing guidelines was validated through a logical process that included two sources of evidence: the consensus achieved from reviewing what was found in 27 textbooks on educational testing and the results of 27 research studies and reviews published since 1990.

...read moreread less

Abstract: A taxonomy of 31 multiple-choice item-writing guidelines was validated through a logical process that included two sources of evidence: the consensus achieved from reviewing what was found in 27 textbooks on educational testing and the results of 27 research studies and reviews published since 1990. This taxonomy is mainly intended for classroom assessment. Because textbooks have potential to educate teachers and future teachers, textbook writers are encouraged to consider these findings in future editions of their textbooks. This taxonomy may also have usefulness for developing test items for large-scale assessments. Finally, research on multiple-choice item writing is discussed both from substantive and methodological viewpoints.

...read moreread less

771 citations

Journal Article•DOI•

Reliability: on the reproducibility of assessment data

[...]

Steven M. Downing¹•Institutions (1)

University of Illinois at Chicago¹

01 Sep 2004-Medical Education

TL;DR: This presentation explains how assessment data, like other scientific experimental data, must be reproducible in order to be meaningfully interpreted.

...read moreread less

Abstract: Context All assessment data, like other scientific experimental data, must be reproducible in order to be meaningfully interpreted. Purpose The purpose of this paper is to discuss applications of reliability to the most common assessment methods in medical education. Typical methods of estimating reliability are discussed intuitively and non-mathematically. Summary Reliability refers to the consistency of assessment outcomes. The exact type of consistency of greatest interest depends on the type of assessment, its purpose and the consequential use of the data. Written tests of cognitive achievement look to internal test consistency, using estimation methods derived from the test-retest design. Rater-based assessment data, such as ratings of clinical performance on the wards, require interrater consistency or agreement. Objective structured clinical examinations, simulated patient examinations and other performance-type assessments generally require generalisability theory analysis to account for various sources of measurement error in complex designs and to estimate the consistency of the generalisations to a universe or domain of skills. Conclusions Reliability is a major source of validity evidence for assessments. Low reliability indicates that large variations in scores can be expected upon retesting. Inconsistent assessment scores are difficult or impossible to interpret meaningfully and thus reduce validity evidence. Reliability coefficients allow the quantification and estimation of the random errors of measurement in assessments, such that overall assessment can be improved.

...read moreread less

621 citations

Book•DOI•

Handbook of test development

[...]

Steven M. Downing¹, Thomas M. Haladyna¹•Institutions (1)

Arizona State University¹

01 Jan 2006

TL;DR: In this article, the standards for educational and psychological testing: guidance in test development and test development strategies to minimize test fraud are discussed, as well as test content-related validity evidence for student achievement tests.

...read moreread less

Abstract: Twelve steps for effective test development / Steven M Downing -- The standards for educational and psychological testing: guidance in test development / Robert L Linn -- Contracting for testing services / E Roger Trent and Edward Roeber -- Evidence-centered assessment design / Robert J Mislevy and Michelle M Riconscente -- Item and test development strategies to minimize test fraud / James C Impara and David Foster -- Preparing examinees for test taking: guidelines for test developers and test users / Linda Crocker -- Content-related validity evidence in test development / Michael Kane -- Identifying content for student achievement tests / Norman L Webb -- Determining test content of credentialing examinations / Mark Raymond and Sandra Neustel -- Standard setting / Gregory J Cizek -- Computerized item banking / C David Vale -- Selected-response item formats in test development / Steven M Downing -- Item and prompt development in performance testing / Catherine Welch --^

...read moreread less

441 citations

Journal Article•DOI•

Construct-Irrelevant Variance in High-Stakes Testing.

[...]

Thomas M. Haladyna¹, Steven M. Downing²•Institutions (2)

Arizona State University¹, University of Illinois at Chicago²

25 Oct 2005-Educational Measurement: Issues and Practice

TL;DR: In this paper, the authors define construct-irrelevant variance (CIV) in the context of the contemporary, unitary view of validity and present logical arguments, hypotheses, and documentation for a variety of CIV sources that commonly threaten interpretations of test scores.

...read moreread less

Abstract: There are many threats to validity in high-stakes achievement testing. One major threat is construct-irrelevant variance (CIV). This article defines CIV in the context of the contemporary, unitary view of validity and presents logical arguments, hypotheses, and documentation for a variety of CIV sources that commonly threaten interpretations of test scores. A more thorough study of CIV is recommended.

...read moreread less

337 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Making sense of Cronbach's alpha

[...]

Mohsen Tavakol, Reg Dennick

27 Jun 2011-International Journal of Medical Education

TL;DR: The meaning of Cronbach’s alpha, the most widely used objective measure of reliability, is explained and the underlying assumptions behind alpha are explained in order to promote its more effective use.

...read moreread less

Abstract: Medical educators attempt to create reliable and valid tests and questionnaires in order to enhance the accuracy of their assessment and evaluations. Validity and reliability are two fundamental elements in the evaluation of a measurement instrument. Instruments can be conventional knowledge, skill or attitude tests, clinical simulations or survey questionnaires. Instruments can measure concepts, psychomotor skills or affective values. Validity is concerned with the extent to which an instrument measures what it is intended to measure. Reliability is concerned with the ability of an instrument to measure consistently.1 It should be noted that the reliability of an instrument is closely associated with its validity. An instrument cannot be valid unless it is reliable. However, the reliability of an instrument does not depend on its validity.2 It is possible to objectively measure the reliability of an instrument and in this paper we explain the meaning of Cronbach’s alpha, the most widely used objective measure of reliability. Calculating alpha has become common practice in medical education research when multiple-item measures of a concept or construct are employed. This is because it is easier to use in comparison to other estimates (e.g. test-retest reliability estimates)3 as it only requires one test administration. However, in spite of the widespread use of alpha in the literature the meaning, proper use and interpretation of alpha is not clearly understood. 2, 4, 5 We feel it is important, therefore, to further explain the underlying assumptions behind alpha in order to promote its more effective use. It should be emphasised that the purpose of this brief overview is just to focus on Cronbach’s alpha as an index of reliability. Alternative methods of measuring reliability based on other psychometric methods, such as generalisability theory or item-response theory, can be used for monitoring and improving the quality of OSCE examinations 6-10, but will not be discussed here. What is Cronbach alpha? Alpha was developed by Lee Cronbach in 195111 to provide a measure of the internal consistency of a test or scale; it is expressed as a number between 0 and 1. Internal consistency describes the extent to which all the items in a test measure the same concept or construct and hence it is connected to the inter-relatedness of the items within the test. Internal consistency should be determined before a test can be employed for research or examination purposes to ensure validity. In addition, reliability estimates show the amount of measurement error in a test. Put simply, this interpretation of reliability is the correlation of test with itself. Squaring this correlation and subtracting from 1.00 produces the index of measurement error. For example, if a test has a reliability of 0.80, there is 0.36 error variance (random error) in the scores (0.80×0.80 = 0.64; 1.00 – 0.64 = 0.36).12 As the estimate of reliability increases, the fraction of a test score that is attributable to error will decrease.2 It is of note that the reliability of a test reveals the effect of measurement error on the observed score of a student cohort rather than on an individual student. To calculate the effect of measurement error on the observed score of an individual student, the standard error of measurement must be calculated (SEM).13 If the items in a test are correlated to each other, the value of alpha is increased. However, a high coefficient alpha does not always mean a high degree of internal consistency. This is because alpha is also affected by the length of the test. If the test length is too short, the value of alpha is reduced.2, 14 Thus, to increase alpha, more related items testing the same concept should be added to the test. It is also important to note that alpha is a property of the scores on a test from a specific sample of testees. Therefore investigators should not rely on published alpha estimates and should measure alpha each time the test is administered.14 Use of Cronbach’s alpha Improper use of alpha can lead to situations in which either a test or scale is wrongly discarded or the test is criticised for not generating trustworthy results. To avoid this situation an understanding of the associated concepts of internal consistency, homogeneity or unidimensionality can help to improve the use of alpha. Internal consistency is concerned with the interrelatedness of a sample of test items, whereas homogeneity refers to unidimensionality. A measure is said to be unidimensional if its items measure a single latent trait or construct. Internal consistency is a necessary but not sufficient condition for measuring homogeneity or unidimensionality in a sample of test items. 5, 15 Fundamentally, the concept of reliability assumes that unidimensionality exists in a sample of test items16 and if this assumption is violated it does cause a major underestimate of reliability. It has been well documented that a multidimensional test does not necessary have a lower alpha than a unidimensional test. Thus a more rigorous view of alpha is that it cannot simply be interpreted as an index for the internal consistency of a test. 5, 15, 17 Factor Analysis can be used to identify the dimensions of a test.18 Other reliable techniques have been used and we encourage the reader to consult the paper “Applied Dimensionality and Test Structure Assessment with the START-M Mathematics Test” and to compare methods for assessing the dimensionality and underlying structure of a test.19 Alpha, therefore, does not simply measure the unidimensionality of a set of items, but can be used to confirm whether or not a sample of items is actually unidimensional.5 On the other hand if a test has more than one concept or construct, it may not make sense to report alpha for the test as a whole as the larger number of questions will inevitable inflate the value of alpha. In principle therefore, alpha should be calculated for each of the concepts rather than for the entire test or scale. 2, 3 The implication for a summative examination containing heterogeneous, case-based questions is that alpha should be calculated for each case. More importantly, alpha is grounded in the ‘tau equivalent model’ which assumes that each test item measures the same latent trait on the same scale. Therefore, if multiple factors/traits underlie the items on a scale, as revealed by Factor Analysis, this assumption is violated and alpha underestimates the reliability of the test.17 If the number of test items is too small it will also violate the assumption of tau-equivalence and will underestimate reliability.20 When test items meet the assumptions of the tau-equivalent model, alpha approaches a better estimate of reliability. In practice, Cronbach’s alpha is a lower-bound estimate of reliability because heterogeneous test items would violate the assumptions of the tau-equivalent model.5 If the calculation of “standardised item alpha” in SPSS is higher than “Cronbach’s alpha”, a further examination of the tau-equivalent measurement in the data may be essential. Numerical values of alpha As pointed out earlier, the number of test items, item inter-relatedness and dimensionality affect the value of alpha.5 There are different reports about the acceptable values of alpha, ranging from 0.70 to 0.95. 2, 21, 22 A low value of alpha could be due to a low number of questions, poor inter-relatedness between items or heterogeneous constructs. For example if a low alpha is due to poor correlation between items then some should be revised or discarded. The easiest method to find them is to compute the correlation of each test item with the total score test; items with low correlations (approaching zero) are deleted. If alpha is too high it may suggest that some items are redundant as they are testing the same question but in a different guise. A maximum alpha value of 0.90 has been recommended.14 Summary High quality tests are important to evaluate the reliability of data supplied in an examination or a research study. Alpha is a commonly employed index of test reliability. Alpha is affected by the test length and dimensionality. Alpha as an index of reliability should follow the assumptions of the essentially tau-equivalent approach. A low alpha appears if these assumptions are not meet. Alpha does not simply measure test homogeneity or unidimensionality as test reliability is a function of test length. A longer test increases the reliability of a test regardless of whether the test is homogenous or not. A high value of alpha (> 0.90) may suggest redundancies and show that the test length should be shortened.

...read moreread less

8,701 citations

Journal Article•

Experience and Education.

[...]

Robert Sommer

01 Jan 1974-Journal of Architectural Education

TL;DR: One of the books that can be recommended for new readers is experience and education as mentioned in this paper, which is not kind of difficult book to read and can be read and understand by the new readers.

...read moreread less

Abstract: Preparing the books to read every day is enjoyable for many people. However, there are still many people who also don't like reading. This is a problem. But, when you can support others to start reading, it will be better. One of the books that can be recommended for new readers is experience and education. This book is not kind of difficult book to read. It can be read and understand by the new readers.

...read moreread less

5,478 citations

Journal Article•

The power of feedback.

[...]

Larry Ambrose, Paula Moscinski

01 Sep 2002-Healthcare executive

4,293 citations

Standards for educational and psychological testing

[...]

Alija Kulenović, Vesna Buško

01 Jan 2006

TL;DR: For example, Standardi pružaju okvir koje ukazuju na ucinkovitost kvalitetnih instrumenata u onim situacijama u kojima je njihovo koristenje potkrijepljeno validacijskim podacima.

...read moreread less

Abstract: Pedagosko i psiholosko testiranje i procjenjivanje spadaju među najvažnije doprinose znanosti o ponasanju nasem drustvu i pružaju temeljna i znacajna poboljsanja u odnosu na ranije postupke. Iako se ne može ustvrditi da su svi testovi dovoljno usavrseni niti da su sva testiranja razborita i korisna, postoji velika kolicina informacija koje ukazuju na ucinkovitost kvalitetnih instrumenata u onim situacijama u kojima je njihovo koristenje potkrijepljeno validacijskim podacima. Pravilna upotreba testova može dovesti do boljih odluka o pojedincima i programima nego sto bi to bio slucaj bez njihovog koristenja, a također i ukazati na put za siri i pravedniji pristup obrazovanju i zaposljavanju. Međutim, losa upotreba testova može dovesti do zamjetne stete nanesene ispitanicima i drugim sudionicima u procesu donosenja odluka na temelju testovnih podataka. Cilj Standarda je promoviranje kvalitetne i eticne upotrebe testova te uspostavljanje osnovice za ocjenu kvalitete postupaka testiranja. Svrha objavljivanja Standarda je uspostavljanje kriterija za evaluaciju testova, provedbe testiranja i posljedica upotrebe testova. Iako bi evaluacija prikladnosti testa ili njegove primjene trebala ovisiti prvenstveno o strucnim misljenjima, Standardi pružaju okvir koji osigurava obuhvacanje svih relevantnih pitanja. Bilo bi poželjno da svi autori, sponzori, nakladnici i korisnici profesionalnih testova usvoje Standarde te da poticu druge da ih također prihvate.

...read moreread less

3,905 citations

Journal Article•DOI•

Features and uses of high-fidelity medical simulations that lead to effective learning: a BEME systematic review

[...]

S. Barry Issenberg¹, William C. McGaghie², Emil Petrusa³, David Lee Gordon¹, Ross J. Scalese¹ - Show less +1 more•Institutions (3)

University of Miami¹, Northwestern University², Duke University³

01 Jan 2005-Medical Teacher

TL;DR: While research in this field needs improvement in terms of rigor and quality, high-fidelity medical simulations are educationally effective and simulation-based education complements medical education in patient care settings.

...read moreread less

Abstract: SUMMARY Review date: 1969 to 2003, 34 years. Background and context: Simulations are now in widespread use in medical education and medical personnel evaluation. Outcomes research on the use and effectiveness of simulation technology in medical education is scattered, inconsistent and varies widely in methodological rigor and substantive focus. Objectives: Review and synthesize existing evidence in educational science that addresses the question, ‘What are the features and uses of high-fidelity medical simulations that lead to most effective learning?’. Search strategy: The search covered five literature databases (ERIC, MEDLINE, PsycINFO, Web of Science and Timelit) and employed 91 single search terms and concepts and their Boolean combinations. Hand searching, Internet searches and attention to the ‘grey literature’ were also used. The aim was to perform the most thorough literature search possible of peer-reviewed publications and reports in the unpublished literature that have been judged for academic quality. Inclusion and exclusion criteria: Four screening criteria were used to reduce the initial pool of 670 journal articles to a focused set of 109 studies: (a) elimination of review articles in favor of empirical studies; (b) use of a simulator as an educational assessment or intervention with learner outcomes measured quantitatively; (c) comparative research, either experimental or quasi-experimental; and (d) research that involves simulation as an educational intervention. Data extraction: Data were extracted systematically from the 109 eligible journal articles by independent coders. Each coder used a standardized data extraction protocol. Data synthesis: Qualitative data synthesis and tabular presentation of research methods and outcomes were used. Heterogeneity of research designs, educational interventions, outcome measures and timeframe precluded data synthesis using meta-analysis. Headline results: Coding accuracy for features of the journal articles is high. The extant quality of the published research is generally weak. The weight of the best available evidence suggests that high-fidelity medical simulations facilitate learning under the right conditions. These include the following:

...read moreread less

3,176 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse