scispace - formally typeset
Open AccessJournal ArticleDOI

Absolute and relative measures of instructional sensitivity

Alexander Naumann, +2 more
- 17 Apr 2017 - 
- Vol. 42, Iss: 6, pp 678-705
Reads0
Chats0
TLDR
In this article, the authors show that valid inferences on teaching drawn from students' test scores require that tests are sensitive to the instruction students received in class and that measures of the test items' instructional...
Abstract
Valid inferences on teaching drawn from students’ test scores require that tests are sensitive to the instruction students received in class. Accordingly, measures of the test items’ instructional ...

read more

Content maybe subject to copyright    Report

Naumann, Alexander; Hartig, Johannes; Hochweber, Jan
Absolute and relative measures of instructional sensitivity
Journal of educational and behavioral statistics 42 (2017) 6, S. 678-705
Quellenangabe/ Reference:
Naumann, Alexander; Hartig, Johannes; Hochweber, Jan: Absolute and relative measures of
instructional sensitivity - In: Journal of educational and behavioral statistics 42 (2017) 6, S. 678-705 -
URN: urn:nbn:de:0111-pedocs-156029 - DOI: 10.25656/01:15602
https://nbn-resolving.org/urn:nbn:de:0111-pedocs-156029
https://doi.org/10.25656/01:15602
Nutzungsbedingungen Terms of use
Gewährt wird ein nicht exklusives, nicht übertragbares,
persönliches und beschränktes Recht auf Nutzung dieses
Dokuments. Dieses Dokument ist ausschließlich für den
persönlichen, nicht-kommerziellen Gebrauch bestimmt. Die
Nutzung stellt keine Übertragung des Eigentumsrechts an diesem
Dokument dar und gilt vorbehaltlich der folgenden
Einschränkungen: Auf sämtlichen Kopien dieses Dokuments
müssen alle Urheberrechtshinweise und sonstigen Hinweise auf
gesetzlichen Schutz beibehalten werden. Sie dürfen dieses
Dokument nicht in irgendeiner Weise abändern, noch dürfen Sie
dieses Dokument für öffentliche oder kommerzielle Zwecke
vervielfältigen, öffentlich ausstellen, aufführen, vertreiben oder
anderweitig nutzen.
We grant a non-exclusive, non-transferable, individual and limited
right to using this document.
This document is solely intended for your personal, non-commercial
use. Use of this document does not include any transfer of property
rights and it is conditional to the following limitations: All of the
copies of this documents must retain all copyright information and
other information regarding legal protection. You are not allowed to
alter this document in any way, to copy it for public or commercial
purposes, to exhibit the document in public, to perform, distribute or
otherwise use the document in public.
Mit der Verwendung dieses Dokuments erkennen Sie die
Nutzungsbedingungen an.
By using this particular document, you accept the above-stated
conditions of use.
Kontakt / Contact:
pe
DOCS
DIPF | Leibniz-Institut für Bildungsforschung und Bildungsinformation
Informationszentrum (IZ) Bildung
E-Mail: pedocs@dipf.de
Internet: www.pedocs.de

Article
Absolute and Relative Measures
of Instructional Sensitivity
Alexander Naumann
Johannes Hartig
German Institute for International Educational Research (DIPF)
Jan Hochweber
University of Teacher Education St. Gallen (PHSG)
Valid inferences on teaching drawn from students test scores require that tests
are sensitive to the instruction students received in class. Accordingly, measures
of the test items instructional sensitivity provide empirical support for validity
claims about inferences on instruction. In the present study, we first introduce
the concepts of absolute and relative measures of instructional sensitivity.
Absolute measures summarize a single items total capacity of capturing effects
of instruction, which is independent of the tests sensitivity. In contrast, relative
measures summarize a single items capacity of capturing effects of instruction
relative to test sensitivity. Then, we propose a longitudinal multilevel item
response theory model that allows estimating both types of measures depending
on the identification constraints.
Keywords:instructional sensitivity; multilevel IRT; differential item functioning
Researchers as well as policymakers regularly rely on student performance data
to draw inferences on schools, teachers, or teaching (Creemers & Kyriakides,
2008; Pellegrino, 2002). Yet valid inferences drawn from student test scores
require that instruments are sensitive to the instruction that students have received
in class (Popham, 2007; Popham & Ryan, 2012). Accordingly, measures of test
items instructional sensitivity may provide empirical support for validity claims
about the inferences on instruction derived from student test scores.
Instructional sensitivity is defined as the psychometric property of a test or a
single item to capture effects of instruction (Polikoff, 2010). Scores of instruc-
tionally sensitive tests are expected to increase with more or better teaching
(Baker, 1994). Students who received different instruction should produce dif-
ferent responses to highly instructionally sensitive items (Ing, 2008). Fundamen-
tally, instructional sensitivity relates to the observation of change in students
responses on items as a consequence of instruction (Burstein, 1989). If item
responses do not change as a consequence of instruction, it may remain unclear
Journal of Educational and Behavioral Statistics
2017, Vol. 42, No. 6, pp. 678705
DOI: 10.3102/1076998617703649
# 2017 AERA. http://jebs.aera.net
678

whether teaching was ineffective or thetest was insensitive (Naumann, Hoch-
weber, & Hartig, 2014). To test the hypothesis of whether an item is instruc-
tionally sensitive, various measures have been proposed (see Haladyna & Roid,
1981; Polikoff, 2010). Most commonly,these item sensitivity measures are
based on item parameters, that is, item difficulty or discrimination (Haladyna,
2004).
According to Naumann, Hochweber, and Klieme (2016), each item sensitivity
measure refers to one of the three perspectives on how to test the instructional
sensitivity of items. From the first perspective, instructional sensitivity is con-
ceived as change in item parameters between two time points of measurement,
while from the second perspective instructional sensitivity is conceived as dif-
ferences in item parameters between at least two groups (e.g., treatment and
control groups or classes) within a sample. The third perspective is a combination
of the two preceding ones, which allows deriving measures addressing two facets
of item sensitivity: global and differential sensitivity. Global sensitivity refers to
the extent to which item parameters change on average across time. Differential
sensitivity refers to the variation of change in parameters across groups,
indicating an items capacity of detecting differences in group-specific learning.
Overall, these perspectives provide an elaborate framework for the measurement
of instructional sensitivity based on item statistics by highlighting the relevant
sources of variance: variance between (a) time points, (b) groups, and (c) groups
and time points. As item sensitivity measures rooted in different perspectives
target different sources of variance, they do not necessarily provide consistent
results (Naumann et al., 2014).
Yet the three perspectives are not sufficient for describing common charac-
teristics and distinctions of instructional sensitivity measures. Actually, instruc-
tional sensitivity measures referring to the same perspective may address two
essentially different hypotheses regarding item sensitivity: Some measures relate
to the hypothesis of whether an item is sensitive at all, that is,absolute sensitivity,
while others relate to the hypothesis of whether an item substantially deviates
from the tests overall sensitivity, that is,relative sensitivity.
This additional distinction has important theoretical and practical implications
for the evaluation of instructional sensitivity. For example, studies have shown
that the most commonly applied approaches, the PretestPosttest Difference
Index (PPDI; Cox & Vargas, 1966) and differential item functioning (DIF)-based
methods (e.g., Linn & Harnisch, 1981; Robitzsch, 2009), are inconsistent in their
judgment of item sensitivity (Li, Ruiz-Primo, & Wills, 2012; Naumann et al.,
2014). One reason for this finding lies in the difference of the perspective taken
on instructional sensitivity by these approaches (Naumann, Hochweber, &
Klieme, 2016): While the PPDI focuses on change in item difficulties across
time points, DIF approaches focus on differences in item difficulty between at
least two groups of students (e.g., treatment groups or courses or classes) within a
sample. Yet another reason is that the approaches differ in the way they measure
Naumann et al.
679

instructional sensitivity: While the PPDI is an absolute sensitivity measure, DIF
approaches provide relative measures of item sensitivity.
Thus, in the present study, we aim to contribute to the measurement frame-
work of instructional sensitivity by introducing the distinction between absolute
and relative measures. Absolute and relative measures may be distinguished
within each of the three perspectives on instructional sensitivity and provide
unique and valuable information on item functioning in educational assessments
when inferences on schools, teachers, or teaching are to be drawn. In the follow-
ing, we will first elaborate on the distinction of absolute and relative measures.
We will point out how absolute and relative measures relate to test sensitivity and
current approaches to the instructional sensitivity of items. Second, we will
provide a model-based approach that allows testing the hypothesis of whether
items are absolutely and/or relatively sensitive within a more general item
response theory (IRT) framework. For illustration purposes, we apply our
approach to simulated and empirical item response data. Finally, we will discuss
implications for the measurement of instructional sensitivity, test development,
and test score interpretation.
Extending the Measurement Framework of Instructional Sensitivity
Figure 1 depicts an extended measurement framework. The extended mea-
surement framework comprises the three perspectives as well as the two sensi-
tivity facetsglobal and differentialsensitivity—that can be distinguished
within the groups and time points perspective following Naumann and col-
leagues (2016). In addition, we draw the distinction between absolute and rela-
tive item sensitivity measures within each perspective, making explicit that two
different hypotheses regarding item sensitivity may be tested via absolute and
relative measures.
Absolute measures address the hypothesis of whether a single item is sensitive
to instruction. In principle, absolute measures summarize a single items total
capacity of capturing potential effects of instruction in terms of variation in item
parameters across time, groups, or both. Hence, absolute measures are expected
to approach zero the less sensitive an item is and depart from zero the higher the
items sensitivity to instruction is.
In contrast, relative measures address the hypothesis of whether a single
items sensitivity substantially deviates from test sensitivity. Test sensitivity is
a concept that so far has only been implicitly used in the measurement of instruc-
tional sensitivity. In consistence with the predominant statistical notion of item
sensitivity (see Haladyna & Roid, 1981; Haladyna, 2004; Polikoff, 2010), test
sensitivity may be defined as the overall (i.e., unconditional) variation of
test scores across either time points, groups, or both (cf. Naumann et al.,
2016). Test sensitivity then is a prerequisite for what is commonly conceived
as the instructional sensitivity of a test, which typically refers to the proportion of
Absolute and Relative Measures of Instructional Sensitivity
680

variance in test scores explained by school, teacher, or teaching characteristics
(e.g., DAgostino, Welsh, & Corson, 2007; Grossman, Cohen, Ronfeldt, &
Brown, 2014; Ing, 2008). Generally, test sensitivity captures the degree of item
sensitivity that is common to all the items within a test. Technically speaking, the
stronger the item sensitivity correlates across all test items, the higher the test
sensitivity. Accordingly, relative measures express the degree to which a single
items sensitivity differs from test sensitivity. More precisely, relative measures
are expected to approach zero the more an items sensitivity is in consistence
with test sensitivity and to be nonzero if the items sensitivity deviates from test
sensitivity.
In general, whether a specific item sensitivity measure is absolute or relative
depends on whether or not the underlying measurement model comprises one or
more parameters capturing test sensitivity. Absolute measures of sensitivity are
unconditional on test sensitivity while relative measures are conditional on test
sensitivity. That is, from each of the three perspectives, measures are obtainable
in two ways, either independently of (i.e., absolute) or depending on (i.e., rela-
tive) test sensitivity. As a result, there are eight different ways of measuring an
FIGURE 1.Extended measurement framework of instructional sensitivity comprising the
three perspectives, the two facets, and the eight absolute and relative sensitivity measures.
Naumann et al.
681

Citations
More filters
Journal ArticleDOI

Mathematical content knowledge and knowledge for teaching: exploring their distinguishability and contribution to student learning

TL;DR: In this paper, a multi-year study of over 200 fourth and fifth grade US teachers revealed that teacher knowledge positively predicts student achievement gains. But, empirical findings on the distinguishability of these two knowledge components, and their relationship with student outcomes, are mixed.
Journal ArticleDOI

Sensitivity of test items to teaching quality

TL;DR: In this article, the authors investigated test and item sensitivity to teaching quality, reanalyzing data from a quasi-experimental intervention study in primary school science education (1026 students, 53 classes, Mage =※8.79 years, SDage= 0.49, 50% female).
Journal ArticleDOI

Exploring teacher popularity: associations with teacher characteristics and student outcomes in primary school

TL;DR: In this article, conditions and consequences of teacher popularity in primary schools were investigated, and teacher popularity was embedded in a theoretical framework that describes relationships between teacher competence, teaching quality, and student outcomes.
Journal ArticleDOI

Instruktionssensitivität von Tests und Items

TL;DR: In this article, the authors discuss empirically unklar, ob a test nicht instruktionssensitiv oder ein Unterricht nicht effektiv war.
References
More filters
Journal Article

R: A language and environment for statistical computing.

R Core Team
- 01 Jan 2014 - 
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Journal ArticleDOI

Bayesian data analysis.

TL;DR: A fatal flaw of NHST is reviewed and some benefits of Bayesian data analysis are introduced and illustrative examples of multiple comparisons in Bayesian analysis of variance and Bayesian approaches to statistical power are presented.

JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling

TL;DR: JAGS is a program for Bayesian Graphical modelling which aims for compatibility with Classic BUGS and could eventually be developed as an R package.
Related Papers (5)