scispace - formally typeset
Open AccessJournal ArticleDOI

A maximin model for test design with practical constraints

Willem J. van der Linden, +1 more
- 01 Jun 1989 - 
- Vol. 54, Iss: 2, pp 237-247
TLDR
A maximin model for IRT-based test design is proposed that serves as a constraint subject to which a linear programming algorithm maximizes the information in the test.
Abstract
A maximin model for IRT-based test design is proposed. In the model only the relative shape of the target test information function is specified. It serves as a constraint subject to which a linear programming algorithm maximizes the information in the test. In the practice of test construction, several demands as linear constraints in the model. A worked example of a text construction problem with practical constraints is presented. The paper concludes with a discussion of some alternative models of test construction.

read more

Content maybe subject to copyright    Report

PSYCHOMETRIKA--VOL. 54, NO. 2, 237-247
JUNE 1989
A MAXIMIN MODEL FOR TEST DESIGN
WITH PRACTICAL CONSTRAINTS
WIM J. VAN DER LINDEN
ELLEN BOEKKOOI-TIMMINGA
UNIVERSITY OF TWENTE
A maximin model for IRT-based test design is proposed. In the model only the relative shape of
the target test information function is specified. It serves as a constraint subject to which a linear
programming algorithm maximizes the information in the test, In the practice of test construction,
several demands with respect to the properties of the test may exist. The paper shows how these can
be formulated as linear constraints in the model. A worked example of a test construction problem
with practical constraints is presented. The paper concludes with a discussion of some alternative
models of test construction.
Key words: item response theory, test construction, linear programming.
In item response theory (IRT), test design is usually based on the concepts of item
and test information functions. In this paper, as an example, information functions
under the three-parameter logistic model for dichotomous responses are considered.
The results, however, hold for any IRT model.
The three-parameter logistic model is as follows:
pi(O) = ci +
(1 - ci){1 + exp
[-ai(O -
bi)]}- 1, (1)
where 0 E (-~, +~) is the ability measured by the test items, a,. E [0, +co) and
bi E (-~, +~) are parameters for the discriminating power and difficulty of item i, and
ci E
[0, 1] is the probability of solving item i for 0 ~ -~. The model gives the
probability of a correct response as a function of the ability parameter 0. For known
item parameters, it holds that Fisher's information about the unknown 0 in a single
response,
Ui, (u i
= 0, 1), to item i is equal to
[p,.'(0)] 2
l(Ui; O) pi(O)[l - pi(O)]"
(2)
For a sample of locally independent responses, UI ..... U 1, to the items i = 1 ..... I,
Fisher's information is equal to
[p ; ( O) ]2
l(Ui ..... Ut; O)= ~ pi(07~(O)].
(3)
i=1
For item parameters estimated from response data with sufficient precision, Birnbaum
(1968) introduced (2) and (3) as the item and test information functions.
The authors are indebted to Jos J. Adema for suggesting Equation 17 as a simplification of an earlier
version of this constraint. This research was supported in part by a grant from the Dutch Organization for
Research (NWO) through the Foundation for Psychological and Psychonomic Research in the Netherlands
(Psychon).
Requests for reprints should be sent to W. J. van der Linden, University of Twente, Department of
Education, PO Box 2t7, 7500 AE Enschede, THE NETHERLANDS.
0033-3123/89/0600-9521 $00.75/0
© 1989 The Psychometric Society
237

238 PSYCHOMETRIKA
The additivity in (3) suggests the following procedure for test construction: A
target information function for the test is specified. Items are selected to fill the area
under the target function. The procedure is stopped as soon as the sum of the item
information functions exceeds the target. In Birnbaum's (1968) and Lord's (1980) de-
scription of the procedure it is assumed that the selection is done by hand. In general,
however, finding an optimal solution (e.g., a test of minimal length exceeding the target)
by hand is practically impossible. Even an approximate solution may involve several
cycles of back-tracking. Also, specifying a target information function is not an easy
task. Although a test constructor may be able to provide the desired shape of the curve
(e.g., a flat curve for a diagnostic test or a peaked one for decision making), the
necessity to decide on its exact height is likely to create a problem, the reason being that
the metric of the information measure has no meaning to the average test constructor.
Recently, a series of papers has been published in which the Birnbaum-Lord pro-
cedure is replaced by an algorithm from zero-one programming (Rao, 1985; Wagner,
1975). The idea to apply zero-one programming to test construction was already sug-
gested in Yen (1983). Theunissen (1985) was the first to present a zero-one programming
model for test construction with a target information function. In the model, the number
of items in the test is the objective function to be minimized subject to the (linear)
conditions that, at a number of 0-values, the information in the test is above the target.
The same idea has been explored in Boekkooi-Timminga (1987), Boekkooi-Timminga
and van der Linden (1987), Theunissen (1986), Theunissen and Verstralen (1986), and
van der Linden and Boekkooi-Timminga (1988). In all these papers, it is still assumed
that the test constructor is able to specify the exact height of the target information
function at a number of points. For a procedure enabling the test constructor to do so,
see Kelderman (1987).
It is the purpose of this paper to present a maximin model for test construction. In
this model only the
relative
shape of the target test information function has to be
provided. A simple experiment to elicit this shape from a test constructor is described
in the next section. The data from the experiment are then used to specify a linear
constraint in a model that maximizes the information in the test. Since the objective
function in the model does not contain any item or test parameter, all properties of the
test can be controlled by including additional constraints in the model. In the practice
of test construction, several demands with respect to the properties of the test may
exist, for example, with respect to the composition of the test, the administration time,
the curricular fit, and possible links between the contents of the items. It is another
purpose of this paper to show how such demands can be formulated as linear con-
straints in the decision variables. A worked example of the model including several of
these constraints is given. The paper concludes with a discussion of some alternative
zero-one programming models of test construction.
A Maximin Model
Instead of considering target information functions over the whole range of 0-
values, zero-one programming models only assume target values at certain points. One
reason for this is that item information functions are continuous, well-behaved func-
tions for which the value of the sum at a certain point does not differ drastically from
those at neighboring points. Another, more practical motivation is that interest often
exists only at certain critical ability levels ignoring the properties of the test at other
levels, for example, when the test is to be used for decision making. Hence, in this
paper it is also assumed that a discrete approach is appropriate, provided the number
of points and their positions are free.

WlM VAN DER LINDEN AND ELLEN BOEKKOOI-TIMMINGA
239
The following experiment is proposed to elicit the relative shape of the target
information function from the test constructor. First, the test constructor is faced with
the ability scale underlying the item bank. This can be done by offering him or her a line
displaying the contents of items with locations at some well-chosen points. The same
practice is used in scale-score reporting of assessment data (e.g., Pandey, 1986). Then,
the constructor is asked to select a number of scale points he or she wants to consider.
The number of points and their spacing are free. Let 0h, k = 1 ..... K, denote these
points. Next, he or she is given a fixed number of chips (100, say) and requested to
distribute them over the scale points such that they reflect the
relative
distribution of
information wanted from the test. The final step then is to ask the test constructor for
the desired number of items in the test. The answer to this question can be facilitated
by providing some statistics about the time typically needed by the group of examinees
to complete items in the bank.
The Model
Now the idea is to select the items such that they maximize the information in the
test, while the resulting test information function still has the desired shape. Let r k be
the numbers of chips the test constructor puts at point Ok (k = 1 ..... K). The relative
target information function is characterized by a series of lower bounds
(r~y ..... rKy)
in which y is a dummy variable to be maximized subject to the constraint that test length
is equal to the value n specified by the test constructor. Finally, x,. (i = 1 ..... /) is the
decision variable as to whether (x,- = 1) or not (x; = 0) to include item i in the test. This
leads to the following model:
maximize y (4)
subject to
1
li(Ok)xi - rk y >I O,
i=1
k = 1 ..... K, (5)
I
Xi--- rl,
i=1
(6)
xi E
{0, 1}, i= 1 ..... I, (7)
y I> O. (8)
The constraints in (5) set a series of lower bounds, r k y, to the test information
It(Ok) =-
~,I= 1 li(Ok)Xi
at each of the points Ok. The common factor y in these bounds is maximized
in (4). The constraint in (6) sets the test length equal to n.
If the left-hand side of the restrictions in (5) were divided by rk the model would
have new coefficients
li(Ok)r ~. l
for the decision variables
x i
and a coefficient equal to one
for variable y. In this representation it is clear that y can be considered a lower bound
to the weighted sums of decision variables 71
Ei=!
li(Ok)rklxi
and that the values
ofx i
are
selected such that this lower bound is maximal. Hence, mathematically the model is of
the maximin type. To solve the model for the values of x;, i = 1 ..... I, and y, a
branch-and-bound algorithm from integer programming can be used (e.g., Wagner,
chap. 13). Such algorithms are readily available in computer code nowadays.

240 PSYCHOMETRIKA
Some Practical Constraints
For algorithmic test design to be practical, it is necessary to provide control of
features of the test other than just the information function and the number of items. It
should be noted that the objective function in (4) is a dummy variable introduced to cast
the maximin criterion into a linear model. It does not contain any item or test param-
eters, and therefore does not explicitly control the values of these parameters. For this
purpose, however, additional constraints can be included in the model. In this section
a review of constraints to be met in the practice of test construction is given, and it is
shown how these can be modeled into a linear form. Throughout this section it is
assumed that (4) through (8) is the basic model.
Test Composition
As already noted, for a sufficiently large bank of test items, the constraint in (6)
controls the length of the test. The same principle can be applied at the level of possible
subtests providing the test constructor with the ability to control the composition of the
test. Let V i(j = 1 ..... J) be a subset of items in the bank from which the test
constructor wants nj ~ n in the test. This is attained if the following equality is added
to the model:
xi =n j, j = 1, ..., J. (9)
i~
It is important to note that using a series of such constraints provides the oppor-
tunity for controlling the composition of the test simultaneously with respect to several
dimensions. For example, an item bank for English could be partitioned not only with
respect to its content (e.g., vocabulary, grammar, or reading comprehension), but also
to a behavioral dimension (e.g., knowledge of facts, application of rules, or evaluation)
or the format of its items (e.g., multiple choice, completion, or matching). For each set
in these partitions the constraint in (9) is incorporated within the model, with the
restriction that the nj's are specified such that the sum over all sets in the same partition
is equal to n. If this option is used, the constraint in (6) is redundant and may be
dropped. Finally, observe how this example shows that (9) can be used with respect to
both disjoint and non-disjoint subsets of items.
Administration Time
In a computerized testing environment, the time needed to solve the items in the
bank by the population of examinees of interest can easily be monitored. Let ti be, for
example, the 95th percentile of the distribution of time for item i in the population.
Instead of fixing the length of the test, the selection of the items could also be based on
the time limit, T, in force for the examinees. In that case (6) can be replaced by
I
E tixi ~ T.
i=1
(10)
However, if there is a reason to restrict the number of items in the test as well, (10) can
also be used in combination with (6) replacing the equality in the latter by an inequality.
Analogous to (9), the composition of the test can be controlled by introducing time
limits at subtest level.

W1M J. VAN DER LINDEN AND ELLEN BOEKKOOI-TIMMINGA
241
Selection on Item Features
Including the constraints below in the model, it is possible to give a//items in the
test the same feature.
Let c; be a positively valued numerical parameter representing a feature of the
items in the bank. Then it is possible to restrict the selection of the items to those with
c; E [cl, %] by including the following set of inequalities in the model:
CiX i ~ Cu,
i = 1 ..... I, (11)
Ci-
lxi
~ Cl-
1,
i = 1 ..... I. (12)
where c, > cl.
Unlike (9), these constraints do not fix the length of subtests. They are used to give
all items in the test the same properties. At the same time, (9) can be used to compose
the test with different item properties.
If the frequency of administration of the items in the bank is monitored, the con-
straints in (I 1) through (12) can be used to restrict the selection of the items to certain
frequencies. For example, if the intention is to obtain uniform usage of items in the
bank, (11) can be used to set an upper bound for item use thus restricting the selection
of items to those with lower usage.
Another example of the use of (I 1) and (12) is to restrict the administration time,
ti,
for each individual item in the test to certain limits.
It is also possible to substitute one of the parameters in the item response model for
ci.
In this way, the constraints can be used, for example, to select items with values for
the difficulty parameters in a certain interval. For the Rasch (1960) model, this allows
for the selection of items based on their probabilities of success: Let 00 be the a priori
known average ability of the group of examinees, and let [Pt, P,] be the interval to which
the probabilities of success for the "average" examinees are restricted. It follows that
the items must have the values of the difficulty parameter, bi, in the interval [bl, b,,]
determined by p(0o; bt) = p, and p(0o; bu) = Pl, where p(-) is the logistic function
specified in the Rasch model. Selecting items based on their probabilities of success for
given examinees may be desirable for instructional reasons.
Constraints like (11) and (12) need not enter the optimization phase of the proce-
dure. They imply that certain items, and hence their decision variables, are excluded
from the model. Normally, a reduction phase precedes the actual optimization in which
such constraints are used to give the model its most economical form.
Group-Dependent Item Parameters
If the item bank has to serve distinct groups of examinees, items may have different
properties for different groups. In such cases it is obvious to consider the parameter c;
in (11) and (12) as group dependent. In school settings, for instance, the recording of the
data of the final administration of item i to group g = 1 ..... G may be useful. The
constraint in (12), with %; instead of %,then allows the selection of items for one group
that have not been used after a given date for other groups. Such strategies may be
instrumental in solving the problem of test security.
If %, is allowed to take only the values zero and one, it can be used to adapt tests
to curriculum differences between groups. Let
%i
indicate whether
(%i
= 1) or not
(cgi = 0) item i covers a part of the curriculum of group g. Then the following constraint
automatically suppresses the administration of items to group g on topics for which
instruction is absent:

Citations
More filters
Book ChapterDOI

Item Response Theory

TL;DR: The item response theory (IRT) as mentioned in this paper is a new theoretical basis for educational and psychological testing and measurement, which has been variously referred to as latent trait theory, item characteristic curve theory, and, more recently, item Response Theory (IRT).
Journal ArticleDOI

A Method for Severely Constrained Item Selection in Adaptive Testing

TL;DR: A new method is presented for incorporating a large number of con straints on adaptive item selection and the meth odology emulates the test construction practices of expert test specialists, which is a necessity if com puterized adaptive testing is to compete with con ventional tests.
Journal ArticleDOI

A model and heuristic for solving very large item selection problems

TL;DR: In this article, a model for solving very large item selection problems is presented, based on previous work in binary programming applied to test con struction, and a heuristic for selecting items that satisfy the constraints in the model also is presented.
Journal ArticleDOI

Some Practical Examples of Computer‐Adaptive Sequential Testing

TL;DR: This paper describes an integrated approach to test development and administration called computer-adaptive sequential testing, or CAST, which incorporates both adaptive testing methods with automated test assembly to allow test developers to maintain a greater degree of control over the production, quality assurance, and administration of different types of computerized tests.
Journal ArticleDOI

A model for optimal constrained adaptive testing

TL;DR: In this article, a model for constrained computerized adaptive testing is proposed in which the information in the test at the trait level (0) estimate is maximized subject to a number of possible constraints on the content of the test.
References
More filters
Book

Statistical Theories of Mental Test Scores

TL;DR: In this paper, the authors present a survey of test theory models and their application in the field of mental test analysis. But the focus of the survey is on test-score theories and models, and not the practical applications and limitations of each model studied.
Book

Applications of Item Response Theory To Practical Testing Problems

TL;DR: The application of item response theory to practical testing problems is discussed in this article, where the authors present an example of the application of the theory to real-world testing problems in a practical setting.
Book

Optimization-Theory and Applications

TL;DR: Theoretical Equivalence of Mayer, Lagrange, and Bolza Problems of Optimal Control, and the Necessary Conditions and Sufficient Conditions Convexity and Lower Semicontinuity.