A maximin model for test design with practical constraints

doi:10.1007/BF02294518

PSYCHOMETRIKA--VOL. 54, NO. 2, 237-247

JUNE 1989

A MAXIMIN MODEL FOR TEST DESIGN

WITH PRACTICAL CONSTRAINTS

WIM J. VAN DER LINDEN

ELLEN BOEKKOOI-TIMMINGA

UNIVERSITY OF TWENTE

A maximin model for IRT-based test design is proposed. In the model only the relative shape of

the target test information function is specified. It serves as a constraint subject to which a linear

programming algorithm maximizes the information in the test, In the practice of test construction,

several demands with respect to the properties of the test may exist. The paper shows how these can

be formulated as linear constraints in the model. A worked example of a test construction problem

with practical constraints is presented. The paper concludes with a discussion of some alternative

models of test construction.

Key words: item response theory, test construction, linear programming.

In item response theory (IRT), test design is usually based on the concepts of item

and test information functions. In this paper, as an example, information functions

under the three-parameter logistic model for dichotomous responses are considered.

The results, however, hold for any IRT model.

The three-parameter logistic model is as follows:

pi(O) = ci +

(1 - ci){1 + exp

[-ai(O -

bi)]}- 1, (1)

where 0 E (-~, +~) is the ability measured by the test items, a,. E [0, +co) and

bi E (-~, +~) are parameters for the discriminating power and difficulty of item i, and

ci E

[0, 1] is the probability of solving item i for 0 ~ -~. The model gives the

probability of a correct response as a function of the ability parameter 0. For known

item parameters, it holds that Fisher's information about the unknown 0 in a single

response,

Ui, (u i

= 0, 1), to item i is equal to

[p,.'(0)] 2

l(Ui; O) pi(O)[l - pi(O)]"

(2)

For a sample of locally independent responses, UI ..... U 1, to the items i = 1 ..... I,

Fisher's information is equal to

[p ; ( O) ]2

l(Ui ..... Ut; O)= ~ pi(07~(O)].

(3)

i=1

For item parameters estimated from response data with sufficient precision, Birnbaum

(1968) introduced (2) and (3) as the item and test information functions.

The authors are indebted to Jos J. Adema for suggesting Equation 17 as a simplification of an earlier

version of this constraint. This research was supported in part by a grant from the Dutch Organization for

Research (NWO) through the Foundation for Psychological and Psychonomic Research in the Netherlands

(Psychon).

Requests for reprints should be sent to W. J. van der Linden, University of Twente, Department of

Education, PO Box 2t7, 7500 AE Enschede, THE NETHERLANDS.

0033-3123/89/0600-9521 $00.75/0

237

238 PSYCHOMETRIKA

The additivity in (3) suggests the following procedure for test construction: A

target information function for the test is specified. Items are selected to fill the area

under the target function. The procedure is stopped as soon as the sum of the item

information functions exceeds the target. In Birnbaum's (1968) and Lord's (1980) de-

scription of the procedure it is assumed that the selection is done by hand. In general,

however, finding an optimal solution (e.g., a test of minimal length exceeding the target)

by hand is practically impossible. Even an approximate solution may involve several

cycles of back-tracking. Also, specifying a target information function is not an easy

task. Although a test constructor may be able to provide the desired shape of the curve

(e.g., a flat curve for a diagnostic test or a peaked one for decision making), the

necessity to decide on its exact height is likely to create a problem, the reason being that

the metric of the information measure has no meaning to the average test constructor.

Recently, a series of papers has been published in which the Birnbaum-Lord pro-

cedure is replaced by an algorithm from zero-one programming (Rao, 1985; Wagner,

1975). The idea to apply zero-one programming to test construction was already sug-

gested in Yen (1983). Theunissen (1985) was the first to present a zero-one programming

model for test construction with a target information function. In the model, the number

of items in the test is the objective function to be minimized subject to the (linear)

conditions that, at a number of 0-values, the information in the test is above the target.

The same idea has been explored in Boekkooi-Timminga (1987), Boekkooi-Timminga

and van der Linden (1987), Theunissen (1986), Theunissen and Verstralen (1986), and

van der Linden and Boekkooi-Timminga (1988). In all these papers, it is still assumed

that the test constructor is able to specify the exact height of the target information

function at a number of points. For a procedure enabling the test constructor to do so,

see Kelderman (1987).

It is the purpose of this paper to present a maximin model for test construction. In

this model only the

relative

shape of the target test information function has to be

provided. A simple experiment to elicit this shape from a test constructor is described

in the next section. The data from the experiment are then used to specify a linear

constraint in a model that maximizes the information in the test. Since the objective

function in the model does not contain any item or test parameter, all properties of the

test can be controlled by including additional constraints in the model. In the practice

of test construction, several demands with respect to the properties of the test may

exist, for example, with respect to the composition of the test, the administration time,

the curricular fit, and possible links between the contents of the items. It is another

purpose of this paper to show how such demands can be formulated as linear con-

straints in the decision variables. A worked example of the model including several of

these constraints is given. The paper concludes with a discussion of some alternative

zero-one programming models of test construction.

A Maximin Model

Instead of considering target information functions over the whole range of 0-

values, zero-one programming models only assume target values at certain points. One

reason for this is that item information functions are continuous, well-behaved func-

tions for which the value of the sum at a certain point does not differ drastically from

those at neighboring points. Another, more practical motivation is that interest often

exists only at certain critical ability levels ignoring the properties of the test at other

levels, for example, when the test is to be used for decision making. Hence, in this

paper it is also assumed that a discrete approach is appropriate, provided the number

of points and their positions are free.

WlM J° VAN DER LINDEN AND ELLEN BOEKKOOI-TIMMINGA

239

The following experiment is proposed to elicit the relative shape of the target

information function from the test constructor. First, the test constructor is faced with

the ability scale underlying the item bank. This can be done by offering him or her a line

displaying the contents of items with locations at some well-chosen points. The same

practice is used in scale-score reporting of assessment data (e.g., Pandey, 1986). Then,

the constructor is asked to select a number of scale points he or she wants to consider.

The number of points and their spacing are free. Let 0h, k = 1 ..... K, denote these

points. Next, he or she is given a fixed number of chips (100, say) and requested to

distribute them over the scale points such that they reflect the

relative

distribution of

information wanted from the test. The final step then is to ask the test constructor for

the desired number of items in the test. The answer to this question can be facilitated

by providing some statistics about the time typically needed by the group of examinees

to complete items in the bank.

The Model

Now the idea is to select the items such that they maximize the information in the

test, while the resulting test information function still has the desired shape. Let r k be

the numbers of chips the test constructor puts at point Ok (k = 1 ..... K). The relative

target information function is characterized by a series of lower bounds

(r~y ..... rKy)

in which y is a dummy variable to be maximized subject to the constraint that test length

is equal to the value n specified by the test constructor. Finally, x,. (i = 1 ..... /) is the

decision variable as to whether (x,- = 1) or not (x; = 0) to include item i in the test. This

leads to the following model:

maximize y (4)

subject to

1

li(Ok)xi - rk y >I O,

i=1

k = 1 ..... K, (5)

I

Xi--- rl,

i=1

(6)

xi E

{0, 1}, i= 1 ..... I, (7)

y I> O. (8)

The constraints in (5) set a series of lower bounds, r k y, to the test information

It(Ok) =-

~,I= 1 li(Ok)Xi

at each of the points Ok. The common factor y in these bounds is maximized

in (4). The constraint in (6) sets the test length equal to n.

If the left-hand side of the restrictions in (5) were divided by rk the model would

have new coefficients

li(Ok)r ~. l

for the decision variables

x i

and a coefficient equal to one

for variable y. In this representation it is clear that y can be considered a lower bound

to the weighted sums of decision variables 71

Ei=!

li(Ok)rklxi

and that the values

ofx i

are

selected such that this lower bound is maximal. Hence, mathematically the model is of

the maximin type. To solve the model for the values of x;, i = 1 ..... I, and y, a

branch-and-bound algorithm from integer programming can be used (e.g., Wagner,

chap. 13). Such algorithms are readily available in computer code nowadays.

240 PSYCHOMETRIKA

Some Practical Constraints

For algorithmic test design to be practical, it is necessary to provide control of

features of the test other than just the information function and the number of items. It

should be noted that the objective function in (4) is a dummy variable introduced to cast

the maximin criterion into a linear model. It does not contain any item or test param-

eters, and therefore does not explicitly control the values of these parameters. For this

purpose, however, additional constraints can be included in the model. In this section

a review of constraints to be met in the practice of test construction is given, and it is

shown how these can be modeled into a linear form. Throughout this section it is

assumed that (4) through (8) is the basic model.

Test Composition

As already noted, for a sufficiently large bank of test items, the constraint in (6)

controls the length of the test. The same principle can be applied at the level of possible

subtests providing the test constructor with the ability to control the composition of the

test. Let V i(j = 1 ..... J) be a subset of items in the bank from which the test

constructor wants nj ~ n in the test. This is attained if the following equality is added

to the model:

xi =n j, j = 1, ..., J. (9)

i~

It is important to note that using a series of such constraints provides the oppor-

tunity for controlling the composition of the test simultaneously with respect to several

dimensions. For example, an item bank for English could be partitioned not only with

respect to its content (e.g., vocabulary, grammar, or reading comprehension), but also

to a behavioral dimension (e.g., knowledge of facts, application of rules, or evaluation)

or the format of its items (e.g., multiple choice, completion, or matching). For each set

in these partitions the constraint in (9) is incorporated within the model, with the

restriction that the nj's are specified such that the sum over all sets in the same partition

is equal to n. If this option is used, the constraint in (6) is redundant and may be

dropped. Finally, observe how this example shows that (9) can be used with respect to

both disjoint and non-disjoint subsets of items.

Administration Time

In a computerized testing environment, the time needed to solve the items in the

bank by the population of examinees of interest can easily be monitored. Let ti be, for

example, the 95th percentile of the distribution of time for item i in the population.

Instead of fixing the length of the test, the selection of the items could also be based on

the time limit, T, in force for the examinees. In that case (6) can be replaced by

I

E tixi ~ T.

i=1

(10)

However, if there is a reason to restrict the number of items in the test as well, (10) can

also be used in combination with (6) replacing the equality in the latter by an inequality.

Analogous to (9), the composition of the test can be controlled by introducing time

limits at subtest level.

W1M J. VAN DER LINDEN AND ELLEN BOEKKOOI-TIMMINGA

241

Selection on Item Features

Including the constraints below in the model, it is possible to give a//items in the

test the same feature.

Let c; be a positively valued numerical parameter representing a feature of the

items in the bank. Then it is possible to restrict the selection of the items to those with

c; E [cl, %] by including the following set of inequalities in the model:

CiX i ~ Cu,

i = 1 ..... I, (11)

Ci-

lxi

~ Cl-

1,

i = 1 ..... I. (12)

where c, > cl.

Unlike (9), these constraints do not fix the length of subtests. They are used to give

all items in the test the same properties. At the same time, (9) can be used to compose

the test with different item properties.

If the frequency of administration of the items in the bank is monitored, the con-

straints in (I 1) through (12) can be used to restrict the selection of the items to certain

frequencies. For example, if the intention is to obtain uniform usage of items in the

bank, (11) can be used to set an upper bound for item use thus restricting the selection

of items to those with lower usage.

Another example of the use of (I 1) and (12) is to restrict the administration time,

ti,

for each individual item in the test to certain limits.

It is also possible to substitute one of the parameters in the item response model for

ci.

In this way, the constraints can be used, for example, to select items with values for

the difficulty parameters in a certain interval. For the Rasch (1960) model, this allows

for the selection of items based on their probabilities of success: Let 00 be the a priori

known average ability of the group of examinees, and let [Pt, P,] be the interval to which

the probabilities of success for the "average" examinees are restricted. It follows that

the items must have the values of the difficulty parameter, bi, in the interval [bl, b,,]

determined by p(0o; bt) = p, and p(0o; bu) = Pl, where p(-) is the logistic function

specified in the Rasch model. Selecting items based on their probabilities of success for

given examinees may be desirable for instructional reasons.

Constraints like (11) and (12) need not enter the optimization phase of the proce-

dure. They imply that certain items, and hence their decision variables, are excluded

from the model. Normally, a reduction phase precedes the actual optimization in which

such constraints are used to give the model its most economical form.

Group-Dependent Item Parameters

If the item bank has to serve distinct groups of examinees, items may have different

properties for different groups. In such cases it is obvious to consider the parameter c;

in (11) and (12) as group dependent. In school settings, for instance, the recording of the

data of the final administration of item i to group g = 1 ..... G may be useful. The

constraint in (12), with %; instead of %,then allows the selection of items for one group

that have not been used after a given date for other groups. Such strategies may be

instrumental in solving the problem of test security.

If %, is allowed to take only the values zero and one, it can be used to adapt tests

to curriculum differences between groups. Let

%i

indicate whether

(%i

= 1) or not

(cgi = 0) item i covers a part of the curriculum of group g. Then the following constraint

automatically suppresses the administration of items to group g on topics for which

instruction is absent:

A maximin model for test design with practical constraints

Citations

Item Response Theory

A Method for Severely Constrained Item Selection in Adaptive Testing

A model and heuristic for solving very large item selection problems

Some Practical Examples of Computer‐Adaptive Sequential Testing

A model for optimal constrained adaptive testing

References

Probabilistic Models for Some Intelligence and Attainment Tests

Statistical Theories of Mental Test Scores

Applications of Item Response Theory To Practical Testing Problems

Optimization-Theory and Applications

Some latent trait models and their use in inferring an examinee's ability

Related Papers (5)

Binary programming and test design

Applications of Item Response Theory To Practical Testing Problems

A model and heuristic for solving very large item selection problems

Computer-assisted test assembly using optimization heuristics

Statistical Theories of Mental Test Scores