A Nonparametric Multidimensional Latent Class IRT Model in a Bayesian Framework.

doi:10.1007/S11336-017-9576-7

A nonparametric multidimensional latent class

IRT model in a Bayesian framework

Francesco Bartolucci, Alessio Farcomeni and Luisa Scaccia

Abstract We propose a nonparametric Item Response Theory model for dichoto-

mously scored items in a Bayesian framework. Partitions of the items are deﬁned

on the basis of inequality constraints among the latent class success probabilities. A

Reversible Jump type algorithm is described for sampling from the posterior distri-

bution. A consequence is the possibility to make inference on the number of dimen-

sions (i.e., number of groups of items measuring the same latent trait) and to cluster

items when unidimensionality is violated.

Key words: Item response theory, unidimensionality, stochastic partition.

1 Introduction

Educational and psychological tests are often based on a set of items which measure

a unidimensional latent trait, that is, a single personal aspect which is not directly

observable (e.g., ability in a certain subject, tendency toward a certain behavior).

When the test is unidimensional, the responses to the items may be validly sum-

marized by a single indicator (e.g., the sum of the correct responses at individual

level) and respondents may be globally ranked according to such an indicator and

the distance between any two respondents in terms of the single latent trait may be

Francesco Bartolucci

Dipartimento di Economia, Finanza e Statistica, Universit

`

a di Perugia, Via A. Pascoli 20, 06123

Perugia, Italy, e-mail: bart@stat.unipg.it

Alessio Farcomeni

Dipartimento di Sanit

´

a Pubblica e Malattie Infettive, Sapienza - Universit

`

a di Roma, Piazzale Aldo

Moro, 5, 00186 Roma, Italy, e-mail: alessio.farcomeni@uniroma1.it

Luisa Scaccia

Dipartimento di Economia e Diritto, Universit

`

a di Macerata, Via Crescimbeni 20, 62100 Macerata,

Italy, e-mail: scaccia@unimc.it

1

2 Francesco Bartolucci, Alessio Farcomeni and Luisa Scaccia

simply measured. A consequent important aspect is how to test the unidimension-

ality assumption and, in case it is violated, how to group items in a sensible way

so that items in the same group measure the same latent trait. Bartolucci (2007)

introduced a multidimensional parametric Item Response Theory (IRT) model for

dichotomously-scored items, which is based on the assumption that respondents are

grouped into k latent classes of ability, and found the number of dimensions, s, and

clusters of items through a hierarchical agglomerative clustering algorithm based

on the model likelihood. However, this approach is based on certain parametric as-

sumptions which may affect the selected number of dimensions.

In this work, we propose to select s relying on a completely nonparametric model

formulated along the lines of Forcina and Bartolucci (2004). This formulation is

based on a set of inequalities on the conditional probabilities of success in each

item given the level of the ability. The distribution of the ability is still assumed to

be discrete, therefore having k latent classes. Consequently, two items measure the

same dimension if their success probabilities have the same ordering with respect to

the latent classes. Any speciﬁc model depends on the number of latent classes and

the set of inequalities on success probabilities, which, in turn, determines a certain

partition of the items into s groups. Inference on the nonparametric IRT models pro-

posed is based on the Bayesian paradigm, allowing us to work with unknown k and

s. Relying on the encompassing approach of Klugkist et al (2005), we formulate

the priors on the parameters of a model that includes any other model of interest.

See also Bartolucci et al (2012). Such encompassing model is the latent class model

(Lazarsfeld and Henry, 1968) with k classes. This automatically deﬁnes the priors

on any nested model. For estimation purposes, we use the Reversible Jump (RJ)

algorithm (Green, 1995; Green and Richardson, 2001) applied to the latent class

model. The output is then suitably post-processed to estimate the posterior prob-

ability of any nonparametric IRT model. An alternative algorithm, expected to be

more efﬁcient, is also outlined.

The paper is organized as follows. Section 2 formalizes the nonparametric

IRT model and deals with Bayesian estimation. Section 3 illustrates the approach

through an application on the Mathematics test data used in Bartolucci (2007).

2 Model Formulation and Bayesian Inference

Let Y

i j

, i = 1, . . . , n, j = 1,...,r denote the binary outcome measured on the i-th

subject for the j-th item. We assume that the sample of respondents is drawn from

a population divided into k latent classes, with individuals in the same class sharing

the same ability level. Thus the ability is represented by a discrete latent variable

C = having k support points denoted, without loss of generality, by 1,...,k. Let

π

1

,...,π

k

be the class weights and λ

c j

= p(Y

i j

= 1|C = c) denote the probability

of success at the j-th item for any subject i in class c. Given two items, j

1

and j

2

say, these are said to measure the same dimension if there exists a permutation of

1,...,k, denoted by c

1

,...,c

k

, such that

A nonparametric multidimensional latent class IRT model in a Bayesian framework 3

λ

c

1

j

≤ ... ≤ λ

c

k

j

, j = j

1

, j

2

. (1)

In other words, the success probabilities of the two items are ordered in the same

way. Such a characterization of items measuring the same dimension is completely

nonparametric, in contrast with the one in Bartolucci (2007) which is based on a

parametric formulation of λ

c j

. For the full set of items, the nonparametric IRT model

is speciﬁed by ﬁxing k and a certain permutation c

( j)

1

,...,c

( j)

k

, of the type (1), for ev-

ery item j = 1, . . . , r. If there are s different permutations, there are s groups of items

measuring distinct dimension, which are are denoted by J

1

,...,J

s

, collected in

J .

The observed log-likelihood of the model deﬁned above may be easily computed

as

`(Λ,π) =

∑

i

log

"

∑

c

π

c

∏

j

λ

y

i j

c j

(1 −λ

c j

)

1−y

i j

#

, (2)

where Λ is the k × r dimensional matrix of probabilities λ

c j

, π is the vector of

class weights π

c

, and y

i j

is the observed value of Y

i j

. To make estimation easier it is

convenient to introduce the latent class indicators z

ic

, i = 1,...,n, c = 1, . . . ,, where

z

ic

= 1 if the i-th subject is in latent class c; see for instance Diebolt and Robert

(1994). The complete or augmented data log-likelihood, after augmenting the data

with z

ic

, is then

`

c

(Λ,π) =

∑

c

z

ic

log(π

c

) +

∑

c

∑

i

∑

j

z

ic

[y

i j

log(λ

c j

) +(1 −y

i j

)log(1 −λ

c j

)]. (3)

2.1 Prior Distributions

Any model of the type above is nested in a latent class model in which the prob-

abilities λ

c j

are left unconstrained (Lazarsfeld and Henry, 1968). Then, once the

priors have been speciﬁed for this model, we can automatically specify those of any

nested model by the encompassing approach (Klugkist et al, 2005): prior distribu-

tions for nested models are automatically derived by truncating the parameter space

according to the constraints of interest.

For the encompassing model we adopt Bayes-Laplace priors for the success prob-

abilities and class weights (Tuyl et al, 2009). This choice reduces to an (uncondi-

tional) uniform prior for λ

c j

, c = 1,...,k. For the class weights this choice cor-

responds to a Dirichlet distribution with vector of parameters having all elements

equal to 1. Finally, we use a uniform prior for k in the discrete set 1, . . .,k

max

.

4 Francesco Bartolucci, Alessio Farcomeni and Luisa Scaccia

2.2 Estimation strategy based on the Reversible Jump algorithm

Our estimation strategy makes use of the RJ algorithm, which samples from the

posterior distribution of all the parameters of the latent class model, including k.

The RJ output is then post-processed for identiﬁability (Fr

¨

uhwirth-Schnatter, 2001)

and to deliver all the different partitions of items visited by the algorithm.

The algorithm performs the following steps:

1. Sample indicators of latent class z

ic

from their full conditional distribution:

Pr(z

ic

= 1|Y,λ , π ) =

π

c

∏

j

λ

y

i j

c j

(1 −λ

c j

)

1−y

i j

∑

h

π

h

∏

j

λ

y

i j

h j

(1 −λ

h j

)

1−y

i j

.

2. Update λ

c j

. For each j = 1,...,r, we propose simultaneous independent zero-

centered normal increments of the current logit (λ

j

), where λ

j

= (λ

1 j

,...,λ

k j

).

The candidate λ

?

j

is accepted with probability equal to min(1, p

λ

?

j

), where

log(p

λ

?

j

) =

∑

c

∑

i

z

ic

{y

i j

log(λ

?

c j

/λ

c j

) +(1 −y

i j

)log[(1 − λ

?

c j

)/(1 − λ

c j

)]} +

+

∑

c



log(λ

?

c j

) + log(1 − λ

?

c j

) − log(λ

c j

) − log(1 − λ

c j

)



. (4)

The ﬁrst line on the right side is the log-likelihood ratio. The ratio between the

prior densities cancels out when using uniform priors for λ

c j

, as suggested. Also

the ratio between the proposal densities cancels out, apart from logarithm of Ja-

cobian of the logit transformation, given in the second line of 4.

3. Sample the weights π

1

,...,π

k

from the full conditional distribution, which is a

Dirichlet with parameters (1 +

∑

i

z

i1

,...,1 +

∑

i

z

ik

).

4. Update k. We follow the approach consisting on a random choice between split-

ting an existing latent class into two and merging two existing classes into one.

The probabilities of these alternatives are b

k

and 1 − b

k

, respectively. Of course

b

1

= 1 and b

k

max

= 0, and otherwise we choose b

k

= 0.5 for k = 2, . . . , k

max

− 1.

For the combine proposal we randomly choose a pair of classes (c

1

,c

2

), with

π

c

1

< π

c

2

, not necessarily adjacent in terms of the current value of their weights.

These two classes are merged into a new one, labeled c

?

= c

2

− 1, reducing k by

1. We then reallocate all those observations y

i j

, j = 1,...,r, with z

ic

1

= 1 and

z

ic

2

= 1 to the new class c

?

and create values for λ

c

?

j

and π

c

?

in such a way that:

λ

c

?

j

= λ

c

2

j

and π

c

?

= π

c

1

+ π

c

2

.

In the split proposal, a class c

?

is chosen at random and split into two new ones

labeled c

1

and c

2

, augmenting k by 1. The place assigned to the class c

1

is ran-

domly chosen between 1 and c

?

, while the class c

2

takes the place c

?

+ 1. Values

for π

c

1

,π

c

2

,λ

c

1

j

,λ

c

2

j

, for j = 1,...,r, are created by generating a scalar u

1

and a

vector u

2

= (u

2 j

)

r

j=1

, respectively as u

1

∼ U[0;0.5] and u

2 j

∼ U[0;1] and setting:

A nonparametric multidimensional latent class IRT model in a Bayesian framework 5

π

c

1

= u

1

π

c

?

, π

c

2

= (1 − u

1

)π

c

?

,

λ

c

1

j

= u

2 j

and λ

c

2

j

= λ

c

?

j

for j = 1,...,r. (5)

Finally we reallocate all those observations y

i j

, j = 1, . . . , r, with z

ic

?

= 1 between

the two new classes, in a way analogous to the standard Gibbs allocation move,

used in step 1. We accept the split move with probability min(1, p

k

), where

p

k

= (likelihood ratio) ×

Pr(k + 1)

Pr(k)

×

D(π

1

,...,π

k+1

)

D(π

?

1

,...,π

?

k

)

×

(π

c

1

)

∑

i

z

ic

1

(π

c

2

)

∑

i

z

ic

2

(π

?

c

?

)

∑

i

z

?

ic

?

×

2(1 − b

k+1

)

b

k

P

alloc

× π

?

c

2

−1

, (6)

where P

alloc

is the probability of this particular allocation and D is the Dirich-

let density with all parameters equal to 1. The ﬁrst four terms in the prod-

uct are the ratio of the likelihood and the priors for the new parameter set to

those for the old one. The ﬁfth term is the proposal ratio. The last term is

the Jacobian of the transformation from (π

c

?

,λ

c

?

1

,...,λ

c

?

r

,u

1

,u

21

,...,u

2r

) to

(π

c

1

,λ

c

1

,...,λ

c

1

r

,π

c

2

,λ

c

2

1

,...,λ

c

2

r

). The combine move is accepted with prob-

ability min(1, p

−1

k

), with some obvious substitutions in the expression for p

k

.

From the RJ output, we estimate the posterior probability of any nonparametric

IRT model visited at least once and the posterior distribution of its parameters. Let

k

(t)

be the number of classes of the model visited at sweep t of the algorithm and

Λ

(t)

and π

(t)

be the parameters of this model, with t = 1,...,T . Then, we exam-

ine every matrix Λ

(t)

and, for j = 1,...,r, we obtain the permutations c

( j)

1

,...,c

( j)

k

(t )

such that the probabilities in the j-th column of this matrix satisfy inequality (1). As

clariﬁed before, these permutations deﬁne a partition of the items in groups corre-

sponding to different dimensions. In particular, the permutation at step t is denoted

J

(t)

1

,...,J

(t)

s

(t )

, where s

(t)

is the number of dimensions that is found. To avoid a

sort of label-switching problem, the groups are ordered so that J

(t)

1

includes the

ﬁrst item, J

(t)

2

includes the item with the smallest index among those excluded

from J

(t)

1

, and so on. Finally, the posterior probability of the model with a certain

k and a certain partition of items J

1

,...,J

s

based on s dimensions is estimated

as:

Pr(k,J

1

,...,J

s

) =

1

T

∑

t:s

(t )

=s

I

n

J

(t)

1

= J

1

,...,J

(t)

s

= J

s

o

, (7)

where the sum is over all sweeps for which s

(t)

= s and I{·} is the indicator function.

On the basis of posterior probabilities in (7), different strategies may be adopted

for model selection. We suggest selecting ﬁrst the value of k on the basis of the

largest number of visits. Then, conditionally on the value of k, we take the partition

with the highest value of the probability in (7). This strategy is similar to that in

Bartolucci (2007). Alternatively, k and the partition J

1

,...,J

s

can be chosen

jointly as those with the highest posterior probability in (7). This method may lead

A Nonparametric Multidimensional Latent Class IRT Model in a Bayesian Framework.

Figures

Citations

Concrete mathematics, a foundation for computer science, by Ronald L. Graham, Donald E. Knuth and Oren Patashnik. Pp 625. £24·95. 1989. ISBN 0-201-14236-8 (Addison-Wesley)

A dynamic inhomogeneous latent state model for measuring material deprivation

Assessing Diabetes Distress Among Type 2 Diabetes Mellitus in Malaysia Using the Problem Areas in Diabetes Scale.

A Nonparametric Bayesian Item Response Modeling Approach for Clustering Items and Individuals Simultaneously

GPIRT: A Gaussian Process Model for Item Response Theory.

References

Estimating the Dimension of a Model

The Hospital Anxiety and Depression Scale.

Information Theory and an Extension of the Maximum Likelihood Principle

Reversible jump Markov chain Monte Carlo computation and Bayesian model determination

Regression and time series model selection in small samples

Related Papers (5)

Fitting Position Latent Cluster Models for Social Networks with latentnet

Margins of discrete Bayesian networks

Posterior simulation across nonparametric models for functional clustering

Estimation and clustering in a semiparametric Poisson process stochastic block model for longitudinal networks

Bayesian Analysis for the Social Sciences