scispace - formally typeset
Open AccessJournal ArticleDOI

Simulating Computer Adaptive Testing With the Mood and Anxiety Symptom Questionnaire

Reads0
Chats0
TLDR
The findings reveal that developing a MASQ CAT for clinical subjects is useful as it leads to more efficient measurement without compromising the reliability of the test outcomes.
Abstract
In a post hoc simulation study (N = 3,597 psychiatric outpatients), we investigated whether the efficiency of the 90-item Mood and Anxiety Symptom Questionnaire (MASQ) could be improved for assessing clinical subjects with computerized adaptive testing (CAT). A CAT simulation was performed on each of the 3 MASQ subscales (Positive Affect, Negative Affect, and Somatic Anxiety). With the CAT simulation’s stopping rule set at a high level of measurement precision, the results showed that patients’ test administration can be shortened substantially; the mean decrease in items used for the subscales ranged from 56% up to 74%. Furthermore, the predictive utility of the CAT simulations was sufficient for all MASQ scales. The findings reveal that developing a MASQ CAT for clinical subjects is useful as it leads to more efficient measurement without compromising the reliability of the test outcomes.

read more

Content maybe subject to copyright    Report

UvA-DARE is a service provided by the library of the University of Amsterdam (http
s
://dare.uva.nl)
UvA-DARE (Digital Academic Repository)
Simulating Computer Adaptive Testing With the Mood and Anxiety Symptom
Questionnaire
Flens, G.; Smits, N.; Carlier, I.; van Hemert, A.M.; de Beurs, E.
DOI
10.1037/pas0000240
Publication date
2016
Document Version
Final published version
Published in
Psychological Assessment
License
Article 25fa Dutch Copyright Act
Link to publication
Citation for published version (APA):
Flens, G., Smits, N., Carlier, I., van Hemert, A. M., & de Beurs, E. (2016). Simulating
Computer Adaptive Testing With the Mood and Anxiety Symptom Questionnaire.
Psychological Assessment
,
28
(8), 953-962. https://doi.org/10.1037/pas0000240
General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s)
and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open
content license (like Creative Commons).
Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please
let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material
inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter
to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You
will be contacted as soon as possible.
Download date:09 Aug 2022

Simulating Computer Adaptive Testing With the Mood and Anxiety
Symptom Questionnaire
Gerard Flens
Foundation for Benchmarking Mental Health Care, Bilthoven,
the Netherlands
Niels Smits
University of Amsterdam
Ingrid Carlier and Albert M. van Hemert
Leiden University Medical Centre, Leiden, the Netherlands
Edwin de Beurs
Foundation for Benchmarking Mental Health Care, Bilthoven,
the Netherlands
In a post hoc simulation study (N 3,597 psychiatric outpatients), we investigated whether the efficiency
of the 90-item Mood and Anxiety Symptom Questionnaire (MASQ) could be improved for assessing
clinical subjects with computerized adaptive testing (CAT). A CAT simulation was performed on each
of the 3 MASQ subscales (Positive Affect, Negative Affect, and Somatic Anxiety). With the CAT
simulation’s stopping rule set at a high level of measurement precision, the results showed that patients’
test administration can be shortened substantially; the mean decrease in items used for the subscales
ranged from 56% up to 74%. Furthermore, the predictive utility of the CAT simulations was sufficient
for all MASQ scales. The findings reveal that developing a MASQ CAT for clinical subjects is useful
as it leads to more efficient measurement without compromising the reliability of the test outcomes.
Keywords: computer adaptive test, clinical assessment, Mood and Anxiety Symptom Questionnaire, item
response theory
In the Netherlands, routine outcome monitoring (ROM) has
been implemented for mental health care patients nationwide (Car-
lier et al., 2012; de Beurs et al., 2011). ROM is the repeated
administration of questionnaires to monitor patients’ progress over
time and use the information to adjust treatment, if indicated. In
the clinical setting, care providers and patients have limited time
and to keep costs at a minimum, assessments should preferably be
short and test outcomes reliable for all patients. A successful
methodology that addresses these needs is computerized adaptive
testing (CAT). CAT uses information from questions that have
been answered so far by an individual in order to select the most
appropriate next question. By administering questions tailored to
each patient, CAT can reduce respondent burden while maintain-
ing or even improving the reliability of the test outcomes for all
patients (Fliege et al., 2005). Ideally, these CAT benefits would
decrease respondent burden, increase response rates and reduce
possible bias due to selective loss of respondents (Dillman, Sin-
clair, & Clark, 1993).
Building a full functioning CAT takes a considerable effort
(Cook, O’Malley, & Roddey, 2005). One of the reasons is that in
most countries, large item banks are generally unavailable for
mental health constructs and have to be developed (Gibbons et al.,
2014). A solution to this problem could be the use of existing
mental health questionnaires as item banks. Although CAT ver-
sions of existing clinical scales have already shown to be useful in
undergraduate students (Forbey & Ben-Porath, 2007; Gardner et
al., 2004; Smits, Cuijpers, & van Straten, 2011), Smits and col-
leagues specifically assessed in a post hoc simulation study
whether a CAT would be useful for measuring clinical subjects
(Smits, Zitman, Cuijpers, den Hollander-Gijsman, & Carlier,
2012). As a first proof of principle for using an existing question-
naire to develop a CAT for clinical subjects, they simulated a CAT
on one of the Mood and Anxiety Symptom Questionnaire (MASQ;
Watson & Clark, 1991) subscales (i.e., the 22-item Anhedonic
Depression subscale) by treating patients’ responses as if they had
been collected adaptively. With the outcomes of the CAT simula-
tion set to a high level of measurement precision, their analysis
showed that patients’ burden was reduced substantially; the ad-
ministration of the MASQ Anhedonic Depression scale was short-
ened for most of the patients with a mean decline of 59% (from 22
to 9 items). Moreover, the outcomes of the CAT remained diag-
nostically accurate.
The full 90-item MASQ is an extensive questionnaire that has a
unique way of assessing symptoms of the two most prevalent
psychiatric syndromes, depression and anxiety disorders (accord-
ing to the tripartite model), and takes into account the high co-
morbidity between both syndromes and high level of symptom
overlap (Watson & Clark, 1991). It is used as research- and clinical
This article was published Online First December 21, 2015.
Gerard Flens, Foundation for Benchmarking Mental Health Care,
Bilthoven, the Netherlands; Niels Smits, Research Institute of Child De-
velopment and Education, University of Amsterdam; Ingrid Carlier and
Albert M. van Hemert, Department of Psychiatry, Leiden University Med-
ical Centre, Leiden, the Netherlands; Edwin de Beurs, Foundation for
Benchmarking Mental Health Care, Bilthoven, the Netherlands.
Correspondence concerning this article should be addressed to Gerard
Flens, Stichting Benchmark GGZ (SBG), Rembrandtlaan 48, 3723 BK
Bilthoven, the Netherlands. E-mail: gerard.flens@sbggz.nl
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
Psychological Assessment © 2015 American Psychological Association
2016, Vol. 28, No. 8, 953–962 1040-3590/16/$12.00 http://dx.doi.org/10.1037/pas0000240
953

assessment instrument and has been validated in multiple coun-
tries, for multiple age groups, and for multiple disorders (e.g., de
Beurs, den Hollander-Gijsman, Helmich, & Zitman, 2007; Deng,
Jiang, & Li, 2012; Lee, Kim, & Cho, 2015). Ideally, for efficient
measurement of clinical subjects, all subscales of the MASQ are
transformed into a CAT. Previous studies have generally con-
firmed three subscales of the 90-item MASQ: a positive affect
(PA) scale, a negative affect (NA) scale, and a somatic anxiety
(SA) scale (Bedford, 1997; Clark & Watson, 1991; de Beurs et al.,
2007; Keogh & Reidy, 2000; Watson et al., 1995). Other studies
that developed shorter versions of the MASQ also applied this
three factor structure in their item design (Osman et al., 2011;
Wardenaar et al., 2010). In these studies, the number of items for
each MASQ scale was fixed, but by doing so, the measurement
precision for test outcomes could vary among respondents with
different trait levels. By contrast, CAT is more dynamic: it fixes
the test outcomes’ measurement precision for all trait levels and
allows for the number of administered items to vary among re-
spondents (Embretson & Reise, 2000). In other words, CAT is
essentially more efficient than fixed questionnaires because CAT
administers only the most informative items to each individual
respondent.
In this article, we assessed in a post hoc CAT simulation study
whether the administration of three MASQ subscales could be
made more efficient for measuring patients receiving mental health
care. We present a comprehensive account of the psychometric
evaluation of the MASQ scales, which is a prerequisite for apply-
ing CAT. As point of departure for the CAT simulations, we have
used data from a large Dutch clinical sample (Smits, Zitman,
Cuijpers, den Hollander-Gijsman, & Carlier, 2012) applying a
three-factor structure to the MASQ from clinically based MASQ
subscales (de Beurs et al., 2007). We assessed to what extent the
administration of each MASQ scale can be shortened for clinical
subjects and whether the CAT estimates are diagnostically accu-
rate compared with the full-scale estimates.
Method
Participants
The sample for this study consisted of 3,597 patients (63%
female) from three Dutch outpatient Mental Healthcare Centres of
the Regional Mental Health Care Provider Rivierduinen. The mean
age of the patients was 38.8 years for the entire sample (SD
13.2), 38.2 years for females (SD 13.3), and 39.9 years for males
(SD 13.1). Patients were referred to Rivierduinen by their
general practitioner for treatment of mood, anxiety, and/or soma-
toform disorders. The patient’s diagnosis was assessed with the
Dutch translation of the Mini International Neuropsychiatric In-
terview (MINI-plus; Sheehan et al., 1998) administered by a
psychiatric nurse who was extensively trained. The MINI-plus is a
standardized interview for clinical diagnosis of mental disorders
following the Diagnostic and Statistical Manual of Mental Disor-
ders (4th ed.; DSM–IV; American Psychiatric Association, 1994).
According to the MINI-plus, the sample for this study was clas-
sified as follows: 23% of the patients had a singular mood disorder,
20% had a singular anxiety disorder, 8% had a singular somato-
form disorder, and 23% did not meet the criteria of these disorders.
Furthermore, 18% of the patients had a comorbid mood and
anxiety disorder, 4% had a comorbid mood and somatoform dis-
order, 3% had a comorbid anxiety and somatoform disorder, and
2% suffered from all three disorders.
Rivierduinen collaborated with the Department of Psychiatry of
the Leiden University Medical Centre (LUMC) in developing
ROM (de Beurs et al., 2011). At intake, patients were informed
that ROM is a part of the general policy of Rivierduinen and
LUMC, designed to monitor treatment outcome, that their data
could be used for research purposes in anonymous form, and that
their personal outcome data would be made available only to their
therapist. If patients did not consent with the procedure, their data
were removed from the database. Anonymity of the patients and
proper handling of the data were assured by a comprehensive policy
protocol (Psychiatric Academic Registration Leiden). This policy
protocol was made available for patients upon request. The procedure
was approved by The Medical Ethical Committee of the LUMC (for
more details, see de Beurs et al., 2011).
The MASQ
The MASQ is a 90-item self-report questionnaire that contains
feelings, sensations, problems and experiences that people can
have associated with mood and anxiety disorders (Watson &
Clark, 1991). The full 90-item MASQ was designed to measure
symptoms of mood and anxiety disorders according to the tripartite
model (Clark & Watson, 1991). The tripartite model aims to
account for the high concordance among symptom measures for
affective disorders, by assigning symptoms to one of three
groups: a group unique to mood disorders (anhedonia or lack of
positive affect [PA]), a group unique to anxiety disorders (somatic
anxiety [SA]), and a group common to both mood and anxiety
disorders (negative affect [NA]). Of the 90 MASQ items, 27 are
stated positively (e.g., Item 1, “felt cheerful”) and 63 are stated
negatively (e.g., Item 2, “felt afraid”). For this study, the Dutch
adaptation of the MASQ was used (de Beurs et al., 2007). Patients
were asked by computer to indicate on a Likert scale (1 not at
all,2 a bit,3 moderately,4 much, and 5 very much) how
frequently they experienced the stated feelings, sensations, prob-
lems and experiences in the past 7 days, including today. For
scoring, the positively stated items were reversed (1 5, 2 4,
3 3, 4 2, 5 1). Thus, all MASQ scale scores had the same
meaning: the higher the score, the more severe the mood or anxiety
problems.
As input for the CAT simulations, multiple MASQ factor solu-
tions were available (e.g., Bedford, 1997; Clark & Watson, 1991;
Keogh & Reidy, 2000; Watson et al., 1995). In this study, the
MASQ items from the Dutch factor solution were used (de Beurs
et al., 2007). First, because this factor solution was based on a
large Dutch clinical sample. Second, because the Dutch subscales
showed satisfactory psychometric properties and results that were
similar to factor solutions from United States and British datasets
(Keogh & Reidy, 2000). The Dutch factor solution grouped 22 of
the 90 MASQ items in the lack of PA, 20 items in the NA, and 18
items in the SA. Table 1 displays the items from the three Dutch
MASQ subscales.
Psychometric Evaluation of the MASQ Scales
We undertook a psychometric evaluation of the three MASQ
scales (Reeve et al., 2007), which is a prerequisite for applying
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
954
FLENS, SMITS, CARLIER, VAN HEMERT, AND DE BEURS

CAT. It was evaluated whether each of the scales met the three
main item response theory (IRT) assumptions of unidimensional-
ity, local independency and monotonicity. Violation of these as-
sumptions may cause bias in the scaling of persons and items on a
common latent trait, which could result in over- or underestimated
trait scores. In addition, we evaluated differential item functioning
(DIF; Embretson & Reise, 2000) among key demographic groups.
Items containing DIF cause bias in latent trait scores because
persons from different groups with the same latent trait score have
different probabilities of selecting item response categories.
The IRT assumption of unidimensionality states that a person’s
item response results from the person’s trait level that the item
measures and not from other factors. Because mental health con-
structs are generally complex, item response results are rarely
strictly unidimensional (Reise, Morizot, & Hays, 2007). For IRT
applications, it is therefore assessed whether the degree of unidi-
mensionality in item response assessments is sufficient. The de-
gree of unidimensionality in each MASQ scale was explored with
both confirmatory factor analyses (CFA) and exploratory factor
analyses (EFA) conducted on the polychoric correlation matrix of
the items (Bollen, 1989). CFA was evaluated by the fit indices
comparative fit index (CFI; 0.95 for good fit), Tucker–Lewis
index (TLI; 0.95 for good fit), root-mean-square error of approx-
imation (RMSEA; 0.06 for good fit), and the average absolute
residual correlations (0.10 for good fit; Reeve et al., 2007), using
the R package lavaan (Version 0.5-17; Rosseel, 2012). EFA (va-
rimax rotated) was evaluated with the proportion of variance
explained by the resulting factors using the R package psych
(Version 1.3.2; Revelle, 2013). Proportion of variance explained in
the first factor should be above the Reckase criterium of 20%
(Reckase, 1979, cited in Hambleton, 1988), and the ratio of vari-
ance explained in the first and second factor should be higher than
the minimal requirement of 4 (Reeve et al., 2007).
The assumption of Local Independency (LI) states that no as-
sociation should exist among item responses when controlling for
the trait level. LI was evaluated among the polytomous response
items by inspecting the residual correlation matrix resulting from
CFA using the R package lavaan (Version 05-17; Rosseel, 2012).
Items with residual correlations above 0.20 are considered to be
possibly locally dependent (Reeve et al., 2007). Further investiga-
tion of LI was done with Yen’s Q3 statistic (Yen, 1993). This
statistic calculates the residual item scores under the graded re-
sponse model (GRM; Samejima, 1969) and correlates these among
items. For this purpose, we fitted the GRM to each of the MASQ
scales using the R package ltm (Version 1.0; Rizopoulos, 2006).
As suggested by Smits et al. (2012), the lack of model fit was
assessed by Cohen’s rules of thumb to interpret effect size; Q3
values between 0.24 and 0.36 imply a moderate deviation, Q3
values above 0.37 imply a large deviation (Cohen, 1988). Item
pairs with large deviations were evaluated according to their effect
on the item parameter estimates (Reeve et al., 2007). First, we
estimated the item parameters of the corresponding MASQ scale.
Second, we removed one of the items with a large deviation
from the scale and estimated the item parameters for the re-
maining items. Last, we compared the item parameters from the
full scale with the restricted scale (minus one item) to assess
whether substantial differences occurred between the remaining
parameters. This process was repeated for each item with a
large deviation.
The IRT assumption of monotonicity states that the probability
of selecting an item response that suggests a better health status on
a scale should increase as the underlying level of health status on
that scale is higher. We evaluated monotonicity by examining
graphs of item mean scores conditional on rest scores (total raw
score minus the item score). Furthermore, we performed the non-
parametric IRT approach Mokken (1971) scale analysis using
Mokken scaling with the R package mokken (van der Ark, 2007).
In this analysis, persons are ranked on a unidimensional scale
according to their trait level and items with regard to their location.
According to the rule of thumb of Mokken (1971), a scale has low
quality when the scalability coefficient is between 0.3 and 0.4,
moderate quality when the scalability coefficient is between 0.4
and 0.5, and high quality when the scalability coefficient is above
0.5.
Finally, DIF (Embretson & Reise, 2000) was evaluated for the
demographic variables age and gender, using the R package lordif
(Version 0.2-2; Choi, Gibbons, & Crane, 2011). An item contains
DIF if the probability of responding in different response catego-
ries differs across groups, while the trait level influencing a per-
son’s response to an item is controlled for. As a consequence, each
group should have their own item parameter estimations for items
containing DIF. For example, when men with a high level of PA
have a higher probability of being more cheerful than women with
an identical level of PA, then the MASQ item 1 “Felt cheerful”
contains probably DIF and should have separate item parameter
estimations for men and women. DIF comes in two kinds: uniform
and nonuniform (Embretson & Reise, 2000; Reeve et al., 2007).
Uniform DIF has the same magnitude of DIF across the entire
range of the trait. Nonuniform DIF has a different magnitude or
direction of DIF across the trait. We explored both kinds of DIF
using ordinal logistic regression (OLR; Crane, Gibbons, Jolley, &
van Belle, 2006). OLR has the advantage of being a flexible and
robust framework for DIF detection, especially with trait level
scores from IRT. Effect size was evaluated by means of change in
McFadden’s R
2
between groups, following the suggestion of a
critical value of 0.02 (Choi et al., 2011) for rejecting the hypothesis
of no (uniform or nonuniform) DIF. For each scale, differences
were evaluated for gender (men and women) and age (divided by
means of the median).
CAT Simulation
We simulated a separate CAT on each of the three MASQ scales
(PA, NA, and SA) from the item responses that were obtained
Table 1
Items From the Three Dutch MASQ Scales (PA, NA, SA)
Scale Item
PA 1, 11, 14, 18, 23, 27, 30, 35, 36, 38, 40, 41, 43, 46, 49, 54,
58, 62, 68, 72, 78, and 86
NA 4, 6, 8, 13, 16, 17, 20, 22, 24, 26, 28, 29, 42, 47, 53, 64,
74, 77, 84, and 89
SA 9, 25, 45, 48, 52, 55, 57, 61, 63, 65, 67, 69, 73, 75, 79, 81,
87, and 88
Note. MASQ Mood and Anxiety Symptom Questionnaire; PA
Positive Affect; NA Negative Affect; SA Somatic Anxiety.
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
955
SIMULATING COMPUTER ADAPTIVE TESTING WITH THE MASQ

from the patients. The item responses were selected for each
patient from all the item responses in the corresponding scale and
were evaluated as if they were collected adaptively. Basically, the
CAT simulation started with the same item for every individual
and then estimated the latent scale score and measurement preci-
sion using both item response and item properties. From here,
either a new item was selected according to the item properties and
the estimated latent trait level, or the simulation stopped when the
prespecified value of measurement precision was obtained. The
selection of new items, and the estimation of latent trait score and
measurement precision using all collected item scores so far,
continued until this prespecified measurement precision was
reached, or when all items were used; items were used only once.
To apply this procedure, we made several decisions regarding (a)
the IRT model that estimates the item parameters, (b) the methods
for selecting new items and (c) estimating patients’ latent scale
scores (), and (d) the starting level and (e) stopping rule for the
CAT. A program (Smits et al., 2011; Smits et al., 2012) was
written in the statistical environment R (R Core Team, 2014)to
implement these decisions into three separate CAT simulations.
Below, we will present the details concerning the decisions rules.
First, as an appropriate IRT model for estimating item param-
eters, we used Samejima’s (1969) GRM for polytomous items. The
GRM is often the preferred IRT model, because it is easier to
illustrate to test users than other models, and the item parameters
are easy to interpret with regard to responder behavior (Ostini,
Finkelman, & Nering, 2015; Smits et al., 2011). These advantages
are especially desirable when CAT is implemented on a large
scale, as is mostly the case in clinical measures, because clinicians
should generally understand how CAT works. The GRM model
uses two types of parameters. The discrimination parameter a
specifies to what extent persons with similar scores on the latent
trait can be differentiated by the item. Furthermore, the GRM uses
the location parameters b (the number of location parameters for
an item is equal to the number of response categories minus one),
which specifies the location on which a patient is expected to
choose from a lower to a higher item response. We fitted the GRM
to the data separately for each scale using the R package ltm
(Version 1.0; Rizopoulos, 2006). The GRM was evaluated for each
scale by examining model fit and evaluating item properties.
Model fit was evaluated by correlating the estimated latent trait
scores under the GRM with the traditional MASQ scale scores.
Item properties were evaluated by examining the a and b param-
eters estimated from the GRM models.
Next, we chose a method for selecting new items and estimating
patients’ latent scale scores (). New items were selected using
item information, which is the most used method in other CATs
(Embretson & Reise, 2000; Wainer, 2000). Item information spec-
ifies how precisely an item can measure the latent trait given the
location of the person’s estimate. The CAT selected each time a
new item that had the highest information at the provisional
estimate of . In addition, was estimated with the maximum a
posteriori method (MAP; Embretson & Reise, 2000). MAP is a
Bayesian method, which estimates as the value with the highest
likelihood of bringing forth the observed item responses using a
prior standard normal distribution of . This Bayesian method was
chosen over the maximum likelihood method (Thissen, 1991) for
being able to provide a estimate for item response patterns
consisting exclusively of either extreme low or extreme high
response categories.
Finally, we chose a starting level and stopping rule for the CAT.
The starting level was set to the average value of the latent trait
(␪⫽0). As a first item for all respondents, we therefore chose the
MASQ item that had the highest information at this starting level:
Item 86 for the PA scale (“Felt really good about myself”), Item 22
for the NA scale (“Felt hopeless”) and Item 79 for the SA scale
(“Was trembling or shaking”). In addition, there are generally two
types of stopping rules for a CAT: (a) a fixed number of admin-
istered items or (b) a prespecified value of measurement precision
(SE). Because this study was set out to find both reliable and
shorter measures, we specified that the CAT simulation stopped
applying new items when the latent trait estimate of a patient
reached a SE() 0.3, comparable to a marginal reliability of .90
(Green, Bock, Humphreys, Linn, & Reckase, 1984). This value of
measurement precision is generally required for minimal reliability
for individual assessments (Bernstein & Nunnally, 1994, p. 265).
When a SE() 0.3 was not obtained after administering all items,
the CAT simulation stopped.
We split the data randomly into two equally sized datasets for
the simulations: one for estimating the item parameters and one for
simulating the CAT. After all, when one uses the same sample to
estimate the item parameters and to simulate the CAT, the proce-
dure might lead to overfitting (Hastie, Friedman, & Tibshirani,
2011), resulting in outcomes that are too optimistic. Several sta-
tistics were recorded separately for each scale: (a) the mean and
standard deviation of the number of administered items, (b) the
percentage of patients for whom all items had to be administered,
and (c) the mean SE of the final estimate for all patients.
Comparing Full-Scale Data With CAT Data
A CAT may be considered efficient when it shows a substantial
decline in administered items compared with the full item bank
administration, and outcomes with sufficient reliability. Further-
more, the good psychometric properties of the scale have to be
retained, such as sufficient criterion validity for diagnostic status
of the patient. This was investigated by comparing CAT outcomes
to the full-scale outcomes of the questionnaire.
We performed two analyses to assess whether the CAT scores
show sufficient similarity with the full MASQ scale scores. In the
first analysis, we assessed whether the CAT outcomes are similar
to the full MASQ scales. The CAT estimates were compared for
each MASQ scale with the full-scale estimates (PA, 22 item
scores; NA, 20 item scores; SA, 17 item scores), using Pearson
correlations and scatterplots. Furthermore, we assessed the size of
difference between the outcomes expressed as Cohen’s d (using
pooled SD’s for the CAT and the full MASQ scale). Cohen’s d was
evaluated using the guideline proposed by Cohen (1988): 0.2
small effect, 0.5 medium effect, 0.8 large effect.
In the second analysis, we assessed whether the predictive utility
(i.e., criterion validity; McDonald, 1999) of the CATs was similar
to that of the full MASQ scales. We formed three patient classi-
fications based on the MINI-plus diagnosis (Sheehan et al., 1998):
(a) a mood disorder or no disorder, (b) an anxiety disorder or no
disorder, and (c) a comorbid mood and anxiety disorder or no
disorder. We then assessed whether the CAT simulation scores and
the full MASQ scale scores could predict the patients classifica-
This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.
956
FLENS, SMITS, CARLIER, VAN HEMERT, AND DE BEURS

Citations
More filters
Journal ArticleDOI

Anxiety and working memory capacity: A meta-analysis and narrative review.

TL;DR: A narrative review of the literature revealed that anxiety, whether self-reported or experimentally induced, is related to poorer performance across a wide variety of tasks and identified a number of methodological limitations common in the literature as well as avenues for future research.
Journal ArticleDOI

Statistical harmonization corrects site effects in functional connectivity measurements from multi-site fMRI data.

TL;DR: The proposed ComBat harmonization approach for fMRI‐derived connectivity measures facilitates reliable and efficient analysis of retrospective and prospective multi‐site fMRI neuroimaging studies and increased the power to detect age associations when using optimal combinations of connectivity metrics and brain atlases.
Journal ArticleDOI

Behavioral and ERP measures of attentional bias to threat in the dot-probe task: poor reliability and lack of correlation with anxiety.

TL;DR: A serious need to develop new tasks and methods to more reliably investigate attentional bias to threat and its relationship to anxiety in both clinical and non-clinical populations is indicated.
Journal ArticleDOI

Quality of Life Research: An International Journal of Quality of Life Aspects of Treatment, Care and Rehabilitation

Carl H. Slater, +1 more
- 04 May 1994 - 
TL;DR: The editors, in their inaugural issue, state their principal objectives to be "the exchange of ideas and the rapid communication of research findings in this multi-disciplinary field of inquiry" and well achieved in the first seven issues.
Journal ArticleDOI

The Role of Implicit Theories in Mental Health Symptoms, Emotion Regulation, and Hypothetical Treatment Choices in College Students

TL;DR: This article found that implicit theories of anxiety, emotion, intelligence, and personality related to various symptoms of anxiety and depression, emotion-regulation strategies, and hypothetical treatment choices (e.g., medication versus therapy) in two undergraduate samples.
References
More filters
Journal Article

R: A language and environment for statistical computing.

R Core Team
- 01 Jan 2014 - 
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Book

Statistical Power Analysis for the Behavioral Sciences

TL;DR: The concepts of power analysis are discussed in this paper, where Chi-square Tests for Goodness of Fit and Contingency Tables, t-Test for Means, and Sign Test are used.
Journal ArticleDOI

Diagnostic and Statistical Manual of Mental Disorders

TL;DR: An issue concerning the criteria for tic disorders is highlighted, and how this might affect classification of dyskinesias in psychotic spectrum disorders.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What are the contributions mentioned in the paper "Simulating computer adaptive testing with the mood and anxiety symptom questionnaire" ?

In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website this paper. 

In future research, these requirements have to be investigated to assess the utility of the MASQ CAT in ROM. As a future line of research, the authors propose to investigate whether adding items with either milder or stronger content will result in more uniform test information because of the increased information in the extremes. For future CATs, researchers should decide on a minimally required correlation between the CAT and the full-scale estimates, which might be met by the proposed solutions. Two last lines of future research, which can be pursued with the MASQ, are the investigation of clinical cut points and their sensitivity and specificity for mood and/or anxiety diagnosis, and the factor structure for patients. 

Two last lines of future research, which can be pursued with the MASQ, are the investigation of clinical cut points and their sensitivity and specificity for mood and/or anxiety diagnosis, and the factor structure for patients. 

Items containing DIF cause bias in latent trait scores because persons from different groups with the same latent trait score have different probabilities of selecting item response categories. 

Because mental health constructs are generally complex, item response results are rarely strictly unidimensional (Reise, Morizot, & Hays, 2007). 

Because this study was set out to find both reliable and shorter measures, the authors specified that the CAT simulation stopped applying new items when the latent trait estimate of a patient reached a SE( ) 0.3, comparable to a marginal reliability of .90 (Green, Bock, Humphreys, Linn, & Reckase, 1984). 

A solution to deal with the lower psychometric quality of the SA scale is by setting a minimum number of items the CAT should administer or by specifying a more strict SE; for example, SE( ) 0.25. 

The GRM is often the preferred IRT model, because it is easier to illustrate to test users than other models, and the item parameters are easy to interpret with regard to responder behavior (Ostini, Finkelman, & Nering, 2015; Smits et al., 2011).