scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Reevaluating the Efficacy and Predictability of Antidepressant Treatments: A Symptom Clustering Approach

01 Apr 2017-JAMA Psychiatry (American Medical Association)-Vol. 74, Iss: 4, pp 370-378
TL;DR: Antidepressants in general were more effective for core emotional symptoms than for sleep or atypical symptoms and two common checklists used to measure depressive severity can produce statistically reliable clusters of symptoms.
Abstract: Importance Depressive severity is typically measured according to total scores on questionnaires that include a diverse range of symptoms despite convincing evidence that depression is not a unitary construct. When evaluated according to aggregate measurements, treatment efficacy is generally modest and differences in efficacy between antidepressant therapies are small. Objectives To determine the efficacy of antidepressant treatments on empirically defined groups of symptoms and examine the replicability of these groups. Design, Setting, and Participants Patient-reported data on patients with depression from the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial (n = 4039) were used to identify clusters of symptoms in a depressive symptom checklist. The findings were then replicated using the Combining Medications to Enhance Depression Outcomes (CO-MED) trial (n = 640). Mixed-effects regression analysis was then performed to determine whether observed symptom clusters have differential response trajectories using intent-to-treat data from both trials (n = 4706) along with 7 additional placebo and active-comparator phase 3 trials of duloxetine (n = 2515). Finally, outcomes for each cluster were estimated separately using machine-learning approaches. The study was conducted from October 28, 2014, to May 19, 2016. Main Outcomes and Measures Twelve items from the self-reported Quick Inventory of Depressive Symptomatology (QIDS-SR) scale and 14 items from the clinician-rated Hamilton Depression (HAM-D) rating scale. Higher scores on the measures indicate greater severity of the symptoms. Results Of the 4706 patients included in the first analysis, 1722 (36.6%) were male; mean (SD) age was 41.2 (13.3) years. Of the 2515 patients included in the second analysis, 855 (34.0%) were male; mean age was 42.65 (12.17) years. Three symptom clusters in the QIDS-SR scale were identified at baseline in STAR*D. This 3-cluster solution was replicated in CO-MED and was similar for the HAM-D scale. Antidepressants in general (8 of 9 treatments) were more effective for core emotional symptoms than for sleep or atypical symptoms. Differences in efficacy between drugs were often greater than the difference in efficacy between treatments and placebo. For example, high-dose duloxetine outperformed escitalopram in treating core emotional symptoms (effect size, 2.3 HAM-D points during 8 weeks, 95% CI, 1.6 to 3.1;P Conclusions and Relevance Two common checklists used to measure depressive severity can produce statistically reliable clusters of symptoms. These clusters differ in their responsiveness to treatment both within and across different antidepressant medications. Selecting the best drug for a given cluster may have a bigger benefit than that gained by use of an active compound vs a placebo.

Content maybe subject to copyright    Report

Copyright 2017 American Medical Association. All rights reserved.
Reevaluating the Efficacy and Predic tability
of Antidepressant Treatments
A Symptom Clustering Approach
Adam M. Chekroud, MSc; Ralitza Gueorguieva, PhD; Harlan M. Krumholz, MD, SM; Madhukar H. Trivedi, MD;
John H. Krystal, MD; Gregory McCarthy, PhD
IMPORTANCE
Depressive severity is typically measured according to total scores on
questionnaires that include a diverse range of symptoms despite convincing evidence that
depression is not a unitary construct. When evaluated according to aggregate measurements,
treatment efficacy is generally modest and differences in efficacy between antidepressant
therapies are small.
OBJECTIVES To determine the efficacy of antidepressant treatments on empirically defined
groups of symptoms and examine the replicability of the se groups.
DESIGN, SETTING, AND PARTICIPANTS Patient-reported data on patients with depression from
the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial (n = 4039) were
used to identify clusters of symptoms in a depressive symptom checklist. The findings were
then replicated using the Combining Medications to Enhance Depression Outcomes
(CO-MED) trial (n = 640). Mixed-effects regression analysis was then performed to
determine whether observed symptom clusters have differential response trajectories using
intent-to-treat data from both trials (n = 4706) along with 7 additional placebo and
active-comparator phase 3 trials of duloxetine (n = 2515). Finally, outcomes for each cluster
were estimated separately using machine-learning approaches. The study was conducted
from October 28, 2014, to May 19, 2016.
MAIN OUTCOMES AND MEASURES Twelve items from the self-reported Quick Inventory of
Depressive Symptomatology (QIDS-SR) scale and 14 items from the clinician-rated Hamilton
Depression (HAM-D) rating scale. Higher scores on the measures indicate greater severity of
the symptoms.
RESULTS Of the 4706 patients included in the first analysis, 1722 (36.6%) were male; mean
(SD) age was 41.2 (13.3) years. Of the 2515 patients included in the second analysis, 855
(34.0%) were male; mean age was 42.65 (12.17) years. Three symptom clusters in the
QIDS-SR scale were identified at baseline in STAR*D. This 3-cluster solution was replicated in
CO-MED and was similar for the HAM-D scale. Antidepre ssants in general (8 of 9 treatments)
were more effective for core emotional symptoms than for sleep or atypical symptoms.
Differences in efficacy between drugs were often greater than the difference in efficacy
between treatments and placebo. For example, high-dose duloxetine outperformed
escitalopram in treating core emotional symptoms (effect size, 2.3 HAM-D points during 8
weeks, 95% CI, 1.6 to 3.1; P < .001), but escitalopram was not significantly different from
placebo (effect size, 0.03 HAM-D points; 95% CI, −0.7 to 0.8; P = .94).
CONCLUSIONS AND RELEVANCE Two common checklists used to measure depressive severity
can produce statistically reliable clusters of symptoms. These clusters differ in their
responsiveness to treatment both within and across different antidepressant medications.
Selecting the best drug for a given cluster may have a bigger benefit than that gained by use
of an active compound vs a placebo.
JAMA Psychiatry. 2017;74(4):370-378. doi:10.1001/jamapsychiatry.2017.0025
Published online February 22, 2017.
Supplemental content
Author Affiliations: Author
affiliations are listed at the end of this
article.
Corresponding Author: Adam M.
Chekroud, MSc, Department of
Psychology, Yale University, 2
Hillhouse Ave, New Haven, CT 06511
(adam.chekroud@yale.edu).
Research
JAMA Psychiatry | Original Investigation
370 (Reprinted) jamapsychiatry.com
Copyright 2017 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 08/26/2022

Copyright 2017 American Medical Association. All rights reserved.
M
eta-analyses
1
and factor analytic studies of large popu-
lations with depression
2,3
indicate that the symp-
toms of major depressive disorder are organized into
2 to 5 clusters depending on the checklist used. Nevertheless,
clinical trials of patients with depression nearly always report
total symptom severity scores as their primary outcome mea-
sures. These studies also frequently report the proportion of pa-
tients whose total symptom severityfalls below a certain thresh-
old and thus achieve clinical response or remission.
4
Few
patients reach remission with their initial treatment, although
depression eventually remits in most patientsafter a largely trial-
and-error treatment selection process.
5
Statistical models might
improve clinical outcomes by accelerating the treatment match-
ing process. Despite concerted efforts using genomic data,
6
structural and functional magnetic resonance imaging,
7
and ma-
chine learning of clinical data,
8
performance in predicting out-
comes remains modest.
9,10
Heterogeneity among depressive symptoms may im-
pede the evaluation of treatments for depression.
11,12
For ex-
ample, treatment efficacy for one group of symptoms may be
masked by a lack of efficacy for other symptoms, potentially
explaining mixed results from large comparative efficacy
meta-analyses.
4,13
For example, selective serotonin reuptake
inhibitors are generally effective in reducing low mood
14
rela-
tive to other symptoms. However, evaluating outcomes on an
individual symptom level may be cumbersome since clini-
cians would need to remember treatment guidelines specific
to each symptom. Although symptoms might be grouped based
on clinical experience (eg, “melancholic depression”)
15
or
the use of rating subscales (eg, Hamilton Rating Scale for
Depression–7), novel associations might be overlooked by this
process.
Statistical methods enable one to categorize depressive
symptoms into subcomponents. For example, one study
showed that nortriptyline hydrochloride is more effective than
escitalopram in treating a neurovegetative symptom dimen-
sion, but escitalopram was more effective in treating mood and
cognitive symptom dimensions.
16
However, traditional sta-
tistical approaches have some shortcomings. Factor analy-
ses, for example, may generate complicated combinations of
symptoms within particular dimensions.
16
These analyses also
may be susceptible to experimenter bias since one often has
to choose the desired number of clusters or components in the
data, as in k means clustering.
17
By contrast, hierarchical clus-
tering is an easy-to-visualize, deterministic method in which
each symptom is assigned to a single cluster (ie, not loading
across multiple clusters) without prespecifying the desired the
number of clusters.
In this study, we explored the efficacy and predictability
of antidepressant therapies in treating specific groups of symp-
toms (eMethods [which includes eTables 1-10 of various analy-
ses] and eFigure 1 in the Supplement). We used an unsuper-
vised machine-learning approach (hierarchical clustering) to
establish a data-driven grouping of baseline symptoms. The
clustering method was applied to patients from a large mul-
tisite trial of depression and a replication sample from an in-
dependent clinical trial with similar inclusion criteria. Next,
we reanalyzed treatment outcomes for 9 archival clinical trials
(Table 1) according to the severity of each symptom cluster
(rather than total severity) to determine whether symptom
clusters are equally responsive to antidepressant treatments
and whether certain drugs and doses are more effective than
others. Finally, we used supervised machine learning to pre-
dict outcomes specific to each cluster of symptoms since there
may be good clinical or biological indicators of changes in some
symptoms that do not correlate strongly with changes in other
features of depression.
Methods
Clinical Trial Data
The Sequenced Treatment Alternatives to Relieve Depres-
sion (STAR*D) trial is the largest prospective, randomized clini-
cal trial of outpatients with major depressive disorder.
18-21
Eli-
gible participants were treatment-seeking outpatients with a
primary clinical (DSM-IV) diagnosis of nonpsychotic major de-
pressive disorder scored 14 or higher on the 17-item Hamilton
Depression (HAM-D) rating scale, were aged 18 to 75 years, and
were recruited from primary and psychiatric care settings in
the United States from June 2001 to April 2004.
19
We focused
on the first treatment stage consisting of a 12-week course of
citalopram hydrobromide. The present study was conducted
from October 28, 2014, to May 19, 2016. It was approved by
the Yale University Human Subjects Committee, with a waiver
of informed consent.
The Combining Medications to Enhance Depression
Outcomes (CO-MED) trial was a multisite, single-blind, ran-
domized clinical trial comparing the efficacy of medication
combinations in the treatment of unipolar major depressive
disorder.
22,23
Eligible patients were aged 18 to 75 years, had a
primary DSM-IV–based diagnosis of nonpsychotic major
depressive disorder, had recurrent or chronic depression
(current episode ≥2 years), scored 16 or higher on the 17-item
HAM-D rating scale, and enrolled participants between
March 2008 and February 2009. Patients were randomly
allocated (1:1:1) to escitalopram plus placebo (monotherapy),
escitalopram plus bupropion hydrochloride, or venlafaxine
hydrochloride plus mirtazapine.
We also analyzed all arms from 7 randomized, multicenter,
double-blind, placebo-controlled, and active comparator-
controlled clinical trials of duloxetine for major depressive
disorder (Table 1). Four different protocols were used for these
Key Points
Question Are antidepressants equally good at treating different
kinds of symptoms in depression?
Findings Individual patient data from 9 clinical trials of major
depression in 7221 patients were analyzed, with a focus on specific
clusters of symptoms rather than total depressive severity. For
each cluster, significant differences in efficacy between
antidepressants were identified.
Meaning Antidepressant medications can be selected to benefit
specific clusters of symptoms in depre ssion.
Reevaluating the Efficacy and Predictability of Antidepressants Original Investigation Research
jamapsychiatry.com (Reprinted) JAMA Psychiatry April 2017 Volume 74, Number 4 371
Copyright 2017 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 08/26/2022

Copyright 2017 American Medical Association. All rights reserved.
studies; parts A and B reflect trials run in parallel following
the same protocol. All studies incorporated double-
blind, variable-duration placebo lead-in periods. Safety and
efficacy results from these studies have been published
previously
24-27
and summarized as pooled analyses of
safety
28
and efficacy.
29
Study HMCR is registered at
clinicaltrials.gov.
30
The other studies were conducted before
clinical trial registration was necessary.
Outcomes for STAR*D and CO-MED are based on the
16-item self-report Quick Inventory of Depressive Symp-
tomatology (QIDS-SR) checklist during 12 weeks of treat-
ment. Outcomes for all other trials are based on the 17-item
HAM-D rating scale
31
during 8 weeks. We excluded the
HAM-D “loss of insight” item because there is no equivalent
in the QIDS-SR and excluded weight/appetite items because
they were not collected in the same way across trials and are
often excluded from item-level analyses
32
(eFigure 2 in the
Supplement). Study selection was driven primarily by
access to individual patient-level data. Patients provided
informed consent to treatment when they participated in
the original clinical trials. Consent was not needed for the
present analyses since the data were deidentified. Of the
4706 patients included in the first analysis, 1722 (36.6%)
were male; mean (SD) age was 41.2 (13.3) years. Of the 2515
patients included in the second analysis, 855 (34.0%) were
male; mean age was 42.65 (12.17) years.
Symptom Clustering
Rating scales in depression include a diverse range of symp-
toms. We applied a data-driven approach to identify groups
of symptoms within depression rating scales. Higher scores on
the rating scales indicate more severe symptoms. Hierarchi-
cal clustering shows structure in data without making assump-
tions about the number of clusters that are present in the data
and gives a deterministic solution. We applied agglomerative
(bottom-up) hierarchical clustering to the QIDS-SR checklist
completed at baseline in STAR*D by 4017 patients and repli-
cated the analysis using baseline QIDS-SR data from CO-MED
(n = 640) and the baseline HAM-D scale that was also col-
lected on 4039 patients in STAR*D. We conducted multiple sen-
sitivity analyses using alternative approaches (eFigures 3-9 and
eTables 1-3 in the Supplement).
Table 1. Individual Patient-Level Data Aggregated From 9 Trials of Antidepressant Efficacy for Unipolar
Major Depression
Protocol
Sample Size
(N = 7221) Treatment Dose
STAR*D phase 1 4041 Citalopram 20-60 mg once daily
CO-MED 224 Escitalopram plus placebo 10-20 mg once daily
221 Escitalopram plus bupropion
extended release
Escitalopram, 10-20 mg once daily; bupropion
extended release, 150-200 mg twice daily
220 Venlafaxine extended release
plus mirtazapine
Venlafaxine extended release, 37.5-300 mg once
daily; mirtazapine, 15-45 mg once daily
HMAQ part A 70 Duloxetine 20-60 mg twice daily
33 Fluoxetine 20 mg once daily
70 Placebo NA
HMAQ part B 82 Duloxetine 20-60 mg twice daily
37 Fluoxetine 20 mg once daily
74 Placebo NA
HMAT part A 91 Duloxetine 20 mg twice daily
84 Duloxetine 40 mg twice daily
89 Paroxetine 20 mg once daily
90 Placebo NA
HMAT part B 86 Duloxetine 20 mg twice daily
91 Duloxetine 40 mg twice daily
87 Paroxetine 20 mg once daily
89 Placebo NA
HMAY part A 95 Duloxetine 40 mg twice daily
93 Duloxetine 60 mg twice daily
86 Paroxetine 20 mg once daily
93 Placebo NA
HMAY part B 93 Duloxetine 40 mg twice daily
103 Duloxetine 60 mg twice daily
97 Paroxetine 20 mg once daily
99 Placebo NA
HMCR 273 Duloxetine 60 mg twice daily
273 Escitalopram 10 mg once daily
137 Placebo NA
Abbreviations: CO-MED, Combining
Medications to Enhance Depression
Outcomes; NA, not applicable;
STAR*D, Sequenced Treatment
Alternatives to Relieve Depression.
Research Original Investigation Reevaluating the Efficacy and Predictability of Antidepressants
372 JAMA Psychiatry April 2017 Volume 74, Number 4 (Reprinted) jamapsychiatry.com
Copyright 2017 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 08/26/2022

Copyright 2017 American Medical Association. All rights reserved.
Evaluation of Treatment Outcomes
Treatment Efficacy
We analyzed the full intent-to-treat samples in all trials using
linear mixed-effects regression models (STAR*D, 4041; CO-
MED, 665; and other trials, 2515). The dependent measure was
mean within-cluster severity: for each patient at each time
point, we calculated the mean symptom severity within each
cluster. Fixed effects included symptom cluster, time (log-
transformed weeks), treatment regimen, and all 2- and 3-way
interaction effects. We included a separate random intercept
and slope for each symptom cluster with unstructured vari-
ance-covariance of the random effects within subject based on
improvements in the Schwarz-Bayesianinformation criterion.
33
False-discovery rate-adjusted
34
P values were used to deter-
mine statistical significance for post hoc comparisons by clus-
ter and drug within each mixed-model analysis.
One model was used to analyze QIDS-SR–based clusters
across STAR*D and CO-MED, and another model was used to
analyze HAM-D–based clusters for the 7 other placebo-
controlled trials. In the HAM-D model, we also included the
main effect of the trial to control for potential systematic dif-
ferences among trials. Preliminary analyses of the 4 dulox-
etine doses in each cluster indicated that 120-mg/d and 80-
mg/d dosages were not significantly different from each other
but differed from the lower doses and placebo (eResults and
eFigure 10 in the Supplement). The 60-mg/d and 40-mg/d du-
loxetine dosages were similar to each other and nearly indis-
tinguishable from placebo. We therefore grouped cohorts into
high-dose duloxetine (80-120 mg/d) and low-dose dulox-
etine (40-60 mg/d).
Outcome Predictability
We used a recently developed statistical modeling pipeline
8
to
predict treatment outcomes specific to each symptom cluster
using information available at baseline. We extracted 164 items,
including demographics, medical and psychiatric histories, and
specific symptom items that were used as predictor variables
(eTable 10 in the Supplement). Penalized logistic regression (elas-
tic net
35,36
) was then used to identify the 25 variables that best
predicted each cluster separately. These variables were then
used to train machine-learning algorithms (gradient boosting
machines
37,38
), resulting in a separate model for each symp-
tom cluster, with each using 25 predictor variables. Predictabil-
ity was measured as the percentage of variance explained in
final cluster scores (ie, R
2
) using 5 repeats of 10-fold cross-
validation. The statistical significance of each model was as-
sessed using a permutation test (eMethods in the Supple-
ment). We trained models on patients with complete baseline
data for whom a severity score was recorded after 12 or more
weeks of treatment (n = 1962) to ensure adequate treatment du-
ration. To externally validate our predictive models, they were
applied without modification to predict final cluster scores in
CO-MED treatment completers. Here, statistical significance was
measured by a P value calculated for Pearson correlations be-
tween predicted outcomes and observed outcomes in each treat-
ment group of CO-MED. We did not have comparable predic-
tor data in the duloxetine trials; thus, predictive analyses were
conducted only for STAR*D and CO-MED. For significance, per-
mutation-based tests used an α level of .01, mixed-effects re-
gressions used a false-discovery rate correction and then an α
level of .05, and Pearson correlations used an α level of .05.
Predictive and clustering analyses were implemented in
R, version 3.2.3 (R Foundation). Efficacy analyses were con-
ducted using SAS, version 9.4 (proc mixed) (SAS Institute).
Results
In 2 independent trials, we identified the same clustering of
symptoms in the QIDS-SR checklist, consisting of core emo-
tional, sleep (insomnia), and atypical symptoms (Figure 1A and
B). A similar clustering solution was also found for the HAM-D
scale checklist (Figure 1C). The clustering solution was robust
across a number of sensitivity analyses using different para-
meters, time points, and approaches (eFigures 3-9 and eTables
1-3intheSupplement).
Efficacy Analyses
Treatment efficacy was measured according to the rate of
symptom improvement over time (ie, steeper symptom tra-
jectories are better, as shown in Figure 2). No antidepressant
treatment worked equally well across all 3 symptom clusters.
As shown in Figure 2A, when measured according to the QIDS-
SR, trajectories were significantly better for core emotional
symptoms than for either sleep symptoms or atypical symp-
toms for citalopram, escitalopram with placebo, and escita-
lopram with bupropion (all β>0.079; all false-discovery rate
corrected P < .001). Sleep trajectories were also better than
atypical trajectories for these 3 treatments (all β>0.099; all
P .001). As shown in Figure 2B, when measured according
to the HAM-D rating scale, a similar pattern was observed. Core
emotional trajectories were better than sleep and atypical
trajectories for all treatments (all β>0.12; all P .001). Sleep
trajectories were also better than atypical trajectories for low-
dose duloxetine and escitalopram (all β>0.080; all P .001).
All slope contrast estimates, SEs, 95% CIs, and P values are
included in eTables 4 and 5 in the Supplement.
To interpret the magnitude of differences between drugs,
we calculated an effect size (ES), measured in raw rating scale
points, that reflects the difference between treatments in re-
ducing the overall severity of a symptom cluster (ie, we mul-
tiplied slope contrasts by the natural log of treatment dura-
tion and then by the number of symptoms in each cluster). For
example, in this study, high-dose duloxetine was signifi-
cantly better than escitalopram in treating atypical symp-
toms, such that a patient’s total improvement in atypical
severity was a mean of 1.9 HAM-D points greater with high-
dose duloxetine than escitalopram (ES, 1.9; 95% CI, 1.4-2.3;
false-discovery rate corrected P < .001).
For each symptom cluster, there were significant differ-
ences in efficacy between treatments (Figure 2). Combined
escitalopram and bupropion treatment was significantly more
effective in treating core emotional symptoms than citalo-
pram (ES, 0.7 QIDS-SR points; 95% CI, 0.2 to 1.3; P = .03). For
sleep/insomnia symptoms, venlafaxine with mirtazapine out-
performed citalopram (ES, 1.4; 95% CI, 1.0 to 1.8; P < .001). For
Reevaluating the Efficacy and Predictability of Antidepressants Original Investigation Research
jamapsychiatry.com (Reprinted) JAMA Psychiatry April 2017 Volume 74, Number 4 373
Copyright 2017 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 08/26/2022

Copyright 2017 American Medical Association. All rights reserved.
core emotional symptoms in HAM-D scale trials (Figure 2B),
high-dose duloxetine outperformed escitalopram (ES, 2.3
HAM-D points; 95% CI, 1.6 to 3.1; P < .001). Escitalopram was
not significantly different from placebo for core emotional
symptoms (ES, 0.03 HAM-D points; 95% CI, −0.7 to 0.8;
P = .94). For sleep symptoms, high-dose duloxetine outper-
formed fluoxetine (ES, 0.9; 95% CI, 0.1 to 1.7; P = .046). For
atypical symptoms, high-dose duloxetine outperformed all oth-
ers (ES, 0.5-1.9) and escitalopram was worse than placebo (ES,
0.7; 95% CI, 0.3 to 1.1; P = .002). Among our HAM-D studies,
only 2 antidepressant treatments (high-dose duloxetine and
paroxetine) outperformed placebo for all 3 symptom clus-
ters. All other comparisons are presented in eTables 6 and 7
in the Supplement.
Predictive Analyses
Within STAR*D, although all models performed significantly
above chance (all P < .01), we observed substantial variability
in the predictability of outcomes for each cluster (Table 2 and
eTable 8 in the Supplement). The sleep symptom cluster was
the most predictable (R
2
= 19.6%; SD, 5.0%; P < .01) and sub-
stantially more predictable than core symptoms (R
2
= 14.5%;
SD, 4.6%; P < .01) and atypical symptoms (R
2
= 15.1%; SD, 5.3%;
P < .01). The observed range in cluster predictability (R
2
difference, 5.1%) was also significantly larger than any range
observed during permutation testing (mean [SD] range, 0.56%
[0.50%]; P < .01). We inspected the best predictive baseline
variables for each model separately, highlighting those iden-
tified as predictive for 1 cluster but not others (ie, specific pre-
dictors) (Table 2). Baseline HAM-D scale severity was a top pre-
dictor of core emotional outcomes but not any of the other 3
clusters. Baseline atypical symptom severity and hypersom-
nia predicted atypical outcomes; baseline sleep cluster sever-
ity and early-morning insomnia predicted sleep outcomes.
We then applied the best-performing models, without
modification, to predict outcomes for each cluster in the 3 treat-
ment groups of CO-MED (Figure 3). Performance was statisti-
cally above chance, although clinically modest, for predict-
ing core emotional outcomes in the escitalopram monotherapy
arm (r
149
= 0.18; P = .03) and the venlafaxine-mirtazapine arm
(r
138
=0.17;P = .04). Performance was above chance predict-
ing sleep outcomes in the escitalopram-bupropion arm
(r
132
= 0.36; P < .001).
Clinical Decision Support Tool
To help translate these findings into clinical practice, we based
a clinical decision support tool on these findings. It is imple-
mented as a brief questionnaire that can be accessed from any
web browser and returns results in real time (https://www
.spring.care/spring-assessment).
Discussion
Using a data-driven approach, we identified 3 symptom clus-
ters within the QIDS-SR checklist. We replicated our cluster-
ing solution in an independent trial cohort (CO-MED) and found
it to be robust across different parameters and time points and
consistent with other statistical approaches. No antidepres-
sant was equally effective for all 3 symptom clusters, and, for
each symptom cluster, there were significant differences in
treatment efficacy between drugs. Antidepressants in gen-
eral worked best in treating core emotional and sleep symp-
toms and were less effective in treating atypical symptoms. The
magnitude of these differences suggests that selecting the best
drug for a given cluster may have a bigger benefit than that
Figure 1. Data-Driven Decomposition of Depressive Checklists
Using Hierarchical Clustering
Midnocturnal insomnia
Sleep-onset insomnia
Early morning insomnia
Energy/fatigability
Concentration/decision making
Loss of interest
Mood (sad)
Feelings of worthlessness
Psychomotor agitation
Psychomotor slowing
Suicidal ideation
Hypersomnia
Midnocturnal insomnia
Sleep-onset insomnia
Early morning insomnia
Energy/fatigability
Concentration/decision making
Loss of interest
Mood (sad)
Feelings of worthlessness
Psychomotor agitation
Psychomotor slowing
Suicidal ideation
Hypersomnia
Reduced libido
Psychomotor slowing
Suicide
Psychomotor agitation
Hypochondriasis
Energy/fatigability
Midnocturnal insomnia
Slee
p-o
nset insomnia
Early morning insomnia
Somatic anxiety
Psychological anxiety
Guilt and delusions
Loss of interest
Mood (sad)
Sleep (insomnia)
Sleep (insomnia)
Sleep (insomnia)
Core emotional
Core emotional
Core emotional
Atypical
Atypical
Atypical
Total QIDS-SR
Severity
Total QIDS-SR
Severity
Total HAM-D
Severity
QIDS-SR in STAR*D
A
QIDS-SR in CO-MED
B
HAM-D in STAR*D
C
This procedure sequentially groups symptoms according to the similarity of
their responses across a patient cohort. With this procedure, groups of
symptoms that merge at high values relative to the merge points of their
subgroups are considered candidates for natural clusters. A and B, In the Quick
Inventory of Depressive Symptomatology–Self Report (QIDS-SR) checklist, we
identified an identical 3-cluster solution in both the Sequenced Treatment
Alternatives to Relieve Depression (STAR*D) (n = 4017) and Combining
Medications to Enhance Depression Outcomes (CO-MED) trials (n = 640).
C, A comparable symptom structure was also observed at baseline for STAR*D
patients when measured according to the Hamilton Depression (HAM-D) rating
scale. The name s of the individual checklist items are colored according to their
cluster assignment. Line leng ths in the dendogram reflect how similar items or
clusters are to one another (shorter line length indicates greater similarity).
Research Original Investigation Reevaluating the Efficacy and Predictability of Antidepressants
374 JAMA Psychiatry April 2017 Volume 74, Number 4 (Reprinted) jamapsychiatry.com
Copyright 2017 American Medical Association. All rights reserved.
Downloaded From: https://jamanetwork.com/ on 08/26/2022

Citations
More filters
Journal ArticleDOI
TL;DR: The Personalized Advantage Index (PAI) and related approaches combine information obtained prior to the initiation of treatment into multivariable prediction models that can generate individualized predictions to help clinicians and patients select the right treatment.
Abstract: Mental health researchers and clinicians have long sought answers to the question "What works for whom?" The goal of precision medicine is to provide evidence-based answers to this question. Treatment selection in depression aims to help each individual receive the treatment, among the available options, that is most likely to lead to a positive outcome for them. Although patient variables that are predictive of response to treatment have been identified, this knowledge has not yet translated into real-world treatment recommendations. The Personalized Advantage Index (PAI) and related approaches combine information obtained prior to the initiation of treatment into multivariable prediction models that can generate individualized predictions to help clinicians and patients select the right treatment. With increasing availability of advanced statistical modeling approaches, as well as novel predictive variables and big data, treatment selection models promise to contribute to improved outcomes in depression.

261 citations


Cites background from "Reevaluating the Efficacy and Predi..."

  • ...Recent multivariable modeling efforts (Chekroud et al. 2017) highlight the potential for these advanced approaches to improve prognostic prediction in mental health (see Gillan & Whelan 2017 for an extensive review)....

    [...]

Journal ArticleDOI
TL;DR: An overview of AI and current applications in healthcare, a review of recent original research on AI specific to mental health, and a discussion of how AI can supplement clinical practice while considering its current limitations, areas needing additional research, and ethical implications regarding AI technology are provided.
Abstract: Artificial intelligence (AI) technology holds both great promise to transform mental healthcare and potential pitfalls. This article provides an overview of AI and current applications in healthcare, a review of recent original research on AI specific to mental health, and a discussion of how AI can supplement clinical practice while considering its current limitations, areas needing additional research, and ethical implications regarding AI technology. We reviewed 28 studies of AI and mental health that used electronic health records (EHRs), mood rating scales, brain imaging data, novel monitoring systems (e.g., smartphone, video), and social media platforms to predict, classify, or subgroup mental health illnesses including depression, schizophrenia or other psychiatric illnesses, and suicide ideation and attempts. Collectively, these studies revealed high accuracies and provided excellent examples of AI’s potential in mental healthcare, but most should be considered early proof-of-concept works demonstrating the potential of using machine learning (ML) algorithms to address mental health questions, and which types of algorithms yield the best performance. As AI techniques continue to be refined and improved, it will be possible to help mental health practitioners re-define mental illnesses more objectively than currently done in the DSM-5, identify these illnesses at an earlier or prodromal stage when interventions may be more effective, and personalize treatments based on an individual’s unique characteristics. However, caution is necessary in order to avoid over-interpreting preliminary results, and more work is required to bridge the gap between AI in mental health research and clinical care.

218 citations

Journal ArticleDOI
TL;DR: It is argued that the evidence for the existence of the distinct resting state connectivity-based subtypes of depression should be interpreted with caution.

213 citations


Additional excerpts

  • ...Many studies have used data-driven clustering methods in order to find new subgroups of clinical populations, based on either clinical or biological data, with some degree of success (Marquand et al., 2016; Chekroud et al., 2017)....

    [...]

Journal ArticleDOI
TL;DR: It is illustrated that such biological dysregulations map more consistently to atypical behavioral symptoms reflecting altered energy intake/expenditure balance and may moderate the antidepressant effects of standard or novel therapeutic approaches.

174 citations

Journal ArticleDOI
TL;DR: Machine learning algorithms provide a powerful conceptual and analytic framework capable of integrating multiple data types and sources and may more effectively model neurobiological components as functional modules of pathophysiology embedded within the complex, social dynamics that influence the phenomenology of mental disorders.

159 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

83,420 citations

Journal ArticleDOI
TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.
Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

50,607 citations

Journal ArticleDOI
TL;DR: The present scale has been devised for use only on patients already diagnosed as suffering from affective disorder of depressive type, used for quantifying the results of an interview, and its value depends entirely on the skill of the interviewer in eliciting the necessary information.
Abstract: Types of Rating Scale The value of this one, and its limitations, can best be considered against its background, so it is useful to consider the limitations of the various rating scales extant. They can be classified into four groups, the first of which has been devised for use on normal subjects. Patients suffering from mental disorders score very highly on some of the variables and these high scores serve as a measure of their illness. Such scales can be very useful, but have two defects: many symptoms are not found in normal persons; and less obviously, but more important, there is a qualitative difference between symptoms of mental illness and normal variations of behaviour. The difference between the two is not a philosophical problem but a biological one. There is always a loss of function in illness, with impaired efficiency. Self-rating scales are popular because they are easy to administer. Aside from the notorious unreliability of self-assessment, such scales are of little use for semiliterate patients and are no use for seriously ill patients who are unable to deal with them. Many rating scales for behaviour have been devised for assessing the social adjustment of patients and their behaviour in the hospital ward. They are very useful for their purpose but give little or no information about symptoms. Finally, a number of scales have been devised specifically for rating symptoms of mental illness. They cover the whole range of symptoms, but such all-inclusiveness has its disadvantages. In the first place, it is extremely difficult to differentiate some symptoms, e.g., apathy, retardation, stupor. These three look alike, but they are quite different and appear in different settings. Other symptoms are difficult to define, except in terms of their settings, e.g., mild agitation and derealization. A more serious difficulty lies in the fallacy of naming. For example, the term "delusions" covers schizophrenic, depressive, hypochrondriacal, and paranoid delusions. They are all quite different and should be clearly distinguished. Another difficulty may be summarized by saying that the weights given to symptoms should not be linear. Thus, in schizophrenia, the amount of anxiety is of no importance, whereas in anxiety states it is fundamental. Again, a schizophrenic patient who has delusions is not necessarily worse than one who has not, but a depressive patient who has, is much worse. Finally, although rating scales are not used for making a diagnosis, they should have some relation to it. Thus the schizophrenic patients should have a high score on schizophrenia and comparatively small scores on other syndromes. In practice, this does not occur. The present scale has been devised for use only on patients already diagnosed as suffering from affective disorder of depressive type. It is used for quantifying the results of an interview, and its value depends entirely on the skill of the interviewer in eliciting the necessary information. The interviewer may, and should, use all information available to help him with his interview and in making the final assessment. The scale has undergone a number of changes since it was first tried out, and although there is room for further improvement, it will be found efficient and simple in use. It has been found to be of great practical value in assessing results of treatment.

29,488 citations

Journal ArticleDOI
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

16,538 citations

Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations

Related Papers (5)