(Open Access) Convergent and incremental predictive validity of clinician, self-report, and structured interview diagnoses for personality disorders over 5 years. (2013) | Douglas B. Samuel

Q: What contributions have the authors mentioned in the paper "Convergent and incremental predictive validity of clinician, self-report, and structured interview diagnoses for personality disorders over 5 years" ?

In this paper, the authors showed that PD diagnoses made by treating clinicians agree poorly with semistructured interviews and self-report questionnaires, and that the latter two methods have greater utility than clinicians ' PD diagnoses for predicting psychosocial functioning.

Q: What future works have the authors mentioned in the paper "Convergent and incremental predictive validity of clinician, self-report, and structured interview diagnoses for personality disorders over 5 years" ?

Future research that examines the incremental predictive validity of clinicians ’ diagnoses derived from more structured assessments, such as therapists completing the SWAP ( Westen & Shedler, 1999 ), the Personality Disorder Schedule ( Nestadt et al., 2012 ), or even an informant version of an existing PD questionnaire, would help to address this possibility. Future research exploring the validity of other psychiatric diagnoses provided by clinicians in routine practice warrants attention. Rather, their results suggest that clinicians use standardized assessment instruments to inform PD diagnoses.

Abstract:

The complexity of diagnosing personality disorders (PDs) has been a longstanding issue in psychiatry (Westen, 1997; Zimmerman, 1994). Several methods exist for diagnosing PDs, including semi-structured diagnostic interviews, self-report questionnaires, clinician-rated Q-sort instruments, as well as unstructured diagnoses made by treating clinicians (McDermut & Zimmerman, 2005). Although research has relied primarily on semi-structured diagnostic interviews and self-report questionnaires, therapists typically base PD diagnoses on their unstructured interviews and clinical contacts with patients (Perry, 1992; Westen, 1997; Zimmerman, 2011). Despite debate regarding the relative merits of different diagnostic methods (Westen & Muderrisoglu, 2003; Zimmerman & Mattia, 1999), no study has yet compared the predictive validity of clinicians’ naturalistic PD diagnoses to those from self-report questionnaires or semi-structured interviews (Zimmerman, 2011). Existing research has repeatedly indicated that clinician-generated PD diagnoses do not agree well with those from self-report measures (Davidson, Obonsawin, Seils, & Patience, 2003; Hyler, Rieder, Williams, & Spitzer, 1989; Morey, Blashfield, Webb, & Jewell, 1988; Rossi, Van den Brande, Tobac, Sloore, & Hauben, 2003) or semi-structured interviews (Dreessen & Arntz, 1999; Fridell & Hesse, 2006; Samuel & Widiger, 2010). This poor agreement is not unique to PDs, and has been noted for various psychiatric diagnoses (Rettew, Lynch, Achenbach, Dumenci, & Ivanova, 2009). More importantly, fundamental questions regarding the incremental predictive validity of diagnoses assigned by clinicians relative to different methods have not been answered. Research has compared the validity of self and informant reports of PD (Klein, 2003; Oltmanns & Turkheimer, 2009), but there is a critical need for analogous work comparing clinical diagnoses to other methods. Such work is crucial for determining whether and how different sources of information might be usefully combined. Currently, the optimal approach for how researchers and clinicians should most validly identify PDs remains unclear. Although research on clinical judgment offers reasons for skepticism about the validity of clinician ratings in general (Grove, Zald, Lebow, Snitz, & Nelson, 2000; Meehl, 1954), there are compelling reasons to believe that their PD diagnoses might be useful and valid. Therapists’ diagnostic impressions rely on extensive training and take into consideration information about the client's life gleaned across extended periods of clinical interactions. Pilkonis, Heape, Ruddy, and Serrao (1991) noted “clinical judgment, of course, has its own limitations, but it would seem unwise to develop assessment tools that are unrelated to thoughtful clinical experience” (p. 46). In addition, Westen (1997) suggested that clinicians take a holistic approach to diagnosis, situating them well to describe complex personality pathology. Others contend that clinicians’ PD ratings are superior to self-report because patients’ ability to accurately assess their own personality might be limited by mood states, lack of insight, or presentation biases (Ganellen, 2007; Huprich, Bornstein, & Schmitt, 2011). Finally, Morey, Blashfield, Webb, and Jewell (1988) suggested that semi-structured diagnostic interviews also might have limitations because “a relatively brief interview situation does not seem particularly well suited to the task of assessing long-term personological characteristics.” (p. 47). Despite these concerns, there are reasons to believe that patient-reported information from semi-structured interviews and/or self-report questionnaires can usefully contribute to PD diagnoses (Zimmerman & Mattia, 1999). Thus, although clinicians might not routinely ask direct questions about PD symptoms or employ semi-structured interviews and self-report questionnaires, they incorporate such information to inform their diagnoses if it is available. Importantly, because semi-structured interviews explicitly assess the longitudinal presence of PD symptoms, they might have greater ability to disentangle episodic state artifacts from more durable trait-based PD syndromes (Loranger et al., 1991; Morey et al., 2010). As treating therapists almost always play the primary role in diagnosing PDs in clinical settings, understanding the relative validity of their impressions carries particular importance. Comparing clinicians’ diagnoses with those from self-report questionnaires or semi-structured diagnostic interviews would be useful for prospectively predicting clinically-relevant outcomes that extend beyond specific diagnostic features, such as psychosocial functioning. We conducted such a comparison using data from the Collaborative Longitudinal Personality Disorders Study (CLPS) (Gunderson et al., 2000). The CLPS is well-suited for this investigation as the baseline assessment included diagnoses from treating clinicians collected using a modified version of the Personality Assessment Form (PAF) (Shea, Glass, Pilkonis, Watkins, & Docherty, 1987; Shea et al., 1990). This allowed them to record the degree to which patients evinced the prototypical characteristics of each of four study PDs (viz., schizotypal, borderline, avoidant, obsessive-compulsive). The PAF provides a relevant, externally valid method for conducting such an analysis as it closely approximates the way clinicians make PD diagnoses in clinical practice. The PAF's format is also timely, as it utilizes a prototype-matching approach that mirrors the original proposal for diagnosing PDs in DSM-5. In fact, the PAF and research that had employed it were cited as primary support for the Work Group's proposal (Skodol, Bender, et al., 2011; Skodol, Clark, et al., 2011). This proposal subsequently was criticized by a number of PD scholars (Pilkonis, Hallquist, Morse, & Stepp, 2011; Widiger, 2011; Zimmerman, 2011) and abandoned. Nonetheless, other prominent researchers and clinicians have strongly argued that the prototype-matching approach should become the standard method of PD diagnosis (Shedler et al., 2010). The benefit and goal of employing the PAF for collecting clinicians’ impressions is to maximize external validity (i.e., most closely match the type of PD diagnoses typically made in clinical practice), not to provide equivalence with other methods (Westen & Weinberger, 2004). Westen and colleagues have demonstrated that when clinicians administer a systematic clinical interview (i.e., the Clinical Diagnostic Interview, CDI; Westen, 2004) and record their impressions using the Shedler-Westen Assessment Procedure (SWAP; Westen & Shedler, 1999), their PD diagnoses become more reliable across independent raters (Westen & Muderrisoglu, 2006; Westen, Shedler, Bradley, & DeFife, 2012). Although informative, such a diagnostic strategy (i.e., a two-hour administration of the Clinical Diagnostic Interview followed by the sorting of 200 SWAP items) is not standard practice in naturalistic settings. Perhaps recognizing this, Westen and his colleagues have also been the primary proponents of the prototype-matching approach (Shedler & Westen, 2004; Westen, DeFife, Bradley, & Hilsenroth, 2010; Westen, Shedler, & Bradley, 2006) that helped inform the original DSM-5 proposal (Skodol, Bender, et al., 2011; Skodol, Clark, et al., 2011). The PAF's prototype-matching format makes it a reasonable choice for collecting treating clinicians’ PD diagnoses in this study. We compared the incremental validity of clinicians’ diagnoses of these four PDs assigned via the PAF to those generated by a semi-structured interview and self-report questionnaire for predicting psychosocial functioning assessed prospectively over five years. Given the published support for the validity of the prototype-matching approach (Westen et al., 2012), we hypothesized that clinicians’ PAF ratings would account for variance in functioning beyond that captured by self-report questionnaires or semi-structured interviews. Nonetheless, we also recognized that all previous findings concerning the relative validity of alternative diagnostic methods have suggested that the methods are mutually informative (Hopwood et al., 2008; Klein, 2003). Thus, we also hypothesized that the self-report and semi-structured interview methods would have unique strengths and demonstrate incremental predictive validity beyond the clinician-assigned diagnoses. Finally, to account for inadequate familiarity with patients that might disadvantage the clinicians’ PAF ratings, we conducted additional analyses using only the subset of cases whom clinicians had treated for at least one year prior to providing the diagnoses. This choice of a one year interval of treatment ensured adequate familiarity with a patient's personality pathology.

Wesleyan University

From the SelectedWorks of Charles A. Sanislow, Ph.D.

",/")"-

+*1"-$"*/*!* -")"*/(-"!& /&1"(&!&/3+#

(&*& &*"(#",+-/*!/-0 /0-"!*/"-1&"2

&$*+.".#+-"-.+*(&/3&.+-!"-.1"-"-.

+0$(.)0"(

%-(".*&.(+2 Wesleyan University

%-&./+,%"-+,2++! Michigan State University

- &"%"

*!-"2'+!+( University of Arizona"/(

1&(("/ %4,.2+-'.",-".. +) %-("..*&.(+2

Convergent and Incremental Predictive Validity of Clinician, Self-Report,

and Structured Interview Diagnoses for Personality Disorders Over 5 Years

Douglas B. Samuel

Yale School of Medicine

Charles A. Sanislow

Wesleyan University

Christopher J. Hopwood

Michigan State University

M. Tracie Shea

Veterans Affairs Medical Center, Providence, Rhode Island, and

Alpert Medical School of Brown University

Andrew E. Skodol

The Sunbelt Collaborative, Tucson, Arizona, and University of

Arizona College of Medicine

Leslie C. Morey

Texas A&M University

Emily B. Ansell

Yale School of Medicine

John C. Markowitz

New York State Psychiatric Institute, New York, New York,

and Columbia University College of Physicians and Surgeons

Mary C. Zanarini

Harvard Medical School

Carlos M. Grilo

Yale School of Medicine

Objective: Research has demonstrated poor agreement between clinician-assigned personality disorder (PD)

diagnoses and those generated by self-report questionnaires and semistructured diagnostic interviews. No

research has compared prospectively the predictive validity of these methods. We investigated the conver-

gence of these 3 diagnostic methods and tested their relative and incremental validity in predicting indepen-

dent, multimethod assessments of psychosocial functioning performed prospectively over 5 years. Method:

Participants were 320 patients in the Collaborative Longitudinal Personality Disorders Study diagnosed with

PDs by therapist, self-report, and semistructured interview at baseline. We examined the relative incremental

validity of therapists’ naturalistic ratings relative to these other diagnostic methods for predicting psychosocial

functioning at 5-year follow-up. Results: Hierarchical linear regression analyses revealed that both the

self-report questionnaire and semistructured interview PD diagnoses had significant incremental predictive

validity over the PD diagnoses assigned by a treating clinician. Although, in some cases, the clinicians’ ratings

for individual PDs did have validity for predicting subsequent functioning, they did not generally provide

incremental prediction beyond the other methods. These findings remained robust in a series of analyses

restricted to a subsample of therapist ratings based on clinical contact of 1 year or greater. Conclusions: These

results from a large clinical sample echo previous research documenting limited agreement between clinicians’

naturalistic PD diagnoses and those from self-report and semistructured interview methods. They extend prior work

by providing the first evidence about the relative predictive validity of these different methods. Our findings

challenge the validity of naturalistic PD diagnoses and suggest the use of structured diagnostic instruments.

Keywords: personality disorder, semistructured interview, self-report, diagnostic agreement, clinician

This article was published Online First May 6, 2013.

Douglas B. Samuel, Department of Psychiatry, Yale School of Medi-

cine; Charles A. Sanislow, Department of Psychology, Wesleyan Univer-

sity; Christopher J. Hopwood, Department of Psychology, Michigan State

University; M. Tracie Shea, Veterans Affairs Medical Center, Providence,

Rhode Island, and Department of Psychiatry, Alpert Medical School of

Brown University; Andrew E. Skodol, The Sunbelt Collaborative, Tucson,

Arizona, and Department of Psychiatry, University of Arizona College of

Medicine; Leslie C. Morey, Department of Psychology, Texas A&M

University; Emily B. Ansell, Department of Psychiatry, Yale School of

Medicine; John C. Markowitz, New York State Psychiatric Institute, New

York, New York, and Department of Psychiatry, Columbia University

College of Physicians and Surgeons; Mary C. Zanarini, Department of

Psychiatry, Harvard Medical School; Carlos M. Grilo, Department of

Psychiatry, Yale School of Medicine.

Writing of this article was supported by the Office of Academic Affiliations,

Advanced Fellowship Program in Mental Illness Research and Treatment,

Department of Veterans Affairs. Research was supported by National Institute

of Mental Health Grants MH 50837, 50838, 50839, 50840, 50850, and

MH073708, awared to Charles A. Sanislow. This publication has been re-

viewed and approved by the Publications Committee of the Collaborative

Longitudinal Personality Disorders Study.

Correspondence concerning this article should be addressed to Doug-

las B. Samuel, who is now at Department of Psychological Sciences,

Purdue University, West Lafayette, IN 47907. E-mail: dbsamuel@

purdue.edu

This document is copyrighted by the American Psychological Association or one of its allied publishers.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

2013, Vol. 81, No. 4, 650– 659 0022-006X/13/$12.00 DOI: 10.1037/a0032813

650

The complexity of diagnosing personality disorders (PDs) has

been a long-standing issue in psychiatry (Westen, 1997; Zimmer-

man, 1994). Several methods exist for diagnosing PDs, including

semistructured diagnostic interviews, self-report questionnaires,

clinician-rated Q-sort instruments, as well as unstructured diagno-

ses made by treating clinicians (McDermut & Zimmerman, 2005).

Although research has relied primarily on semistructured diagnos-

tic interviews and self-report questionnaires, therapists typically

base PD diagnoses on their unstructured interviews and clinical

contacts with patients (Perry, 1992; Westen, 1997; Zimmerman,

2011). Despite debate regarding the relative merits of different

diagnostic methods (Westen & Muderrisoglu, 2003; Zimmerman

& Mattia, 1999), no study has yet compared the predictive validity

of clinicians’ naturalistic PD diagnoses with those from self-report

questionnaires or semistructured interviews (Zimmerman, 2011).

Existing research has repeatedly indicated that clinician-

generated PD diagnoses do not agree well with those from self-

report measures (Davidson, Obonsawin, Seils, & Patience, 2003;

Hyler, Rieder, Williams, & Spitzer, 1989; Morey, Blashfield,

Webb, & Jewell, 1988; Rossi, Van den Brande, Tobac, Sloore, &

Hauben, 2003) or semistructured interviews (Dreessen & Arntz,

1999; Fridell & Hesse, 2006; Samuel & Widiger, 2010). This poor

agreement is not unique to PDs, and has been noted for various

psychiatric diagnoses (Rettew, Lynch, Achenbach, Dumenci, &

Ivanova, 2009). More importantly, fundamental questions regard-

ing the incremental predictive validity of diagnoses assigned by

clinicians relative to different methods have not been answered.

Research has compared the validity of self- and informant reports

of PD (Klein, 2003; Oltmanns & Turkheimer, 2009), but there is

a critical need for analogous work comparing clinical diagnoses

with other methods. Such work is crucial for determining whether

and how different sources of information might be usefully com-

bined. Currently, the optimal approach for how researchers and

clinicians should most validly identify PDs remains unclear.

Although research on clinical judgment offers reasons for skep-

ticism about the validity of clinician ratings in general (Grove,

Zald, Lebow, Snitz, & Nelson, 2000; Meehl, 1954), there are

compelling reasons to believe that their PD diagnoses might be

useful and valid. Therapists’ diagnostic impressions rely on exten-

sive training and take into consideration information about the

client’s life gleaned across extended periods of clinical interac-

tions. Pilkonis, Heape, Ruddy, and Serrao (1991) noted “clinical

judgment, of course, has its own limitations, but it would seem

unwise to develop assessment tools that are unrelated to thoughtful

clinical experience” (p. 46). In addition, Westen (1997) suggested

that clinicians take a holistic approach to diagnosis, situating them

well to describe complex personality pathology. Others contend

that clinicians’ PD ratings are superior to self-report because

patients’ ability to accurately assess their own personality might be

limited by mood states, lack of insight, or presentation biases

(Ganellen, 2007; Huprich, Bornstein, & Schmitt, 2011). Finally,

Morey et al. (1988) suggested that semistructured diagnostic in-

terviews also might have limitations because “a relatively brief

interview situation does not seem particularly well suited to the

task of assessing long-term personological characteristics” (p. 47).

Despite these concerns, there are reasons to believe that patient-

reported information from semistructured interviews and/or self-

report questionnaires can usefully contribute to PD diagnoses

(Zimmerman & Mattia, 1999). Thus, although clinicians might not

routinely ask direct questions about PD symptoms or use semi-

structured interviews and self-report questionnaires, they incorpo-

rate such information to inform their diagnoses if it is available.

Importantly, because semistructured interviews explicitly assess

the longitudinal presence of PD symptoms, they might have

greater ability to disentangle episodic state artifacts from more

durable trait-based PD syndromes (Loranger et al., 1991; Morey et

al., 2010).

As treating therapists almost always play the primary role in

diagnosing PDs in clinical settings, understanding the relative

validity of their impressions carries particular importance. Com-

paring clinicians’ diagnoses with those from self-report question-

naires or semistructured diagnostic interviews would be useful for

prospectively predicting clinically relevant outcomes that extend

beyond specific diagnostic features, such as psychosocial function-

ing. We conducted such a comparison using data from the Col-

laborative Longitudinal Personality Disorders Study (CLPS;

Gunderson et al., 2000). The CLPS is well suited for this investi-

gation as the baseline assessment included diagnoses from treating

clinicians collected using a modified version of the Personality

Assessment Form (PAF; Shea, Glass, Pilkonis, Watkins, & Do-

cherty, 1987; Shea et al., 1990). This allowed them to record the

degree to which patients evinced the prototypical characteristics of

each of four study PDs (viz., schizotypal, borderline, avoidant,

obsessive-compulsive).

The PAF provides a relevant, externally valid method for con-

ducting such an analysis as it closely approximates the way clini-

cians make PD diagnoses in clinical practice. The PAF’s format is

also timely, as it uses a prototype-matching approach that mirrors

the original proposal for diagnosing PDs in the Diagnostic and

Statistical Manual of Mental Disorders, fifth edition (DSM–5). In

fact, the PAF and research that had used it were cited as primary

support for the Work Group’s proposal (Skodol, Bender, et al.,

2011; Skodol, Clark, et al., 2011). This proposal subsequently was

criticized by a number of PD scholars (Pilkonis, Hallquist, Morse,

& Stepp, 2011; Widiger, 2011; Zimmerman, 2011) and aban-

doned. Nonetheless, other prominent researchers and clinicians

have strongly argued that the prototype-matching approach should

become the standard method of PD diagnosis (Shedler et al.,

2010).

The benefit and goal of using the PAF for collecting clinicians’

impressions is to maximize external validity (i.e., most closely

match the type of PD diagnoses typically made in clinical prac-

tice), not to provide equivalence with other methods (Westen &

Weinberger, 2004). Westen and colleagues have demonstrated that

when clinicians administer a systematic clinical interview (i.e., the

Clinical Diagnostic Interview, CDI; Westen, 2004) and record

their impressions using the Shedler-Westen Assessment Procedure

(SWAP; Westen & Shedler, 1999), their PD diagnoses become

more reliable across independent raters (Westen & Muderrisoglu,

2006; Westen, Shedler, Bradley, & DeFife, 2012). Although in-

formative, such a diagnostic strategy (i.e., a 2-hr administration of

the CDI followed by the sorting of 200 SWAP items) is not

standard practice in naturalistic settings. Perhaps recognizing this,

Westen and his colleagues have also been the primary proponents

of the prototype-matching approach (Shedler & Westen, 2004;

Westen, DeFife, Bradley, & Hilsenroth, 2010; Westen, Shedler, &

Bradley, 2006) that helped inform the original DSM–5 proposal

(Skodol, Bender, et al., 2011; Skodol, Clark, et al., 2011). The

This document is copyrighted by the American Psychological Association or one of its allied publishers.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

651

VALIDITY OF CLINICIANS’ PERSONALITY DISORDER DIAGNOSES

PAF’s prototype-matching format makes it a reasonable choice for

collecting treating clinicians’ PD diagnoses in this study.

We compared the incremental validity of clinicians’ diagnoses

of these four PDs assigned via the PAF with those generated by a

semistructured interview and self-report questionnaire for predict-

ing psychosocial functioning assessed prospectively over 5 years.

Given the published support for the validity of the prototype-

matching approach (Westen et al., 2012), we hypothesized that

clinicians’ PAF ratings would account for variance in functioning

beyond that captured by self-report questionnaires or semistruc-

tured interviews. Nonetheless, we also recognized that all previous

findings concerning the relative validity of alternative diagnostic

methods have suggested that the methods are mutually informative

(Hopwood et al., 2008; Klein, 2003). Thus, we also hypothesized

that the self-report and semistructured interview methods would

have unique strengths and demonstrate incremental predictive va-

lidity beyond the clinician-assigned diagnoses. Finally, to account

for inadequate familiarity with patients that might disadvantage the

clinicians’ PAF ratings, we conducted additional analyses using

only the subset of cases, whom clinicians had treated for at least 1

year prior to providing the diagnoses. This choice of a 1-year

interval of treatment ensured adequate familiarity with a patient’s

personality pathology.

Method

Study participants were drawn from the 668 participants re-

cruited from the multiple CLPS clinical sites. Appropriate Institu-

tional Review Boards approved the study. Participants who pro-

vided written, informed consent underwent diagnostic interviews

and completed self-report questionnaires as part of a standardized

battery. Detailed recruitment and diagnostic procedures have been

published elsewhere (Gunderson et al., 2000). Briefly, participants

were assigned to one of four PD groups (borderline, avoidant,

schizotypal, and obsessive-compulsive [OC]), or to major depres-

sive disorder (MDD) without any PD. These PD diagnostic as-

signments were based on the Diagnostic Interview for DSM–IV

Personality Disorders (DIPD-IV; Zanarini, Frankenburg, Sickel, &

Yong, 1996), reliably administered by trained research personnel.

For inclusion, these diagnoses required confirmation by a self-

report questionnaire (e.g., Schedule for Nonadaptive and Adaptive

Personality–2; SNAP-2; Clark, Simms, Wu, & Casillas, in press)

and/or the treating clinician’s PAF ratings. Furthermore, because

inclusion demanded either a self-report or clinician-assigned diag-

nosis, in a subset of participants the semistructured interview-

assigned diagnosis disagreed with the clinicians’ ratings and was

instead confirmed by the self-report questionnaire.

Participants used for the current analyses were 320 individuals

from the CLPS with available PAF ratings completed by a treating

clinician at baseline. Independent sample t tests and chi-square

tests demonstrated no significant differences between participants

with PAF scores and the larger CLPS sample in gender, age, or

ethnicity. Independent samples t tests revealed that this subsample

differed in diagnosis and functioning, perhaps reflecting that par-

ticipants with PAF ratings were in ongoing psychiatric or psycho-

logical treatment. Participants with PAF ratings met more criteria

for borderline PD according to the DIPD-IV at baseline (M ! 4.4,

SD ! 2.7) than did those without available PAF ratings (M ! 2.6,

SD ! 2.5), t(729) ! 9.3, p " .01. Differences for the other three

studied PDs on the DIPD-IV were nonsignificant. Baseline

SNAP-2 PD scores were significantly greater for the studied group

for all four PDs. Participants with available PAF ratings did not

differ from those without in terms of psychosocial functioning

measured by the Social Adjustment Scale, Self-Report (SAS-SR),

t(700) ! 1.1, p ! .28, but did differ significantly according to the

Longitudinal Interval Follow-Up Evaluation (LIFE), t(727) ! 5.3,

p " .01.

Average age of the participants at baseline was 32.9 years

(SD ! 7.9, range ! 18 – 45); 199 (62%) were women; the ethnic

breakdown was 237 (74%) Caucasian, 35 (11%) African Ameri-

can, 39 (12%) Hispanic, six (2%) Asian American, and three (1%)

“other.” Of the participants, 73 (23%) were assigned to the

avoidant, 128 (40%) to the borderline, 54 (17%) to the obsessive-

compulsive, 37 (12%) to the schizotypal, and 28 (9%) to the MDD

without PD groups. Clinicians reported clinical contact with the

patients ranging from 0 to 884 weeks, with a mean of 53.7 (SD !

89.7) at the time of providing the PAF ratings. Their confidence in

their diagnostic ratings evinced a mean of 2.26 (on a 1– 5 metric,

where 1 ! high and 5 ! low; SD ! 1.12).

Personality Disorder Measures

DIPD-IV (Zanarini et al., 1996). The DIPD-IV is a semi-

structured diagnostic interview for assessing the Diagnostic and

Statistical Manual of Mental Disorders, fourth edition (DSM–IV;

American Psychiatric Association, 1994) PDs. Each criterion is

assessed with one or more questions rated on a 3-point scale (0 !

not present;1! present but of uncertain clinical significance;2!

present and clinically significant). The DIPD-IV requires that

criteria be pervasive, present for at least 2 years, and characteristic

of the person for most of his or her adult life. In the CLPS sample,

interrater reliability (based on 84 pairs of raters) kappa coefficients

ranged from .58 to 1.00 (Zanarini et al., 2000). The current report

considered only the DIPD-IV scores for the four PDs studied in

CLPS.

SNAP-2 (Clark et al., in press). Comprising 390 true/false

statements, the SNAP-2 provides a self-report assessment of 12

pathological personality traits derived from an iterative factor

analytic process. The SNAP-2 includes scales assessing the

DSM–IV PDs, ranging in length from 19 (avoidant) to 34 (antiso-

cial) items. Although most DSM–IV PD scale items are also scored

for one of the trait scales, a number of items were added to

explicitly tap additional content. The PD scales can be scored

dimensionally or by individual diagnostic criteria to yield categor-

ical diagnoses. In the full CLPS sample, the SNAP-2 PD scale

internal consistencies ranged from .69 (OCPD) to .88 (avoidant),

with an overall median of .83. The SNAP-2 PD scores correlate

consistently with those from other self-report PD inventories (Wi-

diger & Boyd, 2009) and structured PD diagnostic interviews

(Samuel et al., 2011). The current report only included the SNAP-2

scores for the four CLPS PDs.

PAF (Shea et al., 1987, 1990). The PAF was adapted for the

DSM–IV PDs from a measure developed for the National Institute

of Mental Health Treatment of Depression Collaborative Research

Program (Elkin, Parloff, Hadley, & Autry, 1985). Its purpose was

to provide a standardized method to quantify clinicians’ routine

clinical diagnoses. Thus, it was designed to maximize external

validity and mirror the type of PD ratings and diagnoses made in

This document is copyrighted by the American Psychological Association or one of its allied publishers.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

652

SAMUEL ET AL.

clinical practice. The PAF used in CLPS contained three to four

sentence prototypical descriptions for each of the four PDs studied

(schizotypal, borderline, avoidant, and obsessive-compulsive) as

well as several “cues” to aid clinicians in rating a patient’s match

to the prototypes. The instrument is available by request to the first

author. Clinicians rated all four of the studied PDs on a 1–6 scale,

where 1 indicated not at all and 6 indicated that the patient

matched the prototype to an extreme degree. Consistent with

previous research (Shea et al., 1990), a score ! 4 indicated a

categorical diagnosis. Clinicians could also indicate no informa-

tion or insufficient data for a particular PD, although they used this

only rarely (24 times across the four PDs in the sample of 320).

Those values were recoded as missing for the current analyses. The

mean PAF ratings were 1.95 (SD ! 1.20) for schizotypal PD, 2.94

(SD ! 1.55) for borderline PD, 2.49 (SD ! 1.34) for avoidant PD,

and 2.08 (SD ! 1.33) for OCPD.

Psychosocial Functioning Measures Serving as

Independent External Criteria

Multiple measures of psychosocial functioning served as exter-

nal outcome criteria. These were independent of specific PD

symptoms and used two independent assessment methods. Both

aspects are crucial for the current purposes, as independent, exter-

nal criteria provide the only opportunity to discriminate validity

among different methods of PD diagnosis. To assess psychosocial

functioning, CLPS research team interviewers administered the

LIFE (Keller et al., 1987), a structured interview assessing func-

tioning in interpersonal relationships and occupational and recre-

ational domains. Most areas of functioning are rated on 5-point

severity scales (1 ! no impairment, high level of functioning or

very good functioning and 5 ! severe impairment or very poor

functioning). Participants also completed the SAS-SR (Weissman

& Bothwell, 1976), a self-report instrument yielding estimates of

interpersonal, occupational, and recreational functioning. The

LIFE and SAS-SR were administered at baseline and repeated at

predetermined intervals, including the 5-year follow-up. The same

interviewers administered both interviews (i.e., the LIFE and

DIPD-IV) at a given assessment interval; however, it was unlikely

that the interviewer who administered the DIPD-IV at baseline

also administered the LIFE at 5-year follow-up.

Data Analytic Procedures

We first examined the convergent validity of clinicians’ PAF

diagnoses with those from a semistructured diagnostic interview

(DIPD-IV) and self-report questionnaire (SNAP-2). PAF dimen-

sional ratings were compared with those from the DIPD-IV and

SNAP-2 (all at baseline) for their ability to predict functional

outcomes at the 60-month follow-up (via the LIFE and SAS-SR)

using hierarchical regression analyses. For example, the clinicians’

baseline PAF ratings for the four PDs were entered simultaneously

in one step, followed by the baseline PDs ratings from the

SNAP-2. This was then repeated with the order of entry reversed.

To account for possible contamination due to shared method

variance, we conducted these analyses separately using the self-

report criterion and again with the interview-based criterion vari-

able.

PAF diagnoses had been used to confirm the DIPD-IV diagnosis

for a subset of participants, creating a potential confound. Al-

though our use of functional outcomes rather than diagnostic

information as criteria attenuates this possibility, we nonetheless

examined it by performing a parallel set of analyses restricted to a

subsample of 110 participants for whom the PAF disagreed with

the DIPD-IV at baseline and thus was not required for study

inclusion. In this subsample, PAF ratings would potentially have

greater ability to increment the DIPD-IV scores.

Results

Categorical and Dimensional Agreement

Table 1 provides the agreement between PAF ratings and those

from the DIPD-IV and SNAP-2. Categorical agreement (kappas)

between treating clinicians’ diagnoses and the semistructured di-

agnostic interview ranged from of .21 (avoidant) to .42 (schizo-

typal), whereas dimensional agreement (Pearson correlations)

ranged from .30 (avoidant) to .44 (borderline). Agreement between

clinicians’ ratings and self-report questionnaire was lower than

between clinicians’ ratings and semistructured diagnostic inter-

views, with kappas ranging from .00 (OCPD) to .20 (borderline)

and Pearson correlations ranging from .18 (schizotypal) to .28

(borderline). For context, we note that agreement between

DIPD-IV and SNAP-2 in the current sample ranged from .25

(OCPD) to .51 (avoidant) for categorical diagnoses and from .57

(schizotypal) to .72 (avoidant) for dimensional ratings.

Incremental Predictive Validity

Tables 2–6 summarize the hierarchical regression analyses. Ta-

ble 2 shows that the DIPD-IV provided significant increment

beyond the PAF for predicting functioning assessed by both the

SAS-SR and LIFE. In contrast, clinicians’ ratings did not signifi-

cantly increment the DIPD-IV interview results for either criterion.

The nonsignificant #R

when the PAF block was added does not

indicate that all PAF diagnoses lacked validity, as the individual

schizotypal rating from the PAF was a significant predictor ($!

.15; p " .05). Table 3 summarizes the parallel series of analyses on

Table 1

Dimensional and Categorical Agreement of Clinician PD

Diagnostic Ratings With Interview Generated and Self-Report

PD Scores

PAF ratings

DIPD-IV criteria

counts SNAP-2 PD scores

% r % r

Schizotypal .42 .40 .01 .18

Borderline .38 .44 .20 .28

Avoidant .21 .30 .14 .23

OCPD .24 .30 .00 .20

Note. n ! 320. Kappa between diagnoses provided by PAF (!4) and

from DIPD-IV and SNAP-2 (meeting diagnostic criteria threshold). Di-

mensional agreements represent Pearson correlations of PAF ratings (1–6)

with scores from DIPD-IV and SNAP-2. PD ! personality disorder;

PAF ! Personality Assessment Form; DIPD-IV ! Diagnostic Interview

for DSM–IV Personality Disorders; SNAP-2 ! Schedule for Nonadaptive

and Adaptive Personality–2; OCPD ! obsessive-compulsive personality

disorder.

This document is copyrighted by the American Psychological Association or one of its allied publishers.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

653

VALIDITY OF CLINICIANS’ PERSONALITY DISORDER DIAGNOSES

Convergent and incremental predictive validity of clinician, self-report, and structured interview diagnoses for personality disorders over 5 years.

Figures

Citations

Childhood maltreatment, personality disorders and 3-year persistence of adult alcohol and nicotine dependence in a national sample.

Determinants and Predictive Value of Clinician Assessment of Short-Term Suicide Risk.

Utilizing interview and self-report assessment of the Five-Factor Model to examine convergence with the alternative model for personality disorders

When and How to use Multiple Informants to Improve Clinical Assessments

Clinicians and clients disagree: Five implications for clinical science.

References

Diagnostic and Statistical Manual of Mental Disorders

Diagnostic and Statistical Manual of Mental Disorders, 4th Ed.

Guidelines, Criteria, and Rules of Thumb for Evaluating Normed and Standardized Assessment Instruments in Psychology.

Assessment of Social Adjustment by Patient Self-Report

The Longitudinal Interval Follow-up Evaluation. A comprehensive method for assessing outcome in prospective longitudinal studies

Related Papers (5)

The DSM-5 dimensional trait model and five-factor models of general personality.

Initial construction of a maladaptive personality trait model and inventory for DSM-5.

Plate tectonics in the classification of personality disorder: shifting to a dimensional model.

An other perspective on personality: meta-analytic integration of observers' accuracy and predictive validity.

Assessment and Diagnosis of Personality Disorder: Perennial Issues and an Emerging Reconceptualization

Frequently Asked Questions (2)

Q1. What contributions have the authors mentioned in the paper "Convergent and incremental predictive validity of clinician, self-report, and structured interview diagnoses for personality disorders over 5 years" ?

Q2. What future works have the authors mentioned in the paper "Convergent and incremental predictive validity of clinician, self-report, and structured interview diagnoses for personality disorders over 5 years" ?