scispace - formally typeset
Open AccessJournal ArticleDOI

Dissecting racial bias in an algorithm used to manage the health of populations

TLDR
It is suggested that the choice of convenient, seemingly effective proxies for ground truth can be an important source of algorithmic bias in many contexts.
Abstract
Health systems rely on commercial prediction algorithms to identify and help patients with complex health needs. We show that a widely used algorithm, typical of this industry-wide approach and affecting millions of patients, exhibits significant racial bias: At a given risk score, Black patients are considerably sicker than White patients, as evidenced by signs of uncontrolled illnesses. Remedying this disparity would increase the percentage of Black patients receiving additional help from 17.7 to 46.5%. The bias arises because the algorithm predicts health care costs rather than illness, but unequal access to care means that we spend less money caring for Black patients than for White patients. Thus, despite health care cost appearing to be an effective proxy for health by some measures of predictive accuracy, large racial biases arise. We suggest that the choice of convenient, seemingly effective proxies for ground truth can be an important source of algorithmic bias in many contexts.

read more

Content maybe subject to copyright    Report

UC Berkeley
UC Berkeley Previously Published Works
Title
Dissecting racial bias in an algorithm used to manage the health of populations.
Permalink
https://escholarship.org/uc/item/6h92v832
Journal
Science (New York, N.Y.), 366(6464)
ISSN
0036-8075
Authors
Obermeyer, Ziad
Powers, Brian
Vogeli, Christine
et al.
Publication Date
2019-10-01
DOI
10.1126/science.aax2342
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California

RESEARCH ARTICLE
ECONOMICS
Dissecting racial bias in an algorithm used to manage
the health of populations
Ziad Obermeyer
1,2
*, Brian Powers
3
, Christine Vogeli
4
, Sendhil Mullainathan
5
*
Health systems rely on commercial prediction algorithms to id entify and help patients with complex
health needs. We show that a widely used algorithm, typical of this industry-wide approach and
affecti ng millions of patients, exhibits significant racial bias: At a given risk score, Black patients
are considerably sicker than White pati ents, as evidenced by signs of uncontrolled illnesses.
Remedying this disparity would increase the percentage of Black patients receiving additional
help from 17.7 to 46.5%. The bias arises because the algorithm predicts health care costs rather than
ill ness, b ut unequal access to care means that we spend less money caring for Black pat ients than
for White patie nts. Thus, despite hea lth care cost appearing to be a n effective proxy for hea lth
by some measures of predictive accuracy, large racial biases arise. We suggest that the choice of
convenient, seemingly effective proxies for ground truth can be an important source of algorithmic
bias in many contexts.
T
here is growing concern that algorithms
may reproduce racial and gender dis-
parities via the people building them or
through the data used to train them (13).
Empirical work is increasingly lending
support to these concerns. For example, job
search ads for highly paid positions are less
likely to be presented to women (4), searches
for distinctively Black-sounding names are
more likely to trigger ads for arrest records
(5), and image searches for professions such
as CEO produce fewer images of women ( 6).
Facial recognition systems increasingly used
in law enfo rcement perform worse on rec og-
nizing faces of women and Black individuals
(7, 8), and natural language processing algo-
rithms encode language in gendered ways (9).
Empirical investigations of algorithmic bias,
though,havebeenhinderedbyakeyconstraint:
Algorithms deployed on large scales are typically
proprietary, making it difficult for indepen-
dent researchers to dissect them. Instead, re-
searchers must work from the outside, often
with great ingenuity, and resort to clever work-
arounds such as audit studies. Such efforts can
document disparities, but understanding how
and why they arisemuch less figuring out
what to do about themis difficult without
greater access to the algorithms themselves.
Our understanding of a mechanism therefore
typically relies on theory or exercises with
researcher-created algorithms (1013). With-
out an algorithms training data, objective func-
tion, and prediction methodology, we can only
guess as to the actual mechanisms for the
important algorithmic disparities that arise.
In this study, we exploit a rich dataset that
provides insight into a live, scaled algorithm
deployed nationwide today. It is one of the
largest and most typical examples of a class
of commercial risk-prediction tools that, by
industry estimates, are applied to roughly
200 million people in the United States each
year. Large health systems and payers rely on
this algorithm to target patients for high-risk
care management programs. These programs
seek to improve the care of patients with
complex health needs by providing additional
resources, including greater attention from
trained providers, to help ensure that care is
well coordinated. Most health systems use
these programs as the cornerstone of pop-
ulation health management efforts, and they
are widely considered effective at improving
outcomes and satisfaction while reducing costs
(1417). Because the programs are themselves
expensivewith costs going toward teams of
dedicated nurses, extra primary care appoint-
ment slots, and other scarce resources
health
systems rely extensively on algorithms to iden-
tify patients who will benefit the most (18, 19).
Identifying patients who will derive the
greatest benefit fromtheseprogramsisa
challenging c ausal inference problem that
requires estimation of individual treatment ef-
fects. To solve this problem, health systems
make a key assumption: Those with the great-
est care needs will benefit the most from the
program. Under this assumption, the targeting
problem becomes a pure prediction policy prob-
lem (20). Developers then build algorithms
that rely on past data to build a predictor of
future health care needs.
Our dataset describes one such typical algo-
rithm. It contains both the algorithmspredic-
tionsaswellasthedataneededtounderstand
its inner workings: that is, the underlying in-
gredients used to form the algorithm (data,
objective function, etc.) and links to a rich
set of outcome data. Because we have the
inputs, outputs, and eventual outcomes, our
data allow us a rare opportunity to quantify
racial disparities in algorithms and isolate the
mechanisms by which they arise. It should be
emphasized that this algorithm is not unique.
Rather, it is emblematic of a generalized ap-
proach to risk predi ction in the health sec-
tor, widely adopted by a range of for- and
non-profit medical centers and governmental
agencies (21).
Our analysis has implications beyond what
we learn about this particular algorithm. First,
the specific problem solved by this algorithm
has analogies in many other sectors: The pre-
dicted risk of some future outcome (in our
case, health care needs) is widely used to tar-
get policy interventions under the assumption
that the treatment effect is monotonic in that
risk, and the methods used to build the algo-
rithm are standard. Mechanisms of bias un-
covered in this study likely operate elsewhere.
Second, even beyond our particular finding,
we hope that this exercise illustrates the im-
portance, and the large opportunity, of study-
ing algorithmic bias in health care, not just
as a model system but also in its own right. By
any standarde.g., number of lives affected,
life-and-death consequences of the decision
health is one of the most important and wide-
spread social sectors in which algorithms are
already used at sc ale today , unbekno wnst
to many.
Data and analytic strategy
Working with a large academic hospital, we
identified all primary care patients enrolled
in risk-based contracts from 2013 to 2015. Our
primary interest was in studying differences
between White and Black patients. We formed
race categories by using hospital records, which
are based on patient self-reporting. Any patient
who identified as Black was considered to be
Black for the purpose of this analysis. Of the
remaining patients, those who self-identified
as races other than White (e.g., Hispanic) were
so considered (data on these patients are pre-
sented in table S1 and fig. S1 in the supplemen-
tary materials). We considered all remaining
patients to be White. This approach allowed
us to study one particular racial difference of
social and historical interest between patients
who self-identified as Black and patients who
self-identified as White without another race
or ethnicity; it has the disadvantage of not
allowing for the study of intersectional racial
RESEARCH
Obermeyer et al., Science 366, 447453 (2019) 25 October 2019 1of7
1
School of Public Health, University of California, Berkeley,
Berkeley, CA, USA.
2
Department of Emergency Medicine,
Brigham and Womens Hospital, Boston, MA, USA.
3
Department of Medicine, Brigham and Womens Hospital,
Boston, MA, USA.
4
Mongan Institute Health Policy Center,
Massachusetts General Hospital, Boston, MA, USA.
5
Booth
School of Business, University of Chicago, Chicago, IL, USA.
*These authors contributed equally to this work.
Corresponding author. Email: sendhil.mullainathan@
chicagobooth.edu
on February 12, 2020 http://science.sciencemag.org/Downloaded from

and ethnic identities. Our main sample thus
consisted of (i) 6079 patients who self-identified
as Black and (ii) 43,539 patients who self-
identified as White without another race or
ethnicity, whom we observed over 11,929 and
88,080 patient-years, respectively (1 patient-
year represents data collected for an indivi-
dual patient in a calendar year). The sample
was 71.2% enrolled in commercial insurance
and 28.8% in Medicare; on average, 50.9 years
old; and 63% female (Table 1).
For these patients, we obtained algorith-
mic risk scores generated for each patient-
year. In the health system we studied, risk
scores are generated for each patient during
the enrollment period for the systemscare
management program. Patie nts above the
97th percentile are automatically i dentified
for enrollment in the program. Those above
the55thpercentilearereferredtotheirpri-
mary care physician, who is provided with
contextual data about the patients and asked
to consider whether they would benefit from
program enrollment.
Many existing metrics of algorithmic bias
may apply to this scenario. Some definitions
focus on calibration [i.e., whether the realized
value of some variable of interest Y matches
the risk score R (2, 22, 23)]; others on statis-
ticalparityofsomedecisionD influenced by
the algorithm (10); and still others on balance
of average predictions, conditional on the real-
ized outcome (22). Given this multiplicit y and
the growing recognition that not all condi-
tions can be simultaneously satisfied (3, 10, 22),
we focus on metrics most relevant to the real-
world use of the algorithm, which are related
to calibration bias [formally, comparing Blacks
B and Whites W, E½Y jR; W ¼E½Y jR; B indi-
cates the absence of bias (here, E is the ex-
pectation operator)]. The algorithms stated
goal is to predict complex health needs for the
purpose of targe ting an inter ven tion that
manages those needs. Thus, we compare the
algorithmic risk score for patient i in year t
(R
i,t
), formed on the basis of claims data X
i,(t1)
from the prior year , to data on patients real-
ized health H
i,t
, assessing how well the algo-
rithmic risk score is calibrated across race for
health outcomes H
i,t
. We also ask how well the
algorithm is calibrated for costs C
i,t
.
To measure H, we link predictions to a wide
range of outcomes in electronic health record
data, including all diagnoses (in the form of
International Classification of Diseases codes)
as well as key quantitative laboratory studies
and vital signs capturing the severity of chro-
nic illnesses. To measure C, we link predictions
to insurance claims data on utilization, includ-
ing outpatient and emergency visits, hospital-
izations, and health care costs. These data, and
the rationale for the specific measures of H
used in this study, are described in more detail
in the supplementary materials.
Health disparities conditional on risk score
We begin by calculating an overall measure of
health status, the number of active chronic
conditions [or comorbidity score, a metric
used extensively in medical research (24)to
provide a comprehensive view of a patients
health (25)] by race, conditional on algorith-
mic risk score. Fig. 1A shows that, at the same
level of algorithm-predicted risk, Blacks have
significantly more illness burden than Whites.
We can quantify these differences by choosing
one point on the x axis that corre sponds to
a very-high-risk group (e.g., patients at the
97th percentile of risk score, at which patients
are auto-identified for program enrollment),
where Blacks have 26.3% more chronic ill-
nesses than Whites (4.8 versus 3.8 distinct
conditions; P <0.001).
What do these prediction differences mean
for patients? Algorithm scores are a key input
to decisions about future enrollment in a care
coordination program. So as we might expect,
with less-healthy Blacks scored at similar risk
scores to more-healthy Whites, we find evidence
Obermeyer et al., Science 366, 447453 (2019) 25 October 2019 2of7
Table 1. Descriptive statistics on our sample, by race. BP, blood pressure; LDL, low-density
lipoprotein.
White Black
n (patient-years) 88,080 11,929
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
n (patients) 43,539 6079
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Demographics
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Age 51.3 48.6
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Female (%) 62 69
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Care management program
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Algorithm score (percentile) 50 52
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Race composition of program (%) 81.8 18.2
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Care utilization
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Actual cost $7540 $8442
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Hospitalizations 0.09 0.13
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Hospital days 0.50 0.78
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Emergency visits 0.19 0.35
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Outpatient visits 4.94 4.31
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Mean biomarker values
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
HbA1c (%) 5.9 6.4
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Systolic BP (mmHg) 126.6 130.3
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Diastolic BP (mmHg) 75.5 75.7
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Creatinine (mg/dl) 0.89 0.98
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Hematocrit (%) 40.7 37.8
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
LDL (mg/dl) 103.4 103.0
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Active chronic illnesses (comorbidities)
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Total number of active illnesses 1.20 1.90
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Hypertension 0.29 0.44
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Diabetes, uncomplicated 0.08 0.22
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Arrythmia 0.09 0.08
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Hypothyroid 0.09 0.05
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Obesity 0.07 0.18
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Pulmonary disease 0.07 0.11
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Cancer 0.07 0.06
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Depression 0.06 0.08
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Anemia 0.05 0.10
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Arthritis 0.04 0.04
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Renal failure 0.03 0.07
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Electrolyte disorder 0.03 0.05
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Heart failure 0.03 0.05
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Psychosis 0.03 0.05
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Valvular disease 0.03 0.02
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Stroke 0.02 0.03
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Peripheral vascular disease 0.02 0.02
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Diabetes, complicated 0.02 0.07
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Heart attack 0.01 0.02
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
Liver disease 0.01 0.02
............ ............... ................ ................ ................ ................ ............... ................ ............. ................ ................ ................ ................ ..............
RESEARCH | RESEARCH ARTICLE
on February 12, 2020 http://science.sciencemag.org/Downloaded from

of substantial disparities in program screening.
We quantify this by simulating a counterfactual
world with no gap in health conditional on
risk. Specifically, at some risk threshold a,we
identify the supramarginal White patient (i)
with R
i
> a and compare this patientshealth
to that of the inframarginal Black patient ( j )
with R
j
< a.IfH
i
> H
j
,asmeasuredbynumber
of chronic medical conditions, we replace the
(healthier, but supramarginal) White patient
with the (sicker , but inframarginal) Black patient.
We repeat this procedure until H
i
= H
j
,to
simulate an algorithm with no predictive gap
between Blacks and Whites. Fig. 1B shows the
results: At all risk thresholds a above the 50th
percentile, this procedure would increase the
fraction of Black patients. For example, at a =
97th percentile, among those auto-identified
for the program, the fraction of Black patients
would rise from 17.7 to 46.5%.
Wethenturntoamoremultidimensionalpic-
ture of the complexity and severity of patients
health status, as measured by biomarkers that
index the severi ty of the most common chro-
nic illnesses in our sample (as shown in Table 1).
This allows us to ident ify patients who might
derive a great deal of benefit from care man-
agement programs e.g., patients with severe
diabetes who are at risk of catastrophic com-
plications if they do not lower their blood sugar
(18, 26). (The materials and methods section
describes several experiments to rule out a large
effect of the program on these health measures
in year t; had there been such an effect, we
could not easily use the measures to assess the
accuracy of the algorithmspredictionsonhealth,
because the program is allocated as a func ti o n
of algorithm score.) Across all of these impor-
tant markers of health needsseverity of diabe-
tes, high blood pressure, renal failure, cholesterol,
and anemiawe find that Blacks are substan-
tially less healthy than Whites at any level of
algorithm predictions, as shown in Fig. 2. Blacks
have more-severe hypertension, diabe t e s, re n al
failure, and anemia, and higher cholesterol.
The magnitudes of these differences are large:
For example, differences in severity of hyper-
tension (systolic pressure: 5.7 mmHg) and
diabetes [glycated hemoglobin (HbA1c): 0.6%]
imply differences in all-cause mortality of 7.6%
(27 )and30%(28), respectively, calculated using
data from clinical trials and longitudinal studies.
Mechanism of bias
Anunusualaspectofourdatasetisthatwe
observe the algorithms inputs and outputs
as well as its objective function, providing us
a unique window into the mechanisms by
which bias arises. In our setting, the algorithm
takes in a large set of raw insurance claims
data X
i,t1
(features) over the year t 1: demo-
graphics (e.g., age, sex), insurance type, diag-
nosis and procedure codes, medications, and
detailed costs. Notably, the algorithm specifi-
cally excludes race.
The algorithm uses these data to predict Y
i,t
(i.e., the label). In this instance, the algorithm
takes total medical expenditures (for simplic-
ity, we denote costs C
t
)inyeart as the label.
Thus, the algorithms prediction on health
needs is, in fact, a prediction on health costs.
As a first check on this potential mechanism
ofbias,wecalculatethedistributionofreal-
ized costs C versus predicted costs R.Bythis
metric, one could call the algorithm unbiased.
Fig. 3A shows that, at every level of algorithm-
predicted risk, Blacks and Whites have (rough-
ly) the same costs the following year. In other
words, the algorithm s predictions are well cal-
ibrated across races. For example, at the med-
ian risk score, Black patients had costs of $514 7
versus $4995 for Whites (U.S. dollars); in the
top5%ofalgorithm-predictedrisk,costswere
$35,541 for Blacks versus $34,059 for Whites.
Obermeyer et al., Science 366, 447453 (2019) 25 October 2019 3of7
Defaulted into program
Defaulted into program
Referred for screen
Referred for screen
Percentile of Al
g
orithm Risk Score Percentile of Al
g
orithm Risk Score
Fraction Black
Number of active chronic conditions
Race
Black
White
Original
Simulated
AB
Fig. 1. Number of chronic illnesses versus algorithm-predicted risk,
by race. (A) Mean number of chronic conditions by race, plotted against
algorithm risk score. (B) Fraction of Black patients at or above a given risk
score for the original algorithm (original) and for a simulated scenario
that removes algorithmic bias (simulated: at each threshold of risk, defined
at a given percentile on the x axis, healthier Whites above the threshold are
replaced with less healthy Blacks below the threshold, until the marginal patient
is equally healthy). The × symbols show ris k percentiles by race; circles
show risk deciles with 95% confidence intervals clustered by patient. The
dashed vertical lines show the auto-identification threshold (the black
line, which denotes the 97th percentile) and the screening threshold (the gray
line, which denotes the 55th percentile).
RESEARCH | RESEARCH ARTICLE
on February 12, 2020 http://science.sciencemag.org/Downloaded from

Because these programs are used to target
patients with high costs, these results are large-
ly inconsistent with algorithmic bias, as mea-
sured by calibration: Conditional on risk score,
predictionsdonotfavorWhitesorBlacksany-
where in the risk distribution.
To summarize, we find substantial disparities
in health conditional on risk but little disparity
in costs. On the one hand, this is surprising:
Health care costs and health needs are highly
correlated, as sicker patients need and receive
more care, on average. On the other hand, there
are many opportunities for a wedge to creep in
between needing health care and receiving
health careand crucially, we find that wedge
to be correlated with race, as shown in Fig. 3B.
At a given level of health (again measured by
number of chronic illnesses), Blacks generate
lower costs than Whiteson average, $1801 less
per year , holding constant the number of chron-
ic illnesses (or $1144 less, if we instead hold
constant the specific individual illnesses that
contribute to the sum). Table S2 also shows
that Black patients generate very different
kinds of costs: for example, fewer inpatient
surgical and outpatient specialist costs, and
more costs related to emergency visits and
dialysis. These results suggest that the driv-
ing force behind the bias we detect is that
Black pati ents generate lesser medi cal ex-
penses, conditional on health, even when we
account for specific comorbidities. As a re-
sult, accurate prediction of costs necessarily
means being racially biased on health.
How might these disparities in cost arise?
Theliteraturebroadlysuggeststwomainpo-
tential channels. First, poor patients face sub-
stantial barriers to accessing health care, even
when enrolled in insurance plans. Although
the population we study is entirely insured,
there are many other mechanisms by which
poverty can lead to disparities in use of health
care: geography and differential access to trans-
portation, competing demands from jobs or
child care, or knowledge of reasons to seek care
(2931). To the extent that race and socioeco-
nomic status are correlated, these factors will
differentially affect Black patients. Second, race
could affect costs directly via several channels:
direct (taste-based) discrimination, changes
to the doctorpatient relationship, or others. A
recent trial randomly assigned Black patients
to a Black or White primary care provider and
found significantly higher uptake of recom-
mended preventive care when the provider was
Black (32). This is perhaps the most rigorous
demonstration of this effect, and it fits with a
larger literature on potential mechanisms by
which race can affect health care directly. For
example, it has long been documented that
Black patients have reduced trust in the health
care system (33), a fact that some studies trace
to the revelations of the Tuskegee study and
other adverse experiences (34). A substantial
literature in psychology has documented phys-
icians differential perceptions of Black patients,
in terms of intelligence, affiliation (35), or pain
tolerance (36). Thus, whether it is communi-
cation, trust, or bias, something about the inter-
actions of Black patients with the health care
system itself leads to reduced use of health care.
The collective effect of these many channels is
to lower health spending substantially for Black
patients, conditional on needafindingthathas
been appreciated for at least two decades (37).
Problem formulation
Our findings highlight the importance of the
choice of the label on which the algorithm is
trained. On the one hand, the algorithm man-
ufacturerschoicetopredictfuturecostsisrea-
sonable: The programs goal, at least in part, is
Obermeyer et al., Science 366, 447453 (2019) 25 October 2019 4of7
Race
Black White
0.0
0.1
0.2
0.3
Fraction with uncontrolled blood pressure
A Hypertension: Fraction clinic visits with SBP >139 mmHg
5.5
6.0
6.5
7.0
7.5
Mean HbA1c (%)
B Diabetes severity: HbA1c
90
100
110
10 20 30 40 50 60 70 80 90 100
Mean LDL (mg/dL)
C Bad cholesterol: LDL
−0.1
0.0
0.1
0.2
Mean creatinine (log mg/dL)
D Renal failure: creatinine (log)
Referred for screen
Defaulted into program
Defaulted into program
Referred for screen
Defaulted into program
Referred for screen
Defaulted into program
Referred for screen
Defaulted into program
Referred for screen
32.5
35.0
37.5
40.0
42.5
45.0
Percentile of Algorithm Risk Score
Percentile of Algorithm Risk ScorePercentile of Algorithm Risk Score
Percentile of Algorithm Risk Score
Percentile of Algorithm Risk Score
Mean Hematocrit (%)
E Anemia severity: hematocrit
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
0 1020 30405060 708090100
0 1020 304050 60 708090100
Fig. 2. Biomarkers of health versus
algorithm-predicted risk, by race. (A to
E) Racial differences in a range of biological
measures of disease severity, conditional
on algorithm risk score, for the most common
diseases in the population studied. The ×
symbols show risk percentiles by race, except
in (C) where they show risk ventiles; circles
show risk quintiles with 95% confidence
intervals clustered by patient. The y axis in
(D) has been trimmed for readability , so the
highest percentiles of values for Black patients
are not shown. The dashed vertical lines
show the auto-identification threshold (black
line: 97th percentile) and the screening
threshold (gray line: 55th percentile).
RESEARCH | RESEARCH ARTICLE
on February 12, 2020 http://science.sciencemag.org/Downloaded from

Citations
More filters
Posted Content

WILDS: A Benchmark of in-the-Wild Distribution Shifts

TL;DR: WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.
Journal ArticleDOI

Federated Learning for Healthcare Informatics

TL;DR: In this article, the authors provide a review of federated learning in the biomedical space, and summarize the general solutions to the statistical challenges, system challenges, and privacy issues in federated Learning, and point out the implications and potentials in healthcare.
Journal ArticleDOI

Explainability for artificial intelligence in healthcare: a multidisciplinary perspective.

TL;DR: There is a need to sensitize developers, healthcare professionals, and legislators to the challenges and limitations of opaque algorithms in medical AI and to foster multidisciplinary collaboration moving forward to ensure that medical AI lives up to its promises.
Journal ArticleDOI

AI in health and medicine

TL;DR: Key findings from a 2-year weekly effort to track and share key developments in medical AI are discussed, including prospective studies and advances in medical image analysis, which have reduced the gap between research and deployment.
References
More filters
Journal ArticleDOI

Anemia of Chronic Disease

TL;DR: Advances in knowledge of the causes and management of the anemia of chronic disease are discussed.
Journal ArticleDOI

Socioeconomic Disparities In Health: Pathways And Policies

TL;DR: Reducing SES disparities in health will require policy initiatives addressing the components of socioeconomic status (income, education, and occupation) as well as the pathways by which these affect health.
Journal ArticleDOI

Semantics derived automatically from language corpora contain human-like biases

TL;DR: This article showed that applying machine learning to ordinary human language results in human-like semantic biases and replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web.
Journal ArticleDOI

How to measure comorbidity: a critical review of available methods

TL;DR: The Charlson Index, the CIRS, the ICED and the Kaplan Index are valid and reliable methods to measure comorbidity that can be used in clinical research.
Frequently Asked Questions (10)
Q1. What have the authors contributed in "Dissecting racial bias in an algorithm used to manage the health of populations" ?

Gomez-Uribe et al. this paper proposed a uniform pricing in US retail chains, which was later validated by the National Bureau of Economic Research. 

As a first step, the authors suggested using the existing model infrastructure— sample, predictors ( excluding race, as before ), training process, and so forth—but changing the label: Rather than future cost, they created an index variable that combined health prediction with cost prediction. Building on these results, the authors are establishing an ongoing ( unpaid ) collaboration to convert the results of Table 3 into a better, scaled predictor of multidimensional health measures, with the goal of rolling these improvements out in a future round of algorithm development. These results suggest that label biases are fixable. 

Health care costs, though well measured and readily available in insurance claims data, are also the result of a complex aggregation process with a number of distortions due to structural inequality, incentives, and inefficiency. 

Because the program ultimately operates to improve the management of these conditions, patients with the most encounters related to them could also be a promising group on which to deploy preventative interventions. 

The authors then perform four counterfactual simulations to put these numbers in context; naturally, these simulations use only observable factors, not the many unobserved administrative and human factors that also affect enrollment. 

Although the population the authors study is entirely insured, there are many other mechanisms by which poverty can lead to disparities in use of health care: geography and differential access to transportation, competing demands from jobs or child care, or knowledge of reasons to seek care (29–31). 

Without an algorithm’s training data, objective function, and predictionmethodology, the authors can only guess as to the actual mechanisms for the important algorithmic disparities that arise. 

whether it is communication, trust, or bias, something about the interactions of Black patients with the health care system itself leads to reduced use of health care. 

An unusual aspect of their dataset is that the authors observe the algorithm’s inputs and outputsas well as its objective function, providing us a unique window into the mechanisms by which bias arises. 

For those enrolled in the high-risk care management program (1.3% of their sample), the authors first show the fraction of the population that is Black, as well asthe fraction of all costs and chronic conditions accounted for by these observations.