scispace - formally typeset
Open AccessJournal ArticleDOI

European Union Regulations on Algorithmic Decision-Making and a “Right to Explanation”

Bryce Goodman, +1 more
- 02 Oct 2017 - 
- Vol. 38, Iss: 3, pp 50-57
Reads0
Chats0
TLDR
It is argued that while this law will pose large challenges for industry, it highlights opportunities for computer scientists to take the lead in designing algorithms and evaluation frameworks which avoid discrimination and enable explanation.
Abstract
We summarize the potential impact that the European Union’s new General Data Protection Regulation will have on the routine use of machine learning algorithms. Slated to take effect as law across the EU in 2018, it will restrict automated individual decision-making (that is, algorithms that make decisions based on user-level predictors) which “significantly affect” users. The law will also effectively create a “right to explanation,” whereby a user can ask for an explanation of an algorithmic decision that was made about them. We argue that while this law will pose large challenges for industry, it highlights opportunities for computer scientists to take the lead in designing algorithms and evaluation frameworks which avoid discrimination and enable explanation.

read more

Content maybe subject to copyright    Report

Articles
50 AI MAGAZINE
I
n April 2016, for the rst time in more than two
decades, the European Parliament adopted a set of
comprehensive regulations for the collection, storage,
and use of personal information, the General Data Pro-
tection Regulation (GDPR)
1
(European Union, Parlia-
ment and Council 2016). The new regulation has been
described as a “Copernican Revolution” in data-protec-
tion law, “seeking to shift its focus away from paper-
based, bureaucratic requirements and towards compli-
ance in practice, harmonization of the law, and
individual empowerment” (Kuner 2012). Much in the
regulations is clearly aimed at perceived gaps and incon-
sistencies in the European Union’s (EU) current approach
to data protection. This includes, for example, the codi-
cation of the “right to be forgotten” (Article 17), and
Copyright © 2017, Association for the Advancement of Articial Intelligence. All rights reserved. ISSN 0738-4602
European Union Regulations on
Algorithmic Decision Making and
a “Right to Explanation”
Bryce Goodman, Seth Flaxman
I We summarize the potential impact
that the European Union’s new General
Data Protection Regulation will have on
the routine use of machine-learning
algorithms. Slated to take effect as law
across the European Union in 2018, it
will place restrictions on automated
individual decision making (that is,
algorithms that make decisions based
on user-level predictors) that “signi-
cantly affect” users. When put into
practice, the law may also effectively cre-
ate a right to explanation, whereby a
user can ask for an explanation of an
algorithmic decision that signicantly
affects them. We argue that while this
law may pose large challenges for indus-
try, it highlights opportunities for com-
puter scientists to take the lead in
designing algorithms and evaluation
frameworks that avoid discrimination
and enable explanation.

Articles
FALL 2017 51
© pepifoto, IStock
… for the rst time in more than two decades, the
European Parliament adopted a set of comprehensive
regulations for the collection, storage, and use of
personal information
regulations for foreign companies collecting data
from European citizens (Article 44).
However, while the bulk of language deals with
how data is collected and stored, the regulation con-
tains Article 22: Automated individual decision mak-
ing, including proling (see gure 1) potentially pro-
hibiting a wide swath of algorithms currently in use
in recommendation systems, credit and insurance
risk assessments, computational advertising, and
social networks, for example. This prohibition raises
important issues that are of particular concern to the
machine-learning community. In its current form,
the GDPR’s requirements could require a complete
overhaul of standard and widely used algorithmic
techniques. The GDPR’s policy on the right of citizens
to receive an explanation for algorithmic decisions
highlights the pressing importance of human inter-
pretability in algorithm design. If, as expected, the
GDPR takes effect in its current form in mid-2018,
there will be a pressing need for effective algorithms
that can operate within this new legal framework.
Background
The General Data Protection Regulation is slated to
go into effect in April 2018 and will replace the EU’s
1995 Data Protection Directive (DPD). On the surface,
the GDPR merely reafrms the DPD’s right to expla-
nation and restrictions on automated decision mak-
ing. However, this reading ignores a number of criti-
cal differences between the two pieces of legislation
(Goodman 2016a, 2016b).
First, it is important to note the difference between
a directive and a regulation. While a directive “set[s] out
general rules to be transferred into national law by
each country as they deem appropriate,” a regulation
is “similar to a national law with the difference that
it is applicable in all EU countries” (European Docu-
mentation Centre 2016). In other words, the 1995
directive was subject to national interpretation and
was only ever indirectly implemented through subse-
quent laws passed within individual member states
(Fromholz 2000). The GDPR, however, requires no
enabling legislation to take effect. It does not direct

Articles
52 AI MAGAZINE
the law of EU member states, it simply is the law for
member states (or will be, when it takes effect).
2
Second, the DPD and GDPR are worlds apart in
terms of the nes that can be imposed on violators.
Under the DPD, there are no explicit maximum nes.
Instead, nes are determined on a country by coun-
try basis. By contrast, the GDPR introduces EU-wide
maximum nes of 20 million euros or 4 percent of
global revenue, whichever is greater (Article 83, Para-
graph 5). For companies like Google and Facebook,
this could mean nes in the billions.
Third, the scope of the GDPR is explicitly global
(Article 3, Paragraph 1). Its requirements do not just
apply to companies that are headquartered in the EU
but, rather, to any companies processing EU resi-
dents’ personal data. For the purposes of determining
jurisdiction, it is irrelevant whether that data is
processed within the EU territory or abroad.
In addition, the GDPR introduces a number of
explicit rights that increase the ability of individuals
to lodge complaints and receive compensation for
violations. These rights include the following:
The “right to lodge a complaint with a supervisory
authority” (Article 77), which individuals can exercise
in his or her place of residence, place of work, or place
of the alleged infringement.
The “right to an effective judicial remedy against a
supervisory authority” (Article 78), which may be
enforced against any supervisory authority that “does
not handle a complaint or does not inform the data
subject within three months on the progress or out-
come of the complaint lodged.”
The “right to an effective judicial remedy against a
controller or processor” (Article 79) which, in the case
of more than one processors and/or controllers, speci-
fies that each violating party has liability (see Para-
graph 4).
The “right to compensation and liability” (Article 82),
which creates an obligation for both data controllers
and processors to compensate “any person who has
suffered material or nonmaterial damages as a result of
[their] infringement of this Regulation.”
The “right of representation by a body, organization or
association” (Article 80), which allows an individual to
designate a qualified not-for-profit body (such as pri-
vacy advocacy groups) to exercise data protection
rights on his or her behalf, including lodging com-
plaints and pursuing compensation.
Taken together, these rights greatly strengthen indi-
viduals’ actual (as opposed to nominal) ability to pur-
sue action against companies that fail to comply with
the GDPR (Pastor and Lawrence 2016).
Before proceeding with analysis, we summarize
some of the key terms employed in the GDPR as
dened in Article 4: Denitions:
Personal data is “any information relating to an iden-
tified or identifiable natural person.”
Data subject is the natural person to whom data
relates.
Processing is “any operation or set of operations which
is performed on personal data or on sets of personal
data, whether or not by automated means.”
Profiling is “any form of automated processing of per-
sonal data consisting of the use of personal data to
evaluate certain personal aspects relating to a natural
person.”
Thus proling should be construed as a subset of
processing, under two conditions: the processing is
automated, and the processing is for the purposes of
evaluation.
The GDPR calls particular attention to proling
aimed at “analys[ing] or predict[ing] aspects concern-
ing that natural person’s performance at work, eco-
nomic situation, health, personal preferences, inter-
ests, reliability, behavior, location or movements”
(Article 4, Paragraph 4). Given the breadth of cate-
gories, it stands to reason that the GDPR’s desidera-
tum for proling errs on the side of inclusion, to say
the least.
Article 22: Automated individual decision making,
including proling, Paragraph 1 (see gure 1) pro-
Figure 1. Excerpt from the General
Data Protection Regulation.
(European Union, Parliament and Council 2016)
Article 22. Automated individual
decision making, including proling
1. The data subject shall have the right not to be subject
to a decision based solely on automated processing,
including proling, which produces legal effects
concerning him or her or similarly signicantly
affects him or her.
2. Paragraph 1 shall not apply if the decision:
(a) is necessary for entering into, or performance
of, a contract between the data subject and a
data controller;
(b) is authorised by Union or Member State law to
which the controller is subject and which also
lays down suitable measures to safeguard the
data subject’s rights and freedoms and legiti-
mate interests; or
(c) is based on the data subject’s explicit consent.
3. In the cases referred to in points (a) and (c) of
paragraph 2, the data controller shall implement
suitable measures to safeguard the data subject’s
rights and freedoms and legitimate interests, at
least the right to obtain human intervention on
the part of the controller, to express his or her
point of view and to contest the decision.
4. Decisions referred to in paragraph 2 shall not be
based on special categories of personal data
referred to in Article 9(1), unless point (a) or (g)
of Article 9(2) apply and suitable measures to safe-
guard the data subject’s rights and freedoms and
legitimate interests are in place.

Articles
FALL 2017 53
hibits any “decision based solely on automated pro-
cessing, including proling” which “produces legal
effects...or similarly signicantly affects” a data sub-
ject. Paragraph 2 species that exceptions can be
made “if necessary for entering into, or performance
of, a contract,” authorized by “Union or Member
State law” or “based on the data subject’s explicit con-
sent.” However, Paragraph 3 states that, even in the
case of exceptions, data controllers must “provide
appropriate safeguards” including “the right to
obtain human intervention...to express his or her
point of view and to contest the decision.” Paragraph
4 specically prohibits automated processing “based
on special categories of personal data” unless “suit-
able measures to safeguard the data subject’s rights
and freedoms and legitimate interests are in place.”
Note that this section does not address the condi-
tions under which it is ethically permissible to access
sensitive data — this is dealt with elsewhere (see Arti-
cle 9). Rather, it is implicitly assumed in this section
that the data is legitimately obtained. Thus the pro-
visions for algorithmic proling are an additional
constraint that apply even if the data processor has
informed consent from data subjects.
3
These provisions present a number of practical
challenges for the design and deployment of
machine-learning algorithms. This article focuses on
two: issues raised by the GDPR’s stance on discrimi-
nation and the GDPR’s “right to explanation.”
Throughout, we highlight opportunities for
researchers.
Nondiscrimination
In general, discrimination might be dened as the
unfair treatment of an individual because of his or
her membership in a particular group, race, or gender
(Altman 2015). The right to nondiscrimination is
deeply embedded in the normative framework that
underlies the EU, and can be found in Article 21 of
the Charter of Fundamental Rights of the European
Union, Article 14 of the European Convention on
Human Rights, and in Articles 18–25 of the Treaty on
the Functioning of the European Union.
The use of algorithmic proling for the allocation
of resources is, in a certain sense, inherently discrim-
inatory: proling takes place when data subjects are
grouped in categories according to various variables,
and decisions are made on the basis of subjects falling
within so-dened groups. It is thus not surprising
that concerns over discrimination have begun to take
root in discussions over the ethics of big data. Baro-
cas and Selbst (2016) sum the problem up succinctly:
“Big data claims to be neutral. It isn’t.” As the authors
point out, machine learning depends upon data that
has been collected from society, and to the extent
that society contains inequality, exclusion, or other
traces of discrimination, so too will the data.
4
Conse-
quently, “unthinking reliance on data mining can
deny members of vulnerable groups full participation
in society” (Barocas and Selbst 2016). Indeed,
machine learning can reify existing patterns of dis-
crimination — if they are found in the training data
set, then by design an accurate classier will repro-
duce them. In this way, biased decisions are present-
ed as the outcome of an objective algorithm.
Paragraph 71 of the recitals (the preamble to the
GDPR, which explains the rationale behind it but is
not itself law) explicitly requires data controllers to
“implement appropriate technical and organization-
al measures” that “prevents, inter alia, discriminato-
ry effects” on the basis of processing sensitive data.
According to Article 9: Processing of special categories
of personal data, sensitive data includes:
personal data revealing racial or ethnic origin, political
opinions, religious or philosophical beliefs, or trade-
union membership, and the processing of genetic data,
biometric data for the purpose of uniquely identifying
a natural person, data concerning health or data con-
cerning a natural person’s sex life or sexual orienta-
tion...
It is important to note that Paragraph 71 and Arti-
cle 22 Paragraph 4 specically address discrimination
from proling that makes use of sensitive data. In
unpacking this mandate, we must distinguish
between two potential interpretations. The rst min-
imal interpretation is that this requirement only per-
tains to cases where an algorithm is making direct use
of data that is explicitly sensitive. This would include,
for example, variables that code for race, nances, or
any of the other categories of sensitive information
referred to in Article 9. However, it is widely acknowl-
edged that simply removing certain variables from a
model does not ensure predictions that are, in effect,
uncorrelated to those variables (Leese 2014, Hardt
2014)). For example, if a certain geographic region
has a high number of low income or minority resi-
dents, an algorithm that employs geographic data to
determine loan eligibility is likely to produce results
that are, in effect, informed by race and income.
Thus a second maximal interpretation takes a
broader view of sensitive data to include not only
those variables that are explicitly named, but also any
variables with which they are correlated. This would
put the onus on a data processor to ensure that algo-
rithms are not provided with data sets containing
variables that are correlated with the “special cate-
gories of personal data” in Article 9.
This interpretation also suffers from a number of
complications in practice. With relatively small data
sets it may be possible to both identify and account
for correlations between sensitive and “nonsensitive”
variables. However, removing all data correlated with
sensitive variables may make the resulting predictor
virtually useless. As Calders and Verwer (2010) note,
“postal code can reveal racial information and yet at
the same time, still give useful, nondiscriminatory
information on loan defaulting.”
Furthermore, as data sets become increasingly

large, correlations can become increasingly complex
and difcult to detect. The link between geography
and income may be obvious, but less obvious corre-
lations — say between IP address and race — are like-
ly to exist within large enough data sets and could
lead to discriminatory effects. For example, at an
annual conference of actuaries, consultants from
Deloitte explained that they can now “use thousands
of nontraditional third party data sources, such as
consumer buying history, to predict a life insurance
applicant’s health status with an accuracy compara-
ble to a medical exam” (Robinson, Yu, and Rieke
2014). With sufciently large data sets, the task of
exhaustively identifying and excluding data features
correlated with “sensitive categories” a priori may be
impossible. Companies may also be reluctant to
exclude certain covariates — web-browsing patterns
are a very good predictor for various recommenda-
tion systems, but they are also correlated with sensi-
tive categories.
A nal challenge, which purging variables from the
data set does not address, is posed by what we term
uncertainty bias (Goodman 2016b). This bias arises
when two conditions are met: (1) one group is under-
represented in the sample,
5
so there is more uncer-
tainty associated with predictions about that group;
and (2) the algorithm is risk averse, so it will, all
things being equal, prefer to make decisions based on
predictions about which it is more condent (that is,
those with smaller condence intervals [Aigner and
Cain 1977]).
In practice, uncertainty bias could mean that pre-
dictive algorithms (such as for a loan approval) favor
groups that are better represented in the training
data, since there will be less uncertainty associated
with those predictions. We illustrate uncertainty bias
in gure 2. The population consists of two groups,
whites and nonwhites. An algorithm is used to decide
whether to extend a loan, based on the predicted
probability that the individual will repay the loan.
We repeatedly generated synthetic data sets of size
500, varying the true proportion of nonwhites in the
population. In every case, we set the true probability
of repayment to be independent of group member-
ship: all individuals have a 95 percent probability of
repayment regardless of race. Using a logistic regres-
Articles
54 AI MAGAZINE
Figure 2. An Illustration of Uncertainty Bias.
A hypothetical algorithm is used to predict the probability of loan repayment in a setting in which the ground truth is that nonwhites and
whites are equally likely to repay. The algorithm is risk averse, so it makes an offer when the lower end of the 95 percent condence inter-
val for its predictions lie above a xed approval threshold of 90 percent (dashed line). When nonwhites are less than 30 percent of the pop-
ulation, and assuming a simple random sample, the algorithm exhibits what we term “uncertainty bias” the underrepresentation of
nonwhites means that predictions for nonwhites have less certainty, so they are not offered loans. As the nonwhite percentage approach-
es 50 percent the uncertainty approaches that of whites and everyone is offered loans.
01020304050
20
0
40
60
80
100
Non−white % of population
Predicted probability of repayment
non−white
white
approval threshold

Citations
More filters
Journal ArticleDOI

Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead

TL;DR: This Perspective clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications whereinterpretable models could potentially replace black box models in criminal justice, healthcare and computer vision.
Posted Content

Towards A Rigorous Science of Interpretable Machine Learning

TL;DR: This position paper defines interpretability and describes when interpretability is needed (and when it is not), and suggests a taxonomy for rigorous evaluation and exposes open questions towards a more rigorous science of interpretable machine learning.
Posted Content

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI.

TL;DR: Previous efforts to define explainability in Machine Learning are summarized, establishing a novel definition that covers prior conceptual propositions with a major focus on the audience for which explainability is sought, and a taxonomy of recent contributions related to the explainability of different Machine Learning models are proposed.
Posted Content

Understanding Black-box Predictions via Influence Functions

TL;DR: This paper uses influence functions — a classic technique from robust statistics — to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction.
References
More filters
Book

Judgment Under Uncertainty: Heuristics and Biases

TL;DR: The authors described three heuristics that are employed in making judgements under uncertainty: representativeness, availability of instances or scenarios, and adjustment from an anchor, which is usually employed in numerical prediction when a relevant value is available.

Stanford Encyclopedia of Philosophy

TL;DR: To understand the central claims of evolutionary psychology the authors require an understanding of some key concepts in evolutionary biology, cognitive psychology, philosophy of science and philosophy of mind.
Book

The Shape of the River: Long-Term Consequences of Considering Race in College and University Admissions

TL;DR: The Shape of the River: Long-Term Consequences of Considering Race in College and University Admissions by William G. Bowen and Derek Bok as discussed by the authors is a seminal work on affirmative action in higher education.
Journal ArticleDOI

Big Data's Disparate Impact

TL;DR: In the absence of a demonstrable intent to discriminate, the best doctrinal hope for data mining's victims would seem to lie in disparate impact doctrine as discussed by the authors, which holds that a practice can be justified as a business necessity when its outcomes are predictive of future employment outcomes, and data mining is specifically designed to find such statistical correlations.
Proceedings ArticleDOI

Certifying and Removing Disparate Impact

TL;DR: This work links disparate impact to a measure of classification accuracy that while known, has received relatively little attention and proposes a test for disparate impact based on how well the protected class can be predicted from the other attributes.
Related Papers (5)