scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Human Decision Making with Machine Assistance: An Experiment on Bailing and Jailing

07 Nov 2019-Vol. 3, pp 1-25
TL;DR: This article explored how receiving machine advice influenced people's bail decisions and found that receiving advice has a small effect, which is biased in the direction of predicting no recidivism, despite the fact that the machine is (on average) slightly more accurate than real judges.
Abstract: Much of political debate focuses on the concern that machines might take over. Yet in many domains it is much more plausible that the ultimate choice and responsibility remain with a human decision-maker, but that she is provided with machine advice. A quintessential illustration is the decision of a judge to bail or jail a defendant. In multiple jurisdictions in the US, judges have access to a machine prediction about a defendant's recidivism risk. In our study, we explore how receiving machine advice influences people's bail decisions. We run a vignette experiment with laypersons whom we test on a subsample of cases from the database of this prediction tool. In study 1, we ask them to predict whether defendants will recidivate before tried, and manipulate whether they have access to machine advice. We find that receiving machine advice has a small effect, which is biased in the direction of predicting no recidivism. In the field, human decision makers sometimes have a chance, after the fact, to learn whether the machine has given good advice. In study 2, after each trial we inform participants of ground truth. This does not make it more likely that they follow the advice, despite the fact that the machine is (on average) slightly more accurate than real judges. This also holds if initially the advice is mostly correct, or if it initially is mostly to predict (no) recidivism. Real judges know that their decisions affect defendants' lives. They may also be concerned about reelection or promotion. Hence a lot is at stake. In study 3 we emulate high stakes by giving participants a financial incentive. An incentive to find the ground truth, or to avoid false positive or false negatives, does not make participants more sensitive to machine advice. But an incentive to follow the advice is effective.
Citations
More filters
11 Feb 2010
TL;DR: The American Community Survey (ACS) as discussed by the authors has been conducted on an ongoing basis for the entire country since 2005 and has been shown to be more accurate than the traditional decennial census.
Abstract: Historically, most demographic data for states and substate areas were collected from the long version of the decennial census questionnaire. A “snapshot” of the characteristics of the population on the April 1 census date was available once every 10 years. The long form of the decennial census has been replaced by the American Community Survey (ACS) that has been conducted on an ongoing basis for the entire country since 2005. Instead of a snapshot in which all of the data are gathered at one time, the ACS aggregates data collected over time, making the results more difficult to interpret. However, the ACS data are updated annually.

691 citations

Posted Content
TL;DR: A sequence of pre-registered experiments showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box).
Abstract: With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed, there have been relatively few experimental studies investigating whether these models achieve their intended effects, such as making people more closely follow a model's predictions when it is beneficial for them to do so or enabling them to detect when a model has made a mistake. We present a sequence of pre-registered experiments (N=3,800) in which we showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box). Predictably, participants who saw a clear model with few features could better simulate the model's predictions. However, we did not find that participants more closely followed its predictions. Furthermore, showing participants a clear model meant that they were less able to detect and correct for the model's sizable mistakes, seemingly due to information overload. These counterintuitive findings emphasize the importance of testing over intuition when developing interpretable models.

419 citations

Proceedings ArticleDOI
06 May 2021
TL;DR: The authors showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box).
Abstract: With machine learning models being increasingly used to aid decision making even in high-stakes domains, there has been a growing interest in developing interpretable models. Although many supposedly interpretable models have been proposed, there have been relatively few experimental studies investigating whether these models achieve their intended effects, such as making people more closely follow a model’s predictions when it is beneficial for them to do so or enabling them to detect when a model has made a mistake. We present a sequence of pre-registered experiments (N = 3, 800) in which we showed participants functionally identical models that varied only in two factors commonly thought to make machine learning models more or less interpretable: the number of features and the transparency of the model (i.e., whether the model internals are clear or black box). Predictably, participants who saw a clear model with few features could better simulate the model’s predictions. However, we did not find that participants more closely followed its predictions. Furthermore, showing participants a clear model meant that they were less able to detect and correct for the model’s sizable mistakes, seemingly due to information overload. These counterintuitive findings emphasize the importance of testing over intuition when developing interpretable models.

145 citations

Proceedings ArticleDOI
Xinru Wang1, Ming Yin1
14 Apr 2021
TL;DR: In this article, a comparison of explainable AI methods in AI-assisted decision-making tasks is presented, where the authors highlight three desirable properties that ideal AI explanations should satisfy: improve people's understanding of the AI model, help people recognize the model uncertainty, and support people's calibrated trust in the model.
Abstract: This paper contributes to the growing literature in empirical evaluation of explainable AI (XAI) methods by presenting a comparison on the effects of a set of established XAI methods in AI-assisted decision making. Specifically, based on our review of previous literature, we highlight three desirable properties that ideal AI explanations should satisfy—improve people’s understanding of the AI model, help people recognize the model uncertainty, and support people’s calibrated trust in the model. Through randomized controlled experiments, we evaluate whether four types of common model-agnostic explainable AI methods satisfy these properties on two types of decision making tasks where people perceive themselves as having different levels of domain expertise in (i.e., recidivism prediction and forest cover prediction). Our results show that the effects of AI explanations are largely different on decision making tasks where people have varying levels of domain expertise in, and many AI explanations do not satisfy any of the desirable properties for tasks that people have little domain expertise in. Further, for decision making tasks that people are more knowledgeable, feature contribution explanation is shown to satisfy more desiderata of AI explanations, while the explanation that is considered to resemble how human explain decisions (i.e., counterfactual explanation) does not seem to improve calibrated trust. We conclude by discussing the implications of our study for improving the design of XAI methods to better support human decision making.

111 citations

Journal ArticleDOI
TL;DR: In this paper, the authors pointed out that algorithmic systems can yield sociallybiased outcomes, thereby yielding socially-biased outcomes and the problem of algorithmic bias in data-driven decision making.
Abstract: As firms are moving towards data-driven decision making, they are facing an emerging problem, namely, algorithmic bias. Accordingly, algorithmic systems can yield socially-biased outcomes, thereby ...

61 citations

References
More filters
Journal Article
TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

47,974 citations

Book ChapterDOI
TL;DR: In this paper, the authors present a critique of expected utility theory as a descriptive model of decision making under risk, and develop an alternative model, called prospect theory, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights.
Abstract: This paper presents a critique of expected utility theory as a descriptive model of decision making under risk, and develops an alternative model, called prospect theory. Choices among risky prospects exhibit several pervasive effects that are inconsistent with the basic tenets of utility theory. In particular, people underweight outcomes that are merely probable in comparison with outcomes that are obtained with certainty. This tendency, called the certainty effect, contributes to risk aversion in choices involving sure gains and to risk seeking in choices involving sure losses. In addition, people generally discard components that are shared by all prospects under consideration. This tendency, called the isolation effect, leads to inconsistent preferences when the same choice is presented in different forms. An alternative theory of choice is developed, in which value is assigned to gains and losses rather than to final assets and in which probabilities are replaced by decision weights. The value function is normally concave for gains, commonly convex for losses, and is generally steeper for losses than for gains. Decision weights are generally lower than the corresponding probabilities, except in the range of low prob- abilities. Overweighting of low probabilities may contribute to the attractiveness of both insurance and gambling. EXPECTED UTILITY THEORY has dominated the analysis of decision making under risk. It has been generally accepted as a normative model of rational choice (24), and widely applied as a descriptive model of economic behavior, e.g. (15, 4). Thus, it is assumed that all reasonable people would wish to obey the axioms of the theory (47, 36), and that most people actually do, most of the time. The present paper describes several classes of choice problems in which preferences systematically violate the axioms of expected utility theory. In the light of these observations we argue that utility theory, as it is commonly interpreted and applied, is not an adequate descriptive model and we propose an alternative account of choice under risk. 2. CRITIQUE

35,067 citations

Journal ArticleDOI
30 Jan 1981-Science
TL;DR: The psychological principles that govern the perception of decision problems and the evaluation of probabilities and outcomes produce predictable shifts of preference when the same problem is framed in different ways.
Abstract: The psychological principles that govern the perception of decision problems and the evaluation of probabilities and outcomes produce predictable shifts of preference when the same problem is framed in different ways. Reversals of preference are demonstrated in choices regarding monetary outcomes, both hypothetical and real, and in questions pertaining to the loss of human lives. The effects of frames on preferences are compared to the effects of perspectives on perceptual appearance. The dependence of preferences on the formulation of decision problems is a significant concern for the theory of rational choice.

15,513 citations

Journal ArticleDOI
Ziva Kunda1
TL;DR: It is proposed that motivation may affect reasoning through reliance on a biased set of cognitive processes--that is, strategies for accessing, constructing, and evaluating beliefs--that are considered most likely to yield the desired conclusion.
Abstract: It is proposed that motivation may affect reasoning through reliance on a biased set of cognitive processes—that is, strategies for accessing, constructing, and evaluating beliefs. The motivation to be accurate enhances use of those beliefs and strategies that are considered most appropriate, whereas the motivation to arrive at particular conclusions enhances use of those that are considered most likely to yield the desired conclusion. There is considerable evidence that people are more likely to arrive at conclusions that they want to arrive at, but their ability to do so is constrained by their ability to construct seemingly reasonable justifications for these conclusions. These ideas can account for a wide variety of research concerned with motivated reasoning. The notion that goals or motives affect reasoning has a long and controversial history in social psychology. The propositions that motives may affect perceptions (Erdelyi, 1974), attitudes (Festinger, 1957), and attributions (Heider, 1958) have been put forth by some psychologists and challenged by others. Although early researchers and theorists took it for granted that motivation may cause people to make self-serving attributions and permit them to believe what they want to believe because they want to believe it, this view, and the research used to uphold it, came under concentrated criticism in the 1970s. The major and most damaging criticism of the motivational view was that all research purported to demonstrate motivated reasoning could be reinterpreted in entirely cognitive, nonmotivational terms (Miller & Ross, 1975; Nisbett & Ross, 1980). Thus people could draw self-serving conclusions not because they wanted to but because these conclusions seemed more plausible, given their prior beliefs and expectancies. Because both cognitive and motivational accounts could be generated for any empirical study, some theorists argued that the hot versus cold cognition controversy could not be solved, at least in the attribution paradigm (Ross & Fletcher, 1985; Tetlock & Levi, 1982). One reason for the persistence of this controversy lies in the failure of researchers to explore the mechanisms underlying motivated reasoning. Recently, several authors have attempted to rectify this neglect (Kruglanski & Freund, 1983; Kunda, 1987; Pyszczynski & Greenberg, 1987; Sorrentino & Higgins, 1986). All these authors share a view of motivation as having its effects through cognitive processes: People rely on cognitive processes and representations to arrive at their desired conclusions, but motivation plays a role in determining which of these will be used on a given occasion.

6,643 citations

Journal ArticleDOI
TL;DR: In this article, the authors present the correct way to estimate the magnitude and standard errors of the interaction effect in nonlinear models, which is the same way as in this paper.

5,500 citations