scispace - formally typeset
Search or ask a question
Author

Kristin Bechtel

Other affiliations: Community Resources for Justice
Bio: Kristin Bechtel is an academic researcher from University of Cincinnati. The author has contributed to research in topics: Risk assessment & Predictive validity. The author has an hindex of 4, co-authored 6 publications receiving 530 citations. Previous affiliations of Kristin Bechtel include Community Resources for Justice.

Papers
More filters
Journal Article
TL;DR: The authors pointed out that ProPublica's report was based on faulty statistics and data analysis, and that the report failed to show that the COMPAS itself is racially biased, let alone that other risk instruments are biased.
Abstract: The validity and intellectual honesty of conducting and reporting analysis are critical, since the ramifications of published data, accurate or misleading, may have consequences for years to come.-Marco and Larkin, 2000, p. 692PROPUBLICA RECENTLY RELEASED a much-heralded investigative report claiming that a risk assessment tool (known as the COMPAS) used in criminal justice is biased against black defendants.12 The report heavily implied that such bias is inherent in all actuarial risk assessment instruments (ARAIs).We think ProPublica's report was based on faulty statistics and data analysis, and that the report failed to show that the COMPAS itself is racially biased, let alone that other risk instruments are biased. Not only do ProPublica's results contradict several comprehensive existing studies concluding that actuarial risk can be predicted free of racial and/or gender bias, a correct analysis of the underlying data (which we provide below) sharply undermines ProPublicas approach.Our reasons for writing are simple. It might be that the existing justice system is biased against poor minorities due to a wide variety of reasons (including economic factors, policing patterns, prosecutorial behavior, and judicial biases), and therefore, regardless of the degree of bias, risk assessment tools informed by objective data can help reduce racial bias from its current level. It would be a shame if policymakers mistakenly thought that risk assessment tools were somehow worse than the status quo. Because we are at a time in history when there appears to be bipartisan political support for criminal justice reform, one poorly executed study that makes such absolute claims of bias should not go unchallenged. The gravity of this study's erroneous conclusions is exacerbated by the large-market outlet in which it was published (ProPublica).Before we expand further into our criticisms of the ProPublica piece, we describe some context and characteristics of the American criminal justice system and risk assessments.Mass Incarceration and ARAIsThe United States is clearly the worldwide leader in imprisonment. The prison population in the United States has declined by small percentages in recent years and at year-end 2014 the prison population was the smallest it had been since 2004. Yet, we still incarcerated 1,561,500 individuals in federal and state correctional facilities (Carson, 2015). By sheer numbers, or rates per 100,000 inhabitants, the United States incarcerates more people than just about any country in the world that reports reliable incarceration statistics (Wagner & Walsh, 2016).Further, it appears that there is a fair amount of racial disproportion when comparing the composition of the general population with the composition of the prison population. The 2014 United States Census population projection estimates that, across the U.S., the racial breakdown of the 318 million residents comprised 62.1 percent white, 13.2 percent black or African American, and 17.4 percent Hispanic. In comparison, 37 percent of the prison population was categorized as black, 32 percent was categorized as white, and 22 percent as Hispanic (Carson, 2015). Carson (2015:15) states that, "As a percentage of residents of all ages at yearend 2014, 2.7 percent of black males (or 2,724 per 100,000 black male residents) and 1.1 percent of Hispanic males (1,090 per 100,000 Hispanic males) were serving sentences of at least 1 year in prison, compared to less than 0.5 percent of white males (465 per 100,000 white male residents)."Aside from the negative effects caused by imprisonment, there is a massive financial cost that extends beyond official correctional budgets. A recent report by The Vera Institute of Justice (Henrichson & Delaney, 2012) indicated that the cost of prison operations (including such things as pension and insurance contributions, capital costs, legal fees, and administrative fees) in 40 states participating in their study was 39. …

679 citations

Journal ArticleDOI
TL;DR: This paper used the Youth Level of Service Case Management Inventory (YLS/CMI) to assess, classify, and assist agencies with developing treatment and service plans according to the offender's criminogenic risk factors.
Abstract: The purpose of the Youth Level of Service Case Management Inventory (YLS/CMI) is to assess, classify, and assist agencies with developing treatment and service plans according to the offender's criminogenic risk factors. Given the limited research in the predictive validity for this instrument, the current study attempts to examine this issue on a sample of 4,482 juveniles from Ohio who were given sentences in the community or to juvenile institutions. Results demonstrated the validity of the YLS/CMI in predicting recidivism for both settings.

83 citations

Journal Article
TL;DR: In 2009, the Administrative Office of the U.S. Courts (AOUSC) sought to develop a dynamic risk assessment instrument comprising both risk and needs factors using existing data from the federal supervision data systems as discussed by the authors.
Abstract: BEYOND THE GENERATIONAL improvements observed with risk assessments, agencies have devoted a substantial amount of focused effort to develop, implement, and revise their own instruments. The preference to develop rather than adopt is often attributed to several factors, including the agency's target population, existing data, agency research capacity, staff needs, and costs. It is certainly a benefit to have a tool created specifically for an agency's population, but one potential limitation is that the instrument is developed using existing data, which may not include risk factors that research would suggest also be examined for possible inclusion in the assessment. To address this limitation, additional risk factor items can be collected but not scored; when sufficient data are available, these factors can be analyzed and, if substantial improvements in prediction are found, a revised risk assessment can be introduced.In 2009, the Administrative Office of the U.S. Courts (AOUSC) sought to develop a dynamic risk assessment instrument comprising both risk and needs factors using existing data from the federal supervision data systems. There were several historical reasons for this shift. First, the initial risk assessments used by federal probation officers in the 1980s, the Risk Prediction Scale - 80 (RPS-80) and the United States Parole Commission's Salient Factor Score (SFS), were found to have limited predictive validity. In response to this issue, the Federal Judicial Center created and deployed the Risk Prediction Index (RPI) in the late 1990s. Although the RPI outperformed the RPS-80 and the SFS, this tool had two primary limitations. The RPI was static, which limited the federal probation officer's ability to reassess risk, and the instrument could not be used for case planning, since it lacked dynamic risk factors to target for change (AOUSC, 2011; Johnson, Lowenkamp, VanBenschoten, & Robinson, 2011; VanBenschoten, 2008). As a result, multiple commercially available instruments were considered and vetted, including the Level of Service Inventory-Revised and NorthPointe's COMPAS. Ultimately, however, the decision was made to develop the Post Conviction Risk Assessment (PCRA), using readily available federal probation data. A primary benefit of this decision was the AOUSC's ability to continuously evaluate the performance of the PCRA and, when appropriate, use the data to improve upon the assessment tool's predictive validity.The PCRA risk score is calculated through the scoring of 15 items (located in the Officer Section of the PCRA) that have been empirically shown to be correlated with recidivism (AOUSC, 2011). The Officer Section of the PCRA also contains 15 non-scored items that prior research has suggested should predict recidivism but that, at the time of instrument development, were unavailable for analytical purposes in the AOUSC's case management systems (AOUSC, 2011). The current study seeks to examine if these 15 non-scored items improve the predictive accuracy of the instrument or if they can be removed without affecting its predictive accuracy.Literature reviewRisk prediction has undergone extensive improvements within the criminal justice field. Starting in 1954, Meehl's meta-analysis found that when reviewing multiple studies comparing actuarial and professionally derived instruments, the actuarial assessments had stronger predictive accuracy than instruments derived from professional judgment alone. Multiple subsequent studies produced similar results, leaving a lasting conclusion that risk prediction is most accurately done with actuarial risk assessment instruments rather than relying solely on professional judgment (AEgisdottier, White, Spengler, Maugherman, Anderson, & Cook, 2006; Andrews, Bonta, & Wormith, 2006; Grove, Zald, Lebow, Snitz, & Nelson, 2000; Latessa & Lovins, 2010; Meehl, 1954).Four generations of risk assessment have emerged over the past 60 years. …

9 citations

Journal ArticleDOI
TL;DR: In the federal supervision system, officers have discretion to depart from the risk designations provided by the postconviction risk assessment (PCRA) instrument as mentioned in this paper, and this component of the risk classi...
Abstract: In the federal supervision system, officers have discretion to depart from the risk designations provided by the Post Conviction Risk Assessment (PCRA) instrument. This component of the risk classi...

7 citations

Journal Article
TL;DR: The authors revisited some recent research that was used to develop policy and support movements to change pretrial release processes and suggested that additional research be conducted prior to making sweeping changes in policy and practice.
Abstract: Rigorous research has taken a back seat to trademarks, press releases and the policy that follows. This article revisits some recent research that was used to develop policy and support movements to change pretrial release processes. The original research, was originally delivered with cautions and numerous limitations that seemed to be ignored. This paper follows up on those limitations and suggests that additional research be conducted prior to making sweeping changes in policy and practice.

2 citations


Cited by
More filters
Journal ArticleDOI
Cynthia Rudin1
TL;DR: This Perspective clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications whereinterpretable models could potentially replace black box models in criminal justice, healthcare and computer vision.
Abstract: Black box machine learning models are currently being used for high-stakes decision making throughout society, causing problems in healthcare, criminal justice and other domains. Some people hope that creating methods for explaining these black box models will alleviate some of the problems, but trying to explain black box models, rather than creating models that are interpretable in the first place, is likely to perpetuate bad practice and can potentially cause great harm to society. The way forward is to design models that are inherently interpretable. This Perspective clarifies the chasm between explaining black boxes and using inherently interpretable models, outlines several key reasons why explainable black boxes should be avoided in high-stakes decisions, identifies challenges to interpretable machine learning, and provides several example applications where interpretable models could potentially replace black box models in criminal justice, healthcare and computer vision. There has been a recent rise of interest in developing methods for ‘explainable AI’, where models are created to explain how a first ‘black box’ machine learning model arrives at a specific decision. It can be argued that instead efforts should be directed at building inherently interpretable models in the first place, in particular where they are applied in applications that directly affect human lives, such as in healthcare and criminal justice.

3,609 citations

Journal ArticleDOI
14 Apr 2017-Science
TL;DR: This article showed that applying machine learning to ordinary human language results in human-like semantic biases and replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web.
Abstract: Machine learning is a means to derive artificial intelligence by discovering patterns in existing data. Here, we show that applying machine learning to ordinary human language results in human-like semantic biases. We replicated a spectrum of known biases, as measured by the Implicit Association Test, using a widely used, purely statistical machine-learning model trained on a standard corpus of text from the World Wide Web. Our results indicate that text corpora contain recoverable and accurate imprints of our historic biases, whether morally neutral as toward insects or flowers, problematic as toward race or gender, or even simply veridical, reflecting the status quo distribution of gender with respect to careers or first names. Our methods hold promise for identifying and addressing sources of bias in culture, including technology.

1,874 citations

Posted Content
TL;DR: This survey investigated different real-world applications that have shown biases in various ways, and created a taxonomy for fairness definitions that machine learning researchers have defined to avoid the existing bias in AI systems.
Abstract: With the widespread use of AI systems and applications in our everyday lives, it is important to take fairness issues into consideration while designing and engineering these types of systems. Such systems can be used in many sensitive environments to make important and life-changing decisions; thus, it is crucial to ensure that the decisions do not reflect discriminatory behavior toward certain groups or populations. We have recently seen work in machine learning, natural language processing, and deep learning that addresses such challenges in different subdomains. With the commercialization of these systems, researchers are becoming aware of the biases that these applications can contain and have attempted to address them. In this survey we investigated different real-world applications that have shown biases in various ways, and we listed different sources of biases that can affect AI applications. We then created a taxonomy for fairness definitions that machine learning researchers have defined in order to avoid the existing bias in AI systems. In addition to that, we examined different domains and subdomains in AI showing what researchers have observed with regard to unfair outcomes in the state-of-the-art methods and how they have tried to address them. There are still many future directions and solutions that can be taken to mitigate the problem of bias in AI systems. We are hoping that this survey will motivate researchers to tackle these issues in the near future by observing existing work in their respective fields.

1,571 citations

Proceedings Article
05 Dec 2016
TL;DR: The authors showed that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent, which raises concerns because their widespread use often tends to amplify these biases.
Abstract: The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between the words receptionist and female, while maintaining desired associations such as between the words queen and female. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

1,379 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: Some of the ways in which key notions of fairness are incompatible with each other are suggested, and hence a framework for thinking about the trade-offs between them is provided.
Abstract: Recent discussion in the public sphere about algorithmic classification has involved tension between competing notions of what it means for a probabilistic classification to be fair to different groups. We formalize three fairness conditions that lie at the heart of these debates, and we prove that except in highly constrained special cases, there is no method that can satisfy these three conditions simultaneously. Moreover, even satisfying all three conditions approximately requires that the data lie in an approximate version of one of the constrained special cases identified by our theorem. These results suggest some of the ways in which key notions of fairness are incompatible with each other, and hence provide a framework for thinking about the trade-offs between them.

1,190 citations