scispace - formally typeset
Search or ask a question

Showing papers by "Dan Jurafsky published in 2023"


Journal ArticleDOI
TL;DR: In this article , an analysis of police body-worn camera footage reveals that stops ultimately resulting in escalation differ in their conversational structure in the earliest moments of the encounter: in as little as the first 45 words the officer speaks.
Abstract: Significance Amid calls for police officers to de-escalate encounters with Black citizens, this work sheds light on when and how car stops escalate, as well as their psychological impact on Black men. Our analysis of police body-worn camera footage reveals that stops ultimately resulting in escalation differ in their conversational structure in the earliest moments of the encounter: in as little as the first 45 words the officer speaks. Listening to these escalated encounters evoked anxiety, suspicion, and worry about officer use of force for Black men, who are disproportionately subjected to escalated outcomes. The findings reported here not only inform approaches to de-escalation but also demonstrate the power and promise of systematic footage review more broadly. To improve police–community interactions, we could start by examining them.

2 citations



DOI
16 Jul 2023-medRxiv
TL;DR: This paper found that GPT-4 does not appropriately model the demographic diversity of medical conditions, consistently producing clinical vignettes that stereotype demographic presentations, which can have a direct, harmful impact on medical care.
Abstract: Background. Large language models (LLMs) such as GPT-4 hold great promise as transformative tools in healthcare, ranging from automating administrative tasks to augmenting clinical decision-making. However, these models also pose a serious danger of perpetuating biases and delivering incorrect medical diagnoses, which can have a direct, harmful impact on medical care. Methods. Using the Azure OpenAI API, we tested whether GPT-4 encodes racial and gender biases and examined the impact of such biases on four potential applications of LLMs in the clinical domain---namely, medical education, diagnostic reasoning, plan generation, and patient assessment. We conducted experiments with prompts designed to resemble typical use of GPT-4 within clinical and medical education applications. We used clinical vignettes from NEJM Healer and from published research on implicit bias in healthcare. GPT-4 estimates of the demographic distribution of medical conditions were compared to true U.S. prevalence estimates. Differential diagnosis and treatment planning were evaluated across demographic groups using standard statistical tests for significance between groups. Findings. We find that GPT-4 does not appropriately model the demographic diversity of medical conditions, consistently producing clinical vignettes that stereotype demographic presentations. The differential diagnoses created by GPT-4 for standardized clinical vignettes were more likely to include diagnoses that stereotype certain races, ethnicities, and gender identities. Assessment and plans created by the model showed significant association between demographic attributes and recommendations for more expensive procedures as well as differences in patient perception. Interpretation. Our findings highlight the urgent need for comprehensive and transparent bias assessments of LLM tools like GPT-4 for every intended use case before they are integrated into clinical care. We discuss the potential sources of these biases and potential mitigation strategies prior to clinical implementation.