Top 11 papers published by Greg S. Corrado from Google in 2023

Journal Article•DOI•

Towards Expert-Level Medical Question Answering with Large Language Models

[...]

16 May 2023-arXiv.org

TL;DR: In this paper , a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach was proposed to bridge the gap between physicians' and large language models' answers.

...read moreread less

Abstract: Recent artificial intelligence (AI) systems have reached milestones in"grand challenges"ranging from Go to protein-folding. The capability to retrieve medical knowledge, reason over it, and answer medical questions comparably to physicians has long been viewed as one such grand challenge. Large language models (LLMs) have catalyzed significant progress in medical question answering; Med-PaLM was the first model to exceed a"passing"score in US Medical Licensing Examination (USMLE) style questions with a score of 67.2% on the MedQA dataset. However, this and other prior work suggested significant room for improvement, especially when models' answers were compared to clinicians' answers. Here we present Med-PaLM 2, which bridges these gaps by leveraging a combination of base LLM improvements (PaLM 2), medical domain finetuning, and prompting strategies including a novel ensemble refinement approach. Med-PaLM 2 scored up to 86.5% on the MedQA dataset, improving upon Med-PaLM by over 19% and setting a new state-of-the-art. We also observed performance approaching or exceeding state-of-the-art across MedMCQA, PubMedQA, and MMLU clinical topics datasets. We performed detailed human evaluations on long-form questions along multiple axes relevant to clinical applications. In pairwise comparative ranking of 1066 consumer medical questions, physicians preferred Med-PaLM 2 answers to those produced by physicians on eight of nine axes pertaining to clinical utility (p<0.001). We also observed significant improvements compared to Med-PaLM on every evaluation axis (p<0.001) on newly introduced datasets of 240 long-form"adversarial"questions to probe LLM limitations. While further studies are necessary to validate the efficacy of these models in real-world settings, these results highlight rapid progress towards physician-level performance in medical question answering.

...read moreread less

15 citations

Journal Article•DOI•

Pathologist Validation of a Machine Learning–Derived Feature for Colon Cancer Risk Stratification

[...]

Vincenzo L'Imperio, Ellery Wulczyn, Markus Plass, Heimo Müller, Nicolò Tamini, Luca Gianotti, Nicolas Zucchini, Robert Reihs, Greg S. Corrado, Dale R. Webster, Lily Peng, Po-Hsuan Cameron Chen, Marialuisa Lavitrano, Yun Liu, David F. Steiner, Kurt Zatloukal, Fabio Pagni - Show less +13 more

01 Mar 2023-JAMA network open

TL;DR: In this paper , a prognostic machine learning-derived histopathologic feature was learned and validated by pathologists to predict the survival of patients with colon cancer, with the potential for integration into pathology practice.

...read moreread less

Abstract: Key Points Question Can a prognostic machine learning–derived histopathologic feature be learned and validated by pathologists? Findings In this prognostic study, 2 pathologists were able to learn a machine learning–derived histopathologic feature and validate its prognostic value for survival among patients with colon cancer. Meaning These findings suggest that computationally identified histopathologic features can provide prognostic value for colon cancer, with the potential for integration into pathology practice.

...read moreread less

2 citations

Journal Article•DOI•

A deep learning model for novel systemic biomarkers in photographs of the external eye: a retrospective study.

[...]

Boris Babenko, Ilana Traynis, Christina Chen, Preeti Singh, Akib A Uddin, Jorge Cuadros, Lauren Patty Daskivich, April Y. Maa, Ramasamy Kim, Eugene Yu-Chuan Kang, Y. Matias, Greg S. Corrado, Lily Peng, Dale R. Webster, Christopher Semturs, Jonathan Krause, Avinash V. Varadarajan, Naama Hammel, Yun Liu - Show less +15 more

01 Mar 2023-The Lancet Digital Health

TL;DR: In this article , a deep learning system was developed to predict systemic parameters, such as those related to the liver (albumin, aspartate aminotransferase [AST]); kidney (estimated glomerular filtration rate [eGFR], urine albumin-to-creatinine ratio [ACR]); bone or mineral (calcium); thyroid (thyroid stimulating hormone); and blood (haemoglobin, white blood cells [WBC], platelets).

...read moreread less

1 citations

Journal Article•DOI•

Lessons learned from translating AI from development to deployment in healthcare

[...]

Kasumi Widner, Sunny Virmani, Jonathan Krause, Jayaram Nayar, Elin Rønby Pedersen, Naama Hammel, Y. Matias, Greg S. Corrado, Yun Liu, Lily Peng, Dale R. Webster - Show less +7 more

29 May 2023-Nature Medicine

1 citations

Journal Article•DOI•

Evaluating AI systems under uncertain ground truth: a case study in dermatology

[...]

David Stutz, Ali Taylan Cemgil, Abhijit Guha Roy, Tatiana Matejovicova, Melih Barsbey, Mike Schaekermann, Jana von Freyberg, Rajeev V. Rikhye, Javier Perez Matos, Umesh Telang, Dale R. Webster, Yun Liu, Greg S. Corrado, Y. Matias, Pushmeet Kohli, Yun Liu, Arnaud Doucet - Show less +13 more

05 Jul 2023-arXiv.org

TL;DR: In this paper , the authors propose a probabilistic version of inverse rank normalization (IRN) and a Plackett-Luce-based model to estimate the ground truth uncertainty.

...read moreread less

Abstract: For safety, AI systems in health undergo thorough evaluations before deployment, validating their predictions against a ground truth that is assumed certain. However, this is actually not the case and the ground truth may be uncertain. Unfortunately, this is largely ignored in standard evaluation of AI models but can have severe consequences such as overestimating the future performance. To avoid this, we measure the effects of ground truth uncertainty, which we assume decomposes into two main components: annotation uncertainty which stems from the lack of reliable annotations, and inherent uncertainty due to limited observational information. This ground truth uncertainty is ignored when estimating the ground truth by deterministically aggregating annotations, e.g., by majority voting or averaging. In contrast, we propose a framework where aggregation is done using a statistical model. Specifically, we frame aggregation of annotations as posterior inference of so-called plausibilities, representing distributions over classes in a classification setting, subject to a hyper-parameter encoding annotator reliability. Based on this model, we propose a metric for measuring annotation uncertainty and provide uncertainty-adjusted metrics for performance evaluation. We present a case study applying our framework to skin condition classification from images where annotations are provided in the form of differential diagnoses. The deterministic adjudication process called inverse rank normalization (IRN) from previous work ignores ground truth uncertainty in evaluation. Instead, we present two alternative statistical models: a probabilistic version of IRN and a Plackett-Luce-based model. We find that a large portion of the dataset exhibits significant ground truth uncertainty and standard IRN-based evaluation severely over-estimates performance without providing uncertainty estimates.

...read moreread less

1 citations

Multimodal LLMs for health grounded in individual-specific data

[...]

Anastasiya Belyaeva, Justin Cosentino, Farhad Hormozdiari, Krish Eswaran, Shravya Shetty, Greg S. Corrado, Andrew Carroll, Cory Y. McLean, Nicholas A. Furlotte - Show less +5 more

18 Jul 2023

TL;DR: HeLM as mentioned in this paper is a multimodal LLM for health that is grounded in individual-specific data by developing a framework (Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk.

...read moreread less

Abstract: Foundation large language models (LLMs) have shown an impressive ability to solve tasks across a wide range of fields including health. To effectively solve personalized health tasks, LLMs need the ability to ingest a diversity of data modalities that are relevant to an individual's health status. In this paper, we take a step towards creating multimodal LLMs for health that are grounded in individual-specific data by developing a framework (HeLM: Health Large Language Model for Multimodal Understanding) that enables LLMs to use high-dimensional clinical modalities to estimate underlying disease risk. HeLM encodes complex data modalities by learning an encoder that maps them into the LLM's token embedding space and for simple modalities like tabular data by serializing the data into text. Using data from the UK Biobank, we show that HeLM can effectively use demographic and clinical features in addition to high-dimensional time-series data to estimate disease risk. For example, HeLM achieves an AUROC of 0.75 for asthma prediction when combining tabular and spirogram data modalities compared with 0.49 when only using tabular data. Overall, we find that HeLM outperforms or performs at parity with classical machine learning approaches across a selection of eight binary traits. Furthermore, we investigate the downstream uses of this model such as its generalizability to out-of-distribution traits and its ability to power conversations around individual health and wellness.

...read moreread less

Journal Article•DOI•

Predicting Cardiovascular Disease Risk using Photoplethysmography and Deep Learning

[...]

Wei Weng, Sebastien Baur, Mayank Daswani, Christina Chen, Lauren Harrell, Sujay Kakarmath, Mariam Jabara, Babak Behsaz, Cory Y. McLean, Y. Matias, Greg S. Corrado, Shravya Shetty, Shruthi Prabhakara, Yun Liu, Goodarz Danaei, Diego Alejandro Ardila - Show less +12 more

09 May 2023-arXiv.org

TL;DR: In this article , a deep learning PPG-based CVD risk score (DLS) was developed to predict the probability of having major adverse cardiovascular events (MACE) within ten years, given only age, sex, smoking status and PPG as predictors.

...read moreread less

Abstract: Cardiovascular diseases (CVDs) are responsible for a large proportion of premature deaths in low- and middle-income countries. Early CVD detection and intervention is critical in these populations, yet many existing CVD risk scores require a physical examination or lab measurements, which can be challenging in such health systems due to limited accessibility. Here we investigated the potential to use photoplethysmography (PPG), a sensing technology available on most smartphones that can potentially enable large-scale screening at low cost, for CVD risk prediction. We developed a deep learning PPG-based CVD risk score (DLS) to predict the probability of having major adverse cardiovascular events (MACE: non-fatal myocardial infarction, stroke, and cardiovascular death) within ten years, given only age, sex, smoking status and PPG as predictors. We compared the DLS with the office-based refit-WHO score, which adopts the shared predictors from WHO and Globorisk scores (age, sex, smoking status, height, weight and systolic blood pressure) but refitted on the UK Biobank (UKB) cohort. In UKB cohort, DLS's C-statistic (71.1%, 95% CI 69.9-72.4) was non-inferior to office-based refit-WHO score (70.9%, 95% CI 69.7-72.2; non-inferiority margin of 2.5%, p<0.01). The calibration of the DLS was satisfactory, with a 1.8% mean absolute calibration error. Adding DLS features to the office-based score increased the C-statistic by 1.0% (95% CI 0.6-1.4). DLS predicts ten-year MACE risk comparable with the office-based refit-WHO score. It provides a proof-of-concept and suggests the potential of a PPG-based approach strategies for community-based primary prevention in resource-limited regions.

...read moreread less

Journal Article•DOI•

Using generative AI to investigate medical imagery models and datasets

[...]

Oran Lang, Ilana Traynis, Heather Cole-Lewis, Courtney R. Lyles, Charles Lau, Christopher Semturs, Dale R. Webster, Greg S. Corrado, Avinatan Hassidim, Y. Matias, Yun Liu, Naama Hammel, Boris Babenko - Show less +9 more

01 Jun 2023-arXiv.org

TL;DR: In this paper , a method for automatic visual explanations leveraging team-based expertise by generating hypotheses of what visual signals in the images are correlated with the task is presented, where the discovered attributes are presented to an interdisciplinary panel of experts so that hypotheses can account for social and structural determinants of health.

...read moreread less

Abstract: AI models have shown promise in many medical imaging tasks. However, our ability to explain what signals these models have learned is severely lacking. Explanations are needed in order to increase the trust in AI-based models, and could enable novel scientific discovery by uncovering signals in the data that are not yet known to experts. In this paper, we present a method for automatic visual explanations leveraging team-based expertise by generating hypotheses of what visual signals in the images are correlated with the task. We propose the following 4 steps: (i) Train a classifier to perform a given task (ii) Train a classifier guided StyleGAN-based image generator (StylEx) (iii) Automatically detect and visualize the top visual attributes that the classifier is sensitive towards (iv) Formulate hypotheses for the underlying mechanisms, to stimulate future research. Specifically, we present the discovered attributes to an interdisciplinary panel of experts so that hypotheses can account for social and structural determinants of health. We demonstrate results on eight prediction tasks across three medical imaging modalities: retinal fundus photographs, external eye photographs, and chest radiographs. We showcase examples of attributes that capture clinically known features, confounders that arise from factors beyond physiological mechanisms, and reveal a number of physiologically plausible novel attributes. Our approach has the potential to enable researchers to better understand, improve their assessment, and extract new knowledge from AI-based models. Importantly, we highlight that attributes generated by our framework can capture phenomena beyond physiology or pathophysiology, reflecting the real world nature of healthcare delivery and socio-cultural factors. Finally, we intend to release code to enable researchers to train their own StylEx models and analyze their predictive tasks.

...read moreread less

Journal Article•DOI•

Predicting lymph node metastasis from primary tumor histology and clinicopathologic factors in colorectal cancer using deep learning

[...]

Justin D. Krogue, Shekoofeh Azizi, Fraser Tan, Isabelle Flament-Auvigne, Trissia Brown, Markus Plass, Robert Reihs, Heimo Müller, Kurt Zatloukal, P. Richeson, Greg S. Corrado, Lily Peng, Craig H. Mermel, Yun Liu, Po-Hsuan Cameron Chen, Saurabh Gombar, Jeanne Shen, David F. Steiner, Ellery Wulczyn - Show less +15 more

24 Apr 2023-Communications medicine

TL;DR: In this paper , the authors combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with lymph node metastasis (LNM) in colorectal cancer.

...read moreread less

Abstract: Abstract Background Presence of lymph node metastasis (LNM) influences prognosis and clinical decision-making in colorectal cancer. However, detection of LNM is variable and depends on a number of external factors. Deep learning has shown success in computational pathology, but has struggled to boost performance when combined with known predictors. Methods Machine-learned features are created by clustering deep learning embeddings of small patches of tumor in colorectal cancer via k-means, and then selecting the top clusters that add predictive value to a logistic regression model when combined with known baseline clinicopathological variables. We then analyze performance of logistic regression models trained with and without these machine-learned features in combination with the baseline variables. Results The machine-learned extracted features provide independent signal for the presence of LNM (AUROC: 0.638, 95% CI: [0.590, 0.683]). Furthermore, the machine-learned features add predictive value to the set of 6 clinicopathologic variables in an external validation set (likelihood ratio test, p < 0.00032; AUROC: 0.740, 95% CI: [0.701, 0.780]). A model incorporating these features can also further risk-stratify patients with and without identified metastasis ( p < 0.001 for both stage II and stage III). Conclusion This work demonstrates an effective approach to combine deep learning with established clinicopathologic factors in order to identify independently informative features associated with LNM. Further work building on these specific results may have important impact in prognostication and therapeutic decision making for LNM. Additionally, this general computational approach may prove useful in other contexts.

...read moreread less

Journal Article•DOI•

Robust and data-efficient generalization of self-supervised machine learning for diagnostic imaging

[...]

01 Jun 2023-Nature Biomedical Engineering

Journal Article•DOI•

Enhancing the reliability and accuracy of AI-enabled diagnosis via complementarity-driven deferral to clinicians

[...]

01 Jul 2023-news@nature.com

Showing papers by "Greg S. Corrado published in 2023"