Showing papers by "Greg S. Corrado published in 2020"

PDF

Open Access

Journal Article•DOI•

A deep learning system for differential diagnosis of skin diseases

[...]

Yuan Liu¹, Ayush Jain¹, Clara H. Eng¹, David H. Way¹, Kang Lee¹, Peggy Bui¹, Peggy Bui², Kimberly Kanada, Guilherme de Oliveira Marinho, Jessica Gallegos¹, Sara Gabriele¹, Vishakha Gupta¹, Nalini Singh¹, Nalini Singh³, Vivek T. Natarajan¹, Rainer Hofmann-Wellenhof⁴, Greg S. Corrado¹, Lily Peng¹, Dale R. Webster¹, Dennis Ai¹, Susan Huang, Yun Liu¹, R. Carter Dunn¹, David Coz¹ - Show less +20 more•Institutions (4)

Google¹, University of California, San Francisco², Massachusetts Institute of Technology³, Medical University of Graz⁴

18 May 2020-Nature Medicine

TL;DR: A deep learning system able to identify the most common skin conditions may help clinicians in making more accurate diagnoses in routine clinical practice.

...read moreread less

Abstract: Skin conditions affect 1.9 billion people. Because of a shortage of dermatologists, most cases are seen instead by general practitioners with lower diagnostic accuracy. We present a deep learning system (DLS) to provide a differential diagnosis of skin conditions using 16,114 de-identified cases (photographs and clinical data) from a teledermatology practice serving 17 sites. The DLS distinguishes between 26 common skin conditions, representing 80% of cases seen in primary care, while also providing a secondary prediction covering 419 skin conditions. On 963 validation cases, where a rotating panel of three board-certified dermatologists defined the reference standard, the DLS was non-inferior to six other dermatologists and superior to six primary care physicians (PCPs) and six nurse practitioners (NPs) (top-1 accuracy: 0.66 DLS, 0.63 dermatologists, 0.44 PCPs and 0.40 NPs). These results highlight the potential of the DLS to assist general practitioners in diagnosing skin conditions.

...read moreread less

288 citations

Journal Article•DOI•

Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation

[...]

Anna Majkowska, Sid Mittal, David F. Steiner, Joshua J. Reicher, Scott Mayer McKinney, Gavin E. Duggan, Krish Eswaran, Po-Hsuan Cameron Chen, Yun Liu, Sreenivasa Raju Kalidindi, Alexander Ding, Greg S. Corrado, Daniel Tse, Shravya Shetty - Show less +10 more

01 Feb 2020-Radiology

TL;DR: Expert-level models for detecting clinically relevant chest radiograph findings were developed for this study by using adjudicated reference standards and with population-level performance estimation.

...read moreread less

Abstract: Four deep learning models identified pneumothorax, fractures, opacity, and nodule or mass on frontal chest radiographs with similar performance to radiologists.

...read moreread less

198 citations

Journal Article•DOI•

Development and Validation of a Deep Learning Algorithm for Gleason Grading of Prostate Cancer From Biopsy Specimens.

[...]

Kunal Nagpal¹, Davis Foote¹, Fraser Tan¹, Yun Liu¹, Po-Hsuan Cameron Chen¹, David F. Steiner¹, Naren Manoj¹, Naren Manoj², Niels Olson³, Jenny L. Smith³, Arash Mohtashamian³, Brandon R. Peterson³, Mahul B. Amin⁴, Andrew Evans⁵, Joan W Sweet⁵, Carol C. Cheung⁵, Theodorus van der Kwast⁵, Ankur R. Sangoi⁶, Ming Zhou⁷, Robert W. Allan, Peter A. Humphrey⁸, Jason Hipp¹, Jason Hipp⁹, Krishna Gadepalli¹, Greg S. Corrado¹, Lily Peng¹, Martin C. Stumpe¹, Craig H. Mermel¹ - Show less +24 more•Institutions (9)

Google¹, Toyota Technological Institute², Naval Medical Center San Diego³, University of Tennessee Health Science Center⁴, University Health Network⁵, El Camino Hospital⁶, Tufts Medical Center⁷, Yale University⁸, AstraZeneca⁹

01 Sep 2020-JAMA Oncology

TL;DR: The deep learning system warrants evaluation as an assistive tool for improving prostate cancer diagnosis and treatment decisions, especially where subspecialist expertise is unavailable.

...read moreread less

Abstract: Importance For prostate cancer, Gleason grading of the biopsy specimen plays a pivotal role in determining case management. However, Gleason grading is associated with substantial interobserver variability, resulting in a need for decision support tools to improve the reproducibility of Gleason grading in routine clinical practice. Objective To evaluate the ability of a deep learning system (DLS) to grade diagnostic prostate biopsy specimens. Design, Setting, and Participants The DLS was evaluated using 752 deidentified digitized images of formalin-fixed paraffin-embedded prostate needle core biopsy specimens obtained from 3 institutions in the United States, including 1 institution not used for DLS development. To obtain the Gleason grade group (GG), each specimen was first reviewed by 2 expert urologic subspecialists from a multi-institutional panel of 6 individuals (years of experience: mean, 25 years; range, 18-34 years). A third subspecialist reviewed discordant cases to arrive at a majority opinion. To reduce diagnostic uncertainty, all subspecialists had access to an immunohistochemical-stained section and 3 histologic sections for every biopsied specimen. Their review was conducted from December 2018 to June 2019. Main Outcomes and Measures The frequency of the exact agreement of the DLS with the majority opinion of the subspecialists in categorizing each tumor-containing specimen as 1 of 5 categories: nontumor, GG1, GG2, GG3, or GG4-5. For comparison, the rate of agreement of 19 general pathologists’ opinions with the subspecialists’ majority opinions was also evaluated. Results For grading tumor-containing biopsy specimens in the validation set (n = 498), the rate of agreement with subspecialists was significantly higher for the DLS (71.7%; 95% CI, 67.9%-75.3%) than for general pathologists (58.0%; 95% CI, 54.5%-61.4%) (P Conclusions and Relevance In this study, the DLS showed higher proficiency than general pathologists at Gleason grading prostate needle core biopsy specimens and generalized to an independent institution. Future research is necessary to evaluate the potential utility of using the DLS as a decision support tool in clinical workflows and to improve the quality of prostate cancer grading for therapy decisions.

...read moreread less

96 citations

Journal Article•DOI•

Detection of anaemia from retinal fundus images via deep learning

[...]

Akinori Mitani¹, Abigail E. Huang¹, Subhashini Venugopalan¹, Greg S. Corrado¹, Lily Peng¹, Dale R. Webster¹, Naama Hammel¹, Yun Liu¹, Avinash V. Varadarajan¹ - Show less +5 more•Institutions (1)

Google¹

01 Jan 2020-Nature Biomedical Engineering

TL;DR: Machine-learning algorithms trained with retinal fundus images, with subject metadata or with both data types, predict haemoglobin concentration with mean absolute errors lower than 0.75 and anaemia with areas under the curve in the range of 0.74–0.89.

...read moreread less

Abstract: Owing to the invasiveness of diagnostic tests for anaemia and the costs associated with screening for it, the condition is often undetected. Here, we show that anaemia can be detected via machine-learning algorithms trained using retinal fundus images, study participant metadata (including race or ethnicity, age, sex and blood pressure) or the combination of both data types (images and study participant metadata). In a validation dataset of 11,388 study participants from the UK Biobank, the metadata-only, fundus-image-only and combined models predicted haemoglobin concentration (in g dl–1) with mean absolute error values of 0.73 (95% confidence interval: 0.72–0.74), 0.67 (0.66–0.68) and 0.63 (0.62–0.64), respectively, and with areas under the receiver operating characteristic curve (AUC) values of 0.74 (0.71–0.76), 0.87 (0.85–0.89) and 0.88 (0.86–0.89), respectively. For 539 study participants with self-reported diabetes, the combined model predicted haemoglobin concentration with a mean absolute error of 0.73 (0.68–0.78) and anaemia an AUC of 0.89 (0.85–0.93). Automated anaemia screening on the basis of fundus images could particularly aid patients with diabetes undergoing regular retinal imaging and for whom anaemia can increase morbidity and mortality risks. Machine-learning algorithms trained with retinal fundus images, with subject metadata or with both data types, predict haemoglobin concentration with mean absolute errors lower than 0.75 g dl–1 and anaemia with areas under the curve in the range of 0.74–0.89.

...read moreread less

89 citations

Journal Article•DOI•

Predicting optical coherence tomography-derived diabetic macular edema grades from fundus photographs using deep learning.

[...]

Avinash V. Varadarajan¹, Pinal Bavishi¹, Paisan Ruamviboonsuk², Peranut Chotcomwongse², Subhashini Venugopalan¹, Arunachalam Narayanaswamy¹, Jorge Cuadros, Kuniyoshi Kanai³, George H. Bresnick, Mongkol Tadarati², Sukhum Silpa-archa², Jirawut Limwattanayingyong², Variya Nganthavee², Joseph R. Ledsam, Pearse A. Keane⁴, Greg S. Corrado¹, Lily Peng¹, Dale R. Webster¹ - Show less +14 more•Institutions (4)

Google¹, Rangsit University², University of California, Berkeley³, UCL Institute of Ophthalmology⁴

08 Jan 2020-Nature Communications

TL;DR: A deep learning model is presented that can predict the presence of diabetic macular edema from color fundus photographs with superior specificity and positive predictive value compared to retinal specialists.

...read moreread less

Abstract: Center-involved diabetic macular edema (ci-DME) is a major cause of vision loss. Although the gold standard for diagnosis involves 3D imaging, 2D imaging by fundus photography is usually used in screening settings, resulting in high false-positive and false-negative calls. To address this, we train a deep learning model to predict ci-DME from fundus photographs, with an ROC–AUC of 0.89 (95% CI: 0.87–0.91), corresponding to 85% sensitivity at 80% specificity. In comparison, retinal specialists have similar sensitivities (82–85%), but only half the specificity (45–50%, p < 0.001). Our model can also detect the presence of intraretinal fluid (AUC: 0.81; 95% CI: 0.81–0.86) and subretinal fluid (AUC 0.88; 95% CI: 0.85–0.91). Using deep learning to make predictions via simple 2D images without sophisticated 3D-imaging equipment and with better than specialist performance, has broad relevance to many other applications in medical imaging. Diabetic eye disease is a cause of preventable blindness and accurate and timely referral of patients with diabetic macular edema is important to start treatment. Here the authors present a deep learning model that can predict the presence of diabetic macular edema from color fundus photographs with superior specificity and positive predictive value compared to retinal specialists.

...read moreread less

72 citations

Journal Article•DOI•

Interpretable Survival Prediction for Colorectal Cancer using Deep Learning

[...]

Ellery Wulczyn¹, David F. Steiner¹, Melissa Moran¹, Markus Plass², Robert Reihs², Fraser Tan¹, Isabelle Flament-Auvigne¹, Trissia Brown¹, Peter Regitnig², Po-Hsuan Cameron Chen¹, Narayan Hegde¹, Apaar Sadhwani¹, Robert C. MacDonald¹, Benny Ayalew¹, Greg S. Corrado¹, Lily Peng¹, Daniel Tse¹, Heimo Müller², Zhaoyang Xu¹, Yun Liu¹, Martin C. Stumpe¹, Kurt Zatloukal², Craig H. Mermel¹ - Show less +19 more•Institutions (2)

Google¹, Medical University of Graz²

17 Nov 2020-arXiv: Image and Video Processing

TL;DR: A deep learning system for predicting disease-specific survival for stage II and III colorectal cancer using 3652 cases and clustering embeddings from a deep-learning-based image-similarity model showed that the clustering-derived feature most strongly associated with high DLS scores was also highly prognostic in isolation.

...read moreread less

Abstract: Deriving interpretable prognostic features from deep-learning-based prognostic histopathology models remains a challenge. In this study, we developed a deep learning system (DLS) for predicting disease specific survival for stage II and III colorectal cancer using 3,652 cases (27,300 slides). When evaluated on two validation datasets containing 1,239 cases (9,340 slides) and 738 cases (7,140 slides) respectively, the DLS achieved a 5-year disease-specific survival AUC of 0.70 (95%CI 0.66-0.73) and 0.69 (95%CI 0.64-0.72), and added significant predictive value to a set of 9 clinicopathologic features. To interpret the DLS, we explored the ability of different human-interpretable features to explain the variance in DLS scores. We observed that clinicopathologic features such as T-category, N-category, and grade explained a small fraction of the variance in DLS scores (R2=18% in both validation sets). Next, we generated human-interpretable histologic features by clustering embeddings from a deep-learning based image-similarity model and showed that they explain the majority of the variance (R2 of 73% to 80%). Furthermore, the clustering-derived feature most strongly associated with high DLS scores was also highly prognostic in isolation. With a distinct visual appearance (poorly differentiated tumor cell clusters adjacent to adipose tissue), this feature was identified by annotators with 87.0-95.5% accuracy. Our approach can be used to explain predictions from a prognostic deep learning model and uncover potentially-novel prognostic features that can be reliably identified by people for future validation studies.

...read moreread less

59 citations

Journal Article•DOI•

Predicting Risk of Developing Diabetic Retinopathy using Deep Learning.

[...]

Ashish Bora, Siva Balasubramanian, Boris Babenko, Sunny Virmani, Subhashini Venugopalan, Akinori Mitani, Guilherme de Oliveira Marinho, Jorge Cuadros, Paisan Ruamviboonsuk, Greg S. Corrado, Lily Peng, Dale R. Webster, Avinash V. Varadarajan, Naama Hammel¹, Yun Liu, Pinal Bavishi - Show less +12 more•Institutions (1)

Google¹

10 Aug 2020-arXiv: Image and Video Processing

TL;DR: The deep-learning systems predicted diabetic retinopathy development using colour fundus photographs, and the systems were independent of and more informative than available risk factors.

...read moreread less

Abstract: Diabetic retinopathy (DR) screening is instrumental in preventing blindness, but faces a scaling challenge as the number of diabetic patients rises. Risk stratification for the development of DR may help optimize screening intervals to reduce costs while improving vision-related outcomes. We created and validated two versions of a deep learning system (DLS) to predict the development of mild-or-worse ("Mild+") DR in diabetic patients undergoing DR screening. The two versions used either three-fields or a single field of color fundus photographs (CFPs) as input. The training set was derived from 575,431 eyes, of which 28,899 had known 2-year outcome, and the remaining were used to augment the training process via multi-task learning. Validation was performed on both an internal validation set (set A; 7,976 eyes; 3,678 with known outcome) and an external validation set (set B; 4,762 eyes; 2,345 with known outcome). For predicting 2-year development of DR, the 3-field DLS had an area under the receiver operating characteristic curve (AUC) of 0.79 (95%CI, 0.78-0.81) on validation set A. On validation set B (which contained only a single field), the 1-field DLS's AUC was 0.70 (95%CI, 0.67-0.74). The DLS was prognostic even after adjusting for available risk factors (p<0.001). When added to the risk factors, the 3-field DLS improved the AUC from 0.72 (95%CI, 0.68-0.76) to 0.81 (95%CI, 0.77-0.84) in validation set A, and the 1-field DLS improved the AUC from 0.62 (95%CI, 0.58-0.66) to 0.71 (95%CI, 0.68-0.75) in validation set B. The DLSs in this study identified prognostic information for DR development from CFPs. This information is independent of and more informative than the available risk factors.

...read moreread less

57 citations

Journal Article•DOI•

Evaluation of the Use of Combined Artificial Intelligence and Pathologist Assessment to Review and Grade Prostate Biopsies

[...]

David F. Steiner¹, Kunal Nagpal¹, Rory Sayres¹, Davis Foote¹, Benjamin D Wedin¹, Adam Pearce¹, Carrie J. Cai¹, Samantha Winter¹, Matthew Symonds¹, Liron Yatziv¹, Andrei Kapishnikov¹, Trissia Brown¹, Isabelle Flament-Auvigne¹, Fraser Tan¹, Martin C. Stumpe¹, Pan-Pan Jiang¹, Yun Liu¹, Po-Hsuan Cameron Chen¹, Greg S. Corrado¹, Michael Terry¹, Craig H. Mermel¹ - Show less +17 more•Institutions (1)

Google¹

02 Nov 2020

TL;DR: The study’s findings indicated that the use of an artificial intelligence tool may help pathologists grade prostate biopsies more consistently with the opinions of subspecialists.

...read moreread less

Abstract: Importance Expert-level artificial intelligence (AI) algorithms for prostate biopsy grading have recently been developed. However, the potential impact of integrating such algorithms into pathologist workflows remains largely unexplored. Objective To evaluate an expert-level AI-based assistive tool when used by pathologists for the grading of prostate biopsies. Design, Setting, and Participants This diagnostic study used a fully crossed multiple-reader, multiple-case design to evaluate an AI-based assistive tool for prostate biopsy grading. Retrospective grading of prostate core needle biopsies from 2 independent medical laboratories in the US was performed between October 2019 and January 2020. A total of 20 general pathologists reviewed 240 prostate core needle biopsies from 240 patients. Each pathologist was randomized to 1 of 2 study cohorts. The 2 cohorts reviewed every case in the opposite modality (with AI assistance vs without AI assistance) to each other, with the modality switching after every 10 cases. After a minimum 4-week washout period for each batch, the pathologists reviewed the cases for a second time using the opposite modality. The pathologist-provided grade group for each biopsy was compared with the majority opinion of urologic pathology subspecialists. Exposure An AI-based assistive tool for Gleason grading of prostate biopsies. Main Outcomes and Measures Agreement between pathologists and subspecialists with and without the use of an AI-based assistive tool for the grading of all prostate biopsies and Gleason grade group 1 biopsies. Results Biopsies from 240 patients (median age, 67 years; range, 39-91 years) with a median prostate-specific antigen level of 6.5 ng/mL (range, 0.6-97.0 ng/mL) were included in the analyses. Artificial intelligence–assisted review by pathologists was associated with a 5.6% increase (95% CI, 3.2%-7.9%;P Conclusions and Relevance In this study, the use of an AI-based assistive tool for the review of prostate biopsies was associated with improvements in the quality, efficiency, and consistency of cancer detection and grading.

...read moreread less

47 citations

Journal Article•DOI•

Detecting Deficient Coverage in Colonoscopies

[...]

Daniel Freedman¹, Yochai Blau¹, Liran Katzir¹, Amit Aides¹, Ilan Shimshoni¹, Danny Veikherman¹, Tomer Golany¹, Ariel Gordon¹, Greg S. Corrado¹, Yossi Matias¹, Ehud Rivlin¹ - Show less +7 more•Institutions (1)

Google¹

21 May 2020-IEEE Transactions on Medical Imaging

TL;DR: The C2D2 (Colonoscopy Coverage Deficiency via Depth) algorithm as discussed by the authors detects deficient coverage, and can thereby alert the endoscopist to revisit a given area.

...read moreread less

Abstract: Colonoscopy is tool of choice for preventing Colorectal Cancer, by detecting and removing polyps before they become cancerous. However, colonoscopy is hampered by the fact that endoscopists routinely miss 22-28% of polyps. While some of these missed polyps appear in the endoscopist’s field of view, others are missed simply because of substandard coverage of the procedure, i.e. not all of the colon is seen. This paper attempts to rectify the problem of substandard coverage in colonoscopy through the introduction of the C2D2 (Colonoscopy Coverage Deficiency via Depth) algorithm which detects deficient coverage, and can thereby alert the endoscopist to revisit a given area. More specifically, C2D2 consists of two separate algorithms: the first performs depth estimation of the colon given an ordinary RGB video stream; while the second computes coverage given these depth estimates. Rather than compute coverage for the entire colon, our algorithm computes coverage locally, on a segment-by-segment basis; C2D2 can then indicate in real-time whether a particular area of the colon has suffered from deficient coverage, and if so the endoscopist can return to that area. Our coverage algorithm is the first such algorithm to be evaluated in a large-scale way; while our depth estimation technique is the first calibration-free unsupervised method applied to colonoscopies. The C2D2 algorithm achieves state of the art results in the detection of deficient coverage. On synthetic sequences with ground truth, it is 2.4 times more accurate than human experts; while on real sequences, C2D2 achieves a 93.0% agreement with experts.

...read moreread less

32 citations

Journal Article•DOI•

Customization Scenarios for De-identification of Clinical Notes

[...]

Tzvika Hartman¹, Michael D. Howell¹, Jeffrey Dean¹, Shlomo Hoory¹, Ronit Slyper¹, Itay Laish¹, Oren Gilon¹, Danny Vainstein¹, Greg S. Corrado¹, Katherine Chou¹, Ming Jack Po¹, Jutta Williams, Scott Ellis¹, Gavin Edward Bee¹, Avinatan Hassidim¹, Rony Amira¹, Genady Beryozkin¹, Idan Szpektor¹, Yossi Matias¹ - Show less +15 more•Institutions (1)

Google¹

30 Jan 2020-BMC Medical Informatics and Decision Making

TL;DR: Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level.

...read moreread less

Abstract: Automated machine-learning systems are able to de-identify electronic medical records, including free-text clinical notes. Use of such systems would greatly boost the amount of data available to researchers, yet their deployment has been limited due to uncertainty about their performance when applied to new datasets. We present practical options for clinical note de-identification, assessing performance of machine learning systems ranging from off-the-shelf to fully customized. We implement a state-of-the-art machine learning de-identification system, training and testing on pairs of datasets that match the deployment scenarios. We use clinical notes from two i2b2 competition corpora, the Physionet Gold Standard corpus, and parts of the MIMIC-III dataset. Fully customized systems remove 97–99% of personally identifying information. Performance of off-the-shelf systems varies by dataset, with performance mostly above 90%. Providing a small labeled dataset or large unlabeled dataset allows for fine-tuning that improves performance over off-the-shelf systems. Health organizations should be aware of the levels of customization available when selecting a de-identification deployment solution, in order to choose the one that best matches their resources and target performance level.

...read moreread less

22 citations

Posted Content•

Detecting Deficient Coverage in Colonoscopies

[...]

Google¹

23 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper attempts to rectify the problem of substandard coverage in colonoscopy through the introduction of the C2D2 (Colonoscopy Coverage Deficiency via Depth) algorithm which detects deficient coverage, and can thereby alert the endoscopist to revisit a given area.

...read moreread less

Abstract: Colonoscopy is the tool of choice for preventing Colorectal Cancer, by detecting and removing polyps before they become cancerous. However, colonoscopy is hampered by the fact that endoscopists routinely miss 22-28% of polyps. While some of these missed polyps appear in the endoscopist's field of view, others are missed simply because of substandard coverage of the procedure, i.e. not all of the colon is seen. This paper attempts to rectify the problem of substandard coverage in colonoscopy through the introduction of the C2D2 (Colonoscopy Coverage Deficiency via Depth) algorithm which detects deficient coverage, and can thereby alert the endoscopist to revisit a given area. More specifically, C2D2 consists of two separate algorithms: the first performs depth estimation of the colon given an ordinary RGB video stream; while the second computes coverage given these depth estimates. Rather than compute coverage for the entire colon, our algorithm computes coverage locally, on a segment-by-segment basis; C2D2 can then indicate in real-time whether a particular area of the colon has suffered from deficient coverage, and if so the endoscopist can return to that area. Our coverage algorithm is the first such algorithm to be evaluated in a large-scale way; while our depth estimation technique is the first calibration-free unsupervised method applied to colonoscopies. The C2D2 algorithm achieves state of the art results in the detection of deficient coverage. On synthetic sequences with ground truth, it is 2.4 times more accurate than human experts; while on real sequences, C2D2 achieves a 93.0% agreement with experts.

...read moreread less

Posted Content•

Detecting hidden signs of diabetes in external eye photographs.

[...]

Boris Babenko, Akinori Mitani, Ilana Traynis, Naho Kitade, Preeti Singh, April Y. Maa, Jorge Cuadros, Greg S. Corrado, Lily Peng, Dale R. Webster, Avinash V. Varadarajan, Naama Hammel, Yun Liu - Show less +9 more

23 Nov 2020-arXiv: Image and Video Processing

TL;DR: The results indicate that external eye photographs contain information useful for healthcare providers managing patients with diabetes, and may help prioritize patients for in-person screening, as well as its utility for remote diagnosis and management.

...read moreread less

Abstract: Diabetes-related retinal conditions can be detected by examining the posterior of the eye. By contrast, examining the anterior of the eye can reveal conditions affecting the front of the eye, such as changes to the eyelids, cornea, or crystalline lens. In this work, we studied whether external photographs of the front of the eye can reveal insights into both diabetic retinal diseases and blood glucose control. We developed a deep learning system (DLS) using external eye photographs of 145,832 patients with diabetes from 301 diabetic retinopathy (DR) screening sites in one US state, and evaluated the DLS on three validation sets containing images from 198 sites in 18 other US states. In validation set A (n=27,415 patients, all undilated), the DLS detected poor blood glucose control (HbA1c > 9%) with an area under receiver operating characteristic curve (AUC) of 70.2; moderate-or-worse DR with an AUC of 75.3; diabetic macular edema with an AUC of 78.0; and vision-threatening DR with an AUC of 79.4. For all 4 prediction tasks, the DLS's AUC was higher (p 9%, and a 20% chance of having vision threatening diabetic retinopathy. The results generalized to dilated pupils (validation set B, 5,058 patients) and to a different screening service (validation set C, 10,402 patients). Our results indicate that external eye photographs contain information useful for healthcare providers managing patients with diabetes, and may help prioritize patients for in-person screening. Further work is needed to validate these findings on different devices and patient populations (those without diabetes) to evaluate its utility for remote diagnosis and management.

...read moreread less

Posted Content•

Deep Learning for Distinguishing Normal versus Abnormal Chest Radiographs and Generalization to Unseen Diseases.

[...]

22 Oct 2020-arXiv: Image and Video Processing

TL;DR: An AI system to classify CXRs as normal or abnormal and the results suggest that the AI system trained using a large dataset containing a diverse array of CXR abnormalities generalizes to new patient populations and unseen diseases.

...read moreread less

Abstract: Chest radiography (CXR) is the most widely-used thoracic clinical imaging modality and is crucial for guiding the management of cardiothoracic conditions. The detection of specific CXR findings has been the main focus of several artificial intelligence (AI) systems. However, the wide range of possible CXR abnormalities makes it impractical to build specific systems to detect every possible condition. In this work, we developed and evaluated an AI system to classify CXRs as normal or abnormal. For development, we used a de-identified dataset of 248,445 patients from a multi-city hospital network in India. To assess generalizability, we evaluated our system using 6 international datasets from India, China, and the United States. Of these datasets, 4 focused on diseases that the AI was not trained to detect: 2 datasets with tuberculosis and 2 datasets with coronavirus disease 2019. Our results suggest that the AI system generalizes to new patient populations and abnormalities. In a simulated workflow where the AI system prioritized abnormal cases, the turnaround time for abnormal cases reduced by 7-28%. These results represent an important step towards evaluating whether AI can be safely used to flag cases in a general setting where previously unseen abnormalities exist.

...read moreread less

Journal Article•DOI•

Reply to: Transparency and reproducibility in artificial intelligence

[...]

Scott Mayer McKinney¹, Alan Karthikesalingam¹, Daniel Tse¹, Christopher Kelly¹, Yun Liu¹, Greg S. Corrado¹, Shravya Shetty¹ - Show less +3 more•Institutions (1)

Google¹

01 Oct 2020-Nature

Journal Article•DOI•

Predicting Prostate Cancer-Specific Mortality with A.I.-based Gleason Grading.

[...]

25 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The results suggest that A.I..

...read moreread less

Abstract: Gleason grading of prostate cancer is an important prognostic factor but suffers from poor reproducibility, particularly among non-subspecialist pathologists. Although artificial intelligence (A.I.) tools have demonstrated Gleason grading on-par with expert pathologists, it remains an open question whether A.I. grading translates to better prognostication. In this study, we developed a system to predict prostate-cancer specific mortality via A.I.-based Gleason grading and subsequently evaluated its ability to risk-stratify patients on an independent retrospective cohort of 2,807 prostatectomy cases from a single European center with 5-25 years of follow-up (median: 13, interquartile range 9-17). The A.I.'s risk scores produced a C-index of 0.84 (95%CI 0.80-0.87) for prostate cancer-specific mortality. Upon discretizing these risk scores into risk groups analogous to pathologist Grade Groups (GG), the A.I. had a C-index of 0.82 (95%CI 0.78-0.85). On the subset of cases with a GG in the original pathology report (n=1,517), the A.I.'s C-indices were 0.87 and 0.85 for continuous and discrete grading, respectively, compared to 0.79 (95%CI 0.71-0.86) for GG obtained from the reports. These represent improvements of 0.08 (95%CI 0.01-0.15) and 0.07 (95%CI 0.00-0.14) respectively. Our results suggest that A.I.-based Gleason grading can lead to effective risk-stratification and warrants further evaluation for improving disease management.

...read moreread less

Proceedings Article•DOI•

Explaining an increase in predicted risk for clinical alerts

[...]

Michaela Hardt¹, Alvin Rajkomar², Gerardo Flores², Andrew M. Dai², Michael D. Howell², Greg S. Corrado², Claire Cui², Moritz Hardt³ - Show less +4 more•Institutions (3)

Amazon.com¹, Google², University of California, Berkeley³

02 Apr 2020

TL;DR: In this paper, the authors consider explanations in a temporal setting where a stateful dynamical model produces a sequence of risk estimates given an input at each time step, and the goal of the explanation is to attribute the increase to a few relevant inputs from the past.

...read moreread less

Abstract: Much work aims to explain a model's prediction on a static input. We consider explanations in a temporal setting where a stateful dynamical model produces a sequence of risk estimates given an input at each time step. When the estimated risk increases, the goal of the explanation is to attribute the increase to a few relevant inputs from the past. While our formal setup and techniques are general, we carry out an in-depth case study in a clinical setting. The goal here is to alert a clinician when a patient's risk of deterioration rises. The clinician then has to decide whether to intervene and adjust the treatment. Given a potentially long sequence of new events since she last saw the patient, a concise explanation helps her to quickly triage the alert. We develop methods to lift static attribution techniques to the dynamical setting, where we identify and address challenges specific to dynamics. We then experimentally assess the utility of different explanations of clinical alerts through expert evaluation.

...read moreread less

Book Chapter•DOI•

Scientific Discovery by Generating Counterfactuals Using Image Translation

[...]

Arunachalam Narayanaswamy¹, Subhashini Venugopalan¹, Dale R. Webster¹, Lily Peng¹, Greg S. Corrado¹, Paisan Ruamviboonsuk, Pinal Bavishi¹, Michael Brenner¹, Philip C. Nelson¹, Avinash V. Varadarajan¹ - Show less +6 more•Institutions (1)

Google¹

04 Oct 2020

TL;DR: In this paper, the authors propose a framework to convert predictions from explanation techniques to a mechanism of discovery, which is able to explain the underlying scientific mechanism, thus bridging the gap between the model's performance and human understanding.

...read moreread less

Abstract: Model explanation techniques play a critical role in understanding the source of a model’s performance and making its decisions transparent. Here we investigate if explanation techniques can also be used as a mechanism for scientific discovery. We make three contributions: first, we propose a framework to convert predictions from explanation techniques to a mechanism of discovery. Second, we show how generative models in combination with black-box predictors can be used to generate hypotheses (without human priors) that can be critically examined. Third, with these techniques we study classification models for retinal images predicting Diabetic Macular Edema (DME), where recent work [30] showed that a CNN trained on these images is likely learning novel features in the image. We demonstrate that the proposed framework is able to explain the underlying scientific mechanism, thus bridging the gap between the model’s performance and human understanding.

...read moreread less

Journal Article•DOI•

Addendum: International evaluation of an AI system for breast cancer screening.

[...]

Scott Mayer McKinney¹, Marcin Sieniek¹, Varun Godbole¹, Jonathan Godwin, Natasha Antropova, Hutan Ashrafian², Trevor Back, Mary Chesus, Greg S. Corrado¹, Ara Darzi², Mozziyar Etemadi, Florencia Garcia-Vicente, Fiona J. Gilbert³, Mark D. Halling-Brown⁴, Demis Hassabis, Sunny Jansen, Alan Karthikesalingam¹, Christopher Kelly¹, Dominic King¹, Joseph R. Ledsam, David S. Melnick, Hormuz Mostofi¹, Lily Peng¹, Joshua J. Reicher⁵, Bernardino Romera-Paredes, Richard Sidebottom⁶, Mustafa Suleyman, Daniel Tse¹, Kenneth C. Young⁴, Jeffrey De Fauw, Shravya Shetty¹ - Show less +27 more•Institutions (6)

Google¹, Imperial College London², University of Cambridge³, Royal Surrey County Hospital⁴, Veterans Health Administration⁵, The Royal Marsden NHS Foundation Trust⁶

01 Oct 2020-Nature

Posted Content•

Scientific Discovery by Generating Counterfactuals using Image Translation

[...]

Arunachalam Narayanaswamy, Subhashini Venugopalan, Dale R. Webster, Lily Peng, Greg S. Corrado, Paisan Ruamviboonsuk, Pinal Bavishi, Rory Sayres, Abigail E. Huang, Siva Balasubramanian, Michael Brenner, Philip C. Nelson, Avinash V. Varadarajan - Show less +9 more

10 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a framework to convert predictions from explanation techniques to a mechanism of discovery, and shows how generative models in combination with black-box predictors can be used to generate hypotheses that can be critically examined.

...read moreread less

Abstract: Model explanation techniques play a critical role in understanding the source of a model's performance and making its decisions transparent. Here we investigate if explanation techniques can also be used as a mechanism for scientific discovery. We make three contributions: first, we propose a framework to convert predictions from explanation techniques to a mechanism of discovery. Second, we show how generative models in combination with black-box predictors can be used to generate hypotheses (without human priors) that can be critically examined. Third, with these techniques we study classification models for retinal images predicting Diabetic Macular Edema (DME), where recent work showed that a CNN trained on these images is likely learning novel features in the image. We demonstrate that the proposed framework is able to explain the underlying scientific mechanism, thus bridging the gap between the model's performance and human understanding.

...read moreread less

Journal Article•DOI•

Author Correction: Detection of anaemia from retinal fundus images via deep learning.

[...]

Google¹

01 Feb 2020-Nature Biomedical Engineering

TL;DR: An amendment to this paper has been published and can be accessed via a link at the top of the paper.

...read moreread less

Abstract: An amendment to this paper has been published and can be accessed via a link at the top of the paper.

...read moreread less