scispace - formally typeset
Search or ask a question

Showing papers by "Greg S. Corrado published in 2018"


Journal ArticleDOI
08 May 2018
TL;DR: A representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format is proposed, and it is demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.
Abstract: Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient’s chart.

1,388 citations


Journal ArticleDOI
TL;DR: In this article, the authors used deep learning models trained on retinal fundus images to predict cardiovascular risk factors not previously thought to be present or quantifiable in retinal images.
Abstract: Traditionally, medical discoveries are made by observing associations, making hypotheses from them and then designing and running experiments to test the hypotheses. However, with medical images, observing and quantifying associations can often be difficult because of the wide variety of features, patterns, colours, values and shapes that are present in real data. Here, we show that deep learning can extract new knowledge from retinal fundus images. Using deep-learning models trained on data from 284,335 patients and validated on two independent datasets of 12,026 and 999 patients, we predicted cardiovascular risk factors not previously thought to be present or quantifiable in retinal images, such as age (mean absolute error within 3.26 years), gender (area under the receiver operating characteristic curve (AUC) = 0.97), smoking status (AUC = 0.71), systolic blood pressure (mean absolute error within 11.23 mmHg) and major adverse cardiac events (AUC = 0.70). We also show that the trained deep-learning models used anatomical features, such as the optic disc or blood vessels, to generate each prediction.

1,038 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format and demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.
Abstract: Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient's chart.

958 citations


Journal ArticleDOI
TL;DR: The mechanisms by which a model's design, data, and deployment may lead to disparities are described; how different approaches to distributive justice in machine learning can advance health equity are explained; and what contexts are more appropriate for different equity approaches inMachine learning.
Abstract: Machine learning is used increasingly in clinical care to improve diagnosis, treatment selection, and health system efficiency. Because machine-learning models learn from historically collected data, populations that have experienced human and structural biases in the past-called protected groups-are vulnerable to harm by incorrect predictions or withholding of resources. This article describes how model design, biases in data, and the interactions of model predictions with clinicians and patients may exacerbate health care disparities. Rather than simply guarding against these harms passively, machine-learning systems should be used proactively to advance health equity. For that goal to be achieved, principles of distributive justice must be incorporated into model design, deployment, and evaluation. The article describes several technical implementations of distributive justice-specifically those that ensure equality in patient outcomes, performance, and resource allocation-and guides clinicians as to when they should prioritize each principle. Machine learning is providing increasingly sophisticated decision support and population-level monitoring, and it should encode principles of justice to ensure that models benefit all patients.

438 citations


Journal ArticleDOI
TL;DR: Adjudication reduces the errors in DR grading by using a small number of adjudicated consensus grades as a tuning dataset and higher-resolution images as input, and to train an improved automated algorithm for DR grading.

328 citations


Journal ArticleDOI
TL;DR: In this article, a deep learning system was developed using 112 million pathologist-annotated image patches from 1,226 slides, and evaluated on an independent validation dataset of 331 slides.
Abstract: For prostate cancer patients, the Gleason score is one of the most important prognostic factors, potentially determining treatment independent of the stage. However, Gleason scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility. Here we present a deep learning system (DLS) for Gleason scoring whole-slide images of prostatectomies. Our system was developed using 112 million pathologist-annotated image patches from 1,226 slides, and evaluated on an independent validation dataset of 331 slides, where the reference standard was established by genitourinary specialist pathologists. On the validation dataset, the mean accuracy among 29 general pathologists was 0.61. The DLS achieved a significantly higher diagnostic accuracy of 0.70 (p=0.002) and trended towards better patient risk stratification in correlations to clinical follow-up data. Our approach could improve the accuracy of Gleason scoring and subsequent therapy decisions, particularly where specialist expertise is unavailable. The DLS also goes beyond the current Gleason system to more finely characterize and quantitate tumor morphology, providing opportunities for refinement of the Gleason system itself.

183 citations


Journal ArticleDOI
TL;DR: In this paper, a deep learning algorithm was used to predict refractive error from retinal fundus images and validated it on 24,007 UK Biobank and 15,750 AREDS images.
Abstract: PURPOSE. We evaluate how deep learning can be applied to extract novel information such as refractive error from retinal fundus imaging. METHODS. Retinal fundus images used in this study were 45- and 30-degree field of view images from the UK Biobank and Age-Related Eye Disease Study (AREDS) clinical trials, respectively. Refractive error was measured by autorefraction in UK Biobank and subjective refraction in AREDS. We trained a deep learning algorithm to predict refractive error from a total of 226,870 images and validated it on 24,007 UK Biobank and 15,750 AREDS images. Our model used the ‘‘attention’’ method to identify features that are correlated with refractive error. RESULTS. The resulting algorithm had a mean absolute error (MAE) of 0.56 diopters (95% confidence interval [CI]: 0.55–0.56) for estimating spherical equivalent on the UK Biobank data set and 0.91 diopters (95% CI: 0.89–0.93) for the AREDS data set. The baseline expected MAE (obtained by simply predicting the mean of this population) was 1.81 diopters (95% CI: 1.79–1.84) for UK Biobank and 1.63 (95% CI: 1.60–1.67) for AREDS. Attention maps suggested that the foveal region was one of the most important areas used by the algorithm to make this prediction, though other regions also contribute to the prediction. CONCLUSIONS. To our knowledge, the ability to estimate refractive error with high accuracy from retinal fundus photos has not been previously known and demonstrates that deep learning can be applied to make novel predictions from medical images.

108 citations


Journal ArticleDOI
TL;DR: The Augmented Reality Microscope (ARM) as mentioned in this paper is a cost-effective solution to the integration of AI, which overlays AI-based information onto the current view of the sample through the optical pathway in real-time, enabling seamless integration of the AI into the regular microscopy workflow.
Abstract: The brightfield microscope is instrumental in the visual examination of both biological and physical samples at sub-millimeter scales. One key clinical application has been in cancer histopathology, where the microscopic assessment of the tissue samples is used for the diagnosis and staging of cancer and thus guides clinical therapy. However, the interpretation of these samples is inherently subjective, resulting in significant diagnostic variability. Moreover, in many regions of the world, access to pathologists is severely limited due to lack of trained personnel. In this regard, Artificial Intelligence (AI) based tools promise to improve the access and quality of healthcare. However, despite significant advances in AI research, integration of these tools into real-world cancer diagnosis workflows remains challenging because of the costs of image digitization and difficulties in deploying AI solutions. Here we propose a cost-effective solution to the integration of AI: the Augmented Reality Microscope (ARM). The ARM overlays AI-based information onto the current view of the sample through the optical pathway in real-time, enabling seamless integration of AI into the regular microscopy workflow. We demonstrate the utility of ARM in the detection of lymph node metastases in breast cancer and the identification of prostate cancer with a latency that supports real-time workflows. We anticipate that ARM will remove barriers towards the use of AI in microscopic analysis and thus improve the accuracy and efficiency of cancer diagnosis. This approach is applicable to other microscopy tasks and AI algorithms in the life sciences and beyond.

69 citations


Journal ArticleDOI
TL;DR: In this article, a deep learning model was used to predict center-involved diabetic macular edema (ci-DME) using color fundus photographs and achieved an ROC-AUC of 0.89 (95% CI: 0.87-0.91).
Abstract: Diabetic eye disease is one of the fastest growing causes of preventable blindness. With the advent of anti-VEGF (vascular endothelial growth factor) therapies, it has become increasingly important to detect center-involved diabetic macular edema (ci-DME). However, center-involved diabetic macular edema is diagnosed using optical coherence tomography (OCT), which is not generally available at screening sites because of cost and workflow constraints. Instead, screening programs rely on the detection of hard exudates in color fundus photographs as a proxy for DME, often resulting in high false positive or false negative calls. To improve the accuracy of DME screening, we trained a deep learning model to use color fundus photographs to predict ci-DME. Our model had an ROC-AUC of 0.89 (95% CI: 0.87-0.91), which corresponds to a sensitivity of 85% at a specificity of 80%. In comparison, three retinal specialists had similar sensitivities (82-85%), but only half the specificity (45-50%, p<0.001 for each comparison with model). The positive predictive value (PPV) of the model was 61% (95% CI: 56-66%), approximately double the 36-38% by the retinal specialists. In addition to predicting ci-DME, our model was able to detect the presence of intraretinal fluid with an AUC of 0.81 (95% CI: 0.81-0.86) and subretinal fluid with an AUC of 0.88 (95% CI: 0.85-0.91). The ability of deep learning algorithms to make clinically relevant predictions that generally require sophisticated 3D-imaging equipment from simple 2D images has broad relevance to many other applications in medical imaging.

23 citations


Posted Content
TL;DR: Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate at the cost of slightly higher false positive rates, suggesting that deep learning algorithms may serve as a valuable tool for DR screening.
Abstract: Deep learning algorithms have been used to detect diabetic retinopathy (DR) with specialist-level accuracy. This study aims to validate one such algorithm on a large-scale clinical population, and compare the algorithm performance with that of human graders. 25,326 gradable retinal images of patients with diabetes from the community-based, nation-wide screening program of DR in Thailand were analyzed for DR severity and referable diabetic macular edema (DME). Grades adjudicated by a panel of international retinal specialists served as the reference standard. Across different severity levels of DR for determining referable disease, deep learning significantly reduced the false negative rate (by 23%) at the cost of slightly higher false positive rates (2%). Deep learning algorithms may serve as a valuable tool for DR screening.

13 citations


Posted Content
21 Dec 2018
TL;DR: A deep learning algorithm trained on fundus images alone can detect referable glaucoma risk with higher sensitivity and comparable specificity to eye care providers.
Abstract: Glaucoma is the leading cause of preventable, irreversible blindness world-wide. The disease can remain asymptomatic until severe, and an estimated 50%-90% of people with glaucoma remain undiagnosed. Thus, glaucoma screening is recommended for early detection and treatment. A cost-effective tool to detect glaucoma could expand healthcare access to a much larger patient population, but such a tool is currently unavailable. We trained a deep learning (DL) algorithm using a retrospective dataset of 58,033 images, assessed for gradability, glaucomatous optic nerve head (ONH) features, and referable glaucoma risk. The resultant algorithm was validated using 2 separate datasets. For referable glaucoma risk, the algorithm had an AUC of 0.940 (95%CI, 0.922-0.955) in validation dataset "A" (1,205 images, 1 image/patient; 19% referable where images were adjudicated by panels of fellowship-trained glaucoma specialists) and 0.858 (95% CI, 0.836-0.878) in validation dataset "B" (17,593 images from 9,643 patients; 9.2% referable where images were from the Atlanta Veterans Affairs Eye Clinic diabetic teleretinal screening program using clinical referral decisions as the reference standard). Additionally, we found that the presence of vertical cup-to-disc ratio >= 0.7, neuroretinal rim notching, retinal nerve fiber layer defect, and bared circumlinear vessels contributed most to referable glaucoma risk assessment by both glaucoma specialists and the algorithm. Algorithm AUCs ranged between 0.608-0.977 for glaucomatous ONH features. The DL algorithm was significantly more sensitive than 6 of 10 graders, including 2 of 3 glaucoma specialists, with comparable or higher specificity relative to all graders. A DL algorithm trained on fundus images alone can detect referable glaucoma risk with higher sensitivity and comparable specificity to eye care providers.

Patent
07 Feb 2018
TL;DR: In this paper, the authors proposed a method to solve the problem of the lack of resources in the South Korean market by using the concept of "social media" and "social networks".
Abstract: 리커런트 뉴럴 네트워크를 이용하여 충족되는 컨디션들의 가능성을 예측하기 위한 컴퓨터 저장 매체 상에 인코딩된 컴퓨터 프로그램을 포함하는 방법들, 시스템들, 및 장치가 개시된다. 상기 시스템들 중 하나는, 복수의 타임 스텝들 각각에서 각각의 입력을 포함하는 시간적 시퀀스를 프로세싱하도록 구성되고 그리고 하나 이상의 리커런트 뉴럴 네트워크 계층들; 및 하나 이상의 로지스틱 회귀 노드들을 포함하고, 상기 로지스틱 회귀 노드들 각각은 미리결정된 컨디션들의 세트의 각각의 컨디션에 대응하고, 상기 로지스틱 회귀 노드들 각각은 상기 복수의 타임 스텝들 각각에 대하여, 상기 타임 스텝에 대한 네트워크 내부 상태를 수신하고, 그리고 상기 타임 스텝에 대한 상기 대응 컨디션에 대한 미래 컨디션 점수를 생성하도록, 상기 로지스틱 회귀 노드의 파라미터들의 세트의 현재 값들에 따라 상기 타임 스텝에 대한 네트워크 내부 상태를 프로세싱한다.

Patent
05 Nov 2018
TL;DR: In this paper, the authors proposed a method to combine the following two concepts: (1) 기계 학(2) 하(3) 포함한 (4)
Abstract: 시스템은 하나 이상의 컴퓨터 및 상기 컴퓨터에 의해 실행될 때 상기 컴퓨터로 하여금 예측 출력을 생성하기 위해 입력을 처리하기 위한 복합(combined) 기계 학습 모델을 구현하게 하는 명령들을 저장하는 하나 이상의 저장 디바이스를 포함한다. 상기 결합 모델은, 심층 모델 출력을 생성하기 위해 특징들을 처리하는 심층 기계 학습 모델; 광역 모델 출력을 생성하기 위해 상기 특징들을 처리하는 광역 기계 학습 모델; 및 예측 출력을 생성하기 위해 상기 심층 기계 학습 모델에 의해 생성된 심층 모델 출력과 상기 광역 기계 학습 모델에 의해 생성된 광역 모델 출력을 처리하는 결합 계층을 포함한다. 상기 심층 기계 학습 모델 및 상기 광역 기계 학습 모델을 모두 포함함으로써, 상기 결합 기계 학습 도델은 암기 및 일반화의 이점을 모두 얻을 수 있고 그에 따라 입력 특징들로부터 출력을 예측할 때 더 잘 수행할 수 있다.

26 Nov 2018
TL;DR: In this paper, a deep learning model was trained over 3D CT volumes (400x512x512) as input, and evaluated their approach on the test set, achieving a statistically significant absolute 9.2% (95% CI 8.4, 10.6) higher sensitivity compared to Lung-RADS.
Abstract: PURPOSE Evaluate the utility of deep learning to improve the specificity and sensitivity of lung cancer screening with low-dose helical computed tomography (LDCT), relative to the Lung-RADS guidelines. METHOD AND MATERIALS We analyzed 42,943 CT studies from 14,863 patients, 620 of which developed biopsy-confirmed cancer. All cases were from the National Lung Screening Trial (NLST) study. We randomly split patients into a training (70%), tuning (15%) and test (15%) sets. A study was marked “true” if the patient was diagnosed with biopsy confirmed lung cancer in the same screening year as the study. A deep learning model was trained over 3D CT volumes (400x512x512) as input. We used the 95% specificity operating point based on the tuning set, and evaluated our approach on the test set. To estimate radiologist performance, we retrospectively applied Lung-RADS criteria to each study in the test set. Lung-RADS categories 1 to 2 constitute negative screening results, and categories 3 to 4 constitute positive results. Neither the model nor the Lung-RADS results took into account prior studies, but all screening years were utilized in evaluation. RESULTS The area under the receiver operator curve of the deep learning model was 94.2% (95% CI 91.0, 96.9). Compared to Lung-RADS on the test set, the trained model achieved a statistically significant absolute 9.2% (95% CI 8.4, 10.1) higher specificity and trended a 3.4% (95% CI -5.2, 12.6) higher sensitivity (not statistically significant).Radiologists qualitatively reviewed disagreements between the model and Lung-RADS. Preliminary analysis suggests that the model may be superior in distinguishing scarring from early malignancy. CONCLUSION A deep learning based model improved the specificity of lung cancer screening over Lung-RADS on the NLST dataset and could potentially help reduce unnecessary procedures. This research could supplement future versions of Lung-RADS; or support assisted read or second read workflows. CLINICAL RELEVANCE/APPLICATION While Lung-RADS criteria is recommended for lung cancer screening with LDCT, there is still an opportunity to reduce false-positive rates which lead to unnecessary invasive procedures.