scispace - formally typeset
Search or ask a question

Showing papers by "Jeffrey Dean published in 2018"


Journal ArticleDOI
08 May 2018
TL;DR: A representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format is proposed, and it is demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.
Abstract: Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient’s chart.

1,388 citations


Proceedings Article
03 Jul 2018

1,094 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format and demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.
Abstract: Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient's chart.

958 citations


Proceedings ArticleDOI
27 May 2018
TL;DR: In this paper, the authors propose to replace traditional index structures with learned models, which can have significant advantages over traditional indexes, and theoretically analyze under which conditions learned indexes outperform traditional index structure and describe the main challenges in designing learned index structures.
Abstract: Indexes are models: a \btree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term \em learned indexes. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show that our learned indexes can have significant advantages over traditional indexes. More importantly, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work provides just a glimpse of what might be possible.

742 citations


Posted Content
TL;DR: Efficient Neural Architecture Search is a fast and inexpensive approach for automatic model design that establishes a new state-of-the-art among all methods without post-training processing and delivers strong empirical performances using much fewer GPU-hours.
Abstract: We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational graph. The controller is trained with policy gradient to select a subgraph that maximizes the expected reward on the validation set. Meanwhile the model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Thanks to parameter sharing between child models, ENAS is fast: it delivers strong empirical performances using much fewer GPU-hours than all existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On the Penn Treebank dataset, ENAS discovers a novel architecture that achieves a test perplexity of 55.8, establishing a new state-of-the-art among all methods without post-training processing. On the CIFAR-10 dataset, ENAS designs novel architectures that achieve a test error of 2.89%, which is on par with NASNet (Zoph et al., 2018), whose test error is 2.65%.

579 citations


Journal ArticleDOI
TL;DR: Motivation, suggestions, and warnings to computer architects on how to best contribute to the ML revolution are offered.
Abstract: The end of Moores law and Dennard scaling has led to the end of rapid improvement in general-purpose program performance. Machine learning (ML), and in particular deep learning, is an attractive alternative for architects to explore. It has recently revolutionized vision, speech, language understanding, and many other fields, and it promises to help with the grand challenges facing our society. The computation at its core is low-precision linear algebra. Thus, ML is both broad enough to apply to many domains and narrow enough to benefit from domain-specific architectures, such as Googles Tensor Processing Unit (TPU). Moreover, the growth in demand for ML computing exceeds Moores law at its peak, just as it is fading. Hence, ML experts and computer architects must work together to design the computing systems required to deliver on the potential of ML. This article offers motivation, suggestions, and warnings to computer architects on how to best contribute to the ML revolution.

139 citations


Proceedings Article
15 Feb 2018
TL;DR: In this article, a hierarchical model for efficient placement of computational graphs onto hardware devices, especially in heterogeneous environments with a mixture of CPUs, GPUs, and other computational devices is introduced.
Abstract: We introduce a hierarchical model for efficient placement of computational graphs onto hardware devices, especially in heterogeneous environments with a mixture of CPUs, GPUs, and other computational devices The algorithm learns to assign graph operations to groups and to allocate those groups to available devices The grouping and device allocations are learned jointly The proposed algorithm is trained by a policy gradient method and requires no human intervention Experiments with widely-used computer vision and natural language models show that our algorithm can find optimized, non-trivial placements for TensorFlow (TF) computational graphs with over 80,000 operations In addition, our approach outperforms placements by human experts as well as a previous state-of-the-art placement method based on deep reinforcement learning Our method achieves reductions in runtime of up to 606% per training step when applied to models such as Neural Machine Translation

124 citations


Proceedings ArticleDOI
23 Apr 2018
TL;DR: This paper describes the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system, and describes the use of dataflow graphs to represent machine learning models, offering several distinctive features.
Abstract: Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system. For performance, scalability, and expressiveness, a machine learning system must support dynamic control flow in distributed and heterogeneous environments. This paper presents a programming model for distributed machine learning that supports dynamic control flow. We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system. Our approach extends the use of dataflow graphs to represent machine learning models, offering several distinctive features. First, the branches of conditionals and bodies of loops can be partitioned across many machines to run on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs. Second, programs written in our model support automatic differentiation and distributed gradient computations, which are necessary for training machine learning models that use control flow. Third, our choice of non-strict semantics enables multiple loop iterations to execute in parallel across machines, and to overlap compute and I/O operations. We have done our work in the context of TensorFlow, and it has been used extensively in research and production. We evaluate it using several real-world applications, and demonstrate its performance and scalability.

76 citations


Journal ArticleDOI
TL;DR: The Augmented Reality Microscope (ARM) as mentioned in this paper is a cost-effective solution to the integration of AI, which overlays AI-based information onto the current view of the sample through the optical pathway in real-time, enabling seamless integration of the AI into the regular microscopy workflow.
Abstract: The brightfield microscope is instrumental in the visual examination of both biological and physical samples at sub-millimeter scales. One key clinical application has been in cancer histopathology, where the microscopic assessment of the tissue samples is used for the diagnosis and staging of cancer and thus guides clinical therapy. However, the interpretation of these samples is inherently subjective, resulting in significant diagnostic variability. Moreover, in many regions of the world, access to pathologists is severely limited due to lack of trained personnel. In this regard, Artificial Intelligence (AI) based tools promise to improve the access and quality of healthcare. However, despite significant advances in AI research, integration of these tools into real-world cancer diagnosis workflows remains challenging because of the costs of image digitization and difficulties in deploying AI solutions. Here we propose a cost-effective solution to the integration of AI: the Augmented Reality Microscope (ARM). The ARM overlays AI-based information onto the current view of the sample through the optical pathway in real-time, enabling seamless integration of AI into the regular microscopy workflow. We demonstrate the utility of ARM in the detection of lymph node metastases in breast cancer and the identification of prostate cancer with a latency that supports real-time workflows. We anticipate that ARM will remove barriers towards the use of AI in microscopic analysis and thus improve the accuracy and efficiency of cancer diagnosis. This approach is applicable to other microscopy tasks and AI algorithms in the life sciences and beyond.

69 citations


Proceedings ArticleDOI
TL;DR: The TensorFlow programming model as discussed by the authors extends the use of dataflow graphs to represent machine learning models, offering several distinctive features, such as the branches of conditionals and bodies of loops can be partitioned across many machines to run on a set of heterogeneous devices.
Abstract: Many recent machine learning models rely on fine-grained dynamic control flow for training and inference. In particular, models based on recurrent neural networks and on reinforcement learning depend on recurrence relations, data-dependent conditional execution, and other features that call for dynamic control flow. These applications benefit from the ability to make rapid control-flow decisions across a set of computing devices in a distributed system. For performance, scalability, and expressiveness, a machine learning system must support dynamic control flow in distributed and heterogeneous environments. This paper presents a programming model for distributed machine learning that supports dynamic control flow. We describe the design of the programming model, and its implementation in TensorFlow, a distributed machine learning system. Our approach extends the use of dataflow graphs to represent machine learning models, offering several distinctive features. First, the branches of conditionals and bodies of loops can be partitioned across many machines to run on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs. Second, programs written in our model support automatic differentiation and distributed gradient computations, which are necessary for training machine learning models that use control flow. Third, our choice of non-strict semantics enables multiple loop iterations to execute in parallel across machines, and to overlap compute and I/O operations. We have done our work in the context of TensorFlow, and it has been used extensively in research and production. We evaluate it using several real-world applications, and demonstrate its performance and scalability.

48 citations



Proceedings Article
15 Feb 2018
TL;DR: Efficient Neural Architecture Search is a faster and less expensive approach to automated model design than previous methods, and is more than 10x faster and 100x less resource-demanding than NAS.
Abstract: We propose Efficient Neural Architecture Search (ENAS), a faster and less expensive approach to automated model design than previous methods In ENAS, a controller learns to discover neural network architectures by searching for an optimal path within a larger model The controller is trained with policy gradient to select a path that maximizes the expected reward on the validation set Meanwhile the model corresponding to the selected path is trained to minimize the cross entropy loss On the Penn Treebank dataset, ENAS can discover a novel architecture thats achieves a test perplexity of 578, which is state-of-the-art among automatic model design methods on Penn Treebank On the CIFAR-10 dataset, ENAS can design novel architectures that achieve a test error of 289%, close to the 265% achieved by standard NAS (Zoph et al, 2017) Most importantly, our experiments show that ENAS is more than 10x faster and 100x less resource-demanding than NAS

Journal ArticleDOI
10 Oct 2018
TL;DR: The central finding of this work is that a machine learning pipeline operating on an open-source data-format for electronic health records can render accurate predictions across multiple tasks in a way that works for multiple health systems.
Abstract: We thank Prof. Pinker for bringing up important points on how to assess the performance of machine learning models. The central finding of our work is that a machine learning pipeline operating on an open-source data-format for electronic health records can render accurate predictions across multiple tasks in a way that works for multiple health systems. To demonstrate this, we selected three commonly used binary prediction tasks, inpatient mortality, 30-day unplanned readmission, and length of stay, as well as the task of predicting every discharge diagnosis. The main metric we used for the binary predictions was the area-under-thereceiver-operator curve (AUROC). We would first like to clarify a few issues. We would highlight in our results section that we did report the number-needed-toevaluate or work-up to detection ratio for the inpatient mortality model and baseline model, which is (1/PPV) and commonly accepted as a clinically relevant metric. Also, as described in the “Study Cohort” section, we only included hospitalizations of 24 h or longer, and Table 1 reports the inpatient mortality rates of the hospitals to be approximately 2% in that cohort. This should not be confused with 2.3% of patients dying within 24 h. Prof. Pinker states that the public could be mislead by the way the mainstream media had reported the results of our paper. We observed that many reports incorrectly conflated accuracy with AUROC. We take our responsibility seriously to clearly explain our results to a more general audience and had simultaneously released a public blog post. In that post, we talked explicitly about the AUROC: “The most common way to assess accuracy is by a measure called the area-under-the-receiver-operator curve, which measures how well a model distinguishes between a patient who will have a particular future outcome compared to one who will not. In this metric, 1.00 is perfect, and 0.50 is no better than random chance, so higher numbers mean the model is more accurate.” We agree that the AUROC has its limitations, although we would note that no single metric conveys a complete picture of the performance of a model. The AUROC has an advantage of being a commonly reported metric in both clinical and recent machinelearning papers. We did caution in our manuscript that direct comparison of AUROCs from studies using different cohorts is problematic. However, we do agree that the area under the precision-recall curve (AUPRC) is relevant for prediction tasks and can be particularly helpful with clinical tasks with high class imbalance. Therefore, we report the AUPRC for each of the binary prediction tasks for the primary models reported in the manuscript, the clinical baselines, and the enhanced-baselines that we described in the supplemental materials (Table 1). The confidence intervals are calculated by stratified bootstrapping of the positive and negative classes, as is common for this metric. It is worth noting that the models evaluated here were tuned to optimize the AUROC, and it is well-known that a model tuned for optimizing AUROC does not necessarily optimize AUPRC (and vice-versa). The size of the test set (9624 for Hospital A and 12,127 for Hospital B) limits the power to make comparisons between models, although the point-estimates are higher for the deep learning models for each case.

Patent
27 Apr 2018
TL;DR: In this article, the authors propose to insert send and receive nodes into each subgraph to enable pairs of unique devices to conduct communication with each other in a self-sufficient manner.
Abstract: To provide methods and the like for modifying a computational graph to include send and receive nodes.SOLUTION: Communication between unique devices performing operations of different subgraphs of the computational graph is handled efficiently by inserting send and receive nodes into each subgraph. When executed, the operations that these send and receive nodes represent enable pairs of unique devices to conduct communication with each other in a self-sufficient manner. This shifts a burden of coordinating communication away from a backend, which affords a system that processes this computational graph representation an opportunity to perform one or more other processes while devices are executing subgraphs.SELECTED DRAWING: Figure 5

Patent
07 Feb 2018
TL;DR: In this paper, the authors proposed a method to solve the problem of the lack of resources in the South Korean market by using the concept of "social media" and "social networks".
Abstract: 리커런트 뉴럴 네트워크를 이용하여 충족되는 컨디션들의 가능성을 예측하기 위한 컴퓨터 저장 매체 상에 인코딩된 컴퓨터 프로그램을 포함하는 방법들, 시스템들, 및 장치가 개시된다. 상기 시스템들 중 하나는, 복수의 타임 스텝들 각각에서 각각의 입력을 포함하는 시간적 시퀀스를 프로세싱하도록 구성되고 그리고 하나 이상의 리커런트 뉴럴 네트워크 계층들; 및 하나 이상의 로지스틱 회귀 노드들을 포함하고, 상기 로지스틱 회귀 노드들 각각은 미리결정된 컨디션들의 세트의 각각의 컨디션에 대응하고, 상기 로지스틱 회귀 노드들 각각은 상기 복수의 타임 스텝들 각각에 대하여, 상기 타임 스텝에 대한 네트워크 내부 상태를 수신하고, 그리고 상기 타임 스텝에 대한 상기 대응 컨디션에 대한 미래 컨디션 점수를 생성하도록, 상기 로지스틱 회귀 노드의 파라미터들의 세트의 현재 값들에 따라 상기 타임 스텝에 대한 네트워크 내부 상태를 프로세싱한다.

Patent
06 Jul 2018
TL;DR: In this article, the authors proposed a method to improve the quality of the service provided by the service provider, which is based on the assumption that the user is satisfied with the service of a service provider.
Abstract: 컴퓨터 저장 매체상에 인코딩된 컴퓨터 프로그램들을 포함하는 방법들, 시스템들 및 장치들은, 계산 그래프를 프로세싱하기 위해 클라이언트로부터 요청을 수신하고; 상기 계산 그래프를 나타내는 데이터를 획득하고 -상기 계산 그래프는 복수의 노드들 및 방향 에지들을 포함하고, 각 노드는 각각의 오퍼레이션을 나타내고, 각각의 방향 에지는 각각의 제1 노드를 상기 각각의 제1 노드에 의해 표현되는 오퍼레이션의 출력을 입력으로서 수신하는 오퍼레이션을 나타내는 각각의 제2 노드에 연결함-; 요청된 오퍼레이션을 수행하기 위해 복수의 이용 가능한 디바이스들을 식별하고; 상기 계산 그래프를 복수의 서브그래프들로 분할하고 -서브그래프 각각은 상기 계산 그래프 내의 하나 이상의 노드들을 포함함-; 그리고 각각의 서브그래프에 대해, 상기 서브그래프 내의 상기 하나 이상의 노드들에 의해 표현된 오퍼레이션들을 오퍼레이션을 위한 복수의 이용 가능한 디바이스들 내의 각각의 이용 가능한 디바이스에 할당하는 것을 포함한다.

Patent
25 Jun 2018
TL;DR: In this paper, the authors propose a method to solve the problem of how to improve the quality of a user's experience in the presence of a malicious attacker, which is a challenge for any user.
Abstract: 방법, 시스템 및 장치는 송신 및 수신 노드를 포함하도록 연산 그래프를 수정하기 위한, 컴퓨터 저장 매체에 인코딩된 컴퓨터 프로그램을 포함한다. 연산 그래프의 상이한 서브 그래프들의 연산을 수행하는 고유 디바이스들 간의 통신은 송신 및 수신 노드를 각 서브 그래프에 삽입함으로써 효율적으로 처리될 수 있다. 실행시, 이들 송신 및 수신 노드가 나타내는 연산들은 고유 디바이스 쌍이 자립적 방법으로 서로 통신할 수 있게 한다. 이는 백엔드에서 통신을 조정하는 부담을 시프트시켜, 디바이스들이 서브 그래프를 실행하는 동안 이 연산 그래프 표현을 처리하는 시스템이 하나 이상의 다른 프로세스를 수행할 수 있는 기회를 제공한다.