Home
/
Authors
/
Lu Shen

Author

Lu Shen

Bio: Lu Shen is an academic researcher from Beth Israel Deaconess Medical Center. The author has contributed to research in topics: Electronic health record & Computer science. The author has an hindex of 2, co-authored 2 publications receiving 4760 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

MIMIC-III, a freely accessible critical care database

[...]

Alistair E. W. Johnson¹, Tom J. Pollard¹, Lu Shen², Li-wei H. Lehman¹, Mengling Feng¹, Mengling Feng³, Mohammad M. Ghassemi¹, Benjamin Moody¹, Peter Szolovits¹, Leo Anthony Celi², Leo Anthony Celi¹, Roger G. Mark², Roger G. Mark¹ - Show less +9 more•Institutions (3)

Massachusetts Institute of Technology¹, Beth Israel Deaconess Medical Center², Institute for Infocomm Research Singapore³

24 May 2016-Scientific Data

TL;DR: The Medical Information Mart for Intensive Care (MIMIC-III) as discussed by the authors is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital.

...read moreread less

Abstract: MIMIC-III ('Medical Information Mart for Intensive Care') is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

...read moreread less

4,056 citations

Journal Article•

MIMIC-III, a freely accessible critical care database

[...]

Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, Li-wei H. Lehman, Mengling Feng, Mohammad M. Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, Roger G. Mark - Show less +6 more

01 May 2016-Scientific Reports

TL;DR: MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital.

...read moreread less

3,543 citations

Journal Article•DOI•

MIMIC-IV, a freely accessible electronic health record dataset

[...]

Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J. Pollard, Benjamin Moody, Brian Gow, Li-wei H. Lehman, Leo Anthony Celi, Roger G. Mark - Show less +8 more

03 Jan 2023-Scientific Data

TL;DR: MIMIC-IV as mentioned in this paper is a publicly available database sourced from the electronic health record of the Beth Israel Deaconess Medical Center, which contains valuable information on the care of patients and their response to treatments, offering exciting opportunities for research.

...read moreread less

Abstract: Abstract Digital data collection during routine clinical practice is now ubiquitous within hospitals. The data contains valuable information on the care of patients and their response to treatments, offering exciting opportunities for research. Typically, data are stored within archival systems that are not intended to support research. These systems are often inaccessible to researchers and structured for optimal storage, rather than interpretability and analysis. Here we present MIMIC-IV, a publicly available database sourced from the electronic health record of the Beth Israel Deaconess Medical Center. Information available includes patient measurements, orders, diagnoses, procedures, treatments, and deidentified free-text clinical notes. MIMIC-IV is intended to support a wide array of research studies and educational material, helping to reduce barriers to conducting clinical research.

...read moreread less

57 citations

Journal Article•DOI•

Author Correction: MIMIC-IV, a freely accessible electronic health record dataset

[...]

16 Jan 2023-Scientific Data

2 citations

Journal Article•DOI•

Author Correction: MIMIC-IV, a freely accessible electronic health record dataset.

[...]

Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J. Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, Li-wei H. Lehman, Leo Anthony Celi, Roger G. Mark - Show less +9 more

18 Apr 2023-Scientific Data

1 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

A guide to deep learning in healthcare.

[...]

Andre Esteva¹, Alexandre Robicquet¹, Bharath Ramsundar¹, Volodymyr Kuleshov¹, Mark A. DePristo², Katherine Chou², Claire Cui², Greg S. Corrado², Sebastian Thrun¹, Jeffrey Dean² - Show less +6 more•Institutions (2)

Stanford University¹, Google²

01 Jan 2019-Nature Medicine

TL;DR: How these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems are described.

...read moreread less

Abstract: Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data. Similarly, reinforcement learning is discussed in the context of robotic-assisted surgery, and generalized deep-learning methods for genomics are reviewed.

...read moreread less

1,843 citations

Journal Article•DOI•

Scalable and accurate deep learning with electronic health records

[...]

Alvin Rajkomar¹, Alvin Rajkomar², Eyal Oren¹, Kai Chen¹, Andrew M. Dai¹, Nissan Hajaj¹, Michaela Hardt¹, Peter J. Liu¹, Xiaobing Liu¹, Jake Marcus¹, Mimi Sun¹, Patrik Sundberg¹, Hector Yee¹, Kun Zhang¹, Yi Zhang¹, Gerardo Flores¹, Gavin E. Duggan¹, Jamie Irvine¹, Quoc V. Le¹, Kurt Litsch¹, Alexander Mossin¹, Justin Tansuwan¹, De Wang¹, James Wexler¹, Jimbo Wilson¹, Dana Ludwig², Samuel L. Volchenboum³, Katherine Chou¹, Michael Pearson¹, Srinivasan Madabushi¹, Nigam H. Shah⁴, Atul J. Butte², Michael D. Howell¹, Claire Cui¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +32 more•Institutions (4)

Google¹, University of California, San Francisco², University of Chicago³, Stanford University⁴

08 May 2018

TL;DR: A representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format is proposed, and it is demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.

...read moreread less

Abstract: Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record. We propose a representation of patients’ entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93–0.94), 30-day unplanned readmission (AUROC 0.75–0.76), prolonged length of stay (AUROC 0.85–0.86), and all of a patient’s final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient’s chart.

...read moreread less

1,388 citations

Journal Article•DOI•

Recurrent Neural Networks for Multivariate Time Series with Missing Values.

[...]

Zhengping Che¹, Sanjay Purushotham¹, Kyunghyun Cho², David Sontag³, Yan Liu¹ - Show less +1 more•Institutions (3)

University of Southern California¹, New York University², Massachusetts Institute of Technology³

17 Apr 2018-Scientific Reports

TL;DR: In this article, a deep learning model based on Gated Recurrent Unit (GRU) is proposed to exploit the missing values and their missing patterns for effective imputation and improving prediction performance.

...read moreread less

Abstract: Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely GRU-D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.

...read moreread less

1,085 citations

Book Chapter•DOI•

C-store: a column-oriented DBMS

[...]

Michael Stonebraker¹, Daniel J. Abadi¹, Adam Batkin², Xuedong Chen³, Mitch Cherniack², Miguel Ferreira¹, Edmond Lau¹, Amerson Lin¹, Samuel Madden¹, Elizabeth O'Neil³, Patrick O'Neil³, Alexander Rasin⁴, Nga Tran², Stan Zdonik⁴ - Show less +10 more•Institutions (4)

Massachusetts Institute of Technology¹, Brandeis University², University of Massachusetts Boston³, Brown University⁴

01 Dec 2018

TL;DR: Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products.

...read moreread less

Abstract: This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.We present preliminary performance data on a subset of TPC-H and show that the system we are building, C-Store, is substantially faster than popular commercial products. Hence, the architecture looks very encouraging.

...read moreread less

1,063 citations

Journal Article•DOI•

Scalable and accurate deep learning for electronic health records

[...]

Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M. Dai, Nissan Hajaj, Peter J. Liu, Xiaobing Liu, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Gavin E. Duggan, Gerardo Flores, Michaela Hardt, Jamie Irvine, Quoc V. Le, Kurt Litsch, Jake Marcus, Alexander Mossin, Justin Tansuwan, De Wang, James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L. Volchenboum, Katherine Chou, Michael Pearson, Srinivasan Madabushi, Nigam H. Shah, Atul J. Butte, Michael D. Howell, Claire Cui, Greg S. Corrado, Jeffrey Dean - Show less +30 more

24 Jan 2018-arXiv: Computers and Society

TL;DR: In this paper, the authors proposed a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format and demonstrated that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization.

...read moreread less

Abstract: Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire, raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two U.S. academic medical centers with 216,221 adult patients hospitalized for at least 24 hours. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting in-hospital mortality (AUROC across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed state-of-the-art traditional predictive models in all cases. We also present a case-study of a neural-network attribution system, which illustrates how clinicians can gain some transparency into the predictions. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios, complete with explanations that directly highlight evidence in the patient's chart.

...read moreread less

958 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse