scispace - formally typeset
Search or ask a question
Author

Li-wei H. Lehman

Other affiliations: IBM
Bio: Li-wei H. Lehman is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Intensive care & Computer science. The author has an hindex of 18, co-authored 67 publications receiving 6799 citations. Previous affiliations of Li-wei H. Lehman include IBM.


Papers
More filters
Journal ArticleDOI
TL;DR: The Medical Information Mart for Intensive Care (MIMIC-III) as discussed by the authors is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital.
Abstract: MIMIC-III ('Medical Information Mart for Intensive Care') is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

4,056 citations

Journal Article
TL;DR: MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital.
Abstract: MIMIC-III ('Medical Information Mart for Intensive Care') is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

3,543 citations

Proceedings ArticleDOI
01 Sep 2017
TL;DR: A mid-competition bootstrap approach to expert relabeling of the data, levering the best performing Challenge entrants' algorithms to identify contentious labels is implemented, indicating that a voting approach can boost performance.
Abstract: The PhysioNet/Computing in Cardiology (CinC) Challenge 2017 focused on differentiating AF from noise, normal or other rhythms in short term (from 9–61 s) ECG recordings performed by patients. A total of 12,186 ECGs were used: 8,528 in the public training set and 3,658 in the private hidden test set. Due to the high degree of inter-expert disagreement between a significant fraction of the expert labels we implemented a mid-competition bootstrap approach to expert relabeling of the data, levering the best performing Challenge entrants' algorithms to identify contentious labels. A total of 75 independent teams entered the Challenge using a variety of traditional and novel methods, ranging from random forests to a deep learning approach applied to the raw data in the spectral domain. Four teams won the Challenge with an equal high F1 score (averaged across all classes) of 0.83, although the top 11 algorithms scored within 2% of this. A combination of 45 algorithms identified using LASSO achieved an F1 of 0.87, indicating that a voting approach can boost performance.

569 citations

Journal ArticleDOI
TL;DR: In this article, an automated Perl-based de-identification software package is described that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc.
Abstract: Text-based patient medical records are a vital resource in medical research. In order to preserve patient confidentiality, however, the U.S. Health Insurance Portability and Accountability Act (HIPAA) requires that protected health information (PHI) be removed from medical records before they can be disseminated. Manual de-identification of large medical record databases is prohibitively expensive, time-consuming and prone to error, necessitating automatic methods for large-scale, automated de-identification. We describe an automated Perl-based de-identification software package that is generally usable on most free-text medical records, e.g., nursing notes, discharge summaries, X-ray reports, etc. The software uses lexical look-up tables, regular expressions, and simple heuristics to locate both HIPAA PHI, and an extended PHI set that includes doctors' names and years of dates. To develop the de-identification approach, we assembled a gold standard corpus of re-identified nursing notes with real PHI replaced by realistic surrogate information. This corpus consists of 2,434 nursing notes containing 334,000 words and a total of 1,779 instances of PHI taken from 163 randomly selected patient records. This gold standard corpus was used to refine the algorithm and measure its sensitivity. To test the algorithm on data not used in its development, we constructed a second test corpus of 1,836 nursing notes containing 296,400 words. The algorithm's false negative rate was evaluated using this test corpus. Performance evaluation of the de-identification software on the development corpus yielded an overall recall of 0.967, precision value of 0.749, and fallout value of approximately 0.002. On the test corpus, a total of 90 instances of false negatives were found, or 27 per 100,000 word count, with an estimated recall of 0.943. Only one full date and one age over 89 were missed. No patient names were missed in either corpus. We have developed a pattern-matching de-identification system based on dictionary look-ups, regular expressions, and heuristics. Evaluation based on two different sets of nursing notes collected from a U.S. hospital suggests that, in terms of recall, the software out-performs a single human de-identifier (0.81) and performs at least as well as a consensus of two human de-identifiers (0.94). The system is currently tuned to de-identify PHI in nursing notes and discharge summaries but is sufficiently generalized and can be customized to handle text files of any format. Although the accuracy of the algorithm is high, it is probably insufficient to be used to publicly disseminate medical data. The open-source de-identification software and the gold standard re-identified corpus of medical records have therefore been made available to researchers via the PhysioNet website to encourage improvements in the algorithm.

371 citations

Proceedings ArticleDOI
29 Mar 1998
TL;DR: Analysis and simulation results show that ARM yields significant benefits even when less than half the routers within the multicast tree can perform ARM processing.
Abstract: This paper presents a novel loss recovery scheme, active reliable multicast (ARM), for large scale reliable multicast. ARM is "active" in that routers in the multicast tree play an active role in loss recovery. Additionally, ARM utilizes soft-state storage within the network to improve performance and scalability. In the upstream direction, routers suppress duplicate NACKs from multiple receivers to control the implosion problem. By suppressing duplicate NACKs, ARM also lessens the traffic that propagates back through the network, In the downstream direction, routers limit the delivery of repair packets to receivers experiencing loss, thereby reducing network bandwidth consumption. Finally, to reduce wide-area recovery latency and to distribute the retransmission load, routers cache multicast data on a "best effort" basis. ARM is flexible and robust in that it does not require all nodes to be active, nor does it require any specific router or receiver to perform loss recovery. Analysis and simulation results show that ARM yields significant benefits even when less than half the routers within the multicast tree can perform ARM processing.

228 citations


Cited by
More filters
01 Mar 2007
TL;DR: An initiative to develop uniform standards for defining and classifying AKI and to establish a forum for multidisciplinary interaction to improve care for patients with or at risk for AKI is described.
Abstract: Acute kidney injury (AKI) is a complex disorder for which currently there is no accepted definition. Having a uniform standard for diagnosing and classifying AKI would enhance our ability to manage these patients. Future clinical and translational research in AKI will require collaborative networks of investigators drawn from various disciplines, dissemination of information via multidisciplinary joint conferences and publications, and improved translation of knowledge from pre-clinical research. We describe an initiative to develop uniform standards for defining and classifying AKI and to establish a forum for multidisciplinary interaction to improve care for patients with or at risk for AKI. Members representing key societies in critical care and nephrology along with additional experts in adult and pediatric AKI participated in a two day conference in Amsterdam, The Netherlands, in September 2005 and were assigned to one of three workgroups. Each group's discussions formed the basis for draft recommendations that were later refined and improved during discussion with the larger group. Dissenting opinions were also noted. The final draft recommendations were circulated to all participants and subsequently agreed upon as the consensus recommendations for this report. Participating societies endorsed the recommendations and agreed to help disseminate the results. The term AKI is proposed to represent the entire spectrum of acute renal failure. Diagnostic criteria for AKI are proposed based on acute alterations in serum creatinine or urine output. A staging system for AKI which reflects quantitative changes in serum creatinine and urine output has been developed. We describe the formation of a multidisciplinary collaborative network focused on AKI. We have proposed uniform standards for diagnosing and classifying AKI which will need to be validated in future studies. The Acute Kidney Injury Network offers a mechanism for proceeding with efforts to improve patient outcomes.

5,467 citations

Journal ArticleDOI
TL;DR: The Medical Information Mart for Intensive Care (MIMIC-III) as discussed by the authors is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital.
Abstract: MIMIC-III ('Medical Information Mart for Intensive Care') is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

4,056 citations

Journal Article
TL;DR: MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital.
Abstract: MIMIC-III ('Medical Information Mart for Intensive Care') is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hospital. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

3,543 citations

Journal ArticleDOI
TL;DR: How these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems are described.
Abstract: Here we present deep-learning techniques for healthcare, centering our discussion on deep learning in computer vision, natural language processing, reinforcement learning, and generalized methods. We describe how these computational techniques can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging, and we describe the application of natural language processing to domains such as electronic health record data. Similarly, reinforcement learning is discussed in the context of robotic-assisted surgery, and generalized deep-learning methods for genomics are reviewed.

1,843 citations

Journal ArticleDOI
TL;DR: It is demonstrated that an end-to-end deep learning approach can classify a broad range of distinct arrhythmias from single-lead ECGs with high diagnostic performance similar to that of cardiologists.
Abstract: Computerized electrocardiogram (ECG) interpretation plays a critical role in the clinical ECG workflow1. Widely available digital ECG data and the algorithmic paradigm of deep learning2 present an opportunity to substantially improve the accuracy and scalability of automated ECG analysis. However, a comprehensive evaluation of an end-to-end deep learning approach for ECG analysis across a wide variety of diagnostic classes has not been previously reported. Here, we develop a deep neural network (DNN) to classify 12 rhythm classes using 91,232 single-lead ECGs from 53,549 patients who used a single-lead ambulatory ECG monitoring device. When validated against an independent test dataset annotated by a consensus committee of board-certified practicing cardiologists, the DNN achieved an average area under the receiver operating characteristic curve (ROC) of 0.97. The average F1 score, which is the harmonic mean of the positive predictive value and sensitivity, for the DNN (0.837) exceeded that of average cardiologists (0.780). With specificity fixed at the average specificity achieved by cardiologists, the sensitivity of the DNN exceeded the average cardiologist sensitivity for all rhythm classes. These findings demonstrate that an end-to-end deep learning approach can classify a broad range of distinct arrhythmias from single-lead ECGs with high diagnostic performance similar to that of cardiologists. If confirmed in clinical settings, this approach could reduce the rate of misdiagnosed computerized ECG interpretations and improve the efficiency of expert human ECG interpretation by accurately triaging or prioritizing the most urgent conditions. Analysis of electrocardiograms using an end-to-end deep learning approach can detect and classify cardiac arrhythmia with high accuracy, similar to that of cardiologists.

1,632 citations