RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism

Open AccessPosted Content

RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism

- 19 Aug 2016 -

TLDR

The REverse Time AttentIoN model (RETAIN) is developed for application to Electronic Health Records (EHR) data and achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits.

Abstract:

Accuracy and interpretability are two dominant features of successful predictive models. Typically, a choice must be made in favor of complex black box models such as recurrent neural networks (RNN) for accuracy versus less accurate but more interpretable traditional models such as logistic regression. This tradeoff poses challenges in medicine where both accuracy and interpretability are important. We addressed this challenge by developing the REverse Time AttentIoN model (RETAIN) for application to Electronic Health Records (EHR) data. RETAIN achieves high accuracy while remaining clinically interpretable and is based on a two-level neural attention model that detects influential past visits and significant clinical variables within those visits (e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that recent clinical visits are likely to receive higher attention. RETAIN was tested on a large health system EHR dataset with 14 million visits completed by 263K patients over an 8 year period and demonstrated predictive accuracy and computational scalability comparable to state-of-the-art methods such as RNN, and ease of interpretability comparable to traditional models.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI

Alejandro Barredo Arrieta, +13 more

- 01 Jun 2020 -

Information Fusion

TL;DR: In this paper, a taxonomy of recent contributions related to explainability of different machine learning models, including those aimed at explaining Deep Learning methods, is presented, and a second dedicated taxonomy is built and examined in detail.

...read moreread less

Journal ArticleDOI

Opportunities and obstacles for deep learning in biology and medicine.

Travers Ching, +38 more

- 01 Apr 2018 -

Journal of the Royal Society Interface

TL;DR: It is found that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art.

...read moreread less

Journal ArticleDOI

Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.

Cao Xiao, +2 more

- 01 Oct 2018 -

Journal of the American Medical Informat...

TL;DR: A systematic review of deep learning models for electronic health record (EHR) data is conducted, and various deep learning architectures for analyzing different data sources and their target applications are illustrated.

...read moreread less

Posted Content

Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting

Bryan Lim, +3 more

- 19 Dec 2019 -

arXiv: Machine Learning

TL;DR: The Temporal Fusion Transformer is introduced -- a novel attention-based architecture which combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics and three practical interpretability use-cases of TFT are showcased.

...read moreread less

Journal ArticleDOI

Time Series Forecasting With Deep Learning: A Survey

Bryan Lim, +1 more

- 28 Apr 2020 -

arXiv: Machine Learning

TL;DR: This article surveys common encoder and decoder designs used in both one-step-ahead and multi-horizon time-series forecasting—describing how temporal information is incorporated into predictions by each model.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

Kyunghyun Cho, +8 more

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Journal ArticleDOI

Learning long-term dependencies with gradient descent is difficult

Yoshua Bengio, +2 more

- 01 Mar 1994 -

IEEE Transactions on Neural Networks

TL;DR: This work shows why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases, and exposes a trade-off between efficient learning by gradient descent and latching on information for long periods.

...read moreread less