scispace - formally typeset
Open AccessJournal Article

Motif Discovery in Physiological Datasets: A Methodology for Inferring Predictive Elements

Reads0
Chats0
TLDR
The principle of conservation is used to identify activity that consistently precedes an outcome in patients, and a two-stage process that allows us to efficiently search for such patterns in large datasets is described.
Abstract
In this article, we propose a methodology for identifying predictive physiological patterns in the absence of prior knowledge. We use the principle of conservation to identify activity that consistently precedes an outcome in patients, and describe a two-stage process that allows us to efficiently search for such patterns in large datasets. This involves first transforming continuous physiological signals from patients into symbolic sequences, and then searching for patterns in these reduced representations that are strongly associated with an outcome. Our strategy of identifying conserved activity that is unlikely to have occurred purely by chance in symbolic data is analogous to the discovery of regulatory motifs in genomic datasets. We build upon existing work in this area, generalizing the notion of a regulatory motif and enhancing current techniques to operate robustly on non-genomic data. We also address two significant considerations associated with motif discovery in general: computational efficiency and robustness in the presence of degeneracy and noise. To deal with these issues, we introduce the concept of active regions and new subset-based techniques such as a two-layer Gibbs sampling algorithm. These extensions allow for a framework for information inference, where precursors are identified as approximately conserved activity of arbitrary complexity preceding multiple occurrences of an event. We evaluated our solution on a population of patients who experienced sudden cardiac death and attempted to discover electrocardiographic activity that may be associated with the endpoint of death. To assess the predictive patterns discovered, we compared likelihood scores for motifs in the sudden death population against control populations of normal individuals and those with non-fatal supraventricular arrhythmias. Our results suggest that predictive motif discovery may be able to identify clinically relevant information even in the absence of significant prior knowledge.

read more

Citations
More filters
Journal ArticleDOI

Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns

TL;DR: The novelty of the approach, stems from the integration of sequence-based physiological pattern markers with the sequential CHMM model to learn dynamic physiological behavior, as well as from the coupling of such patterns to build powerful risk stratification models for septic shock patients.
Proceedings ArticleDOI

Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series

TL;DR: VALMOD is introduced, an exact and scalable motif discovery algorithm that efficiently finds all motifs in a given range of lengths and shows that removing the unrealistic assumption that the user knows the correct length can often produce more intuitive and actionable results, which could have been missed otherwise.
Journal ArticleDOI

Learning Multiple Diagnosis Codes for ICU Patients with Local Disease Correlation Mining

TL;DR: This study suggests that problems in the automated diagnosis code annotation can be reliably addressed by using a multi-label learning model that exploits disease correlation.
Journal ArticleDOI

Naïve Bayes Classifier for ECG Abnormalities Using Multivariate Maximal Time Series Motif

TL;DR: The proposed model of predicting Time Series Motif is evaluated with the dataset contains the collection of ECG signals of patients recorded using Holter Monitor and the efficiency of the proposed work is proved by comparing the precision of existing with various Feature extraction Techniques.
Proceedings ArticleDOI

Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining (Best Student Paper Award)

TL;DR: This work introduces Time Series Chains, which are related to, but distinct from, time series motifs, and a scalable algorithm that allows them to discover them in massive datasets.
References
More filters
Journal ArticleDOI

WebLogo: A Sequence Logo Generator

TL;DR: WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment that provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive.
Journal ArticleDOI

A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells

TL;DR: It is proposed that bivalent domains silence developmental genes in ES cells while keeping them poised for activation, highlighting the importance of DNA sequence in defining the initial epigenetic landscape and suggesting a novel chromatin-based mechanism for maintaining pluripotency.
Proceedings Article

Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

TL;DR: The algorithm described in this paper discovers one or more motifs in a collection of DNA or protein sequences by using the technique of expectation maximization to fit a two-component finite mixture model to the set of sequences.

Williamson, estimating the support of a high-dimensional distribution

TL;DR: The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data by carrying out sequential optimization over pairs of input patterns and providing a theoretical analysis of the statistical performance of the algorithm.
Related Papers (5)