Motif Discovery in Physiological Datasets: A Methodology for Inferring Predictive Elements

Open AccessJournal Article

Motif Discovery in Physiological Datasets: A Methodology for Inferring Predictive Elements

Zeeshan Syed, +4 more

- 01 Jan 2010 -

PubMed Central

Chats0

TLDR

The principle of conservation is used to identify activity that consistently precedes an outcome in patients, and a two-stage process that allows us to efficiently search for such patterns in large datasets is described.

Abstract:

In this article, we propose a methodology for identifying predictive physiological patterns in the absence of prior knowledge. We use the principle of conservation to identify activity that consistently precedes an outcome in patients, and describe a two-stage process that allows us to efficiently search for such patterns in large datasets. This involves first transforming continuous physiological signals from patients into symbolic sequences, and then searching for patterns in these reduced representations that are strongly associated with an outcome. Our strategy of identifying conserved activity that is unlikely to have occurred purely by chance in symbolic data is analogous to the discovery of regulatory motifs in genomic datasets. We build upon existing work in this area, generalizing the notion of a regulatory motif and enhancing current techniques to operate robustly on non-genomic data. We also address two significant considerations associated with motif discovery in general: computational efficiency and robustness in the presence of degeneracy and noise. To deal with these issues, we introduce the concept of active regions and new subset-based techniques such as a two-layer Gibbs sampling algorithm. These extensions allow for a framework for information inference, where precursors are identified as approximately conserved activity of arbitrary complexity preceding multiple occurrences of an event. We evaluated our solution on a population of patients who experienced sudden cardiac death and attempted to discover electrocardiographic activity that may be associated with the endpoint of death. To assess the predictive patterns discovered, we compared likelihood scores for motifs in the sudden death population against control populations of normal individuals and those with non-fatal supraventricular arrhythmias. Our results suggest that predictive motif discovery may be able to identify clinically relevant information even in the absence of significant prior knowledge.

Motif Discovery in Physiological Datasets: A Methodology for Inferring Predictive Elements

Citations

Septic shock prediction for ICU patients via coupled HMM walking on sequential contrast patterns

Matrix Profile X: VALMOD - Scalable Discovery of Variable-Length Motifs in Data Series

Learning Multiple Diagnosis Codes for ICU Patients with Local Disease Correlation Mining

Naïve Bayes Classifier for ECG Abnormalities Using Multivariate Maximal Time Series Motif

Matrix Profile VII: Time Series Chains: A New Primitive for Time Series Data Mining (Best Student Paper Award)

References

WebLogo: A Sequence Logo Generator

A Bivalent Chromatin Structure Marks Key Developmental Genes in Embryonic Stem Cells

Fitting a mixture model by expectation maximization to discover motifs in biopolymers.

Williamson, estimating the support of a high-dimensional distribution

Genome-wide maps of chromatin state in pluripotent and lineage-committed cells

Related Papers (5)

LOGOS: a modular Bayesian model for de novo motif detection

MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences

WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches

Discriminative motif analysis of high-throughput dataset

Discovering multiple realistic TFBS motifs based on a generalized model