scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Hidden Markov Models in Bioinformatics

TL;DR: This survey considers the major bioinformatics applications ofHidden Markov Models, such as alignment, labeling, and profiling of sequences, protein structure prediction, and pattern recognition, and provides a critical appraisal of the use and perspectives of HMMs.
Abstract: Hidden Markov Models (HMMs) became recently important and popular among bioinformatics researchers, and many software tools are based on them. In this survey, we first consider in some detail the mathematical foundations of HMMs, we describe the most important algorithms, and provide useful comparisons, pointing out advantages and drawbacks. We then consider the major bioinformatics applications, such as alignment, labeling, and profiling of sequences, protein structure prediction, and pattern recognition. We finally provide a critical appraisal of the use and perspectives of HMMs in bioinformatics.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: How the availability of high-throughput sequencing technologies has transformed microbiology and bioinformatics and how to tackle the inherent computational challenges that arise from the DNA sequencing revolution is reviewed.
Abstract: The study of microorganisms that pervade each and every part of this planet has encountered many challenges through time such as the discovery of unknown organisms and the understanding of how they interact with their environment. The aim of this review is to take the reader along the timeline and major milestones that led us to modern metagenomics. This new and thriving area is likely to be an important contributor to solve different problems. The transition from classical microbiology to modern metagenomics studies has required the development of new branches of knowledge and specialization. Here, we will review how the availability of high-throughput sequencing technologies has transformed microbiology and bioinformatics and how to tackle the inherent computational challenges that arise from the DNA sequencing revolution. New computational methods are constantly developed to collect, process, and extract useful biological information from a variety of samples and complex datasets, but metagenomics needs the integration of several of these computational methods. Despite the level of specialization needed in bioinformatics, it is important that life-scientists have a good understanding of it for a correct experimental design, which allows them to reveal the information in a metagenome.

255 citations


Cites methods from "Hidden Markov Models in Bioinformat..."

  • ...The program collects ribosomal sequences from short reads by using a HiddenMarkov Models (HMM)-based reconstruction algorithm (De Fonzo et al., 2007)....

    [...]

Journal ArticleDOI
TL;DR: Proposed cybersecurity framework uses Markov model, Intrusion Detection System (IDS), and Virtual Honeypot Device (VHD) to identify malicious edge device in fog computing environment and results indicated that proposed cybersecurity framework is successful in identifying the malicious device as well as reducing the false IDS alarm rate.

179 citations


Cites methods from "Hidden Markov Models in Bioinformat..."

  • ...3 Two-stage Markov Model Training Mendel HMM toolbox of Matlab is used for the training of two-stage Markov model [50]....

    [...]

  • ...These values are predicted with the help of viterbi algorithm of the proposed cybersecurity framework [50]....

    [...]

Journal Article
TL;DR: A careful analysis of a family of algorithmically defined decoders aiming to hybridize the two standard ones was proposed elsewhere, and several problems and issues with it and other previously proposed approaches are identified, and practical resolutions of those are proposed.
Abstract: Motivated by the unceasing interest in hidden Markov models (HMMs), this paper reexamines hidden path inference in these models, using primarily a risk-based framework. While the most common maximum a posteriori (MAP), or Viterbi, path estimator and the minimum error, or Posterior Decoder (PD) have long been around, other path estimators, or decoders, have been either only hinted at or applied more recently and in dedicated applications generally unfamiliar to the statistical learning community. Over a decade ago, however, a family of algorithmically defined decoders aiming to hybridize the two standard ones was proposed elsewhere. The present paper gives a careful analysis of this hybridization approach, identifies several problems and issues with it and other previously proposed approaches, and proposes practical resolutions of those. Furthermore, simple modifications of the classical criteria for hidden path recognition are shown to lead to a new class of decoders. Dynamic programming algorithms to compute these decoders in the usual forward-backward manner are presented. A particularly interesting subclass of such estimators can be also viewed as hybrids of the MAP and PD estimators. Similar to previously proposed MAP-PD hybrids, the new class is parameterized by a small number of tunable parameters. Unlike their algorithmic predecessors, the new risk-based decoders are more clearly interpretable, and, most importantly, work "out-of-the box" in practice, which is demonstrated on some real bioinformatics tasks and data. Some further generalizations and applications are discussed in the conclusion.

153 citations

Proceedings ArticleDOI
09 Mar 2009
TL;DR: This paper validated existence of a Markov chain for sub-band utilization by PUs over time using real-time measurements collected in the paging band (928–948 MHz) and formulated a spectrum sensing paradigm as a Hidden Markov model that predicts the true states of a sub- band.
Abstract: The primary function of a cognitive radio is to detect idle frequencies or sub-bands, not used by the primary users (PUs), and allocate these frequencies to secondary users. The state of the sub-band at any time point is either free (unoccupied by a PU) or busy (occupied by a PU). The states of a sub-band are monitored over L consecutive time periods, where each period is of a given time interval. Existing research assume the presence of a Markov chain for sub-band utilization by PUs over time, but this assumption has not been validated. Therefore, in this paper we validate existence of a Markov chain for sub-band utilization using real-time measurements collected in the paging band (928–948 MHz). Furthermore, since the detection of idle sub-bands by a cognitive radio is prone to errors, we probabilistically model the errors and then formulate a spectrum sensing paradigm as a Hidden Markov model that predicts the true states of a sub-band. The accuracy of our proposed method in predicting the true states of the sub-band is substantiated using extensive simulations.

147 citations


Cites background from "Hidden Markov Models in Bioinformat..."

  • ...Hence, the Markov chain, constituting the true sequence Y , is hidden and the name for this type of model is hidden Markov model (HMM) [8], [9]....

    [...]

Journal ArticleDOI
TL;DR: It is shown that the intelligent forwarders can provide the remote sensors with context-awareness and transmit only important information to the big data server for analytics when certain behaviours happen and avoid overwhelming communication and data storage.
Abstract: An increasing number of the elderly population wish to live an independent lifestyle, rather than rely on intrusive care programmes. A big data solution is presented using wearable sensors capable of carrying out continuous monitoring of the elderly, alerting the relevant caregivers when necessary and forwarding pertinent information to a big data system for analysis. A challenge for such a solution is the development of context-awareness through the multidimensional, dynamic and nonlinear sensor readings that have a weak correlation with observable human behaviours and health conditions. To address this challenge, a wearable sensor system with an intelligent data forwarder is discussed in this paper. The forwarder adopts a Hidden Markov Model for human behaviour recognition. Locality sensitive hashing is proposed as an efficient mechanism to learn sensor patterns. A prototype solution is implemented to monitor health conditions of dispersed users. It is shown that the intelligent forwarders can provide the remote sensors with context-awareness. They transmit only important information to the big data server for analytics when certain behaviours happen and avoid overwhelming communication and data storage. The system functions unobtrusively, whilst giving the users peace of mind in the knowledge that their safety is being monitored and analysed.

119 citations


Cites methods from "Hidden Markov Models in Bioinformat..."

  • ...Within the bioscience field, for example, the model is ideal for gene prediction—where each state emits random DNA strings of random length, which are observable as a means to determine the gene producing them [10]—and in protein structure prediction and genetic mapping [11]....

    [...]

References
More filters
Journal ArticleDOI
Lawrence R. Rabiner1
01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

21,819 citations

Proceedings Article
28 Jun 2001
TL;DR: This work presents iterative parameter estimation algorithms for conditional random fields and compares the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.
Abstract: We present conditional random fields , a framework for building probabilistic models to segment and label sequence data. Conditional random fields offer several advantages over hidden Markov models and stochastic grammars for such tasks, including the ability to relax strong independence assumptions made in those models. Conditional random fields also avoid a fundamental limitation of maximum entropy Markov models (MEMMs) and other discriminative Markov models based on directed graphical models, which can be biased towards states with few successor states. We present iterative parameter estimation algorithms for conditional random fields and compare the performance of the resulting models to HMMs and MEMMs on synthetic and natural-language data.

13,190 citations


"Hidden Markov Models in Bioinformat..." refers methods in this paper

  • ...A special kind of enhancement of MEMMs, are the so-called Conditional Random Fields (CRFs) [61], introduced by Lafferty et al. [ 62 ]....

    [...]

Journal ArticleDOI
TL;DR: A new membrane protein topology prediction method, TMHMM, based on a hidden Markov model is described and validated, and it is discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C-in topologies.

11,453 citations


"Hidden Markov Models in Bioinformat..." refers background in this paper

  • ...Some authors [35, 36, 37, 38] specialised the HMM architecture to predict the topology of helical transmembrane proteins....

    [...]

Proceedings ArticleDOI
01 Jul 1992
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

11,211 citations


"Hidden Markov Models in Bioinformat..." refers methods in this paper

  • ...For example, HMMs are also used for bioinformatic predictions together with the so-called Support Vector Machine (SVM) [64], a technique based on the Vapnik-Chervonenkis theory [ 65 ] that produces decision surfaces in multidimensional spaces, in order to perform various kinds...

    [...]

Trending Questions (1)
How to implement hidden Markov model in python?

Hidden Markov Models (HMMs) became recently important and popular among bioinformatics researchers, and many software tools are based on them.