scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes

TL;DR: A new membrane protein topology prediction method, TMHMM, based on a hidden Markov model is described and validated, and it is discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C-in topologies.
About: This article is published in Journal of Molecular Biology.The article was published on 2001-01-19. It has received 11453 citations till now. The article focuses on the topics: Integral membrane protein & Membrane protein.
Citations
More filters
Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations


Cites methods from "Predicting transmembrane protein to..."

  • ...These predictions are pre-computed over the sequence database by the following third party programs: TMHMM ( 10 ) (transmembrane regions), SignalP (11) (signal peptide regions), ncoils (12) (coiled-coil regions) and SEG (9) (low complexity regions)....

    [...]

Journal ArticleDOI
TL;DR: Improvements of the currently most popular method for prediction of classically secreted proteins, SignalP, which consists of two different predictors based on neural network and hidden Markov model algorithms, where both components have been updated.

6,492 citations

Journal ArticleDOI
TL;DR: A new Java-based architecture for the widely used protein function prediction software package InterProScan is described, resulting in a flexible and stable system that is able to use both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis.
Abstract: Motivation: Robust, large-scale sequence analysis is a major challenge in modern genomic science, where biologists are frequently trying to characterise many millions of sequences. Here we describe a new Java-based architecture for the widely-used protein function prediction software package InterProScan. Developments include improvements and additions to the outputs of the software and the complete re-implementation of the software framework, resulting in a flexible and stable system that is able to utilise both multiprocessor machines and/or conventional clusters to achieve scalable distributed data analysis. InterProScan is freely available for download from the EMBl-EBI FTP site and the (open) source code is hosted at Google Code. Availability: InterProScan is distributed via FTP at ftp://ftp.ebi.ac.uk/pub/software/unix/iprscan/5/ and the source code is available from http://code.google.com/p/interproscan/. Contact: http://www.ebi.ac.uk/support or interhelp@ebi.ac.uk

5,434 citations


Cites background from "Predicting transmembrane protein to..."

  • ...…(Haft et al., 2012), SMART (Letunic et al., 2012), PIRSF (Wu et al., 2004), Panther (Mi et al., 2012), HAMAP (Pedruzzi et al., 2012), Prosite (Sigrist et al., 2012), ProDom (Bru et al., 2005), PRINTS (Attwood et al., 2012), CATHGene3D (Lees et al., 2012) and SUPERFAMILY (De Lima…...

    [...]

Book
16 Dec 2008
TL;DR: The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.
Abstract: The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building large-scale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide variety of algorithms — among them sum-product, cluster variational methods, expectation-propagation, mean field methods, max-product and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.

4,335 citations


Cites methods from "Predicting transmembrane protein to..."

  • ...These and other biological facts are used to design the states and state transition matrix of the transmembrane hidden Markov model, an HMM for modeling membrane proteins [138]....

    [...]

Journal ArticleDOI
03 Oct 2002-Nature
TL;DR: The genome sequence of P. falciparum clone 3D7 is reported, which is the most (A + T)-rich genome sequenced to date and is being exploited in the search for new drugs and vaccines to fight malaria.
Abstract: The parasite Plasmodium falciparum is responsible for hundreds of millions of cases of malaria, and kills more than one million African children annually. Here we report an analysis of the genome sequence of P. falciparum clone 3D7. The 23-megabase nuclear genome consists of 14 chromosomes, encodes about 5,300 genes, and is the most (A + T)-rich genome sequenced to date. Genes involved in antigenic variation are concentrated in the subtelomeric regions of the chromosomes. Compared to the genomes of free-living eukaryotic microbes, the genome of this intracellular parasite encodes fewer enzymes and transporters, but a large proportion of genes are devoted to immune evasion and host-parasite interactions. Many nuclear-encoded proteins are targeted to the apicoplast, an organelle involved in fatty-acid and isoprenoid metabolism. The genome sequence provides the foundation for future studies of this organism, and is being exploited in the search for new drugs and vaccines to fight malaria.

4,312 citations

References
More filters
Journal ArticleDOI
Lawrence R. Rabiner1
01 Feb 1989
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Abstract: This tutorial provides an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and gives practical details on methods of implementation of the theory along with a description of selected applications of the theory to distinct problems in speech recognition. Results from a number of original sources are combined to provide a single source of acquiring the background required to pursue further this area of research. The author first reviews the theory of discrete Markov chains and shows how the concept of hidden states, where the observation is a probabilistic function of the state, can be used effectively. The theory is illustrated with two simple examples, namely coin-tossing, and the classic balls-in-urns system. Three fundamental problems of HMMs are noted and several practical techniques for solving these problems are given. The various types of HMMs that have been studied, including ergodic as well as left-right models, are described. >

21,819 citations

Journal ArticleDOI
TL;DR: A new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequence that performs significantly better than previous prediction schemes and can easily be applied on genome-wide data sets.
Abstract: We have developed a new method for the identification of signal peptides and their cleavage sites based on neural networks trained on separate sets of prokaryotic and eukaryotic sequence. The method performs significantly better than previous prediction schemes and can easily be applied on genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal-anchor sequences is also possible, though with lower precision. Predictions can be made on a publicly available WWW server.

5,480 citations


"Predicting transmembrane protein to..." refers methods in this paper

  • ...A set of signal peptides used for training of SignalP (Nielsen et al., 1997) was used to test the discrimination between signal peptides and membrane helices....

    [...]

  • ...These proteins were analyzed with SignalP-HMM (Nielsen & Krogh, 1998), and if a signal peptide was predicted, it was removed from the protein....

    [...]

  • ...Such proteins were sent to SignalP- HMM (http://www.cbs.dtu.dk/services/SignalP-2.0/), and if a cleavage site was predicted with a probability of more than 0.5, the predicted signal peptide was cleaved off....

    [...]

  • ...This was done only for the eukaryotes and the Gram-positive and Gram-negative bacteria because SignalP is only developed for these groups of organisms (see Materials and Methods for details)....

    [...]

  • ...A preliminary test of the accuracy of SignalP- HMM reveals that about 80 % of the true signal peptides are found, and 20 % of transmembrane helices are mistaken for signal peptides in eukaryotes ....

    [...]

01 Jan 1997
TL;DR: In this paper, a new method for the identification of in performance compared with the weight matrix method signal peptides and their cleavage sites based on neural (Arrigo et al., 1991; Ladunga et al, 1991; Schneider and networks trained on separate sets of prokaryotic and eukaryotic sequence.
Abstract: applicable prediction methods with significant improvements We have developed a new method for the identification of in performance compared with the weight matrix method signal peptides and their cleavage sites based on neural (Arrigo et al., 1991; Ladunga et al., 1991; Schneider and networks trained on separate sets of prokaryotic and Wrede, 1993). eukaryotic sequence. The method performs significantly better than previous prediction schemes and can easily be Materials and methods applied on genome-wide data sets. Discrimination between cleaved signal peptides and uncleaved N-terminal signal- The data were taken from SWISS-PROT version 29 (Bairoch anchor sequences is also possible, though with lower preci- and Boeckmann, 1994). The data sets were divided into sion. Predictions can be made on a publicly available prokaryotic and eukaryotic entries and the prokaryotic data sets WWW server.

5,191 citations

Proceedings Article
01 Jul 1998
TL;DR: The transmembrane HMM, TMHMM, correctly predicts the entire topology for 77% of the sequences in a standard dataset of 83 proteins with known topology, and the same accuracy was achieved on a larger dataset of 160 proteins.
Abstract: A novel method to model and predict the location and orientation of alpha helices in membrane- spanning proteins is presented. It is based on a hidden Markov model (HMM) with an architecture that corresponds closely to the biological system. The model is cyclic with 7 types of states for helix core, helix caps on either side, loop on the cytoplasmic side, two loops for the non-cytoplasmic side, and a globular domain state in the middle of each loop. The two loop paths on the non-cytoplasmic side are used to model short and long loops separately, which corresponds biologically to the two known different membrane insertions mechanisms. The close mapping between the biological and computational states allows us to infer which parts of the model architecture are important to capture the information that encodes the membrane topology, and to gain a better understanding of the mechanisms and constraints involved. Models were estimated both by maximum likelihood and a discriminative method, and a method for reassignment of the membrane helix boundaries were developed. In a cross validated test on single sequences, our transmembrane HMM, TMHMM, correctly predicts the entire topology for 77% of the sequences in a standard dataset of 83 proteins with known topology. The same accuracy was achieved on a larger dataset of 160 proteins. These results compare favourably with existing methods.

2,518 citations


"Predicting transmembrane protein to..." refers methods or result in this paper

  • ...In the third and ®nal stage of estimation, the model from stage two was further optimized using a discriminative method of estimation as described by Sonnhammer et al. (1998)....

    [...]

  • ...This compares well to other methods, the best of which use multiple alignments to achieve the same level of accuracy (Rost et al., 1996; Tusnady & Simon, 1998), see Sonnhammer et al. (1998) for comparisons ....

    [...]

  • ...Here we describe a new method, TMHMM, based on a hidden Markov model (HMM) approach (a preliminary description of TMHMM has been published by Sonnhammer et al., 1998)....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a new strategy for predicting the topology of bacterial inner membrane proteins is proposed on the basis of hydrophobicity analysis, automatic generation of a set of possible topologies and ranking of these according to the positive inside rule.

1,661 citations

Trending Questions (1)
What type membrane protein predicted by tmhmm?

The paper does not explicitly mention the specific type of membrane protein predicted by TMHMM.