scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models.

01 Jan 2013-Human Mutation (Wiley-Blackwell)-Vol. 34, Iss: 1, pp 57-65
TL;DR: The Functional Analysis Through Hidden Markov Models (FATHMM) software and server is described: a species‐independent method with optional species‐specific weightings for the prediction of the functional effects of protein missense variants, demonstrating that FATHMM can be efficiently applied to high‐throughput/large‐scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations.
Abstract: The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole-genome/whole-exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever-increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state-of-the-art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high-throughput/large-scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web-based implementation of FATHMM, including a high-throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.
Citations
More filters
Journal ArticleDOI
TL;DR: The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.
Abstract: The Ensembl Variant Effect Predictor is a powerful toolset for the analysis, annotation, and prioritization of genomic variants in coding and non-coding regions. It provides access to an extensive collection of genomic annotation, with a variety of interfaces to suit different requirements, and simple options for configuring and extending analysis. It is open source, free to use, and supports full reproducibility of results. The Ensembl Variant Effect Predictor can simplify and accelerate variant interpretation in a wide range of study designs.

4,658 citations


Cites methods from "Predicting the functional, molecula..."

  • ...GWAVA [51], CADD [52], and FATHMM-MKL [53] plugins are also available, which integrate genomic and epigenomic factors to grade and prioritize non-coding variants....

    [...]

  • ...Other pathogenicity predictor scores such as Condel [42], FATHMM [43], and MutationTaster [44] are available for human data via VEP plugins (Table 4)....

    [...]

Journal ArticleDOI
05 Apr 2018-Cell
TL;DR: This study reports a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations, identifying 299 driver genes with implications regarding their anatomical sites and cancer/cell types.

1,623 citations


Cites background or methods from "Predicting the functional, molecula..."

  • ...…designed to distinguish between driver and passenger somatic mutations (CHASM [Wong et al., 2011], CanDrA [Carter et al., 2013], fathmm [Shihab et al., 2013] and transFIC [Gonzalez-Perez et al., 2012]) and four tools that leverage information from protein structures (HotSpot3D [Niu et…...

    [...]

  • ...…Ng and Henikoff, 2002 http://sift.jcvi.org PolyPhen2 Adzhubei et al., 2013 http://genetics.bwh.harvard.edu/pph2/ fathmm Shihab et al., 2013 http://fathmm.biocompute.org.uk transFIC Gonzalez-Perez et al., 2012 http://bbglab.irbbarcelona.org/transfic/home CTAT-score This…...

    [...]

  • ...…version was run under default parameters using the ‘‘general’’ cancer type database. fathmm Functional Analysis Through Hidden Markov Models (fathmm) (Shihab et al., 2013) uses Hidden Markov modeling to represent the protein domain shared across human proteins and to estimate the functional…...

    [...]

  • ...…[Ng and Henikoff, 2002], PolyPhen2 [Adzhubei et al., 2013], MutationAssessor [Reva et al., 2011], transFIC [Gonzalez-Perez et al., 2012], fathmm [Shihab et al., 2013], CHASM [Carter et al., 2009], CanDrA [Mao et al., 2013] and VEST [Carter et al., 2013]), 4 structure-based (HotSpot3D [Niu et…...

    [...]

Journal ArticleDOI
TL;DR: This work developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, LRT, GERP, SiPhy, phyloP, and phastCons.
Abstract: The vast majority of coding variants are rare, and assessment of the contribution of rare variants to complex traits is hampered by low statistical power and limited functional data. Improved methods for predicting the pathogenicity of rare coding variants are needed to facilitate the discovery of disease variants from exome sequencing studies. We developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP, SiPhy, phyloP, and phastCons. REVEL was trained with recently discovered pathogenic and rare neutral missense variants, excluding those previously used to train its constituent tools. When applied to two independent test sets, REVEL had the best overall performance (p −12 ) as compared to any individual tool and seven ensemble methods: MetaSVM, MetaLR, KGGSeq, Condel, CADD, DANN, and Eigen. Importantly, REVEL also had the best performance for distinguishing pathogenic from rare neutral variants with allele frequencies

1,295 citations

Journal ArticleDOI
TL;DR: The Human Gene Mutation Database (HGMD®) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease.
Abstract: The Human Gene Mutation Database (HGMD®) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease. By June 2013, the database contained over 141,000 different lesions detected in over 5,700 different genes, with new mutation entries currently accumulating at a rate exceeding 10,000 per annum. HGMD was originally established in 1996 for the scientific study of mutational mechanisms in human genes. However, it has since acquired a much broader utility as a central unified disease-oriented mutation repository utilized by human molecular geneticists, genome scientists, molecular biologists, clinicians and genetic counsellors as well as by those specializing in biopharmaceuticals, bioinformatics and personalized genomics. The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academic institutions/non-profit organizations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via BIOBASE GmbH.

1,204 citations


Cites methods from "Predicting the functional, molecula..."

  • ...…also been used by a number of different groups to aid the development of post-NGS variant interpretation algorithms including MutPred (Li et al. 2009), PROVEAN (Choi et al. 2012), CAROL (Lopes et al. 2012), CRAVAT (Douville et al. 2013), NEST (Carter et al. 2013) and FATHMM (Shihab et al. 2013)....

    [...]

  • ...2013) and FATHMM (Shihab et al. 2013)....

    [...]

  • ...HGMD has also been used by a number of different groups to aid the development of post-NGS variant interpretation algorithms including MutPred (Li et al. 2009), PROVEAN (Choi et al. 2012), CAROL (Lopes et al. 2012), CRAVAT (Douville et al. 2013), NEST (Carter et al. 2013) and FATHMM (Shihab et al. 2013)....

    [...]

Journal ArticleDOI
TL;DR: The Human Gene Mutation Database constitutes de facto the central unified gene/disease-oriented repository of heritable mutations causing human genetic disease used worldwide by researchers, clinicians, diagnostic laboratories and genetic counsellors, and is an essential tool for the annotation of next-generation sequencing data.
Abstract: The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that underlie, or are closely associated with human inherited disease. At the time of writing (March 2017), the database contained in excess of 203,000 different gene lesions identified in over 8000 genes manually curated from over 2600 journals. With new mutation entries currently accumulating at a rate exceeding 17,000 per annum, HGMD represents de facto the central unified gene/disease-oriented repository of heritable mutations causing human genetic disease used worldwide by researchers, clinicians, diagnostic laboratories and genetic counsellors, and is an essential tool for the annotation of next-generation sequencing data. The public version of HGMD (http://www.hgmd.org) is freely available to registered users from academic institutions and non-profit organisations whilst the subscription version (HGMD Professional) is available to academic, clinical and commercial users under license via QIAGEN Inc.

1,053 citations


Additional excerpts

  • ...HGMD has also been used by a number of different groups to aid the development of a wide variety of post-NGS variant interpretation and exome prioritisation algorithms including MutPred (Li et al. 2009), MutPred Splice (Mort et al. 2014), PROVEAN (Choi et al. 2012), CAROL (Lopes et al. 2012), regSNPs (Teng et al. 2012), CRAVAT (Douville et al. 2013), NEST (Carter et al. 2013), FATHMM (Shihab et al. 2013), FATHMM-MKL (Shihab et al. 2015), PinPor (Zhang et al. 2014), MutationTaster2 (Schwarz et al. 2014), Phen-Gen (Javed et al. 2014), VEST-indel (Douville et al. 2016), Gene Damage Index (Itan et al. 2015), DDIGin (Folkman et al. 2015), RSVP (Peterson et al. 2016), ExonImpact (Li et al. 2017), IntSplice (Shibata et al. 2016), snvForest (Wu et al. 2015), IMHOTEP (Knecht et al. 2017) and M-CAP (Jagadeesh et al. 2016)....

    [...]

  • ...2013), FATHMM (Shihab et al. 2013), FATHMM-MKL (Shihab et al....

    [...]

  • ...…PROVEAN (Choi et al. 2012), CAROL (Lopes et al. 2012), regSNPs (Teng et al. 2012), CRAVAT (Douville et al. 2013), NEST (Carter et al. 2013), FATHMM (Shihab et al. 2013), FATHMM-MKL (Shihab et al. 2015), PinPor (Zhang et al. 2014), MutationTaster2 (Schwarz et al. 2014), Phen-Gen (Javed et al.…...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations


"Predicting the functional, molecula..." refers background or methods in this paper

  • ...Traditionally, the BLAST range of pairwise alignment [Altschul et al., 1990] and sequence profile algorithms [Altschul et al....

    [...]

  • ...Traditionally, the BLAST range of pairwise alignment [Altschul et al., 1990] and sequence profile algorithms [Altschul et al., 1997] have been used to search large sequence databases for homologous proteins falling within a predefined similarity threshold....

    [...]

Journal ArticleDOI
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Abstract: The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSIBLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

70,111 citations


"Predicting the functional, molecula..." refers methods in this paper

  • ...Traditionally, the BLAST range of pairwise alignment [Altschul et al., 1990] and sequence profile algorithms [Altschul et al., 1997] have been used to search large sequence databases for homologous proteins falling within a predefined similarity threshold....

    [...]

  • ..., 1990] and sequence profile algorithms [Altschul et al., 1997] have been used to search large sequence databases for homologous proteins falling within a predefined similarity threshold....

    [...]

Journal ArticleDOI
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Abstract: Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.

35,225 citations


"Predicting the functional, molecula..." refers background in this paper

  • ...For example, the molecular consequences of AASs are statistically inferred by mapping SUPERFAMILY [Gough et al., 2001] HMMs onto the Gene Ontology [Ashburner et al., 2000]....

    [...]

Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations