Predicting the functional, molecular, and phenotypic consequences of amino acid substitutions using hidden Markov models.
Hashem A. Shihab,Julian Gough,David Neil Cooper,Peter D. Stenson,Gary L A Barker,Keith J. Edwards,Ian N. M. Day,Tom R. Gaunt +7 more
Reads0
Chats0
TLDR
The Functional Analysis Through Hidden Markov Models (FATHMM) software and server is described: a species‐independent method with optional species‐specific weightings for the prediction of the functional effects of protein missense variants, demonstrating that FATHMM can be efficiently applied to high‐throughput/large‐scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations.Abstract:
The rate at which nonsynonymous single nucleotide polymorphisms (nsSNPs) are being identified in the human genome is increasing dramatically owing to advances in whole-genome/whole-exome sequencing technologies. Automated methods capable of accurately and reliably distinguishing between pathogenic and functionally neutral nsSNPs are therefore assuming ever-increasing importance. Here, we describe the Functional Analysis Through Hidden Markov Models (FATHMM) software and server: a species-independent method with optional species-specific weightings for the prediction of the functional effects of protein missense variants. Using a model weighted for human mutations, we obtained performance accuracies that outperformed traditional prediction methods (i.e., SIFT, PolyPhen, and PANTHER) on two separate benchmarks. Furthermore, in one benchmark, we achieve performance accuracies that outperform current state-of-the-art prediction methods (i.e., SNPs&GO and MutPred). We demonstrate that FATHMM can be efficiently applied to high-throughput/large-scale human and nonhuman genome sequencing projects with the added benefit of phenotypic outcome associations. To illustrate this, we evaluated nsSNPs in wheat (Triticum spp.) to identify some of the important genetic variants responsible for the phenotypic differences introduced by intense selection during domestication. A Web-based implementation of FATHMM, including a high-throughput batch facility and a downloadable standalone package, is available at http://fathmm.biocompute.org.uk.read more
Citations
More filters
Journal ArticleDOI
Characterization of intellectual disability and autism comorbidity through gene panel sequencing
Maria Cristina Aspromonte,Mariagrazia Bellini,Alessandra Gasparini,Marco Carraro,Elisa Bettella,Roberta Polli,Federica Cesca,Stefania Bigoni,Stefania Boni,Ombretta Carlet,Susanna Negrin,Isabella Mammi,Donatella Milani,Angela Peron,Angela Peron,Stefano Sartori,Irene Toldo,Fiorenza Soli,Licia Turolla,Franco Stanzial,Francesco Benedicenti,Cristina Marino-Buslje,Silvio C. E. Tosatto,Silvio C. E. Tosatto,Alessandra Murgia,Emanuela Leonardi +25 more
TL;DR: A low‐cost next‐generation sequencing gene panel that has been transferred into clinical practice, replacing single disease‐gene analyses for the early diagnosis of individuals with ID/ASD and supports the pathogenic role of genes recently proposed to be involved in ASD.
Journal ArticleDOI
Targeted capture massively parallel sequencing analysis of LCIS and invasive lobular cancer: Repertoire of somatic genetic alterations and clonal relationships.
Rita A. Sakr,Michail Schizas,Jose V. Scarpa Carniello,Charlotte K.Y. Ng,Salvatore Piscuoglio,Dilip Giri,Victor P. Andrade,Marina De Brot,Raymond S. Lim,Russell Towers,Britta Weigelt,Jorge S. Reis-Filho,Tari A. King +12 more
TL;DR: This work sought to define the repertoire of somatic genetic alterations in pure LCIS and in synchronousLCIS and ILC using targeted massively parallel sequencing.
Journal ArticleDOI
HIPred: an integrative approach to predicting haploinsufficient genes.
TL;DR: A machine learning approach that integrates genomic and evolutionary information from ENSEMBL, with functional annotations from the Encyclopaedia of DNA Elements consortium and the NIH Roadmap Epigenomics Project to predict haploinsufficiency, without the study bias described earlier is described.
Journal ArticleDOI
Computational approaches to interpreting genomic sequence variation
TL;DR: The main current bioinformatics approaches to identifying functional variation are discussed, including widely used algorithms for coding variation such as SIFT and PolyPhen and also novel techniques for interpreting variation across the genome.
Journal ArticleDOI
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome
Michael Ferlaino,Mark F. Rogers,Hashem A. Shihab,Matthew Mort,David Neil Cooper,Tom R. Gaunt,Colin Campbell +6 more
TL;DR: FATHMM-indel can accurately predict the functional impact and prioritise small indels throughout the whole non-coding genome, significantly outperforms CADD and GAVIN, state of the art models in assessing the pathogenic impact of non-Coding variants.
References
More filters
Journal ArticleDOI
Basic Local Alignment Search Tool
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
Journal ArticleDOI
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.
Stephen F. Altschul,Thomas L. Madden,Alejandro A. Schäffer,Jinghui Zhang,Zheng Zhang,Webb Miller,David J. Lipman +6 more
TL;DR: A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original.
Journal ArticleDOI
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI
The Pfam protein families database
Marco Punta,Penny Coggill,Ruth Y. Eberhardt,Jaina Mistry,John Tate,Chris Boursnell,Ningze Pang,Kristoffer Forslund,Goran Ceric,Jody Clements,Andreas Heger,Liisa Holm,Erik L. L. Sonnhammer,Sean R. Eddy,Alex Bateman,Robert D. Finn +15 more
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.