Predicting the Functional Effect of Amino Acid Substitutions and Indels
TLDR
A new algorithm, PROVEAN (Protein Variation Effect Analyzer), is developed, which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions.Abstract:
As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.read more
Citations
More filters
Journal ArticleDOI
Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
Sue Richards,Nazneen Aziz,Nazneen Aziz,Sherri J. Bale,David P. Bick,Soma Das,Julie M. Gastier-Foster,Wayne W. Grody,Madhuri Hegde,Elaine Lyon,Elaine B. Spector,Karl V. Voelkerding,Heidi L. Rehm +12 more
TL;DR: Because of the increased complexity of analysis and interpretation of clinical genetic testing described in this report, the ACMG strongly recommends thatclinical molecular genetic testing should be performed in a Clinical Laboratory Improvement Amendments–approved laboratory, with results interpreted by a board-certified clinical molecular geneticist or molecular genetic pathologist or the equivalent.
Journal ArticleDOI
CADD: predicting the deleteriousness of variants throughout the human genome.
Philipp Rentzsch,Daniela Witten,Gregory M. Cooper,Jay Shendure,Martin Kircher,Martin Kircher +5 more
TL;DR: The latest updates to CADD are reviewed, including the most recent version, 1.4, which supports the human genome build GRCh38, and also present updates to the website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications.
Journal ArticleDOI
PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels
Yongwook Choi,Agnes P. Chan +1 more
TL;DR: A web server to predict the functional effect of single or multiple amino acid substitutions, insertions and deletions using the prediction tool PROVEAN, which provides rapid analysis of protein variants from any organisms, and also supports high-throughput analysis for human and mouse variants at both the genomic and protein levels.
Journal ArticleDOI
REVEL: An Ensemble Method for Predicting the Pathogenicity of Rare Missense Variants
Nilah M. Ioannidis,Joseph H. Rothstein,Joseph H. Rothstein,Vikas Pejaver,Sumit Middha,Shannon K. McDonnell,Saurabh Baheti,Anthony M. Musolf,Qing Li,Emily R. Holzinger,Danielle M. Karyadi,Lisa A. Cannon-Albright,Craig C. Teerlink,Janet L. Stanford,William B. Isaacs,Jianfeng Xu,Kathleen A. Cooney,Kathleen A. Cooney,Ethan M. Lange,Johanna Schleutker,John D. Carpten,Isaac J. Powell,Olivier Cussenot,Geraldine Cancel-Tassin,Graham G. Giles,Graham G. Giles,Robert J. MacInnis,Robert J. MacInnis,Christiane Maier,Chih-Lin Hsieh,Fredrik Wiklund,William J. Catalona,William D. Foulkes,Diptasri Mandal,Rosalind A. Eeles,Zsofia Kote-Jarai,Carlos Bustamante,Daniel J. Schaid,Trevor Hastie,Elaine A. Ostrander,Joan E. Bailey-Wilson,Predrag Radivojac,Stephen N. Thibodeau,Alice S. Whittemore,Weiva Sieh,Weiva Sieh +45 more
TL;DR: This work developed REVEL (rare exome variant ensemble learner), an ensemble method for predicting the pathogenicity of missense variants on the basis of individual tools: MutPred, FATHMM, VEST, PolyPhen, SIFT, PROVEAN, MutationAssessor, LRT, GERP, SiPhy, phyloP, and phastCons.
Journal ArticleDOI
The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine
Peter D. Stenson,Matthew Mort,Edward V. Ball,Katy Shaw,Andrew David Phillips,David Neil Cooper +5 more
TL;DR: The Human Gene Mutation Database (HGMD®) is a comprehensive collection of germline mutations in nuclear genes that underlie, or are associated with, human inherited disease.
References
More filters
Journal ArticleDOI
A general method applicable to the search for similarities in the amino acid sequence of two proteins
TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
Journal ArticleDOI
A method and server for predicting damaging missense mutations.
Ivan Adzhubei,Steffen Schmidt,Leonid Peshkin,Vasily Ramensky,Anna Gerasimova,Peer Bork,Alexey S. Kondrashov,Shamil R. Sunyaev +7 more
TL;DR: A new method and the corresponding software tool, PolyPhen-2, which is different from the early tool polyPhen1 in the set of predictive features, alignment pipeline, and the method of classification is presented and performance, as presented by its receiver operating characteristic curves, was consistently superior.
Journal ArticleDOI
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences
Weizhong Li,Adam Godzik +1 more
TL;DR: Cd-hit-2d compares two protein datasets and reports similar matches between them; cd- Hit-est clusters a DNA/RNA sequence database and cd- hit-est-2D compares two nucleotide datasets.
Journal ArticleDOI
A Map of Human Genome Variation From Population-Scale Sequencing
Gonçalo R. Abecasis,David Altshuler,David Altshuler,Adam Auton,Lisa D Brooks,Richard Durbin,Richard A. Gibbs,Matthew E. Hurles,Gil McVean +8 more
TL;DR: The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype as mentioned in this paper, and the results of the pilot phase of the project, designed to develop and compare different strategies for genomewide sequencing with high-throughput platforms.
Journal ArticleDOI
Comprehensive genomic characterization defines human glioblastoma genes and core pathways
Roger E. McLendon,Allan H. Friedman,Darrell D. Bigner,Erwin G. Van Meir,Daniel J. Brat,Gena M. Mastrogianakis,Jeffrey J. Olson,Tom Mikkelsen,Norman L. Lehman,Kenneth Aldape,W. K. Alfred Yung,Oliver Bogler,John N. Weinstein,Scott R. VandenBerg,Mitchel S. Berger,Michael D. Prados,Donna M. Muzny,Margaret Morgan,Steve Scherer,Aniko Sabo,Lynn Nazareth,Lora Lewis,Otis Hall,Yiming Zhu,Yanru Ren,Omar Alvi,Jiqiang Yao,Alicia Hawes,Shalini N. Jhangiani,Gerald R. Fowler,Anthony San Lucas,Christie Kovar,Andrew Cree,Huyen Dinh,Jireh Santibanez,Vandita Joshi,Manuel L. Gonzalez-Garay,Christopher A. Miller,Aleksandar Milosavljevic,Lawrence A. Donehower,David A. Wheeler,Richard A. Gibbs,Kristian Cibulskis,Carrie Sougnez,Timothy Fennell,Scott Mahan,Jane Wilkinson,Liuda Ziaugra,Robert C. Onofrio,Toby Bloom,Rob Nicol,Kristin G. Ardlie,Jennifer Baldwin,Stacey Gabriel,Eric S. Lander,Eric S. Lander,Li Ding,Robert S. Fulton,Michael D. McLellan,John W. Wallis,David E. Larson,Xiaoqi Shi,Rachel Abbott,Lucinda Fulton,Ken Chen,Daniel C. Koboldt,Michael C. Wendl,Rick Meyer,Yuzhu Tang,Ling Lin,John R. Osborne,Brian H. Dunford-Shore,Tracie L. Miner,Kim D. Delehaunty,Chris Markovic,Gary W. Swift,William Courtney,Craig Pohl,Scott Abbott,Amy Hawkins,Shin Leong,Carrie A. Haipek,Heather Schmidt,Maddy Wiechert,Tammi L. Vickery,Sacha Scott,David J. Dooling,Asif T. Chinwalla,George M. Weinstock,Elaine R. Mardis,Richard K. Wilson,Gad Getz,Wendy Winckler,Roel G.W. Verhaak,Michael S. Lawrence,Michael J. T. O’Kelly,James A. Robinson,Gabriele Alexe,Rameen Beroukhim,Scott L. Carter,Derek Y. Chiang,Josh Gould,Supriya Gupta,Josh Korn,Craig H. Mermel,Jill P. Mesirov,Stefano Monti,Huy V. Nguyen,Melissa Parkin,Michael R. Reich,Nicolas Stransky,Barbara A. Weir,Levi A. Garraway,Todd R. Golub,Matthew Meyerson,Lynda Chin,Alexei Protopopov,Jianhua Zhang,Ilana Perna,Sandy Aronson,Narayanan Sathiamoorthy,Georgia Ren,Jun Yao,W. Ruprecht Wiedemeyer,Hyun Soo Kim,Won Kong Sek,Yonghong Xiao,Isaac S. Kohane,Jon G. Seidman,Peter J. Park,Raju Kucherlapati,Peter W. Laird,Leslie Cope,James G. Herman,Daniel J. Weisenberger,Fei Pan,David Van Den Berg,Leander Van Neste,Mi Yi Joo,Kornel E. Schuebel,Stephen B. Baylin,Devin Absher,Jun Li,Audrey Southwick,Shannon T. Brady,Amita Aggarwal,Tisha Chung,Gavin Sherlock,James D. Brooks,Richard M. Myers,Paul T. Spellman,Elizabeth Purdom,Lakshmi Jakkula,Anna Lapuk,Henry Marr,Shannon Dorton,Gi Choi Yoon,Ju Han,Amrita Ray,Victoria Wang,Steffen Durinck,Mark D. Robinson,Nicholas J. Wang,Karen Vranizan,Vivian Peng,Eric Van Name,Gerald V. Fontenay,John Ngai,John G. Conboy,Bahram Parvin,Heidi S. Feiler,Terence P. Speed,Terence P. Speed,Joe W. Gray,Cameron Brennan,Nicholas D. Socci,Adam B. Olshen,Barry S. Taylor,Barry S. Taylor,Alex E. Lash,Nikolaus Schultz,Boris Reva,Yevgeniy Antipin,Alexey Stukalov,Benjamin Gross,Ethan Cerami,Qing Wang Wei,Li-Xuan Qin,Venkatraman E. Seshan,Liliana Villafania,Magali Cavatore,Laetitia Borsu,Agnes Viale,William L. Gerald,Chris Sander,Marc Ladanyi,Charles M. Perou,D. Neil Hayes,Michael D. Topal,Katherine A. Hoadley,Yuan Qi,Sai Balu,Yan Shi,Junyuan Wu,Robert Penny,Michael L. Bittner,Troy Shelton,Elizabeth Lenkiewicz,Scott Morris,Debbie Beasley,Sheri Sanders,Ari B. Kahn,Robert Sfeir,Jessica Chen,David Nassau,Larry Feng,Erin Hickey,Anna D. Barker,Daniela S. Gerhard,Joseph G. Vockley,Carolyn C. Compton,Jim Vaught,Peter Fielding,Martin L. Ferguson,Carl F. Schaefer,Jinghui Zhang,Subhashree Madhavan,Kenneth H. Buetow,Francis S. Collins,Peter J. Good,Mark S. Guyer,Brad Ozenberger,Jane Peterson,Elizabeth J. Thomson +233 more
TL;DR: The interim integrative analysis of DNA copy number, gene expression and DNA methylation aberrations in 206 glioblastomas reveals a link between MGMT promoter methylation and a hypermutator phenotype consequent to mismatch repair deficiency in treated gliobeasts, demonstrating that it can rapidly expand knowledge of the molecular basis of cancer.
Related Papers (5)
SIFT: predicting amino acid changes that affect protein function
Pauline C. Ng,Steven Henikoff +1 more
Analysis of protein-coding genetic variation in 60,706 humans
Monkol Lek,Konrad J. Karczewski,Konrad J. Karczewski,Eric Vallabh Minikel,Eric Vallabh Minikel,Kaitlin E. Samocha,Eric Banks,Timothy Fennell,Anne H. O’Donnell-Luria,Anne H. O’Donnell-Luria,Anne H. O’Donnell-Luria,James S. Ware,Andrew J. Hill,Andrew J. Hill,Andrew J. Hill,Beryl B. Cummings,Beryl B. Cummings,Taru Tukiainen,Taru Tukiainen,Daniel P. Birnbaum,Jack A. Kosmicki,Laramie E. Duncan,Laramie E. Duncan,Karol Estrada,Karol Estrada,Fengmei Zhao,Fengmei Zhao,James Zou,Emma Pierce-Hoffman,Emma Pierce-Hoffman,Joanne Berghout,David Neil Cooper,Nicole A. Deflaux,Mark A. DePristo,Ron Do,Jason Flannick,Jason Flannick,Menachem Fromer,Laura D. Gauthier,Jackie Goldstein,Jackie Goldstein,Namrata Gupta,Daniel P. Howrigan,Daniel P. Howrigan,Adam Kiezun,Mitja I. Kurki,Mitja I. Kurki,Ami Levy Moonshine,Pradeep Natarajan,Lorena Orozco,Gina M. Peloso,Gina M. Peloso,Ryan Poplin,Manuel A. Rivas,Valentin Ruano-Rubio,Samuel A. Rose,Douglas M. Ruderfer,Khalid Shakir,Peter D. Stenson,Christine Stevens,Brett Thomas,Brett Thomas,Grace Tiao,María Teresa Tusié-Luna,Ben Weisburd,Hong-Hee Won,Dongmei Yu,David Altshuler,David Altshuler,Diego Ardissino,Michael Boehnke,John Danesh,Stacey Donnelly,Roberto Elosua,Jose C. Florez,Jose C. Florez,Stacey Gabriel,Gad Getz,Gad Getz,Stephen J. Glatt,Christina M. Hultman,Sekar Kathiresan,Markku Laakso,Steven A. McCarroll,Steven A. McCarroll,Mark I. McCarthy,Mark I. McCarthy,Dermot P.B. McGovern,Ruth McPherson,Benjamin M. Neale,Benjamin M. Neale,Aarno Palotie,Shaun Purcell,Danish Saleheen,Jeremiah M. Scharf,Pamela Sklar,Patrick F. Sullivan,Patrick F. Sullivan,Jaakko Tuomilehto,Ming T. Tsuang,Hugh Watkins,Hugh Watkins,James G. Wilson,Mark J. Daly,Mark J. Daly,Daniel G. MacArthur,Daniel G. MacArthur +106 more