Predicting Functional Effect of Human Missense Mutations Using PolyPhen-2

doi:10.1002/0471142905.HG0720S76

Home
/
Papers
/
Predicting Functional Effect of Human Missense Mutations Using PolyPhen-2

Journal Article•DOI•

Predicting Functional Effect of Human Missense Mutations Using PolyPhen-2

Ivan Adzhubei¹, Daniel M. Jordan², Daniel M. Jordan¹, Shamil R. Sunyaev¹•Institutions (2)

Brigham and Women's Hospital¹, Harvard University²

01 Jan 2013-Current protocols in human genetics (NIH Public Access)-Vol. 76, Iss: 1

TL;DR: PolyPhen‐2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations.

read less

Abstract: PolyPhen-2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. It performs functional annotation of single-nucleotide polymorphisms (SNPs), maps coding SNPs to gene transcripts, extracts protein sequence annotations and structural attributes, and builds conservation profiles. It then estimates the probability of the missense mutation being damaging based on a combination of all these properties. PolyPhen-2 features include a high-quality multiple protein sequence alignment pipeline and a prediction method employing machine-learning classification. The software also integrates the UCSC Genome Browser's human genome annotations and MultiZ multiple alignments of vertebrate genomes with the human genome. PolyPhen-2 is capable of analyzing large volumes of data produced by next-generation sequencing projects, thanks to built-in support for high-performance computing environments like Grid Engine and Platform LSF.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Comprehensive Characterization of Cancer Driver Genes and Mutations.

[...]

Matthew H. Bailey¹, Collin Tokheim², Eduard Porta-Pardo³, Sohini Sengupta¹, Denis Bertrand⁴, Amila Weerasinghe¹, Antonio Colaprico⁵, Michael C. Wendl¹, Jaegil Kim⁶, Brendan Reardon⁷, Patrick Kwok Shing Ng⁸, Kang Jin Jeong⁸, Song Cao¹, Zixing Wang⁸, Jianjiong Gao⁹, Qingsong Gao¹, Fang Wang⁸, Eric Minwei Liu¹⁰, Loris Mularoni, Carlota Rubio-Perez, Niranjan Nagarajan⁴, Isidro Cortes-Ciriano¹¹, Daniel Cui Zhou¹, Wen-Wei Liang¹, Julian M. Hess⁶, Venkata Yellapantula¹, David Tamborero, Abel Gonzalez-Perez, Chayaporn Suphavilai⁴, Jia Yu Ko⁴, Ekta Khurana¹⁰, Peter J. Park⁷, Eliezer M. Van Allen⁶, Eliezer M. Van Allen⁷, Han Liang⁸, Michael S. Lawrence⁷, Adam Godzik³, Nuria Lopez-Bigas¹², Josh Stuart¹³, David A. Wheeler¹⁴, Gad Getz⁶, Ken Chen⁸, Alexander J. Lazar⁸, Gordon B. Mills⁸, Rachel Karchin², Li Ding¹ - Show less +42 more•Institutions (14)

Washington University in St. Louis¹, Johns Hopkins University², Discovery Institute³, Genome Institute of Singapore⁴, University of Miami⁵, Broad Institute⁶, Harvard University⁷, University of Texas MD Anderson Cancer Center⁸, Memorial Sloan Kettering Cancer Center⁹, Cornell University¹⁰, University of Cambridge¹¹, Catalan Institution for Research and Advanced Studies¹², University of California, Santa Cruz¹³, Baylor College of Medicine¹⁴

05 Apr 2018-Cell

TL;DR: This study reports a PanCancer and PanSoftware analysis spanning 9,423 tumor exomes (comprising all 33 of The Cancer Genome Atlas projects) and using 26 computational tools to catalog driver genes and mutations, identifying 299 driver genes with implications regarding their anatomical sites and cancer/cell types.

...read moreread less

1,623 citations

Cites background or methods from "Predicting Functional Effect of Hum..."

...PolyPhen2 Polymorphism Phenotyping v2 (PolyPhen2) (Adzhubei et al., 2013) is a machine learning approach that computes the functional impact of missense mutations....
[...]
...The collection was comprised of 8 mutation-level algorithms (SIFT [Ng and Henikoff, 2002], PolyPhen2 [Adzhubei et al., 2013], MutationAssessor [Reva et al., 2011], transFIC [Gonzalez-Perez et al., 2012], fathmm [Shihab et al., 2013], CHASM [Carter et al., 2009], CanDrA [Mao et al., 2013] and VEST [Carter et al., 2013]), 4 structure-based (HotSpot3D [Niu et al., 2016], HotMAPS [Tokheim et al., 2016a], 3DHotSpots.org [Gao et al., 2017] and e-Driver3D [Porta-Pardo et al., 2015]), 2 network and –omic integration tools (OncoIMPACT [Bertrand et al., 2015], DriverNet [Bashashati et al., 2012]), and 2 algorithms to identify clinically-actionable events (PHIAL [Van Allen et al., 2014] and DEPO [S.Q. Sun, R.J. Mashl, S. Sengupta, A.D. Scott, W. Wang, P. Batra, L.-B. Wang, M.A. Wyczalkowski, L. Ding, unpublished data])....
[...]
...We utilized four tools that distinguish pathogenic mutations from benign polymorphisms on a population level (SIFT [Ng and Henikoff, 2002], PolyPhen2 [Adzhubei et al., 2013], VEST (version 3 scores) [Carter et al., 2013] and MutationAssessor [Reva et al., 2011]), four tools specifically designed to distinguish between driver and passenger somatic mutations (CHASM [Wong et al., 2011], CanDrA [Carter et al., 2013], fathmm [Shihab et al., 2013] and transFIC [Gonzalez-Perez et al., 2012]) and four tools that leverage information from protein structures (HotSpot3D [Niu et al., 2016], HotMAPS [Tokheim et al., 2016a], 3DHotSpot.org [Gao et al., 2017] and e-Driver3D [Porta-Pardo et al., 2015])....
[...]
...…Reva et al., 2011 http://mutationassessor.org/r3/ SIFT Ng and Henikoff, 2002 http://sift.jcvi.org PolyPhen2 Adzhubei et al., 2013 http://genetics.bwh.harvard.edu/pph2/ fathmm Shihab et al., 2013 http://fathmm.biocompute.org.uk transFIC Gonzalez-Perez et…...
[...]
...REAGENT or RESOURCE SOURCE IDENTIFIER Deposited Data Public MC3 MAF Ellrott et al., 2018 https://gdc.cancer.gov/about-data/publications/mc3-2017 Clinical Data Liu et al., 2018 https://gdc.cancer.gov/about-data/publications/pancanatlas Target Drug Database - Phial Van Allen et al., 2014 https://github.com/vanallenlab/2017-tcga-mc3_phial DEPO S.S., L.D., S.Q. Sun, R.J. Mashl, A.D. Scott, W. Wang, P. Batra, L.-B. Wang, and M.A. Wyczalkowski, unpublished data http://depo-dinglab.ddns.net OncoKB Chakravarty et al., 2017 http://oncokb.org Mutation Validation Ng et al., 2018 N/A Software and Algorithms 20/20+ Tokheim et al., 2016b https://github.com/KarchinLab/2020plus MutSig2CV Lawrence et al., 2014 http://archive.broadinstitute.org/cancer/cga/mutsig_run MuSiC2 Dees et al., 2012 https://github.com/ding-lab/MuSiC2 OncodriveCLUST Tamborero et al., 2013a http://bg.upf.edu/group/projects/oncodrive-clust.php OncodriveFML Mularoni et al., 2016 http://bbglab.irbbarcelona.org/oncodrivefml/home ActiveDriver Reimand and Bader, 2013 http://individual.utoronto.ca/reimand/ActiveDriver/ CompositeDriver This paper https://github.com/khuranalab/CompositeDriver HotMAPS Tokheim et al., 2016a https://github.com/KarchinLab/HotMAPS CHASM Carter et al., 2009 http://www.cravat.us/CRAVAT/ VEST Carter et al., 2013 http://www.cravat.us/CRAVAT/ e-Driver Porta-Pardo and Godzik, 2014 https://github.com/eduardporta/e-Driver CanDrA Mao et al., 2013 http://bioinformatics.mdanderson.org/main/CanDrA HotSpot3D Niu et al., 2016 https://github.com/ding-lab/hotspot3d 3DHotSpots.org Gao et al., 2017 http://3dhotspots.org/3d/ e-Driver3D Porta-Pardo et al., 2015 https://github.com/eduardporta/e-Driver DriverNET Bashashati et al., 2012 http://www.shahlab.ca OncoIMPACT Bertrand et al., 2015 https://github.com/CSB5/OncoIMPACT MutationAssessor Reva et al., 2011 http://mutationassessor.org/r3/ SIFT Ng and Henikoff, 2002 http://sift.jcvi.org PolyPhen2 Adzhubei et al., 2013 http://genetics.bwh.harvard.edu/pph2/ fathmm Shihab et al., 2013 http://fathmm.biocompute.org.uk transFIC Gonzalez-Perez et al., 2012 http://bbglab.irbbarcelona.org/transfic/home CTAT-score This Paper https://gdc.cancer.gov MSIsensor Niu et al., 2014 https://github.com/ding-lab/msisensor...
[...]

Journal Article•DOI•

Standards and Guidelines for the Interpretation and Reporting of Sequence Variants in Cancer: A Joint Consensus Recommendation of the Association for Molecular Pathology, American Society of Clinical Oncology, and College of American Pathologists

[...]

Marilyn M. Li¹, Michael B. Datto², Eric J. Duncavage³, Shashikant Kulkarni⁴, Neal I. Lindeman⁵, Somak Roy⁶, Apostolia Maria Tsimberidou⁷, Cindy L. Vnencak-Jones⁸, Daynna J. Wolff⁹, Anas Younes¹⁰, Marina N. Nikiforova⁶ - Show less +7 more•Institutions (10)

Children's Hospital of Philadelphia¹, Duke University², Washington University in St. Louis³, Baylor University⁴, Brigham and Women's Hospital⁵, University of Pittsburgh⁶, University of Texas MD Anderson Cancer Center⁷, Vanderbilt University Medical Center⁸, Medical University of South Carolina⁹, Memorial Sloan Kettering Cancer Center¹⁰

01 Jan 2017-The Journal of Molecular Diagnostics

TL;DR: A four-tiered system to categorize somatic sequence variations based on their clinical significances is proposed, with variants with strong clinical significance and variants with potential clinical significance in tier I; tier III, variants of unknown clinical significance; and tier IV, variants deemed benign or likely benign.

...read moreread less

1,113 citations

Posted Content•DOI•

Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences

[...]

Alexander Rives¹, Siddharth Goyal², Joshua Meier², Demi Guo², Myle Ott², C. Lawrence Zitnick², Jerry Ma², Rob Fergus², Rob Fergus¹ - Show less +5 more•Institutions (2)

New York University¹, Facebook²

29 Apr 2019-bioRxiv

TL;DR: This work uses unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, enabling state-of-the-art supervised prediction of mutational effect and secondary structure, and improving state- of- the-art features for long-range contact prediction.

...read moreread less

Abstract: In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.

...read moreread less

748 citations

Cites background from "Predicting Functional Effect of Hum..."

...Computational variant effect predictors are useful for assessing the effect of point mutations (Gray et al., 2018; Adzhubei et al., 2013; Kumar et al., 2009; Hecht et al., 2015; Rentzsch et al., 2018)....
[...]

Journal Article•DOI•

Dilated cardiomyopathy: the complexity of a diverse genetic architecture

[...]

Ray E. Hershberger¹, Dale J. Hedges¹, Ana Morales¹•Institutions (1)

Ohio State University¹

01 Sep 2013-Nature Reviews Cardiology

TL;DR: Reassessment of assumptions about the complexity of the genomic and phenomic architecture of DCM is warranted, which will require comprehensive genomic studies in much larger cohorts of rigorously phenotyped probands and family members than previously examined.

...read moreread less

Abstract: Remarkable progress has been made in understanding the genetic basis of dilated cardiomyopathy (DCM). Rare variants in >30 genes, some also involved in other cardiomyopathies, muscular dystrophy, or syndromic disease, perturb a diverse set of important myocardial proteins to produce a final DCM phenotype. Large, publicly available datasets have provided the opportunity to evaluate previously identified DCM-causing mutations, and to examine the population frequency of sequence variants similar to those that have been observed to cause DCM. The frequency of these variants, whether associated with dilated or hypertrophic cardiomyopathy, is greater than estimates of disease prevalence. This mismatch might be explained by one or more of the following possibilities: that the penetrance of DCM-causing mutations is lower than previously thought, that some variants are noncausal, that DCM prevalence is higher than previously estimated, or that other more-complex genomics underlie DCM. Reassessment of our assumptions about the complexity of the genomic and phenomic architecture of DCM is warranted. Much about the genomic basis of DCM remains to be investigated, which will require comprehensive genomic studies in much larger cohorts of rigorously phenotyped probands and family members than previously examined.

...read moreread less

728 citations

Journal Article•DOI•

Genomic and Molecular Landscape of DNA Damage Repair Deficiency across The Cancer Genome Atlas

[...]

Theo A. Knijnenburg¹, Linghua Wang², Michael T. Zimmermann³, Nyasha Chambwe¹, Galen F. Gao⁴, Andrew D. Cherniack⁴, Huihui Fan⁵, Hui Shen⁵, Gregory P. Way⁶, Casey S. Greene⁶, Yuexin Liu², Rehan Akbani², Bin Feng, Lawrence A. Donehower⁷, Chase Miller⁷, Yang Shen⁸, Mostafa Karimi⁸, Haoran Chen⁸, Pora Kim⁹, Peilin Jia⁹, Eve Shinbrot⁷, Shaojun Zhang², Jianfang Liu, Hai Hu, Matthew H. Bailey¹⁰, Christina Yau¹¹, Denise M. Wolf¹², Zhongming Zhao⁹, John N. Weinstein², Lei Li¹³, Li Ding¹⁰, Gordon B. Mills², Peter W. Laird⁵, David A. Wheeler⁷, Ilya Shmulevich¹, Raymond J. Monnat¹⁴, Yonghong Xiao, Chen Wang³ - Show less +34 more•Institutions (14)

Institute for Systems Biology¹, University of Texas MD Anderson Cancer Center², Mayo Clinic³, Massachusetts Institute of Technology⁴, Van Andel Institute⁵, University of Pennsylvania⁶, Baylor College of Medicine⁷, Texas A&M University⁸, University of Texas Health Science Center at Houston⁹, Washington University in St. Louis¹⁰, Buck Institute for Research on Aging¹¹, University of California, San Francisco¹², University of Texas at Austin¹³, University of Washington¹⁴

03 Apr 2018-Cell Reports

TL;DR: These frequent DDR gene alterations in many human cancers have functional consequences that may determine cancer progression and guide therapy and a new machine-learning-based classifier developed from gene expression data allowed to identify alterations that phenocopy deleterious TP53 mutations.

...read moreread less

706 citations

Cites methods from "Predicting Functional Effect of Hum..."

...To estimate the probability of missense mutations being damaging, we further annotated these missense mutations using six commonly used functional prediction algorithms (Figure S1D): PolyPhen-2 (Adzhubei et al., 2013), SIFT (Kumar et al....
[...]
...…being damaging, we further annotated these missense mutations using six commonly used functional prediction algorithms (Figure S1D): PolyPhen-2 (Adzhubei et al., 2013), SIFT (Kumar et al., 2009), Mutation Taster (Schwarz et al., 2014), Mutation Assessor (Reva et al., 2011), LR and LRT (Chun…...
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.

[...]

Julie D. Thompson¹, Toby J. Gibson, Frederica Plewniak¹, Francois Jeanmougin¹, Desmond G. Higgins² - Show less +1 more•Institutions (2)

French Institute of Health and Medical Research¹, University College Cork²

01 Dec 1997-Nucleic Acids Research

TL;DR: ClUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W, providing an integrated system for performing multiple sequence and profile alignments and analysing the results.

...read moreread less

Abstract: CLUSTAL X is a new windows interface for the widely-used progressive multiple sequence alignment program CLUSTAL W. The new system is easy to use, providing an integrated system for performing multiple sequence and profile alignments and analysing the results. CLUSTAL X displays the sequence alignment in a window on the screen. A versatile sequence colouring scheme allows the user to highlight conserved features in the alignment. Pull-down menus provide all the options required for traditional multiple sequence and profile alignment. New features include: the ability to cut-and-paste sequences to change the order of the alignment, selection of a subset of the sequences to be realigned, and selection of a sub-range of the alignment to be realigned and inserted back into the original alignment. Alignment quality analysis can be performed and low-scoring segments or exceptional residues can be highlighted. Quality analysis and realignment of selected residue ranges provide the user with a powerful tool to improve and refine difficult alignments and to trap errors in input sequences. CLUSTAL X has been compiled on SUN Solaris, IRIX5.3 on Silicon Graphics, Digital UNIX on DECstations, Microsoft Windows (32 bit) for PCs, Linux ELF for x86 PCs, and Macintosh PowerMac.

...read moreread less

38,522 citations

"Predicting Functional Effect of Hum..." refers methods in this paper

...Searching for Mutations 7.20.5 Current Protocols in Human Genetics Supplement 76 Figure 7.20.3 Detailed results of the PolyPhen-2 analysis for a single variant query with the multiple sequence alignment and 3-D-structure protein viewer panels expanded the multiple sequence alignment panel displays a fixed 75-residue wide window surrounding the variant’s position (the column indicated by black frame), with the alignment colored using the ClustalX (Thompson et al., 1997) scheme for all columns above 50% conservation threshold....
[...]
...…panels expanded the multiple sequence alignment panel displays a fixed 75-residue wide window surrounding the variant’s position (the column indicated by black frame), with the alignment colored using the ClustalX (Thompson et al., 1997) scheme for all columns above 50% conservation threshold....
[...]

Journal Article•DOI•

A method and server for predicting damaging missense mutations.

[...]

Ivan Adzhubei¹, Steffen Schmidt², Leonid Peshkin³, Vasily Ramensky⁴, Anna Gerasimova⁵, Peer Bork, Alexey S. Kondrashov⁵, Shamil R. Sunyaev¹ - Show less +4 more•Institutions (5)

Brigham and Women's Hospital¹, Max Planck Society², Harvard University³, Engelhardt Institute of Molecular Biology⁴, University of Michigan⁵

01 Apr 2010-Nature Methods

TL;DR: A new method and the corresponding software tool, PolyPhen-2, which is different from the early tool polyPhen1 in the set of predictive features, alignment pipeline, and the method of classification is presented and performance, as presented by its receiver operating characteristic curves, was consistently superior.

...read moreread less

Abstract: To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods). Figure 1 PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ... We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.

...read moreread less

11,571 citations

"Predicting Functional Effect of Hum..." refers background or methods in this paper

...PolyPhen-2 (Adzhubei et al., 2010) is an automatic tool for prediction of the possible impact of an amino acid substitution on the structure and function of a human protein....
[...]
...…published estimate (for version 2.0.0) is that, for a false positive rate of 20%, PolyPhen-2 achieves true positive prediction rates of 92% on the HumDiv dataset and 73% on the HumVar dataset (Adzhubei et al. 2010), and our unpublished estimates for newer versions show slightly better performance....
[...]

Journal Article•DOI•

Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

[...]

Andrew M. Waterhouse¹, James B. Procter¹, David M. A. Martin¹, Michele Clamp¹, Geoffrey J. Barton¹ - Show less +1 more•Institutions (1)

University of Dundee¹

01 May 2009-Bioinformatics

TL;DR: Jalview 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server.

...read moreread less

Abstract: Summary: Jalview Version 2 is a system for interactive WYSIWYG editing, analysis and annotation of multiple sequence alignments. Core features include keyboard and mouse-based editing, multiple views and alignment overviews, and linked structure display with Jmol. Jalview 2 is available in two forms: a lightweight Java applet for use in web applications, and a powerful desktop application that employs web services for sequence alignment, secondary structure prediction and the retrieval of alignments, sequences, annotation and structures from public databases and any DAS 1.53 compliant sequence or annotation server. Availability: The Jalview 2 Desktop application and JalviewLite applet are made freely available under the GPL, and can be downloaded from www.jalview.org Contact: g.j.barton@dundee.ac.uk

...read moreread less

7,926 citations

"Predicting Functional Effect of Hum..." refers background in this paper

...Jalview Version 2-a multiple sequence alignment editor and analysis workbench....
[...]
...Clicking on the link at the bottom of the alignment panel opens the Jalview (Waterhouse et al., 2009) alignment viewer applet with the complete multiple alignment loaded....
[...]
...Click the link at the bottom of the panel to open an interactive alignment viewer (Jalview, http://www.jalview.org/) (Waterhouse et al., 2009) to scroll through the complete alignment....
[...]

Book•

Molecular Evolution and Phylogenetics

[...]

Masatoshi Nei, Sudhir Kumar

15 Aug 2000

TL;DR: This chapter discusses the molecular basis of evolution, the evolution of organisms based on the fossil record, and the implications of these events for phylogenetic inference.

...read moreread less

Abstract: 1. Molecular basis of evolution 2. Evolutionary changes of amino acid sequences 3. Evolutionary changes of DNA sequences 4. Synonymous and nonsynonymous nucleotide substitutions 5. Phylogenetic trees 6. Phylogenetic inference: Distance methods 7. Phylogenetic inference: Maximum parsimony methods 8. Phylogenetic inference: Maximum likelihood methods 9. Accuracies and statistical tests of phylogenetic trees 10. Molecular clocks and linearized trees 11. Ancestral nucleotide and amino acid sequences 12. Genetic polymorphism and evolution 13. Population trees from genetic markers 14. Perspectives Appendices A. Mathematical sumbols and notations B. Geological timescale C. Geological events in the Cenozoic and Meszoic eras D. Evolution of organisms based on the fossil record

...read moreread less

5,629 citations

Journal Article•DOI•

Evolution and functional impact of rare coding variation from deep sequencing of human exomes

[...]

Jacob A. Tennessen¹, Abigail W. Bigham¹, Timothy D. O’Connor¹, Wenqing Fu¹, Eimear E. Kenny², Simon Gravel², Sean McGee¹, Ron Do³, Ron Do⁴, Xiaoming Liu⁵, Goo Jun⁶, Hyun Min Kang⁶, Daniel M. Jordan³, Suzanne M. Leal⁷, Stacey Gabriel⁴, Mark J. Rieder¹, Gonçalo R. Abecasis⁶, David Altshuler⁴, Deborah A. Nickerson¹, Eric Boerwinkle⁷, Eric Boerwinkle⁵, Shamil R. Sunyaev³, Shamil R. Sunyaev⁴, Carlos Bustamante², Michael J. Bamshad¹, Joshua M. Akey¹ - Show less +22 more•Institutions (7)

University of Washington¹, Stanford University², Harvard University³, Broad Institute⁴, University of Texas Health Science Center at Houston⁵, University of Michigan⁶, Baylor College of Medicine⁷

06 Jul 2012-Science

TL;DR: The findings suggest that most human variation is rare, not shared between populations, and that rare variants are likely to play a role in human health, and show that large sample sizes will be required to associate rare variants with complex traits.

...read moreread less

Abstract: As a first step toward understanding how rare variants contribute to risk for complex diseases, we sequenced 15,585 human protein-coding genes to an average median depth of 111× in 2440 individuals of European (n = 1351) and African (n = 1088) ancestry. We identified over 500,000 single-nucleotide variants (SNVs), the majority of which were rare (86% with a minor allele frequency less than 0.5%), previously unknown (82%), and population-specific (82%). On average, 2.3% of the 13,595 SNVs each person carried were predicted to affect protein function of ~313 genes per genome, and ~95.7% of SNVs predicted to be functionally important were rare. This excess of rare functional variants is due to the combined effects of explosive, recent accelerated population growth and weak purifying selection. Furthermore, we show that large sample sizes will be required to associate rare variants with complex traits.

...read moreread less

1,680 citations

"Predicting Functional Effect of Hum..." refers background in this paper

...…rare alleles that cause Mendelian disease (Bamshad et al., 2011), scanning for potentially medically actionable alleles in an individual’s genome (Ashley et al., 2010), and profiling the spectrum of rare variation uncovered by deep sequencing of large populations (Tennessen et al., 2012)....
[...]