Home
/
Authors
/
Pauline C. Ng

Author

Pauline C. Ng

Other affiliations: Fred Hutchinson Cancer Research Center, University of Washington, J. Craig Venter Institute ...read more

Bio: Pauline C. Ng is an academic researcher from Genome Institute of Singapore. The author has contributed to research in topics: Nonsynonymous substitution & Indel. The author has an hindex of 14, co-authored 19 publications receiving 16175 citations. Previous affiliations of Pauline C. Ng include Fred Hutchinson Cancer Research Center & University of Washington.

Topics: Nonsynonymous substitution, Indel, Cilium, Protein sequencing, Exome ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm.

[...]

Priyank Kumar¹, Steven Henikoff², Steven Henikoff³, Pauline C. Ng¹, Pauline C. Ng³ - Show less +1 more•Institutions (3)

J. Craig Venter Institute¹, Howard Hughes Medical Institute², Fred Hutchinson Cancer Research Center³

25 Jun 2009-Nature Protocols

TL;DR: This protocol describes the use of the 'Sorting Tolerant From Intolerant' (SIFT) algorithm in predicting whether an AAS affects protein function.

...read moreread less

Abstract: The effect of genetic mutation on phenotype is of significant interest in genetics. The type of genetic mutation that causes a single amino acid substitution (AAS) in a protein sequence is called a non-synonymous single nucleotide polymorphism (nsSNP). An nsSNP could potentially affect the function of the protein, subsequently altering the carrier's phenotype. This protocol describes the use of the 'Sorting Tolerant From Intolerant' (SIFT) algorithm in predicting whether an AAS affects protein function. To assess the effect of a substitution, SIFT assumes that important positions in a protein sequence have been conserved throughout evolution and therefore substitutions at these positions may affect protein function. Thus, by using sequence homology, SIFT predicts the effects of all possible substitutions at each position in the protein sequence. The protocol typically takes 5–20 min, depending on the input. SIFT is available as an online tool ( http://sift-dna.org ).

...read moreread less

6,154 citations

Journal Article•DOI•

SIFT: predicting amino acid changes that affect protein function

[...]

Pauline C. Ng¹, Steven Henikoff•Institutions (1)

Fred Hutchinson Cancer Research Center¹

01 Jul 2003-Nucleic Acids Research

TL;DR: SIFT is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study and can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms.

...read moreread less

Abstract: Single nucleotide polymorphism (SNP) studies and random mutagenesis projects identify amino acid substitutions in protein-coding regions. Each substitution has the potential to affect protein function. SIFT (Sorting Intolerant From Tolerant) is a program that predicts whether an amino acid substitution affects protein function so that users can prioritize substitutions for further study. We have shown that SIFT can distinguish between functionally neutral and deleterious amino acid changes in mutagenesis studies and on human polymorphisms. SIFT is available at http://blocks.fhcrc.org/sift/SIFT.html.

...read moreread less

5,318 citations

Journal Article•DOI•

Predicting Deleterious Amino Acid Substitutions

[...]

Pauline C. Ng¹, Steven Henikoff•Institutions (1)

Fred Hutchinson Cancer Research Center¹

01 May 2001-Genome Research

TL;DR: A tool that uses sequence homology to predict whether a substitution affects protein function is constructed, which may be used to identify plausible disease candidates among the SNPs that cause missense substitutions.

...read moreread less

Abstract: Many missense substitutions are identified in single nucleotide polymorphism (SNP) data and large-scale random mutagenesis projects. Each amino acid substitution potentially affects protein function. We have constructed a tool that uses sequence homology to predict whether a substitution affects protein function. SIFT, which sorts intolerant from tolerant substitutions, classifies substitutions as tolerated or deleterious. A higher proportion of substitutions predicted to be deleterious by SIFT gives an affected phenotype than substitutions predicted to be deleterious by substitution scoring matrices in three test cases. Using SIFT before mutagenesis studies could reduce the number of functional assays required and yield a higher proportion of affected phenotypes. may be used to identify plausible disease candidates among the SNPs that cause missense substitutions.

...read moreread less

2,374 citations

Journal Article•DOI•

SIFT web server: predicting effects of amino acid substitutions on proteins

[...]

Ngak-Leng Sim¹, Priyank Kumar, Jing Hu, Steven Henikoff, Georg Schneider, Pauline C. Ng - Show less +2 more•Institutions (1)

Genome Institute of Singapore¹

01 Jul 2012-Nucleic Acids Research

TL;DR: This work has updated SIFT’s genome-wide prediction tool since the last publication in 2009, and added new features to the insertion/deletion (indel) tool.

...read moreread less

Abstract: The Sorting Intolerant from Tolerant (SIFT) algorithm predicts the effect of coding variants on protein function. It was first introduced in 2001, with a corresponding website that provides users with predictions on their variants. Since its release, SIFT has become one of the standard tools for characterizing missense variation. We have updated SIFT’s genome-wide prediction tool since our last publication in 2009, and added new features to the insertion/deletion (indel) tool. We also show accuracy metrics on independent data sets. The original developers have hosted the SIFT web server at FHCRC, JCVI and the web server is currently located at BII. The URL is http://sift-dna.org (24 May 2012, date last accessed).

...read moreread less

1,748 citations

Journal Article•DOI•

Predicting the Effects of Amino Acid Substitutions on Protein Function

[...]

Pauline C. Ng¹, Steven Henikoff•Institutions (1)

Fred Hutchinson Cancer Research Center¹

01 Sep 2006-Annual Review of Genomics and Human Genetics

TL;DR: An overview of amino acid substitution (AAS) prediction methods, which use sequence and/or structure to predict the effect of an AAS on protein function, and the utility of AAS prediction methods for Mendelian and complex diseases as well as their broader applications for understanding protein function.

...read moreread less

Abstract: Nonsynonymous single nucleotide polymorphisms (nsSNPs) are coding variants that introduce amino acid changes in their corresponding proteins. Because nsSNPs can affect protein function, they are believed to have the largest impact on human health compared with SNPs in other regions of the genome. Therefore, it is important to distinguish those nsSNPs that affect protein function from those that are functionally neutral. Here we provide an overview of amino acid substitution (AAS) prediction methods, which use sequence and/or structure to predict the effect of an AAS on protein function. Most methods predict approximately 25–30% of human nsSNPs to negatively affect protein function, and such nsSNPs tend to be rare in the population. We discuss the utility of AAS prediction methods for Mendelian and complex diseases as well as their broader applications for understanding protein function.

...read moreread less

957 citations

1
2
3
4
…

Cited by

PDF

Open Access

More filters

疟原虫var基因转换速率变化导致抗原变异[英]／Paul H, Robert P, Christodoulou Z, et al//Proc Natl Acad Sci U S A

[...]

宁北芳, 朱淮民

28 Jul 2005

TL;DR: PfPMP1）与感染红细胞、树突状组胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作�ly.

...read moreread less

Abstract: 抗原变异可使得多种致病微生物易于逃避宿主免疫应答。表达在感染红细胞表面的恶性疟原虫红细胞表面蛋白1（PfPMP1）与感染红细胞、内皮细胞、树突状细胞以及胎盘的单个或多个受体作用，在黏附及免疫逃避中起关键的作用。每个单倍体基因组var基因家族编码约60种成员，通过启动转录不同的var基因变异体为抗原变异提供了分子基础。

...read moreread less

18,940 citations

Journal Article•DOI•

Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.

[...]

Sue Richards¹, Nazneen Aziz², Nazneen Aziz³, Sherri J. Bale⁴, David P. Bick⁵, Soma Das⁶, Julie M. Gastier-Foster, Wayne W. Grody⁷, Madhuri Hegde⁸, Elaine Lyon⁹, Elaine B. Spector¹⁰, Karl V. Voelkerding⁹, Heidi L. Rehm¹¹ - Show less +9 more•Institutions (11)

Oregon Health & Science University¹, College of American Pathologists², Boston Children's Hospital³, GeneDx⁴, Medical College of Wisconsin⁵, University of Chicago⁶, University of California, Los Angeles⁷, Emory University⁸, University of Utah⁹, University of Colorado Denver¹⁰, Harvard University¹¹

05 Mar 2015-Genetics in Medicine

TL;DR: Because of the increased complexity of analysis and interpretation of clinical genetic testing described in this report, the ACMG strongly recommends thatclinical molecular genetic testing should be performed in a Clinical Laboratory Improvement Amendments–approved laboratory, with results interpreted by a board-certified clinical molecular geneticist or molecular genetic pathologist or the equivalent.

...read moreread less

17,834 citations

Journal Article•DOI•

A method and server for predicting damaging missense mutations.

[...]

Ivan Adzhubei¹, Steffen Schmidt², Leonid Peshkin³, Vasily Ramensky⁴, Anna Gerasimova⁵, Peer Bork, Alexey S. Kondrashov⁵, Shamil R. Sunyaev¹ - Show less +4 more•Institutions (5)

Brigham and Women's Hospital¹, Max Planck Society², Harvard University³, Engelhardt Institute of Molecular Biology⁴, University of Michigan⁵

01 Apr 2010-Nature Methods

TL;DR: A new method and the corresponding software tool, PolyPhen-2, which is different from the early tool polyPhen1 in the set of predictive features, alignment pipeline, and the method of classification is presented and performance, as presented by its receiver operating characteristic curves, was consistently superior.

...read moreread less

Abstract: To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods). Figure 1 PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ... We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.

...read moreread less

11,571 citations

Journal Article•DOI•

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

[...]

Kai Wang¹, Mingyao Li¹, Hakon Hakonarson¹•Institutions (1)

Children's Hospital of Philadelphia¹

01 Sep 2010-Nucleic Acids Research

TL;DR: The ANNOVAR tool to annotate single nucleotide variants and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP is developed.

...read moreread less

Abstract: High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.

...read moreread less

10,461 citations

Journal Article•

Data Mining Practical Machine Learning Tools and Techniques

[...]

อนิรุธ สืบสิงห์

01 Jan 2014-Journal of management science

9,185 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse