Home
/
Authors
/
Ivan Adzhubei

Author

Ivan Adzhubei

Other affiliations: Moscow State University, Institute of Cancer Research, Harvard University

Bio: Ivan Adzhubei is an academic researcher from Brigham and Women's Hospital. The author has contributed to research in topics: Medicine & Biology. The author has an hindex of 14, co-authored 24 publications receiving 18367 citations. Previous affiliations of Ivan Adzhubei include Moscow State University & Institute of Cancer Research.

Topics: Medicine, Biology, Circular dichroism, Genome, Exome sequencing ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A method and server for predicting damaging missense mutations.

[...]

Ivan Adzhubei¹, Steffen Schmidt², Leonid Peshkin³, Vasily Ramensky⁴, Anna Gerasimova⁵, Peer Bork, Alexey S. Kondrashov⁵, Shamil R. Sunyaev¹ - Show less +4 more•Institutions (5)

Brigham and Women's Hospital¹, Max Planck Society², Harvard University³, Engelhardt Institute of Molecular Biology⁴, University of Michigan⁵

01 Apr 2010-Nature Methods

TL;DR: A new method and the corresponding software tool, PolyPhen-2, which is different from the early tool polyPhen1 in the set of predictive features, alignment pipeline, and the method of classification is presented and performance, as presented by its receiver operating characteristic curves, was consistently superior.

...read moreread less

Abstract: To the Editor: Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow. Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods). Figure 1 PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ... We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging. We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2). One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations. For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods). The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.

...read moreread less

11,571 citations

Journal Article•DOI•

Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project

[...]

Ewan Birney, John A. Stamatoyannopoulos¹, Anindya Dutta², Roderic Guigó³ +317 more•Institutions (44)

14 Jun 2007-Nature

TL;DR: Functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project are reported, providing convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts.

...read moreread less

Abstract: We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function.

...read moreread less

5,091 citations

Journal Article•DOI•

Predicting Functional Effect of Human Missense Mutations Using PolyPhen-2

[...]

Ivan Adzhubei¹, Daniel M. Jordan², Daniel M. Jordan¹, Shamil R. Sunyaev¹•Institutions (2)

Brigham and Women's Hospital¹, Harvard University²

01 Jan 2013-Current protocols in human genetics

TL;DR: PolyPhen‐2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations.

...read moreread less

Abstract: PolyPhen-2 (Polymorphism Phenotyping v2), available as software and via a Web server, predicts the possible impact of amino acid substitutions on the stability and function of human proteins using structural and comparative evolutionary considerations. It performs functional annotation of single-nucleotide polymorphisms (SNPs), maps coding SNPs to gene transcripts, extracts protein sequence annotations and structural attributes, and builds conservation profiles. It then estimates the probability of the missense mutation being damaging based on a combination of all these properties. PolyPhen-2 features include a high-quality multiple protein sequence alignment pipeline and a prediction method employing machine-learning classification. The software also integrates the UCSC Genome Browser's human genome annotations and MultiZ multiple alignments of vertebrate genomes with the human genome. PolyPhen-2 is capable of analyzing large volumes of data produced by next-generation sequencing projects, thanks to built-in support for high-performance computing environments like Grid Engine and Platform LSF.

...read moreread less

2,681 citations

Journal Article•DOI•

Human mutation rate associated with DNA replication timing

[...]

John A. Stamatoyannopoulos¹, Ivan Adzhubei², Robert E. Thurman¹, Gregory V. Kryukov², Sergei M. Mirkin³, Shamil R. Sunyaev² - Show less +2 more•Institutions (3)

University of Washington¹, Brigham and Women's Hospital², Tufts University³

15 Mar 2009-Nature Genetics

TL;DR: It is observed that mutation rate, as reflected in recent evolutionary divergence and human nucleotide diversity, is markedly increased in later-replicating regions of the human genome, suggesting a generalized mechanism involving replication time-dependent DNA damage.

...read moreread less

Abstract: Eukaryotic DNA replication is highly stratified, with different genomic regions shown to replicate at characteristic times during S phase. Here we observe that mutation rate, as reflected in recent evolutionary divergence and human nucleotide diversity, is markedly increased in later-replicating regions of the human genome. All classes of substitutions are affected, suggesting a generalized mechanism involving replication time-dependent DNA damage. This correlation between mutation rate and regionally stratified replication timing may have substantial evolutionary implications.

...read moreread less

424 citations

Journal Article•DOI•

A universal trend of amino acid gain and loss in protein evolution

[...]

I. King Jordan, Fyodor A. Kondrashov¹, Ivan Adzhubei², Yuri I. Wolf, Eugena V. Koonin, Alexey S. Kondrashov, Shasnil Sunyaev² - Show less +3 more•Institutions (2)

University of California, Davis¹, Brigham and Women's Hospital²

10 Feb 2005-Nature

TL;DR: Comparison of sets of orthologous proteins encoded by triplets of closely related genomes from 15 taxa representing all three domains of life and phylogenies to polarize amino acid substitutions shows expansion of initially under-represented amino acids apparently continues to this day.

...read moreread less

Abstract: A comparison of corresponding sets of proteins encoded by closely related genes from organisms representing all three domains of life (Bacteria, Archaea and Eukaryota) suggests that the order in which the genetic code was assembled over 3.5 billion years ago continues to influence the evolution of proteins today. Across these diverse genomes, evolving proteins have accumulated Cys, Met, His, Ser and Phe, and lost many of their Pro, Ala, Glu and Gly residues. The same nine amino acids are currently accrued or lost in human proteins as shown by analysis of nucleotide polymorphisms. The amino acids with declining frequencies were probably among the first incorporated into the genetic code, and most of those with increasing frequencies were probably recruited late. Amino acid composition of proteins varies substantially between taxa and, thus, can evolve. For example, proteins from organisms with (G + C)-rich (or (A + T)-rich) genomes contain more (or fewer) amino acids encoded by (G + C)-rich codons1,2,3,4. However, no universal trends in ongoing changes of amino acid frequencies have been reported. We compared sets of orthologous proteins encoded by triplets of closely related genomes from 15 taxa representing all three domains of life (Bacteria, Archaea and Eukaryota), and used phylogenies to polarize amino acid substitutions. Cys, Met, His, Ser and Phe accrue in at least 14 taxa, whereas Pro, Ala, Glu and Gly are consistently lost. The same nine amino acids are currently accrued or lost in human proteins, as shown by analysis of non-synonymous single-nucleotide polymorphisms. All amino acids with declining frequencies are thought to be among the first incorporated into the genetic code; conversely, all amino acids with increasing frequencies, except Ser, were probably recruited late5,6,7. Thus, expansion of initially under-represented amino acids, which began over 3,400 million years ago8,9, apparently continues to this day.

...read moreread less

258 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.

[...]

Sue Richards¹, Nazneen Aziz², Nazneen Aziz³, Sherri J. Bale⁴, David P. Bick⁵, Soma Das⁶, Julie M. Gastier-Foster, Wayne W. Grody⁷, Madhuri Hegde⁸, Elaine Lyon⁹, Elaine B. Spector¹⁰, Karl V. Voelkerding⁹, Heidi L. Rehm¹¹ - Show less +9 more•Institutions (11)

Oregon Health & Science University¹, Boston Children's Hospital², College of American Pathologists³, GeneDx⁴, Medical College of Wisconsin⁵, University of Chicago⁶, University of California, Los Angeles⁷, Emory University⁸, University of Utah⁹, University of Colorado Denver¹⁰, Harvard University¹¹

05 Mar 2015-Genetics in Medicine

TL;DR: Because of the increased complexity of analysis and interpretation of clinical genetic testing described in this report, the ACMG strongly recommends thatclinical molecular genetic testing should be performed in a Clinical Laboratory Improvement Amendments–approved laboratory, with results interpreted by a board-certified clinical molecular geneticist or molecular genetic pathologist or the equivalent.

...read moreread less

17,834 citations

Journal Article•DOI•

An integrated encyclopedia of DNA elements in the human genome

[...]

Principal investigators¹, Nhgri groups², Data production leads³, Lead analysts³•Institutions (3)

Wellcome Trust¹, University of Washington², Pennsylvania State University³

06 Sep 2012-Nature

TL;DR: The Encyclopedia of DNA Elements project provides new insights into the organization and regulation of the authors' genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

Abstract: The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

...read moreread less

13,548 citations

Journal Article•DOI•

A global reference for human genetic variation.

[...]

Adam Auton¹, Gonçalo R. Abecasis², David Altshuler³, Richard Durbin⁴ +514 more•Institutions (90)

01 Oct 2015-Nature

TL;DR: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations, and has reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-generation sequencing, deep exome sequencing, and dense microarray genotyping.

...read moreread less

Abstract: The 1000 Genomes Project set out to provide a comprehensive description of common human genetic variation by applying whole-genome sequencing to a diverse set of individuals from multiple populations. Here we report completion of the project, having reconstructed the genomes of 2,504 individuals from 26 populations using a combination of low-coverage whole-genome sequencing, deep exome sequencing, and dense microarray genotyping. We characterized a broad spectrum of genetic variation, in total over 88 million variants (84.7 million single nucleotide polymorphisms (SNPs), 3.6 million short insertions/deletions (indels), and 60,000 structural variants), all phased onto high-quality haplotypes. This resource includes >99% of SNP variants with a frequency of >1% for a variety of ancestries. We describe the distribution of genetic variation across the global sample, and discuss the implications for common disease studies.

...read moreread less

12,661 citations

Journal Article•DOI•

Induced Pluripotent Stem Cell Lines Derived from Human Somatic Cells

[...]

Junying Yu¹, Maxim A. Vodyanik, Kim Smuga-Otto, Jessica Antosiewicz-Bourget, Jennifer L. Frane, Shulan Tian, Jeff Nie, Gudrun A. Jonsdottir, Victor Ruotti, Ron Stewart, Igor I. Slukvin, James A. Thomson - Show less +8 more•Institutions (1)

University of Wisconsin-Madison¹

21 Dec 2007-Science

TL;DR: This article showed that OCT4, SOX2, NANOG, and LIN28 factors are sufficient to reprogram human somatic cells to pluripotent stem cells that exhibit the essential characteristics of embryonic stem (ES) cells.

...read moreread less

Abstract: Somatic cell nuclear transfer allows trans-acting factors present in the mammalian oocyte to reprogram somatic cell nuclei to an undifferentiated state. We show that four factors (OCT4, SOX2, NANOG, and LIN28) are sufficient to reprogram human somatic cells to pluripotent stem cells that exhibit the essential characteristics of embryonic stem (ES) cells. These induced pluripotent human stem cells have normal karyotypes, express telomerase activity, express cell surface markers and genes that characterize human ES cells, and maintain the developmental potential to differentiate into advanced derivatives of all three primary germ layers. Such induced pluripotent human cell lines should be useful in the production of new disease models and in drug development, as well as for applications in transplantation medicine, once technical limitations (for example, mutation through viral integration) are eliminated.

...read moreread less

9,836 citations

Journal Article•DOI•

Tissue-based map of the human proteome

[...]

Mathias Uhlén¹, Mathias Uhlén², Linn Fagerberg¹, Björn M. Hallström¹, Cecilia Lindskog³, Per Oksvold¹, Adil Mardinoglu⁴, Åsa Sivertsson¹, Caroline Kampf³, Evelina Sjöstedt³, Evelina Sjöstedt¹, Anna Asplund³, IngMarie Olsson³, Karolina Edlund, Emma Lundberg¹, Sanjay Navani, Cristina Al-Khalili Szigyarto¹, Jacob Odeberg¹, Dijana Djureinovic³, Jenny Ottosson Takanen¹, Sophia Hober¹, Tove Alm¹, Per-Henrik Edqvist³, Holger Berling¹, Hanna Tegel¹, Jan Mulder³, Johan Rockberg¹, Peter Nilsson¹, Jochen M. Schwenk¹, Marica Hamsten¹, Kalle von Feilitzen¹, Mattias Forsberg¹, Lukas Persson¹, Fredric Johansson¹, Martin Zwahlen¹, Gunnar von Heijne⁵, Jens Nielsen⁴, Jens Nielsen², Fredrik Pontén³ - Show less +35 more•Institutions (5)

Royal Institute of Technology¹, Technical University of Denmark², Science for Life Laboratory³, Chalmers University of Technology⁴, Stockholm University⁵

23 Jan 2015-Science

TL;DR: In this paper, a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level.

...read moreread less

Abstract: Resolving the molecular details of proteome variation in the different tissues and organs of the human body will greatly increase our knowledge of human biology and disease. Here, we present a map of the human tissue proteome based on an integrated omics approach that involves quantitative transcriptomics at the tissue and organ level, combined with tissue microarray-based immunohistochemistry, to achieve spatial localization of proteins down to the single-cell level. Our tissue-based analysis detected more than 90% of the putative protein-coding genes. We used this approach to explore the human secretome, the membrane proteome, the druggable proteome, the cancer proteome, and the metabolic functions in 32 different tissues and organs. All the data are integrated in an interactive Web-based database that allows exploration of individual proteins, as well as navigation of global expression patterns, in all major tissues and organs in the human body.

...read moreread less

9,745 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse