Institution
Engelhardt Institute of Molecular Biology
Facility•Moscow, Russia•
About: Engelhardt Institute of Molecular Biology is a facility organization based out in Moscow, Russia. It is known for research contribution in the topics: Gene & DNA. The organization has 2346 authors who have published 3549 publications receiving 93195 citations. The organization is also known as: Federal State Institution of Science Institute of Molecular Biology. VA Engelhardt of the Russian Academy of Sciences.
Topics: Gene, DNA, RNA, Oligonucleotide, Genome
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A new method and the corresponding software tool, PolyPhen-2, which is different from the early tool polyPhen1 in the set of predictive features, alignment pipeline, and the method of classification is presented and performance, as presented by its receiver operating characteristic curves, was consistently superior.
Abstract: To the Editor:
Applications of rapidly advancing sequencing technologies exacerbate the need to interpret individual sequence variants. Sequencing of phenotyped clinical subjects will soon become a method of choice in studies of the genetic causes of Mendelian and complex diseases. New exon capture techniques will direct sequencing efforts towards the most informative and easily interpretable protein-coding fraction of the genome. Thus, the demand for computational predictions of the impact of protein sequence variants will continue to grow.
Here we present a new method and the corresponding software tool, PolyPhen-2 (http://genetics.bwh.harvard.edu/pph2/), which is different from the early tool PolyPhen1 in the set of predictive features, alignment pipeline, and the method of classification (Fig. 1a). PolyPhen-2 uses eight sequence-based and three structure-based predictive features (Supplementary Table 1) which were selected automatically by an iterative greedy algorithm (Supplementary Methods). Majority of these features involve comparison of a property of the wild-type (ancestral, normal) allele and the corresponding property of the mutant (derived, disease-causing) allele, which together define an amino acid replacement. Most informative features characterize how well the two human alleles fit into the pattern of amino acid replacements within the multiple sequence alignment of homologous proteins, how distant the protein harboring the first deviation from the human wild-type allele is from the human protein, and whether the mutant allele originated at a hypermutable site2. The alignment pipeline selects the set of homologous sequences for the analysis using a clustering algorithm and then constructs and refines their multiple alignment (Supplementary Fig. 1). The functional significance of an allele replacement is predicted from its individual features (Supplementary Figs. 2–4) by Naive Bayes classifier (Supplementary Methods).
Figure 1
PolyPhen-2 pipeline and prediction accuracy. (a) Overview of the algorithm. (b) Receiver operating characteristic (ROC) curves for predictions made by PolyPhen-2 using five-fold cross-validation on HumDiv (red) and HumVar3 (light green). UniRef100 (solid ...
We used two pairs of datasets to train and test PolyPhen-2. We compiled the first pair, HumDiv, from all 3,155 damaging alleles with known effects on the molecular function causing human Mendelian diseases, present in the UniProt database, together with 6,321 differences between human proteins and their closely related mammalian homologs, assumed to be non-damaging (Supplementary Methods). The second pair, HumVar3, consists of all the 13,032 human disease-causing mutations from UniProt, together with 8,946 human nsSNPs without annotated involvement in disease, which were treated as non-damaging.
We found that PolyPhen-2 performance, as presented by its receiver operating characteristic curves, was consistently superior compared to PolyPhen (Fig. 1b) and it also compared favorably with the three other popular prediction tools4–6 (Fig. 1c). For a false positive rate of 20%, PolyPhen-2 achieves the rate of true positive predictions of 92% and 73% on HumDiv and HumVar, respectively (Supplementary Table 2).
One reason for a lower accuracy of predictions on HumVar is that nsSNPs assumed to be non-damaging in HumVar contain a sizable fraction of mildly deleterious alleles. In contrast, most of amino acid replacements assumed non-damaging in HumDiv must be close to selective neutrality. Because alleles that are even mildly but unconditionally deleterious cannot be fixed in the evolving lineage, no method based on comparative sequence analysis is ideal for discriminating between drastically and mildly deleterious mutations, which are assigned to the opposite categories in HumVar. Another reason is that HumDiv uses an extra criterion to avoid possible erroneous annotations of damaging mutations.
For a mutation, PolyPhen-2 calculates Naive Bayes posterior probability that this mutation is damaging and reports estimates of false positive (the chance that the mutation is classified as damaging when it is in fact non-damaging) and true positive (the chance that the mutation is classified as damaging when it is indeed damaging) rates. A mutation is also appraised qualitatively, as benign, possibly damaging, or probably damaging (Supplementary Methods).
The user can choose between HumDiv- and HumVar-trained PolyPhen-2. Diagnostics of Mendelian diseases requires distinguishing mutations with drastic effects from all the remaining human variation, including abundant mildly deleterious alleles. Thus, HumVar-trained PolyPhen-2 should be used for this task. In contrast, HumDiv-trained PolyPhen-2 should be used for evaluating rare alleles at loci potentially involved in complex phenotypes, dense mapping of regions identified by genome-wide association studies, and analysis of natural selection from sequence data, where even mildly deleterious alleles must be treated as damaging.
11,571 citations
••
Alistair R. R. Forrest, Hideya Kawaji, Michael Rehli1, J Kenneth Baillie2 +277 more•Institutions (63)
TL;DR: For example, the authors mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body.
Abstract: Regulated transcription controls the diversity, developmental pathways and spatial organization of the hundreds of cell types that make up a mammal Using single-molecule cDNA sequencing, we mapped transcription start sites (TSSs) and their usage in human and mouse primary cells, cell lines and tissues to produce a comprehensive overview of mammalian gene expression across the human body We find that few genes are truly 'housekeeping', whereas many mammalian promoters are composite entities composed of several closely separated TSSs, with independent cell-type-specific expression profiles TSSs specific to different cell types evolve at different rates, whereas promoters of broadly expressed genes are the most conserved Promoter-based expression analysis reveals key transcription factors defining cell states and links them to binding-site motifs The functions of identified novel transcripts can be predicted by coexpression and sample ontology enrichment analyses The functional annotation of the mammalian genome 5 (FANTOM5) project provides comprehensive expression profiles and functional annotation of mammalian cell-type-specific transcriptomes with wide applications in biomedical research
1,715 citations
••
University of Washington1, University of Queensland2, Harvard University3, Katholieke Universiteit Leuven4, University of California, San Diego5, Engelhardt Institute of Molecular Biology6, Boston University7, University of California, Santa Cruz8, Moscow State University9, University of Milan10, Université libre de Bruxelles11, Rockefeller University12
TL;DR: The purpose of the current assessment is to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
Abstract: The prediction of regulatory elements is a problem where computational methods offer great hope. Over the past few years, numerous tools have become available for this task. The purpose of the current assessment is twofold: to provide some guidance to users regarding the accuracy of currently available tools in various settings, and to provide a benchmark of data sets for assessing future tools.
1,324 citations
••
TL;DR: This work has developed a straightforward and reliable method based on physical and comparative considerations that estimates the impact of an amino acid replacement on the three-dimensional structure and function of the protein.
Abstract: Single nucleotide polymorphisms (SNPs) constitute the bulk of human genetic variation, occurring with an average density of approximately 1/1000 nucleotides of a genotype. SNPs are either neutral allelic variants or are under selection of various strengths, and the impact of SNPs on fitness remains unknown. Identification of SNPs affecting human phenotype, especially leading to risks of complex disorders, is one of the key problems of medical genetics. SNPs in protein-coding regions that cause amino acid variants (non-synonymous cSNPs) are most likely to affect phenotypes. We have developed a straightforward and reliable method based on physical and comparative considerations that estimates the impact of an amino acid replacement on the three-dimensional structure and function of the protein. We estimate that approximately 20% of common human non-synonymous SNPs damage the protein. The average minor allele frequency of such SNPs in our data set was two times lower than that of benign non-synonymous SNPs. The average human genotype carries approximately 10(3) damaging non-synonymous SNPs that together cause a substantial reduction in fitness.
1,051 citations
••
TL;DR: It is shown that p53 protects the genome from oxidation by reactive oxygen species (ROS), a major cause of DNA damage and genetic instability, and relatively low levels of p53 are sufficient for upregulation of several genes with antioxidant products, which is associated with a decrease in intracellular ROS.
Abstract: It is widely accepted that the p53 tumor suppressor restricts abnormal cells by induction of growth arrest or by triggering apoptosis. Here we show that, in addition, p53 protects the genome from oxidation by reactive oxygen species (ROS), a major cause of DNA damage and genetic instability. In the absence of severe stresses, relatively low levels of p53 are sufficient for upregulation of several genes with antioxidant products, which is associated with a decrease in intracellular ROS. Downregulation of p53 results in excessive oxidation of DNA, increased mutation rate and karyotype instability, which are prevented by incubation with the antioxidant N-acetylcysteine (NAC). Dietary supplementation with NAC prevented frequent lymphomas characteristic of Trp53-knockout mice, and slowed the growth of lung cancer xenografts deficient in p53. Our results provide a new paradigm for a nonrestrictive tumor suppressor function of p53 and highlight the potential importance of antioxidants in the prophylaxis and treatment of cancer.
1,044 citations
Authors
Showing all 2384 results
Name | H-index | Papers | Citations |
---|---|---|---|
Ruben Abagyan | 85 | 377 | 31620 |
Paolo Arosio | 84 | 460 | 25188 |
Natalia Ivanova | 81 | 543 | 35008 |
Shamil R. Sunyaev | 77 | 207 | 57138 |
William C. Merrick | 74 | 194 | 17610 |
Sankar Adhya | 68 | 228 | 15974 |
Ulrich Hübscher | 65 | 252 | 15578 |
Yan Li | 65 | 938 | 20370 |
Giuseppe Gerna | 63 | 249 | 12452 |
Emmanuel Barillot | 63 | 248 | 15847 |
Per Linse | 62 | 220 | 11038 |
Sergei A. Nedospasov | 62 | 265 | 13738 |
Fausto Baldanti | 61 | 427 | 15017 |
Andres Merits | 56 | 204 | 7807 |
Mirjam Czjzek | 53 | 152 | 9140 |