Home
/
Authors
/
Philipp Rentzsch

Author

Philipp Rentzsch

Bio: Philipp Rentzsch is an academic researcher from Charité. The author has contributed to research in topics: splice & Computer science. The author has an hindex of 2, co-authored 2 publications receiving 1235 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

CADD: predicting the deleteriousness of variants throughout the human genome.

[...]

Philipp Rentzsch¹, Daniela Witten², Gregory M. Cooper, Jay Shendure², Martin Kircher¹, Martin Kircher² - Show less +2 more•Institutions (2)

Charité¹, University of Washington²

08 Jan 2019-Nucleic Acids Research

TL;DR: The latest updates to CADD are reviewed, including the most recent version, 1.4, which supports the human genome build GRCh38, and also present updates to the website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications.

...read moreread less

Abstract: Combined Annotation-Dependent Depletion (CADD) is a widely used measure of variant deleteriousness that can effectively prioritize causal variants in genetic analyses, particularly highly penetrant contributors to severe Mendelian disorders. CADD is an integrative annotation built from more than 60 genomic features, and can score human single nucleotide variants and short insertion and deletions anywhere in the reference assembly. CADD uses a machine learning model trained on a binary distinction between simulated de novo variants and variants that have arisen and become fixed in human populations since the split between humans and chimpanzees; the former are free of selective pressure and may thus include both neutral and deleterious alleles, while the latter are overwhelmingly neutral (or, at most, weakly deleterious) by virtue of having survived millions of years of purifying selection. Here we review the latest updates to CADD, including the most recent version, 1.4, which supports the human genome build GRCh38. We also present updates to our website that include simplified variant lookup, extended documentation, an Application Program Interface and improved mechanisms for integrating CADD scores into other tools or applications. CADD scores, software and documentation are available at https://cadd.gs.washington.edu.

...read moreread less

2,091 citations

Journal Article•DOI•

CADD-Splice—improving genome-wide variant effect prediction using deep learning-derived splice scores

[...]

Philipp Rentzsch¹, Max Schubach¹, Jay Shendure², Martin Kircher¹•Institutions (2)

Charité¹, University of Washington²

22 Feb 2021-Genome Medicine

TL;DR: The authors integrated two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu ), a widely used tool for genome-wide variant effect prediction that was previously developed to weight and integrate diverse collections of genomic annotations.

...read moreread less

Abstract: Splicing of genomic exons into mRNAs is a critical prerequisite for the accurate synthesis of human proteins. Genetic variants impacting splicing underlie a substantial proportion of genetic disease, but are challenging to identify beyond those occurring at donor and acceptor dinucleotides. To address this, various methods aim to predict variant effects on splicing. Recently, deep neural networks (DNNs) have been shown to achieve better results in predicting splice variants than other strategies. It has been unclear how best to integrate such process-specific scores into genome-wide variant effect predictors. Here, we use a recently published experimental data set to compare several machine learning methods that score variant effects on splicing. We integrate the best of those approaches into general variant effect prediction models and observe the effect on classification of known pathogenic variants. We integrate two specialized splicing scores into CADD (Combined Annotation Dependent Depletion; cadd.gs.washington.edu ), a widely used tool for genome-wide variant effect prediction that we previously developed to weight and integrate diverse collections of genomic annotations. With this new model, CADD-Splice, we show that inclusion of splicing DNN effect scores substantially improves predictions across multiple variant categories, without compromising overall performance. While splice effect scores show superior performance on splice variants, specialized predictors cannot compete with other variant scores in general variant interpretation, as the latter account for nonsense and missense effects that do not alter splicing. Although only shown here for splice scores, we believe that the applied approach will generalize to other specific molecular processes, providing a path for the further improvement of genome-wide variant effect prediction.

...read moreread less

252 citations

Using machine learning to predict pathogenicity of genomic variants throughout the human genome

[...]

Philipp Rentzsch

Cited by

PDF

Open Access

More filters

Posted Content•DOI•

Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences

[...]

Alexander Rives¹, Siddharth Goyal², Joshua Meier², Demi Guo², Myle Ott², C. Lawrence Zitnick², Jerry Ma², Rob Fergus², Rob Fergus¹ - Show less +5 more•Institutions (2)

New York University¹, Facebook²

29 Apr 2019-bioRxiv

TL;DR: This work uses unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, enabling state-of-the-art supervised prediction of mutational effect and secondary structure, and improving state- of- the-art features for long-range contact prediction.

...read moreread less

Abstract: In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.

...read moreread less

748 citations

Chromatin-state discovery and genome annotation with ChromHMM

[...]

Jason Ernst, Manolis Kellis¹•Institutions (1)

Broad Institute¹

01 Nov 2017

TL;DR: ChromHMM combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type, and provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state.

...read moreread less

Abstract: Noncoding DNA regions have central roles in human biology, evolution, and disease. ChromHMM helps to annotate the noncoding genome using epigenomic information across one or multiple cell types. It combines multiple genome-wide epigenomic maps, and uses combinatorial and spatial mark patterns to infer a complete annotation for each cell type. ChromHMM learns chromatin-state signatures using a multivariate hidden Markov model (HMM) that explicitly models the combinatorial presence or absence of each mark. ChromHMM uses these signatures to generate a genome-wide annotation for each cell type by calculating the most probable state for each genomic segment. ChromHMM provides an automated enrichment analysis of the resulting annotations to facilitate the functional interpretations of each chromatin state. ChromHMM is distinguished by its modeling emphasis on combinations of marks, its tight integration with downstream functional enrichment analyses, its speed, and its ease of use. Chromatin states are learned, annotations are produced, and enrichments are computed within 1 d.

...read moreread less

364 citations

Journal Article•DOI•

Structural variations in human ACE2 may influence its binding with SARS-CoV-2 spike protein.

[...]

Mushtaq Hussain¹, Nusrat Jabeen², Fozia Raza¹, Sanya Shabbir¹, Sanya Shabbir², Ayesha Ashraf Baig¹, Anusha Amanullah¹, Basma Aziz¹ - Show less +4 more•Institutions (2)

Dow University of Health Sciences¹, University of Karachi²

06 Apr 2020-Journal of Medical Virology

TL;DR: The data provide a structural basis of potential resistance against SARS‐CoV‐2 infection driven by ACE2 allelic variants.

...read moreread less

Abstract: The recent pandemic of COVID-19, caused by SARS-CoV-2, is unarguably the most fearsome compared with the earlier outbreaks caused by other coronaviruses, SARS-CoV and MERS-CoV. Human ACE2 is now established as a receptor for the SARS-CoV-2 spike protein. Where variations in the viral spike protein, in turn, lead to the cross-species transmission of the virus, genetic variations in the host receptor ACE2 may also contribute to the susceptibility and/or resistance against the viral infection. This study aims to explore the binding of the proteins encoded by different human ACE2 allelic variants with SARS-CoV-2 spike protein. Briefly, coding variants of ACE2 corresponding to the reported binding sites for its attachment with coronavirus spike protein were selected and molecular models of these variants were constructed by homology modeling. The models were then superimposed over the native ACE2 and ACE2-spike protein complex, to observe structural changes in the ACE2 variants and their intermolecular interactions with SARS-CoV-2 spike protein, respectively. Despite strong overall structural similarities, the spatial orientation of the key interacting residues varies in the ACE2 variants compared with the wild-type molecule. Most ACE2 variants showed a similar binding affinity for SARS-CoV-2 spike protein as observed in the complex structure of wild-type ACE2 and SARS-CoV-2 spike protein. However, ACE2 alleles, rs73635825 (S19P) and rs143936283 (E329G) showed noticeable variations in their intermolecular interactions with the viral spike protein. In summary, our data provide a structural basis of potential resistance against SARS-CoV-2 infection driven by ACE2 allelic variants.

...read moreread less

288 citations

Journal Article•DOI•

The Human Gene Mutation Database (HGMD ® ): optimizing its use in a clinical diagnostic or research setting

[...]

Peter D. Stenson¹, Matthew Mort¹, Edward V. Ball¹, Molly Chapman¹, Katy Evans¹, Luísa Azevedo², Luísa Azevedo¹, Matthew J. Hayden¹, Sally Heywood¹, David Stuart Millar¹, Andrew David Phillips¹, David Neil Cooper¹ - Show less +8 more•Institutions (2)

Cardiff University¹, University of Porto²

28 Jun 2020-Human Genetics

TL;DR: A review of the Human Gene Mutation Database aims to highlight how to make the most out of HGMD data in each setting.

...read moreread less

Abstract: The Human Gene Mutation Database (HGMD®) constitutes a comprehensive collection of published germline mutations in nuclear genes that are thought to underlie, or are closely associated with human inherited disease At the time of writing (June 2020), the database contains in excess of 289,000 different gene lesions identified in over 11,100 genes manually curated from 72,987 articles published in over 3100 peer-reviewed journals There are primarily two main groups of users who utilise HGMD on a regular basis; research scientists and clinical diagnosticians This review aims to highlight how to make the most out of HGMD data in each setting

...read moreread less

268 citations