scispace - formally typeset
Search or ask a question
Author

Louis Bergelson

Bio: Louis Bergelson is an academic researcher from Broad Institute. The author has contributed to research in topics: Gene & Genome. The author has an hindex of 4, co-authored 5 publications receiving 3186 citations.

Papers
More filters
Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Posted ContentDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli2, Laurent C. Francioli1, Grace Tiao1, Grace Tiao2, Beryl B. Cummings1, Beryl B. Cummings2, Jessica Alföldi2, Jessica Alföldi1, Qingbo Wang1, Qingbo Wang2, Ryan L. Collins2, Ryan L. Collins1, Kristen M. Laricchia2, Kristen M. Laricchia1, Andrea Ganna2, Andrea Ganna3, Andrea Ganna1, Daniel P. Birnbaum2, Laura D. Gauthier2, Harrison Brand2, Harrison Brand1, Matthew Solomonson2, Matthew Solomonson1, Nicholas A. Watts1, Nicholas A. Watts2, Daniel R. Rhodes4, Moriel Singer-Berk2, Eleanor G. Seaby1, Eleanor G. Seaby2, Jack A. Kosmicki1, Jack A. Kosmicki2, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman1, Katherine Tashman2, Yossi Farjoun2, Eric Banks2, Timothy Poterba2, Timothy Poterba1, Arcturus Wang2, Arcturus Wang1, Cotton Seed1, Cotton Seed2, Nicola Whiffin5, Nicola Whiffin2, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman2, Zachary Zappala8, Zachary Zappala2, Anne H. O’Donnell-Luria9, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, Eric Vallabh Minikel2, Ben Weisburd2, Monkol Lek10, Monkol Lek2, James S. Ware2, James S. Ware5, Christopher Vittal2, Christopher Vittal1, Irina M. Armean1, Irina M. Armean11, Irina M. Armean2, Louis Bergelson2, Kristian Cibulskis2, Kristen M. Connolly2, Miguel Covarrubias2, Stacey Donnelly2, Steven Ferriera2, Stacey Gabriel2, Jeff Gentry2, Namrata Gupta2, Thibault Jeandet2, Diane Kaplan2, Christopher Llanwarne2, Ruchi Munshi2, Sam Novod2, Nikelle Petrillo2, David Roazen2, Valentin Ruano-Rubio2, Andrea Saltzman2, Molly Schleicher2, Jose Soto2, Kathleen Tibbetts2, Charlotte Tolonen2, Gordon Wade2, Michael E. Talkowski1, Michael E. Talkowski2, Benjamin M. Neale2, Benjamin M. Neale1, Mark J. Daly2, Daniel G. MacArthur1, Daniel G. MacArthur2 
30 Jan 2019-bioRxiv
TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

1,128 citations

Journal ArticleDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli2, Laurent C. Francioli1, Grace Tiao1, Grace Tiao2, Beryl B. Cummings2, Beryl B. Cummings1, Jessica Alföldi1, Jessica Alföldi2, Qingbo Wang2, Qingbo Wang1, Ryan L. Collins1, Ryan L. Collins2, Kristen M. Laricchia1, Kristen M. Laricchia2, Andrea Ganna2, Andrea Ganna1, Andrea Ganna3, Daniel P. Birnbaum2, Daniel P. Birnbaum1, Laura D. Gauthier2, Harrison Brand1, Harrison Brand2, Matthew Solomonson1, Matthew Solomonson2, Nicholas A. Watts1, Nicholas A. Watts2, Daniel R. Rhodes4, Moriel Singer-Berk1, Moriel Singer-Berk2, Eleina M. England1, Eleina M. England2, Eleanor G. Seaby2, Eleanor G. Seaby1, Jack A. Kosmicki2, Jack A. Kosmicki1, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman1, Katherine Tashman2, Yossi Farjoun2, Eric Banks2, Timothy Poterba1, Timothy Poterba2, Arcturus Wang2, Arcturus Wang1, Cotton Seed2, Cotton Seed1, Nicola Whiffin, Jessica X. Chong5, Kaitlin E. Samocha6, Emma Pierce-Hoffman2, Emma Pierce-Hoffman1, Zachary Zappala1, Zachary Zappala7, Zachary Zappala2, Anne H. O’Donnell-Luria, Eric Vallabh Minikel2, Ben Weisburd2, Monkol Lek8, James S. Ware9, James S. Ware2, Christopher Vittal2, Christopher Vittal1, Irina M. Armean2, Irina M. Armean1, Louis Bergelson2, Kristian Cibulskis2, Kristen M. Connolly2, Miguel Covarrubias2, Stacey Donnelly2, Steven Ferriera2, Stacey Gabriel2, Jeff Gentry2, Namrata Gupta2, Thibault Jeandet2, Diane Kaplan2, Christopher Llanwarne2, Ruchi Munshi2, Sam Novod2, Nikelle Petrillo2, David Roazen2, Valentin Ruano-Rubio2, Andrea Saltzman2, Molly Schleicher2, Jose Soto2, Kathleen Tibbetts2, Charlotte Tolonen2, Gordon Wade2, Michael E. Talkowski2, Michael E. Talkowski1, Benjamin M. Neale2, Benjamin M. Neale1, Mark J. Daly, Daniel G. MacArthur 
03 Feb 2021-Nature

56 citations

Journal ArticleDOI
Sanna Gudmundsson1, Sanna Gudmundsson2, Sanna Gudmundsson3, Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli1, Laurent C. Francioli2, Grace Tiao2, Grace Tiao1, Beryl B. Cummings2, Beryl B. Cummings1, Jessica Alföldi2, Jessica Alföldi1, Qingbo Wang1, Qingbo Wang2, Ryan L. Collins1, Ryan L. Collins2, Kristen M. Laricchia2, Kristen M. Laricchia1, Andrea Ganna1, Andrea Ganna4, Andrea Ganna2, Daniel P. Birnbaum1, Daniel P. Birnbaum2, Laura D. Gauthier1, Harrison Brand1, Harrison Brand2, Matthew Solomonson2, Matthew Solomonson1, Nicholas A. Watts1, Nicholas A. Watts2, Daniel R. Rhodes5, Moriel Singer-Berk2, Moriel Singer-Berk1, Eleina M. England1, Eleina M. England2, Eleanor G. Seaby1, Eleanor G. Seaby2, Jack A. Kosmicki2, Jack A. Kosmicki1, Raymond K. Walters2, Raymond K. Walters1, Katherine Tashman2, Katherine Tashman1, Yossi Farjoun1, Eric Banks1, Timothy Poterba1, Timothy Poterba2, Arcturus Wang1, Arcturus Wang2, Cotton Seed1, Cotton Seed2, Nicola Whiffin, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Zachary Zappala2, Zachary Zappala1, Zachary Zappala8, Anne H. O’Donnell-Luria, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek9, James S. Ware1, James S. Ware10, Christopher Vittal1, Christopher Vittal2, Irina M. Armean2, Irina M. Armean1, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski2, Michael E. Talkowski1, Benjamin M. Neale1, Benjamin M. Neale2, Mark J. Daly, Daniel G. MacArthur 
01 Sep 2021-Nature

30 citations

Posted ContentDOI
21 Mar 2022-bioRxiv
TL;DR: It is demonstrated that this genome-wide constraint map provides an effective approach for characterizing the non-coding genome and improving the identification and interpretation of functional human genetic variation.
Abstract: The depletion of disruptive variation caused by purifying natural selection (constraint) has been widely used to investigate protein-coding genes underlying human disorders, but attempts to assess constraint for non-protein-coding regions have proven more difficult. Here we aggregate, process, and release a dataset of 76,156 human genomes from the Genome Aggregation Database (gnomAD), the largest public open-access human genome reference dataset, and use this dataset to build a mutational constraint map for the whole genome. We present a refined mutational model that incorporates local sequence context and regional genomic features to detect depletions of variation across the genome. As expected, proteincoding sequences overall are under stronger constraint than non-coding regions. Within the noncoding genome, constrained regions are enriched for regulatory elements and variants implicated in complex human diseases and traits, facilitating the triangulation of biological annotation, disease association, and natural selection to non-coding DNA analysis. More constrained regulatory elements tend to regulate more constrained genes, while non-coding constraint captures additional functional information underrecognized by gene constraint metrics. We demonstrate that this genome-wide constraint map provides an effective approach for characterizing the non-coding genome and improving the identification and interpretation of functional human genetic variation.

25 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The DisGeNET platform, a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.
Abstract: One of the most pressing challenges in genomic medicine is to understand the role played by genetic variation in health and disease. Thanks to the exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. However, the identification of variants of clinical relevance is a significant challenge that requires comprehensive interrogation of previous knowledge and linkage to new experimental results. To assist in this complex task, we created DisGeNET (http://www.disgenet.org/), a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, including the scientific literature. DisGeNET covers the full spectrum of human diseases as well as normal and abnormal traits. The current release covers more than 24 000 diseases and traits, 17 000 genes and 117 000 genomic variants. The latest developments of DisGeNET include new sources of data, novel data attributes and prioritization metrics, a redesigned web interface and recently launched APIs. Thanks to the data standardization, the combination of expert curated information with data automatically mined from the scientific literature, and a suite of tools for accessing its publicly available data, DisGeNET is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.

1,183 citations

Journal ArticleDOI
06 Feb 2020-Cell
TL;DR: The largest exome sequencing study of autism spectrum disorder (ASD) to date, using an enhanced analytical framework to integrate de novo and case-control rare variation, identifies 102 risk genes at a false discovery rate of 0.1 or less, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.

1,169 citations

Journal ArticleDOI
04 Mar 2021-Nature
TL;DR: The GenOMICC (Genetics Of Mortality In Critical Care) genome-wide association study in 2244 critically ill Covid-19 patients from 208 UK intensive care units is reported, finding evidence in support of a causal link from low expression of IFNAR2, and high expression of TYK2, to life-threatening disease.
Abstract: Host-mediated lung inflammation is present1, and drives mortality2, in the critical illness caused by coronavirus disease 2019 (COVID-19). Host genetic variants associated with critical illness may identify mechanistic targets for therapeutic development3. Here we report the results of the GenOMICC (Genetics Of Mortality In Critical Care) genome-wide association study in 2,244 critically ill patients with COVID-19 from 208 UK intensive care units. We have identified and replicated the following new genome-wide significant associations: on chromosome 12q24.13 (rs10735079, P = 1.65 × 10−8) in a gene cluster that encodes antiviral restriction enzyme activators (OAS1, OAS2 and OAS3); on chromosome 19p13.2 (rs74956615, P = 2.3 × 10−8) near the gene that encodes tyrosine kinase 2 (TYK2); on chromosome 19p13.3 (rs2109069, P = 3.98 × 10−12) within the gene that encodes dipeptidyl peptidase 9 (DPP9); and on chromosome 21q22.1 (rs2236757, P = 4.99 × 10−8) in the interferon receptor gene IFNAR2. We identified potential targets for repurposing of licensed medications: using Mendelian randomization, we found evidence that low expression of IFNAR2, or high expression of TYK2, are associated with life-threatening disease; and transcriptome-wide association in lung tissue revealed that high expression of the monocyte–macrophage chemotactic receptor CCR2 is associated with severe COVID-19. Our results identify robust genetic signals relating to key host antiviral defence mechanisms and mediators of inflammatory organ damage in COVID-19. Both mechanisms may be amenable to targeted treatment with existing drugs. However, large-scale randomized clinical trials will be essential before any change to clinical practice. A genome-wide association study of critically ill patients with COVID-19 identifies genetic signals that relate to important host antiviral defence mechanisms and mediators of inflammatory organ damage that may be targeted by repurposing drug treatments.

941 citations

Journal ArticleDOI
Daniel Taliun1, Daniel N. Harris2, Michael D. Kessler2, Jedidiah Carlson1  +202 moreInstitutions (61)
10 Feb 2021-Nature
TL;DR: The Trans-Omics for Precision Medicine (TOPMed) project as discussed by the authors aims to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases.
Abstract: The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1 In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals) These rare variants provide insights into mutational processes and recent human evolutionary history The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 001% The goals, resources and design of the NHLBI Trans-Omics for Precision Medicine (TOPMed) programme are described, and analyses of rare variants detected in the first 53,831 samples provide insights into mutational processes and recent human evolutionary history

801 citations

Posted ContentDOI
29 Apr 2019-bioRxiv
TL;DR: This work uses unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million protein sequences spanning evolutionary diversity, enabling state-of-the-art supervised prediction of mutational effect and secondary structure, and improving state- of- the-art features for long-range contact prediction.
Abstract: In the field of artificial intelligence, a combination of scale in data and model capacity enabled by unsupervised learning has led to major advances in representation learning and statistical generation. In biology, the anticipated growth of sequencing promises unprecedented data on natural sequence diversity. Learning the natural distribution of evolutionary protein sequence variation is a logical step toward predictive and generative modeling for biology. To this end we use unsupervised learning to train a deep contextual language model on 86 billion amino acids across 250 million sequences spanning evolutionary diversity. The resulting model maps raw sequences to representations of biological properties without labels or prior domain knowledge. The learned representation space organizes sequences at multiple levels of biological granularity from the biochemical to proteomic levels. Learning recovers information about protein structure: secondary structure and residue-residue contacts can be extracted by linear projections from learned representations. With small amounts of labeled data, the ability to identify tertiary contacts is further improved. Learning on full sequence diversity rather than individual protein families increases recoverable information about secondary structure. We show the networks generalize by adapting them to variant activity prediction from sequences only, with results that are comparable to a state-of-the-art variant predictor that uses evolutionary and structurally derived features.

748 citations