scispace - formally typeset
Search or ask a question
Author

Nicholas A. Watts

Bio: Nicholas A. Watts is an academic researcher from Broad Institute. The author has contributed to research in topics: Exome sequencing & Population. The author has an hindex of 8, co-authored 18 publications receiving 3578 citations. Previous affiliations of Nicholas A. Watts include Massachusetts Institute of Technology & Harvard University.

Papers
More filters
Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Posted ContentDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli1, Laurent C. Francioli2, Grace Tiao1, Grace Tiao2, Beryl B. Cummings2, Beryl B. Cummings1, Jessica Alföldi1, Jessica Alföldi2, Qingbo Wang1, Qingbo Wang2, Ryan L. Collins2, Ryan L. Collins1, Kristen M. Laricchia1, Kristen M. Laricchia2, Andrea Ganna1, Andrea Ganna3, Andrea Ganna2, Daniel P. Birnbaum1, Laura D. Gauthier1, Harrison Brand2, Harrison Brand1, Matthew Solomonson2, Matthew Solomonson1, Nicholas A. Watts1, Nicholas A. Watts2, Daniel R. Rhodes4, Moriel Singer-Berk1, Eleanor G. Seaby1, Eleanor G. Seaby2, Jack A. Kosmicki1, Jack A. Kosmicki2, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman2, Katherine Tashman1, Yossi Farjoun1, Eric Banks1, Timothy Poterba2, Timothy Poterba1, Arcturus Wang2, Arcturus Wang1, Cotton Seed1, Cotton Seed2, Nicola Whiffin5, Nicola Whiffin1, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman1, Zachary Zappala1, Zachary Zappala8, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria9, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek10, Monkol Lek1, James S. Ware1, James S. Ware5, Christopher Vittal1, Christopher Vittal2, Irina M. Armean1, Irina M. Armean11, Irina M. Armean2, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski2, Michael E. Talkowski1, Benjamin M. Neale2, Benjamin M. Neale1, Mark J. Daly1, Daniel G. MacArthur2, Daniel G. MacArthur1 
30 Jan 2019-bioRxiv
TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

1,128 citations

Journal ArticleDOI
28 May 2020-Nature
TL;DR: A large empirical assessment of sequence-resolved structural variants from 14,891 genomes across diverse global populations in the Genome Aggregation Database (gnomAD) provides a reference map for disease-association studies, population genetics, and diagnostic screening.
Abstract: Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.

494 citations

Journal ArticleDOI
Tarjinder Singh, Timothy Poterba, David Curtis, Huda Akil, Mariam M. Al Eissa, Jack D. Barchas, Nicholas Bass, Tim B. Bigdeli, Gerome Breen, Evelyn J. Bromet, Peter F Buckley, William E. Bunney, Jonas Bybjerg-Grauholm, William Byerley, Sinéad B. Chapman, Wei J. Chen, Claire Churchhouse, Nicholas John Craddock, Caroline N. Cusick, Lynn E. DeLisi, Sheila Dodge, Michael Escamilla, Saana Eskelinen, Ayman H. Fanous, Stephen V. Faraone, Alessia Fiorentino, Laurent C. Francioli, Stacey Gabriel, Diane Gage, Sarah A Gagliano Taliun, Andrea Ganna, Giulio Genovese, David C. Glahn, Jakob Grove, Mei-Hua Hall, Eija Hämäläinen, Henrike O. Heyne, Matti Holi, David M. Hougaard, Daniel P. Howrigan, Hailiang Huang, Hai-Gwo Hwu, René S. Kahn, Hyun Min Kang, Konrad J. Karczewski, George Kirov, James A. Knowles, Francis S. Lee, Douglas S. Lehrer, Francesco Lescai, Dolores Malaspina, Stephen R. Marder, Steven A. McCarroll, Andrew M. McIntosh, Helena Medeiros, Lili Milani, Christopher P. Morley, Derek W. Morris, Preben Bo Mortensen, Richard M. Myers, Merete Nordentoft, Niamh L. O'Brien, Ana Maria Olivares, Dost Ongur, Willem H. Ouwehand, Duncan S. Palmer, Tiina Paunio, Digby Quested, Mark Hyman Rapaport, Elliott Rees, Brandi Rollins, F. Kyle Satterstrom, A. Schatzberg, Edward M. Scolnick, Laura J. Scott, Sally I. Sharp, Pamela Sklar, Jordan W. Smoller, Janet L. Sobell, Matthew Solomonson, Eli A. Stahl, Christine Stevens, Jaana Suvisaari, Grace Tiao, Stanley J. Watson, Nicholas A. Watts, Douglas Blackwood, Anders D. Børglum, Bruce M. Cohen, Aiden Corvin, Tõnu Esko, Nelson B. Freimer, Stephen J. Glatt, Christina M. Hultman, Andrew McQuillin, Aarno Palotie, Carlos N. Pato, Michele T. Pato, Ann E. Pulver, David St Clair, Ming T. Tsuang, Marquis P. Vawter, James T.R. Walters, Thomas Werge, Roel A. Ophoff, Patrick F. Sullivan, Michael John Owen, Michael Boehnke, Michael Conlon O'Donovan, Benjamin M. Neale, Mark J. Daly 
TL;DR: In this paper , a meta-analysing the whole exomes of 24,248 schizophrenia cases and 97,322 controls was used to implicate ultra-rare coding variants in 10 genes as conferring substantial risk for schizophrenia (odds ratios of 3-50, P < 2.14 × 10-6) and 32 genes at a false discovery rate of < 5%.
Abstract: Rare coding variation has historically provided the most direct connections between gene function and disease pathogenesis. By meta-analysing the whole exomes of 24,248 schizophrenia cases and 97,322 controls, we implicate ultra-rare coding variants (URVs) in 10 genes as conferring substantial risk for schizophrenia (odds ratios of 3-50, P < 2.14 × 10-6) and 32 genes at a false discovery rate of <5%. These genes have the greatest expression in central nervous system neurons and have diverse molecular functions that include the formation, structure and function of the synapse. The associations of the NMDA (N-methyl-D-aspartate) receptor subunit GRIN2A and AMPA (α-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid) receptor subunit GRIA3 provide support for dysfunction of the glutamatergic system as a mechanistic hypothesis in the pathogenesis of schizophrenia. We observe an overlap of rare variant risk among schizophrenia, autism spectrum disorders1, epilepsy and severe neurodevelopmental disorders2, although different mutation types are implicated in some shared genes. Most genes described here, however, are not implicated in neurodevelopment. We demonstrate that genes prioritized from common variant analyses of schizophrenia are enriched in rare variant risk3, suggesting that common and rare genetic risk factors converge at least partially on the same underlying pathogenic biological processes. Even after excluding significantly associated genes, schizophrenia cases still carry a substantial excess of URVs, which indicates that more risk genes await discovery using this approach.

218 citations

Journal ArticleDOI
28 May 2020-Nature
TL;DR: A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.
Abstract: The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the ‘proportion expressed across transcripts’, which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies. A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.

130 citations


Cited by
More filters
Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Journal ArticleDOI
TL;DR: The DisGeNET platform, a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.
Abstract: One of the most pressing challenges in genomic medicine is to understand the role played by genetic variation in health and disease. Thanks to the exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. However, the identification of variants of clinical relevance is a significant challenge that requires comprehensive interrogation of previous knowledge and linkage to new experimental results. To assist in this complex task, we created DisGeNET (http://www.disgenet.org/), a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, including the scientific literature. DisGeNET covers the full spectrum of human diseases as well as normal and abnormal traits. The current release covers more than 24 000 diseases and traits, 17 000 genes and 117 000 genomic variants. The latest developments of DisGeNET include new sources of data, novel data attributes and prioritization metrics, a redesigned web interface and recently launched APIs. Thanks to the data standardization, the combination of expert curated information with data automatically mined from the scientific literature, and a suite of tools for accessing its publicly available data, DisGeNET is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.

1,183 citations

Journal ArticleDOI
06 Feb 2020-Cell
TL;DR: The largest exome sequencing study of autism spectrum disorder (ASD) to date, using an enhanced analytical framework to integrate de novo and case-control rare variation, identifies 102 risk genes at a false discovery rate of 0.1 or less, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.

1,169 citations

Posted ContentDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli1, Laurent C. Francioli2, Grace Tiao2, Grace Tiao1, Beryl B. Cummings1, Beryl B. Cummings2, Jessica Alföldi1, Jessica Alföldi2, Qingbo Wang1, Qingbo Wang2, Ryan L. Collins1, Ryan L. Collins2, Kristen M. Laricchia2, Kristen M. Laricchia1, Andrea Ganna2, Andrea Ganna1, Andrea Ganna3, Daniel P. Birnbaum2, Laura D. Gauthier2, Harrison Brand1, Harrison Brand2, Matthew Solomonson2, Matthew Solomonson1, Nicholas A. Watts2, Nicholas A. Watts1, Daniel R. Rhodes4, Moriel Singer-Berk2, Eleanor G. Seaby1, Eleanor G. Seaby2, Jack A. Kosmicki1, Jack A. Kosmicki2, Raymond K. Walters2, Raymond K. Walters1, Katherine Tashman1, Katherine Tashman2, Yossi Farjoun2, Eric Banks2, Timothy Poterba2, Timothy Poterba1, Arcturus Wang1, Arcturus Wang2, Cotton Seed2, Cotton Seed1, Nicola Whiffin2, Nicola Whiffin5, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman2, Zachary Zappala2, Zachary Zappala8, Anne H. O’Donnell-Luria9, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria2, Eric Vallabh Minikel2, Ben Weisburd2, Monkol Lek10, Monkol Lek2, James S. Ware5, James S. Ware2, Christopher Vittal1, Christopher Vittal2, Irina M. Armean2, Irina M. Armean11, Irina M. Armean1, Louis Bergelson2, Kristian Cibulskis2, Kristen M. Connolly2, Miguel Covarrubias2, Stacey Donnelly2, Steven Ferriera2, Stacey Gabriel2, Jeff Gentry2, Namrata Gupta2, Thibault Jeandet2, Diane Kaplan2, Christopher Llanwarne2, Ruchi Munshi2, Sam Novod2, Nikelle Petrillo2, David Roazen2, Valentin Ruano-Rubio2, Andrea Saltzman2, Molly Schleicher2, Jose Soto2, Kathleen Tibbetts2, Charlotte Tolonen2, Gordon Wade2, Michael E. Talkowski1, Michael E. Talkowski2, Benjamin M. Neale1, Benjamin M. Neale2, Mark J. Daly2, Daniel G. MacArthur1, Daniel G. MacArthur2 
30 Jan 2019-bioRxiv
TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

1,128 citations