scispace - formally typeset
Search or ask a question
Author

Arcturus Wang

Other affiliations: Harvard University
Bio: Arcturus Wang is an academic researcher from Broad Institute. The author has contributed to research in topics: Constraint (information theory) & Gene. The author has an hindex of 4, co-authored 7 publications receiving 3199 citations. Previous affiliations of Arcturus Wang include Harvard University.

Papers
More filters
Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Posted ContentDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli2, Laurent C. Francioli1, Grace Tiao1, Grace Tiao2, Beryl B. Cummings1, Beryl B. Cummings2, Jessica Alföldi2, Jessica Alföldi1, Qingbo Wang2, Qingbo Wang1, Ryan L. Collins2, Ryan L. Collins1, Kristen M. Laricchia2, Kristen M. Laricchia1, Andrea Ganna2, Andrea Ganna3, Andrea Ganna1, Daniel P. Birnbaum2, Laura D. Gauthier2, Harrison Brand2, Harrison Brand1, Matthew Solomonson2, Matthew Solomonson1, Nicholas A. Watts2, Nicholas A. Watts1, Daniel R. Rhodes4, Moriel Singer-Berk2, Eleanor G. Seaby2, Eleanor G. Seaby1, Jack A. Kosmicki2, Jack A. Kosmicki1, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman1, Katherine Tashman2, Yossi Farjoun2, Eric Banks2, Timothy Poterba1, Timothy Poterba2, Arcturus Wang1, Arcturus Wang2, Cotton Seed1, Cotton Seed2, Nicola Whiffin5, Nicola Whiffin2, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman2, Zachary Zappala8, Zachary Zappala2, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria9, Eric Vallabh Minikel2, Ben Weisburd2, Monkol Lek10, Monkol Lek2, James S. Ware2, James S. Ware5, Christopher Vittal1, Christopher Vittal2, Irina M. Armean1, Irina M. Armean2, Irina M. Armean11, Louis Bergelson2, Kristian Cibulskis2, Kristen M. Connolly2, Miguel Covarrubias2, Stacey Donnelly2, Steven Ferriera2, Stacey Gabriel2, Jeff Gentry2, Namrata Gupta2, Thibault Jeandet2, Diane Kaplan2, Christopher Llanwarne2, Ruchi Munshi2, Sam Novod2, Nikelle Petrillo2, David Roazen2, Valentin Ruano-Rubio2, Andrea Saltzman2, Molly Schleicher2, Jose Soto2, Kathleen Tibbetts2, Charlotte Tolonen2, Gordon Wade2, Michael E. Talkowski2, Michael E. Talkowski1, Benjamin M. Neale2, Benjamin M. Neale1, Mark J. Daly2, Daniel G. MacArthur1, Daniel G. MacArthur2 
30 Jan 2019-bioRxiv
TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

1,128 citations

Journal ArticleDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli1, Laurent C. Francioli2, Grace Tiao1, Grace Tiao2, Beryl B. Cummings1, Beryl B. Cummings2, Jessica Alföldi1, Jessica Alföldi2, Qingbo Wang2, Qingbo Wang1, Ryan L. Collins1, Ryan L. Collins2, Kristen M. Laricchia1, Kristen M. Laricchia2, Andrea Ganna3, Andrea Ganna2, Andrea Ganna1, Daniel P. Birnbaum2, Daniel P. Birnbaum1, Laura D. Gauthier1, Harrison Brand1, Harrison Brand2, Matthew Solomonson2, Matthew Solomonson1, Nicholas A. Watts2, Nicholas A. Watts1, Daniel R. Rhodes4, Moriel Singer-Berk1, Moriel Singer-Berk2, Eleina M. England1, Eleina M. England2, Eleanor G. Seaby1, Eleanor G. Seaby2, Jack A. Kosmicki2, Jack A. Kosmicki1, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman2, Katherine Tashman1, Yossi Farjoun1, Eric Banks1, Timothy Poterba1, Timothy Poterba2, Arcturus Wang2, Arcturus Wang1, Cotton Seed2, Cotton Seed1, Nicola Whiffin, Jessica X. Chong5, Kaitlin E. Samocha6, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Zachary Zappala7, Zachary Zappala2, Zachary Zappala1, Anne H. O’Donnell-Luria, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek8, James S. Ware9, James S. Ware1, Christopher Vittal2, Christopher Vittal1, Irina M. Armean2, Irina M. Armean1, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski1, Michael E. Talkowski2, Benjamin M. Neale2, Benjamin M. Neale1, Mark J. Daly, Daniel G. MacArthur 
03 Feb 2021-Nature

56 citations

Journal ArticleDOI
Sanna Gudmundsson1, Sanna Gudmundsson2, Sanna Gudmundsson3, Konrad J. Karczewski1, Konrad J. Karczewski3, Laurent C. Francioli1, Laurent C. Francioli3, Grace Tiao1, Grace Tiao3, Beryl B. Cummings3, Beryl B. Cummings1, Jessica Alföldi1, Jessica Alföldi3, Qingbo Wang3, Qingbo Wang1, Ryan L. Collins3, Ryan L. Collins1, Kristen M. Laricchia3, Kristen M. Laricchia1, Andrea Ganna1, Andrea Ganna4, Andrea Ganna3, Daniel P. Birnbaum3, Daniel P. Birnbaum1, Laura D. Gauthier3, Harrison Brand3, Harrison Brand1, Matthew Solomonson3, Matthew Solomonson1, Nicholas A. Watts3, Nicholas A. Watts1, Daniel R. Rhodes5, Moriel Singer-Berk1, Moriel Singer-Berk3, Eleina M. England3, Eleina M. England1, Eleanor G. Seaby1, Eleanor G. Seaby3, Jack A. Kosmicki1, Jack A. Kosmicki3, Raymond K. Walters1, Raymond K. Walters3, Katherine Tashman3, Katherine Tashman1, Yossi Farjoun3, Eric Banks3, Timothy Poterba1, Timothy Poterba3, Arcturus Wang1, Arcturus Wang3, Cotton Seed1, Cotton Seed3, Nicola Whiffin, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman1, Emma Pierce-Hoffman3, Zachary Zappala1, Zachary Zappala8, Zachary Zappala3, Anne H. O’Donnell-Luria, Eric Vallabh Minikel3, Ben Weisburd3, Monkol Lek9, James S. Ware3, James S. Ware10, Christopher Vittal1, Christopher Vittal3, Irina M. Armean3, Irina M. Armean1, Louis Bergelson3, Kristian Cibulskis3, Kristen M. Connolly3, Miguel Covarrubias3, Stacey Donnelly3, Steven Ferriera3, Stacey Gabriel3, Jeff Gentry3, Namrata Gupta3, Thibault Jeandet3, Diane Kaplan3, Christopher Llanwarne3, Ruchi Munshi3, Sam Novod3, Nikelle Petrillo3, David Roazen3, Valentin Ruano-Rubio3, Andrea Saltzman3, Molly Schleicher3, Jose Soto3, Kathleen Tibbetts3, Charlotte Tolonen3, Gordon Wade3, Michael E. Talkowski1, Michael E. Talkowski3, Benjamin M. Neale1, Benjamin M. Neale3, Mark J. Daly, Daniel G. MacArthur 
01 Sep 2021-Nature

30 citations

Posted ContentDOI
Nicola Whiffin1, Nicola Whiffin2, Nicola Whiffin3, Konrad J. Karczewski1  +190 moreInstitutions (22)
07 Feb 2019-bioRxiv
TL;DR: In this article, the authors describe a systematic genome-wide study of variants that create and disrupt human uORFs, and explore their role in human disease using 15,708 whole genome sequences collected by the Genome Aggregation Database (gnomAD) project.
Abstract: Upstream open reading frames (uORFs) are important tissue-specific cis-regulators of protein translation. Although isolated case reports have shown that variants that create or disrupt uORFs can cause disease, genetic sequencing approaches typically focus on protein-coding regions and ignore these variants. Here, we describe a systematic genome-wide study of variants that create and disrupt human uORFs, and explore their role in human disease using 15,708 whole genome sequences collected by the Genome Aggregation Database (gnomAD) project. We show that 14,897 variants that create new start codons upstream of the canonical coding sequence (CDS), and 2,406 variants disrupting the stop site of existing uORFs, are under strong negative selection. Furthermore, variants creating uORFs that overlap the CDS show signals of selection equivalent to coding loss-of-function variants, and uORF-perturbing variants are under strong selection when arising upstream of known disease genes and genes intolerant to loss-of-function variants. Finally, we identify specific genes where perturbation of uORFs is likely to represent an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in families with neurofibromatosis. Our results highlight uORF-perturbing variants as an important and under-recognised functional class that can contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data to study the deleteriousness of specific classes of non-coding variants.

12 citations


Cited by
More filters
Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Journal ArticleDOI
TL;DR: The DisGeNET platform, a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.
Abstract: One of the most pressing challenges in genomic medicine is to understand the role played by genetic variation in health and disease. Thanks to the exploration of genomic variants at large scale, hundreds of thousands of disease-associated loci have been uncovered. However, the identification of variants of clinical relevance is a significant challenge that requires comprehensive interrogation of previous knowledge and linkage to new experimental results. To assist in this complex task, we created DisGeNET (http://www.disgenet.org/), a knowledge management platform integrating and standardizing data about disease associated genes and variants from multiple sources, including the scientific literature. DisGeNET covers the full spectrum of human diseases as well as normal and abnormal traits. The current release covers more than 24 000 diseases and traits, 17 000 genes and 117 000 genomic variants. The latest developments of DisGeNET include new sources of data, novel data attributes and prioritization metrics, a redesigned web interface and recently launched APIs. Thanks to the data standardization, the combination of expert curated information with data automatically mined from the scientific literature, and a suite of tools for accessing its publicly available data, DisGeNET is an interoperable resource supporting a variety of applications in genomic medicine and drug R&D.

1,183 citations

Journal ArticleDOI
06 Feb 2020-Cell
TL;DR: The largest exome sequencing study of autism spectrum disorder (ASD) to date, using an enhanced analytical framework to integrate de novo and case-control rare variation, identifies 102 risk genes at a false discovery rate of 0.1 or less, consistent with multiple paths to an excitatory-inhibitory imbalance underlying ASD.

1,169 citations

Posted ContentDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli1, Laurent C. Francioli2, Grace Tiao1, Grace Tiao2, Beryl B. Cummings2, Beryl B. Cummings1, Jessica Alföldi1, Jessica Alföldi2, Qingbo Wang2, Qingbo Wang1, Ryan L. Collins2, Ryan L. Collins1, Kristen M. Laricchia2, Kristen M. Laricchia1, Andrea Ganna2, Andrea Ganna3, Andrea Ganna1, Daniel P. Birnbaum1, Laura D. Gauthier1, Harrison Brand1, Harrison Brand2, Matthew Solomonson1, Matthew Solomonson2, Nicholas A. Watts2, Nicholas A. Watts1, Daniel R. Rhodes4, Moriel Singer-Berk1, Eleanor G. Seaby2, Eleanor G. Seaby1, Jack A. Kosmicki1, Jack A. Kosmicki2, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman1, Katherine Tashman2, Yossi Farjoun1, Eric Banks1, Timothy Poterba2, Timothy Poterba1, Arcturus Wang1, Arcturus Wang2, Cotton Seed2, Cotton Seed1, Nicola Whiffin5, Nicola Whiffin1, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman1, Zachary Zappala8, Zachary Zappala1, Anne H. O’Donnell-Luria1, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria9, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek1, Monkol Lek10, James S. Ware1, James S. Ware5, Christopher Vittal1, Christopher Vittal2, Irina M. Armean11, Irina M. Armean2, Irina M. Armean1, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski1, Michael E. Talkowski2, Benjamin M. Neale2, Benjamin M. Neale1, Mark J. Daly1, Daniel G. MacArthur2, Daniel G. MacArthur1 
30 Jan 2019-bioRxiv
TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

1,128 citations

Journal ArticleDOI
04 Mar 2021-Nature
TL;DR: The GenOMICC (Genetics Of Mortality In Critical Care) genome-wide association study in 2244 critically ill Covid-19 patients from 208 UK intensive care units is reported, finding evidence in support of a causal link from low expression of IFNAR2, and high expression of TYK2, to life-threatening disease.
Abstract: Host-mediated lung inflammation is present1, and drives mortality2, in the critical illness caused by coronavirus disease 2019 (COVID-19). Host genetic variants associated with critical illness may identify mechanistic targets for therapeutic development3. Here we report the results of the GenOMICC (Genetics Of Mortality In Critical Care) genome-wide association study in 2,244 critically ill patients with COVID-19 from 208 UK intensive care units. We have identified and replicated the following new genome-wide significant associations: on chromosome 12q24.13 (rs10735079, P = 1.65 × 10−8) in a gene cluster that encodes antiviral restriction enzyme activators (OAS1, OAS2 and OAS3); on chromosome 19p13.2 (rs74956615, P = 2.3 × 10−8) near the gene that encodes tyrosine kinase 2 (TYK2); on chromosome 19p13.3 (rs2109069, P = 3.98 × 10−12) within the gene that encodes dipeptidyl peptidase 9 (DPP9); and on chromosome 21q22.1 (rs2236757, P = 4.99 × 10−8) in the interferon receptor gene IFNAR2. We identified potential targets for repurposing of licensed medications: using Mendelian randomization, we found evidence that low expression of IFNAR2, or high expression of TYK2, are associated with life-threatening disease; and transcriptome-wide association in lung tissue revealed that high expression of the monocyte–macrophage chemotactic receptor CCR2 is associated with severe COVID-19. Our results identify robust genetic signals relating to key host antiviral defence mechanisms and mediators of inflammatory organ damage in COVID-19. Both mechanisms may be amenable to targeted treatment with existing drugs. However, large-scale randomized clinical trials will be essential before any change to clinical practice. A genome-wide association study of critically ill patients with COVID-19 identifies genetic signals that relate to important host antiviral defence mechanisms and mediators of inflammatory organ damage that may be targeted by repurposing drug treatments.

941 citations