scispace - formally typeset
Search or ask a question
Author

Cotton Seed

Other affiliations: Harvard University, Intel
Bio: Cotton Seed is an academic researcher from Broad Institute. The author has contributed to research in topics: Genome & Khovanov homology. The author has an hindex of 19, co-authored 40 publications receiving 4013 citations. Previous affiliations of Cotton Seed include Harvard University & Intel.

Papers
More filters
Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

Posted ContentDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli2, Laurent C. Francioli1, Grace Tiao1, Grace Tiao2, Beryl B. Cummings1, Beryl B. Cummings2, Jessica Alföldi2, Jessica Alföldi1, Qingbo Wang1, Qingbo Wang2, Ryan L. Collins1, Ryan L. Collins2, Kristen M. Laricchia1, Kristen M. Laricchia2, Andrea Ganna2, Andrea Ganna1, Andrea Ganna3, Daniel P. Birnbaum1, Laura D. Gauthier1, Harrison Brand1, Harrison Brand2, Matthew Solomonson1, Matthew Solomonson2, Nicholas A. Watts1, Nicholas A. Watts2, Daniel R. Rhodes4, Moriel Singer-Berk1, Eleanor G. Seaby2, Eleanor G. Seaby1, Jack A. Kosmicki1, Jack A. Kosmicki2, Raymond K. Walters2, Raymond K. Walters1, Katherine Tashman1, Katherine Tashman2, Yossi Farjoun1, Eric Banks1, Timothy Poterba1, Timothy Poterba2, Arcturus Wang1, Arcturus Wang2, Cotton Seed2, Cotton Seed1, Nicola Whiffin5, Nicola Whiffin1, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman1, Zachary Zappala1, Zachary Zappala8, Anne H. O’Donnell-Luria9, Anne H. O’Donnell-Luria2, Anne H. O’Donnell-Luria1, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek1, Monkol Lek10, James S. Ware5, James S. Ware1, Christopher Vittal2, Christopher Vittal1, Irina M. Armean2, Irina M. Armean1, Irina M. Armean11, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski2, Michael E. Talkowski1, Benjamin M. Neale1, Benjamin M. Neale2, Mark J. Daly1, Daniel G. MacArthur1, Daniel G. MacArthur2 
30 Jan 2019-bioRxiv
TL;DR: Using an improved human mutation rate model, human protein-coding genes are classified along a spectrum representing tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.
Abstract: Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes critical for an organism’s function will be depleted for such variants in natural populations, while non-essential genes will tolerate their accumulation. However, predicted loss-of-function (pLoF) variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes. Here, we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence pLoF variants in this cohort after filtering for sequencing and annotation artifacts. Using an improved model of human mutation, we classify human protein-coding genes along a spectrum representing intolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve gene discovery power for both common and rare diseases.

1,128 citations

Journal ArticleDOI
TL;DR: Large-scale deep-coverage whole-genome sequencing is now feasible and offers potential advantages for locus discovery and the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia.
Abstract: Large-scale deep-coverage whole-genome sequencing (WGS) is now feasible and offers potential advantages for locus discovery. We perform WGS in 16,324 participants from four ancestries at mean depth >29X and analyze genotypes with four quantitative traits—plasma total cholesterol, low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol, and triglycerides. Common variant association yields known loci except for few variants previously poorly imputed. Rare coding variant association yields known Mendelian dyslipidemia genes but rare non-coding variant association detects no signals. A high 2M-SNP LDL-C polygenic score (top 5th percentile) confers similar effect size to a monogenic mutation (~30 mg/dl higher for each); however, among those with severe hypercholesterolemia, 23% have a high polygenic score and only 2% carry a monogenic mutation. At these sample sizes and for these phenotypes, the incremental value of WGS for discovery is limited but WGS permits simultaneous assessment of monogenic and polygenic models to severe hypercholesterolemia.

147 citations

Journal ArticleDOI
28 May 2020-Nature
TL;DR: A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.
Abstract: The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the ‘proportion expressed across transcripts’, which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies. A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.

130 citations

Journal ArticleDOI
14 Nov 2019-PLOS ONE
TL;DR: It is demonstrated that this newly discovered mode of AAV binding and transduction can occur independently of other known AAV receptors, and inform ongoing efforts to develop next-generation AAV vehicles for human CNS gene therapy.
Abstract: The engineered AAV-PHP.B family of adeno-associated virus efficiently delivers genes throughout the mouse central nervous system. To guide their application across disease models, and to inspire the development of translational gene therapy vectors for targeting neurological diseases in humans, we sought to elucidate the host factors responsible for the CNS tropism of the AAV-PHP.B vectors. Leveraging CNS tropism differences across 13 mouse strains, we systematically determined a set of genetic variants that segregate with the permissivity phenotype, and rapidly identified LY6A as an essential receptor for the AAV-PHP.B vectors. Interfering with LY6A by CRISPR/Cas9-mediated Ly6a disruption or with blocking antibodies reduced transduction of mouse brain endothelial cells by AAV-PHP.eB, while ectopic expression of Ly6a increased AAV-PHP.eB transduction of HEK293T and CHO cells by 30-fold or more. Importantly, we demonstrate that this newly discovered mode of AAV binding and transduction can occur independently of other known AAV receptors. These findings illuminate the previously reported species- and strain-specific tropism characteristics of the AAV-PHP.B vectors and inform ongoing efforts to develop next-generation AAV vehicles for human CNS gene therapy.

129 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This year's edition of the Statistical Update includes data on the monitoring and benefits of cardiovascular health in the population, metrics to assess and monitor healthy diets, an enhanced focus on social determinants of health, a focus on the global burden of cardiovascular disease, and further evidence-based approaches to changing behaviors, implementation strategies, and implications of the American Heart Association’s 2020 Impact Goals.
Abstract: Background: The American Heart Association, in conjunction with the National Institutes of Health, annually reports on the most up-to-date statistics related to heart disease, stroke, and cardiovas...

5,078 citations

Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations

01 Feb 2015
TL;DR: In this article, the authors describe the integrative analysis of 111 reference human epigenomes generated as part of the NIH Roadmap Epigenomics Consortium, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression.
Abstract: The reference human genome sequence set the stage for studies of genetic variation and its association with human disease, but epigenomic studies lack a similar reference. To address this need, the NIH Roadmap Epigenomics Consortium generated the largest collection so far of human epigenomes for primary cells and tissues. Here we describe the integrative analysis of 111 reference human epigenomes generated as part of the programme, profiled for histone modification patterns, DNA accessibility, DNA methylation and RNA expression. We establish global maps of regulatory elements, define regulatory modules of coordinated activity, and their likely activators and repressors. We show that disease- and trait-associated genetic variants are enriched in tissue-specific epigenomic marks, revealing biologically relevant cell types for diverse human traits, and providing a resource for interpreting the molecular basis of human disease. Our results demonstrate the central role of epigenomic information for understanding gene regulation, cellular differentiation and human disease.

4,409 citations

Journal ArticleDOI
TL;DR: The American Heart Association, in conjunction with the National Institutes of Health, annually reports the most up-to-date statistics related to heart disease, stroke, and cardiovascul...
Abstract: Background: The American Heart Association, in conjunction with the National Institutes of Health, annually reports the most up-to-date statistics related to heart disease, stroke, and cardiovascul...

3,034 citations

Journal ArticleDOI
TL;DR: Genome-wide polygenic risk scores derived from GWAS data for five common diseases can identify subgroups of the population with risk approaching or exceeding that of a monogenic mutation.
Abstract: A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.

1,962 citations