scispace - formally typeset
Search or ask a question

Showing papers by "Daniel G. MacArthur published in 2021"


Journal ArticleDOI
TL;DR: The Genome Aggregation Database (gnomAD) as discussed by the authors is the largest and most widely used publicly available collection of population variation from harmonized sequencing data, which is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis.
Abstract: Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We introduce key features including allele frequency, per-base expression levels, constraint scores, and variant co-occurrence, alongside guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease. This article is protected by copyright. All rights reserved.

92 citations


Journal ArticleDOI
24 Sep 2021-Science
TL;DR: In this paper, the primary challenge in human genetics will be to understand the biological mechanisms by which genetic variants influence phenotypes, including disease risk, over the next decade, which is the main challenge of human genetics research.
Abstract: Over the next decade, the primary challenge in human genetics will be to understand the biological mechanisms by which genetic variants influence phenotypes, including disease risk. Although the sc...

61 citations


Journal ArticleDOI
Konrad J. Karczewski1, Konrad J. Karczewski2, Laurent C. Francioli2, Laurent C. Francioli1, Grace Tiao2, Grace Tiao1, Beryl B. Cummings2, Beryl B. Cummings1, Jessica Alföldi2, Jessica Alföldi1, Qingbo Wang1, Qingbo Wang2, Ryan L. Collins2, Ryan L. Collins1, Kristen M. Laricchia1, Kristen M. Laricchia2, Andrea Ganna2, Andrea Ganna3, Andrea Ganna1, Daniel P. Birnbaum1, Daniel P. Birnbaum2, Laura D. Gauthier1, Harrison Brand2, Harrison Brand1, Matthew Solomonson2, Matthew Solomonson1, Nicholas A. Watts2, Nicholas A. Watts1, Daniel R. Rhodes4, Moriel Singer-Berk1, Moriel Singer-Berk2, Eleina M. England2, Eleina M. England1, Eleanor G. Seaby2, Eleanor G. Seaby1, Jack A. Kosmicki2, Jack A. Kosmicki1, Raymond K. Walters1, Raymond K. Walters2, Katherine Tashman1, Katherine Tashman2, Yossi Farjoun1, Eric Banks1, Timothy Poterba2, Timothy Poterba1, Arcturus Wang2, Arcturus Wang1, Cotton Seed1, Cotton Seed2, Nicola Whiffin, Jessica X. Chong5, Kaitlin E. Samocha6, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Zachary Zappala1, Zachary Zappala7, Zachary Zappala2, Anne H. O’Donnell-Luria, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek8, James S. Ware9, James S. Ware1, Christopher Vittal1, Christopher Vittal2, Irina M. Armean1, Irina M. Armean2, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski2, Michael E. Talkowski1, Benjamin M. Neale2, Benjamin M. Neale1, Mark J. Daly, Daniel G. MacArthur 
03 Feb 2021-Nature

56 citations


Journal ArticleDOI
Sanna Gudmundsson1, Sanna Gudmundsson2, Sanna Gudmundsson3, Konrad J. Karczewski2, Konrad J. Karczewski1, Laurent C. Francioli1, Laurent C. Francioli2, Grace Tiao2, Grace Tiao1, Beryl B. Cummings2, Beryl B. Cummings1, Jessica Alföldi2, Jessica Alföldi1, Qingbo Wang1, Qingbo Wang2, Ryan L. Collins1, Ryan L. Collins2, Kristen M. Laricchia2, Kristen M. Laricchia1, Andrea Ganna4, Andrea Ganna2, Andrea Ganna1, Daniel P. Birnbaum1, Daniel P. Birnbaum2, Laura D. Gauthier1, Harrison Brand1, Harrison Brand2, Matthew Solomonson1, Matthew Solomonson2, Nicholas A. Watts1, Nicholas A. Watts2, Daniel R. Rhodes5, Moriel Singer-Berk1, Moriel Singer-Berk2, Eleina M. England1, Eleina M. England2, Eleanor G. Seaby2, Eleanor G. Seaby1, Jack A. Kosmicki1, Jack A. Kosmicki2, Raymond K. Walters2, Raymond K. Walters1, Katherine Tashman2, Katherine Tashman1, Yossi Farjoun1, Eric Banks1, Timothy Poterba1, Timothy Poterba2, Arcturus Wang1, Arcturus Wang2, Cotton Seed1, Cotton Seed2, Nicola Whiffin, Jessica X. Chong6, Kaitlin E. Samocha7, Emma Pierce-Hoffman1, Emma Pierce-Hoffman2, Zachary Zappala1, Zachary Zappala8, Zachary Zappala2, Anne H. O’Donnell-Luria, Eric Vallabh Minikel1, Ben Weisburd1, Monkol Lek9, James S. Ware1, James S. Ware10, Christopher Vittal2, Christopher Vittal1, Irina M. Armean1, Irina M. Armean2, Louis Bergelson1, Kristian Cibulskis1, Kristen M. Connolly1, Miguel Covarrubias1, Stacey Donnelly1, Steven Ferriera1, Stacey Gabriel1, Jeff Gentry1, Namrata Gupta1, Thibault Jeandet1, Diane Kaplan1, Christopher Llanwarne1, Ruchi Munshi1, Sam Novod1, Nikelle Petrillo1, David Roazen1, Valentin Ruano-Rubio1, Andrea Saltzman1, Molly Schleicher1, Jose Soto1, Kathleen Tibbetts1, Charlotte Tolonen1, Gordon Wade1, Michael E. Talkowski1, Michael E. Talkowski2, Benjamin M. Neale1, Benjamin M. Neale2, Mark J. Daly, Daniel G. MacArthur 
01 Sep 2021-Nature

30 citations


Journal ArticleDOI
TL;DR: In this paper, the expression modifier score (EMS) is used as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putative causal eQTs, and incorporated into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.
Abstract: The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.

29 citations


Posted ContentDOI
23 Jun 2021-medRxiv
TL;DR: In this paper, the full manuscript has been temporarily withdrawn by the authors upon request from UK Biobank and results supporting this manuscript remain valid and can be found at https://genebass.org.
Abstract: This manuscript has been temporarily withdrawn by the authors upon request from UK Biobank. Results supporting this manuscript remain valid and can be found at https://genebass.org. The full manuscript will be re-uploaded pending decision by UK Biobank. If you have any questions, please contact the corresponding author.

28 citations


Posted ContentDOI
31 Aug 2021-medRxiv
TL;DR: Mendelian disease genomic research has undergone a massive transformation over the last decade as discussed by the authors, and the role of Mendelian research has expanded beyond data collection, sequencing, and analysis to worldwide data sharing and collaboration.
Abstract: Mendelian disease genomic research has undergone a massive transformation over the last decade. With increasing availability of exome and genome sequencing, the role of Mendelian research has expanded beyond data collection, sequencing, and analysis to worldwide data sharing and collaboration. Over the last 10 years, the NIH-supported Centers for Mendelian Genomics (CMGs) have played a major role in this research and clinical evolution. We highlight the cumulative gene discoveries facilitated by the program, biomedical research leveraged by the approach, and the larger impact on the research community. Mendelian genomic research extends beyond generating lists of gene-phenotype relationships, it includes developing tools, training the larger community to use these tools and approaches, and facilitating collaboration through data sharing. Thus, the CMGs have also focused on creating resources, tools, and training for the larger community to foster the understanding of genes and genome variation. The CMGs have participated in a wide range of data sharing activities, including deposition of all eligible CMG data into AnVIL (NHGRI’s Genomic Data Science Analysis, Visualization, and Informatics Lab-Space), sharing candidate genes through Matchmaker Exchange (MME) and the CMG website, and sharing variants in Geno2MP and VariantMatcher. The research genomics output remains exploratory with evidence that thousands of disease genes, in which variant alleles contribute to disease, remain undiscovered, and many patients with rare disease remain molecularly undiagnosed. Strengthening communication between research and clinical labs, continued development and sharing of knowledge and tools required for solving previously unsolved cases, and improving access to data sets, including high-quality metadata, are all required to continue to advance Mendelian genomics research and continue to leverage the Human Genome Project for basic biomedical science research and clinical utility.

22 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe the diagnostic utility of whole-genome sequencing and RNA studies in boys with suspected dystrophinopathy, for whom multiplex ligation-dependent probe amplification and exomic parallel sequencing failed to yield a genetic diagnosis, and use remnant normal DMD splicing in 3 families to define critical levels of wild-type dystrophicin bridging clinical spectrums of Duchenne to myalgia.
Abstract: Objective To describe the diagnostic utility of whole-genome sequencing and RNA studies in boys with suspected dystrophinopathy, for whom multiplex ligation-dependent probe amplification and exomic parallel sequencing failed to yield a genetic diagnosis, and to use remnant normal DMD splicing in 3 families to define critical levels of wild-type dystrophin bridging clinical spectrums of Duchenne to myalgia Methods Exome, genome, and/or muscle RNA sequencing was performed for 7 males with elevated creatine kinase PCR of muscle-derived complementary DNA (cDNA) studied consequences for DMD premessenger RNA (pre-mRNA) splicing Quantitative Western blot was used to determine levels of dystrophin, relative to control muscle Results Splice-altering intronic single nucleotide variants or structural rearrangements in DMD were identified in all 7 families Four individuals, with abnormal splicing causing a premature stop codon and nonsense-mediated decay, expressed remnant levels of normally spliced DMD mRNA Quantitative Western blot enabled correlation of wild-type dystrophin and clinical severity, with 0%–5% dystrophin conferring a Duchenne phenotype, 10% ± 2% a Becker phenotype, and 15% ± 2% dystrophin associated with myalgia without manifesting weakness Conclusions Whole-genome sequencing relied heavily on RNA studies to identify DMD splice-altering variants Short-read RNA sequencing was regularly confounded by the effectiveness of nonsense-mediated mRNA decay and low read depth of the giant DMD mRNA PCR of muscle cDNA provided a simple, yet informative approach Highly relevant to genetic therapies for dystrophinopathies, our data align strongly with previous studies of mutant dystrophin in Becker muscular dystrophy, with the collective conclusion that a fractional increase in levels of normal dystrophin between 5% and 20% is clinically significant

22 citations


Journal ArticleDOI
14 Jul 2021
TL;DR: Common mutations in the 5-methylcytosine reader, ZBTB33, as well as in YLPM1, SRCAP, and ZNF318 are identified, potentially linking DNA methylation and RNA splicing, the two most commonly mutated pathways in clonal hematopoiesis and MDS.
Abstract: Clonal hematopoiesis results from somatic mutations in cancer driver genes in hematopoietic stem cells. We sought to identify novel drivers of clonal expansion using an unbiased analysis of sequencing data from 84,683 persons and identified common mutations in the 5-methylcytosine reader, ZBTB33, as well as in YLPM1, SRCAP, and ZNF318. We also identified these mutations at low frequency in myelodysplastic syndrome patients. Zbtb33 edited mouse hematopoietic stem and progenitor cells exhibited a competitive advantage in vivo and increased genome-wide intron retention. ZBTB33 mutations potentially link DNA methylation and RNA splicing, the two most commonly mutated pathways in clonal hematopoiesis and MDS.

15 citations


Posted ContentDOI
28 Oct 2021-medRxiv
TL;DR: Seqr as mentioned in this paper is an open source, web-based tool for family-based monogenic disease analysis that allows researchers to work collaboratively to search and annotate genomic callsets.
Abstract: Exome and genome sequencing have become the tools of choice for rare disease diagnosis, leading to large amounts of data available for analyses. To identify causal variants in these datasets, powerful filtering and decision support tools that can be efficiently used by clinicians and researchers are required. To address this need, we developed seqr - an open source, web-based tool for family-based monogenic disease analysis that allows researchers to work collaboratively to search and annotate genomic callsets. To date, seqr is being used in several research pipelines and one clinical diagnostic lab. In our own experience through the Broad Institute Center for Mendelian Genomics, seqr has enabled analyses of over 10,000 families, supporting the diagnosis of more than 3,800 individuals with rare disease and discovery of over 300 novel disease genes. Here we describe a framework for genomic analysis in rare disease that leverages seqr9s capabilities for variant filtration, annotation, and causal variant identification, as well as support for research collaboration and data sharing. The seqr platform is available as open source software, allowing low-cost participation in rare disease research, and a community effort to support diagnosis and gene discovery in rare disease.

14 citations


Posted ContentDOI
23 Jan 2021-bioRxiv
TL;DR: In this paper, the authors presented a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from GTEx tissues and cell lines, complementing the GTEx resource.
Abstract: Regulation of transcript structure generates transcript diversity and plays an important role in human disease. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure. In this paper, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from GTEx tissues and cell lines, complementing the GTEx resource. We identified just under 100,000 new transcripts for annotated genes, and validated the protein expression of a similar proportion of novel and annotated transcripts. We developed a new computational package, LORALS, to analyze genetic effects of rare and common variants on the transcriptome via allele-specific analysis of long reads. We called allele-specific expression and transcript structure events, providing novel insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we use this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.

Journal ArticleDOI
TL;DR: In this paper, the effect of PLXNA1 variants on the phenotype of patients with autosomal dominant and recessive inheritance patterns and to functionally characterize the zebrafish homologs plxna1a and plxNA1b during development was investigated.

Journal ArticleDOI
TL;DR: In this article, the authors used exome sequencing (ES) and genome sequencing (GS) to identify three unrelated probands with CFEOM who harbored novel heterozygous TUBA1A missense variants c.1216C>G, p.(His406Asp); c.467G>A, p(Arg156His); and c.1193T>G and p.(Met398Arg).
Abstract: Variants in multiple tubulin genes have been implicated in neurodevelopmental disorders, including malformations of cortical development (MCD) and congenital fibrosis of the extraocular muscles (CFEOM). Distinct missense variants in the beta-tubulin encoding genes TUBB3 and TUBB2B cause MCD, CFEOM, or both, suggesting substitution-specific mechanisms. Variants in the alpha tubulin-encoding gene TUBA1A have been associated with MCD, but not with CFEOM. Using exome sequencing (ES) and genome sequencing (GS), we identified 3 unrelated probands with CFEOM who harbored novel heterozygous TUBA1A missense variants c.1216C>G, p.(His406Asp); c.467G>A, p.(Arg156His); and c.1193T>G, p.(Met398Arg). MRI revealed small oculomotor-innervated muscles and asymmetrical caudate heads and lateral ventricles with or without corpus callosal thinning. Two of the three probands had MCD. Mutated amino acid residues localize either to the longitudinal interface at which α and β tubulins heterodimerize (Met398, His406) or to the lateral interface at which tubulin protofilaments interact (Arg156), and His406 interacts with the motor domain of kinesin-1. This series of individuals supports TUBA1A variants as a cause of CFEOM and expands our knowledge of tubulinopathies.

Journal ArticleDOI
TL;DR: The genotypic spectrum ofXLMTM is expanded and benefits of screening non-coding regions of MTM1 in male probands with phenotypically concordant XLMTM who remain undiagnosed following exome sequencing are highlighted.
Abstract: X-linked myotubular myopathy (XLMTM) is a severe congenital myopathy characterised by generalised weakness and respiratory insufficiency. XLMTM is associated with pathogenic variants in MTM1; a gene encoding the lipid phosphatase myotubularin. Whole genome sequencing (WGS) of an exome-negative male proband with severe hypotonia, respiratory insufficiency and centralised nuclei on muscle biopsy identified a deep intronic MTM1 variant NG_008199.1(NM_000252.2):c.1468-577A>G, which strengthened a cryptic 5' splice site (A>G substitution at the +5 position). Muscle RNA sequencing was non-diagnostic due to low read depth. Reverse transcription PCR (RT-PCR) of muscle RNA confirmed the c.1468-577A>G variant activates inclusion of a pseudo-exon encoding a premature stop codon into all detected MTM1 transcripts. Western blot analysis establishes deficiency of myotubularin protein, consistent with the severe XLMTM phenotype. We expand the genotypic spectrum of XLMTM and highlight benefits of screening non-coding regions of MTM1 in male probands with phenotypically concordant XLMTM who remain undiagnosed following exome sequencing.

Journal ArticleDOI
TL;DR: In this paper, a large cohort trio-based exome sequencing and international data-sharing was used to identify 24 unrelated individuals with NDD phenotypes and a variant in GNAI1, which encodes the inhibitory Gαi1 subunit of heterotrimeric G-proteins.

Journal ArticleDOI
01 Feb 2021-Nature
TL;DR: A Correction to this paper has been published: https://doi.org/10.1038/s41586-020-03176-6.
Abstract: A Correction to this paper has been published: https://doi.org/10.1038/s41586-020-03176-6.

Journal ArticleDOI
TL;DR: The Undiagnosed Diseases Program-Victoria as discussed by the authors, an Australian program embedded within a clinical genetics service in the state of Victoria with a focus on paediatric rare diseases, used family-based exome sequencing (family ES), family based genome sequencing, RNA sequencing (RNA-seq) and high-resolution chromosomal microarray (CMA) with research-based analysis.
Abstract: Background Clinical exome sequencing typically achieves diagnostic yields of 30%-57.5% in individuals with monogenic rare diseases. Undiagnosed diseases programmes implement strategies to improve diagnostic outcomes for these individuals. Aim We share the lessons learnt from the first 3 years of the Undiagnosed Diseases Program-Victoria, an Australian programme embedded within a clinical genetics service in the state of Victoria with a focus on paediatric rare diseases. Methods We enrolled families who remained without a diagnosis after clinical genomic (panel, exome or genome) sequencing between 2016 and 2018. We used family-based exome sequencing (family ES), family-based genome sequencing (family GS), RNA sequencing (RNA-seq) and high-resolution chromosomal microarray (CMA) with research-based analysis. Results In 150 families, we achieved a diagnosis or strong candidate in 64 (42.7%) (37 in known genes with a consistent phenotype, 3 in known genes with a novel phenotype and 24 in novel disease genes). Fifty-four diagnoses or strong candidates were made by family ES, six by family GS with RNA-seq, two by high-resolution CMA and two by data reanalysis. Conclusion We share our lessons learnt from the programme. Flexible implementation of multiple strategies allowed for scalability and response to the availability of new technologies. Broad implementation of family ES with research-based analysis showed promising yields post a negative clinical singleton ES. RNA-seq offered multiple benefits in family ES-negative populations. International data sharing strategies were critical in facilitating collaborations to establish novel disease-gene associations. Finally, the integrated approach of a multiskilled, multidisciplinary team was fundamental to having diverse perspectives and strategic decision-making.

Posted ContentDOI
23 Jul 2021-bioRxiv
TL;DR: In this paper, the authors present a pipeline to call mtDNA variants that addresses three technical challenges: (i) detecting homoplasmic and heterplasmic variants, present respectively in all or a fraction of mtDNA molecules, (ii) circular mtDNA genome, and (iii) misalignment of nuclear sequences of mitochondrial origin (NUMTs).
Abstract: Databases of allele frequency are extremely helpful for evaluating clinical variants of unknown significance; however, until now, genetic databases such as the Genome Aggregation Database (gnomAD) have ignored the mitochondrial genome (mtDNA). Here we present a pipeline to call mtDNA variants that addresses three technical challenges: (i) detecting homoplasmic and heteroplasmic variants, present respectively in all or a fraction of mtDNA molecules, (ii) circular mtDNA genome, and (iii) misalignment of nuclear sequences of mitochondrial origin (NUMTs). We observed that mtDNA copy number per cell varied across gnomAD cohorts and influenced the fraction of NUMT-derived false-positive variant calls, which can account for the majority of putative heteroplasmies. To avoid false positives, we excluded samples prone to NUMT misalignment (few mtDNA copies per cell), cell line artifacts (many mtDNA copies per cell), or with contamination and we reported variants with heteroplasmy greater than 10%. We applied this pipeline to 56,434 whole genome sequences in the gnomAD v3.1 database that includes individuals of European (58%), African (25%), Latino (10%), and Asian (5%) ancestry. Our gnomAD v3.1 release contains population frequencies for 10,850 unique mtDNA variants at more than half of all mtDNA bases. Importantly, we report frequencies within each nuclear ancestral population and mitochondrial haplogroup. Homoplasmic variants account for most variant calls (98%) and unique variants (85%). We observed that 1/250 individuals carry a pathogenic mtDNA variant with heteroplasmy above 10%. These mitochondrial population allele frequencies are publicly available at gnomad.broadinstitute.org and will aid in diagnostic interpretation and research studies.




Posted Content
TL;DR: The Genome Aggregation Database (gnomAD) as mentioned in this paper is the largest and most widely used publicly available collection of population variation from harmonized sequencing data, which enables rapid and intuitive variant analysis.
Abstract: Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely-used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (this https URL) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We will introduce key features including allele frequency, per-base expression levels, and constraint scores, and provide guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease.