Showing papers by "Daniel G. MacArthur published in 2021"
••
TL;DR: The Genome Aggregation Database (gnomAD) as discussed by the authors is the largest and most widely used publicly available collection of population variation from harmonized sequencing data, which is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis.
Abstract: Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (https://gnomad.broadinstitute.org/) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We introduce key features including allele frequency, per-base expression levels, constraint scores, and variant co-occurrence, alongside guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease. This article is protected by copyright. All rights reserved.
92 citations
••
TL;DR: In this paper, the primary challenge in human genetics will be to understand the biological mechanisms by which genetic variants influence phenotypes, including disease risk, over the next decade, which is the main challenge of human genetics research.
Abstract: Over the next decade, the primary challenge in human genetics will be to understand the biological mechanisms by which genetic variants influence phenotypes, including disease risk. Although the sc...
61 citations
••
56 citations
••
30 citations
••
TL;DR: In this paper, the expression modifier score (EMS) is used as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putative causal eQTs, and incorporated into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.
Abstract: The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants' effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.
29 citations
••
TL;DR: In this paper, the full manuscript has been temporarily withdrawn by the authors upon request from UK Biobank and results supporting this manuscript remain valid and can be found at https://genebass.org.
Abstract: This manuscript has been temporarily withdrawn by the authors upon request from UK Biobank. Results supporting this manuscript remain valid and can be found at https://genebass.org. The full manuscript will be re-uploaded pending decision by UK Biobank. If you have any questions, please contact the corresponding author.
28 citations
••
Broad Institute1, Baylor College of Medicine2, Yale University3, Johns Hopkins University School of Medicine4, Seattle Children's5, Rutgers University6, University of Washington7, National Institutes of Health8, University of Texas Health Science Center at Houston9, Human Genome Sequencing Center10, Rockefeller University11, Garvan Institute of Medical Research12, Harvard University13, Boston Children's Hospital14
TL;DR: Mendelian disease genomic research has undergone a massive transformation over the last decade as discussed by the authors, and the role of Mendelian research has expanded beyond data collection, sequencing, and analysis to worldwide data sharing and collaboration.
Abstract: Mendelian disease genomic research has undergone a massive transformation over the last decade. With increasing availability of exome and genome sequencing, the role of Mendelian research has expanded beyond data collection, sequencing, and analysis to worldwide data sharing and collaboration. Over the last 10 years, the NIH-supported Centers for Mendelian Genomics (CMGs) have played a major role in this research and clinical evolution. We highlight the cumulative gene discoveries facilitated by the program, biomedical research leveraged by the approach, and the larger impact on the research community. Mendelian genomic research extends beyond generating lists of gene-phenotype relationships, it includes developing tools, training the larger community to use these tools and approaches, and facilitating collaboration through data sharing. Thus, the CMGs have also focused on creating resources, tools, and training for the larger community to foster the understanding of genes and genome variation. The CMGs have participated in a wide range of data sharing activities, including deposition of all eligible CMG data into AnVIL (NHGRI’s Genomic Data Science Analysis, Visualization, and Informatics Lab-Space), sharing candidate genes through Matchmaker Exchange (MME) and the CMG website, and sharing variants in Geno2MP and VariantMatcher. The research genomics output remains exploratory with evidence that thousands of disease genes, in which variant alleles contribute to disease, remain undiscovered, and many patients with rare disease remain molecularly undiagnosed. Strengthening communication between research and clinical labs, continued development and sharing of knowledge and tools required for solving previously unsolved cases, and improving access to data sets, including high-quality metadata, are all required to continue to advance Mendelian genomics research and continue to leverage the Human Genome Project for basic biomedical science research and clinical utility.
22 citations
••
Children's Hospital at Westmead1, University of Sydney2, Harvard University3, Broad Institute4, Children's Medical Research Institute5, Westmead Hospital6, Boston Children's Hospital7, University of New South Wales8, Nepean Hospital9, Wellington Management Company10, Royal Prince Alfred Hospital11, University of Otago12
TL;DR: In this article, the authors describe the diagnostic utility of whole-genome sequencing and RNA studies in boys with suspected dystrophinopathy, for whom multiplex ligation-dependent probe amplification and exomic parallel sequencing failed to yield a genetic diagnosis, and use remnant normal DMD splicing in 3 families to define critical levels of wild-type dystrophicin bridging clinical spectrums of Duchenne to myalgia.
Abstract: Objective To describe the diagnostic utility of whole-genome sequencing and RNA studies in boys with suspected dystrophinopathy, for whom multiplex ligation-dependent probe amplification and exomic parallel sequencing failed to yield a genetic diagnosis, and to use remnant normal DMD splicing in 3 families to define critical levels of wild-type dystrophin bridging clinical spectrums of Duchenne to myalgia Methods Exome, genome, and/or muscle RNA sequencing was performed for 7 males with elevated creatine kinase PCR of muscle-derived complementary DNA (cDNA) studied consequences for DMD premessenger RNA (pre-mRNA) splicing Quantitative Western blot was used to determine levels of dystrophin, relative to control muscle Results Splice-altering intronic single nucleotide variants or structural rearrangements in DMD were identified in all 7 families Four individuals, with abnormal splicing causing a premature stop codon and nonsense-mediated decay, expressed remnant levels of normally spliced DMD mRNA Quantitative Western blot enabled correlation of wild-type dystrophin and clinical severity, with 0%–5% dystrophin conferring a Duchenne phenotype, 10% ± 2% a Becker phenotype, and 15% ± 2% dystrophin associated with myalgia without manifesting weakness Conclusions Whole-genome sequencing relied heavily on RNA studies to identify DMD splice-altering variants Short-read RNA sequencing was regularly confounded by the effectiveness of nonsense-mediated mRNA decay and low read depth of the giant DMD mRNA PCR of muscle cDNA provided a simple, yet informative approach Highly relevant to genetic therapies for dystrophinopathies, our data align strongly with previous studies of mutant dystrophin in Becker muscular dystrophy, with the collective conclusion that a fractional increase in levels of normal dystrophin between 5% and 20% is clinically significant
22 citations
••
Harvard University1, Massachusetts Institute of Technology2, Memorial Sloan Kettering Cancer Center3, University of Washington4, Fred Hutchinson Cancer Research Center5, University of Pavia6, Karolinska University Hospital7, Lund University8, University of Michigan9, Kaiser Permanente10, Cornell University11, Brigham and Women's Hospital12, Icahn School of Medicine at Mount Sinai13, Yale University14, Stanford University15, Howard Hughes Medical Institute16
TL;DR: Common mutations in the 5-methylcytosine reader, ZBTB33, as well as in YLPM1, SRCAP, and ZNF318 are identified, potentially linking DNA methylation and RNA splicing, the two most commonly mutated pathways in clonal hematopoiesis and MDS.
Abstract: Clonal hematopoiesis results from somatic mutations in cancer driver genes in hematopoietic stem cells. We sought to identify novel drivers of clonal expansion using an unbiased analysis of sequencing data from 84,683 persons and identified common mutations in the 5-methylcytosine reader, ZBTB33, as well as in YLPM1, SRCAP, and ZNF318. We also identified these mutations at low frequency in myelodysplastic syndrome patients. Zbtb33 edited mouse hematopoietic stem and progenitor cells exhibited a competitive advantage in vivo and increased genome-wide intron retention. ZBTB33 mutations potentially link DNA methylation and RNA splicing, the two most commonly mutated pathways in clonal hematopoiesis and MDS.
15 citations
••
TL;DR: Seqr as mentioned in this paper is an open source, web-based tool for family-based monogenic disease analysis that allows researchers to work collaboratively to search and annotate genomic callsets.
Abstract: Exome and genome sequencing have become the tools of choice for rare disease diagnosis, leading to large amounts of data available for analyses. To identify causal variants in these datasets, powerful filtering and decision support tools that can be efficiently used by clinicians and researchers are required. To address this need, we developed seqr - an open source, web-based tool for family-based monogenic disease analysis that allows researchers to work collaboratively to search and annotate genomic callsets. To date, seqr is being used in several research pipelines and one clinical diagnostic lab. In our own experience through the Broad Institute Center for Mendelian Genomics, seqr has enabled analyses of over 10,000 families, supporting the diagnosis of more than 3,800 individuals with rare disease and discovery of over 300 novel disease genes. Here we describe a framework for genomic analysis in rare disease that leverages seqr9s capabilities for variant filtration, annotation, and causal variant identification, as well as support for research collaboration and data sharing. The seqr platform is available as open source software, allowing low-cost participation in rare disease research, and a community effort to support diagnosis and gene discovery in rare disease.
14 citations
••
TL;DR: In this paper, the authors presented a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from GTEx tissues and cell lines, complementing the GTEx resource.
Abstract: Regulation of transcript structure generates transcript diversity and plays an important role in human disease. The advent of long-read sequencing technologies offers the opportunity to study the role of genetic variation in transcript structure. In this paper, we present a large human long-read RNA-seq dataset using the Oxford Nanopore Technologies platform from 88 samples from GTEx tissues and cell lines, complementing the GTEx resource. We identified just under 100,000 new transcripts for annotated genes, and validated the protein expression of a similar proportion of novel and annotated transcripts. We developed a new computational package, LORALS, to analyze genetic effects of rare and common variants on the transcriptome via allele-specific analysis of long reads. We called allele-specific expression and transcript structure events, providing novel insights into the specific transcript alterations caused by common and rare genetic variants and highlighting the resolution gained from long-read data. We were able to perturb transcript structure upon knockdown of PTBP1, an RNA binding protein that mediates splicing, thereby finding genetic regulatory effects that are modified by the cellular environment. Finally, we use this dataset to enhance variant interpretation and study rare variants leading to aberrant splicing patterns.
••
University of Bonn1, University Hospital Bonn2, Icahn School of Medicine at Mount Sinai3, Baylor College of Medicine4, University of Health Sciences Antigua5, University of Alabama at Birmingham6, Mohanlal Sukhadia University7, Alfaisal University8, UCL Institute of Neurology9, Boston Children's Hospital10, Pir Mehr Ali Shah Arid Agriculture University11, University of Cologne12, Technische Universität München13, University of British Columbia14, University of Paris15, Massachusetts Institute of Technology16, University of Sydney17, Children's Hospital at Westmead18, University of New South Wales19, Children's Medical Research Institute20, Harvard University21, University Hospital Heidelberg22, Leipzig University23, Columbia University24, University of Florence25, GeneDx26, University of Illinois at Chicago27, St George's, University of London28
TL;DR: In this paper, the effect of PLXNA1 variants on the phenotype of patients with autosomal dominant and recessive inheritance patterns and to functionally characterize the zebrafish homologs plxna1a and plxNA1b during development was investigated.
••
TL;DR: In this article, the authors used exome sequencing (ES) and genome sequencing (GS) to identify three unrelated probands with CFEOM who harbored novel heterozygous TUBA1A missense variants c.1216C>G, p.(His406Asp); c.467G>A, p(Arg156His); and c.1193T>G and p.(Met398Arg).
Abstract: Variants in multiple tubulin genes have been implicated in neurodevelopmental disorders, including malformations of cortical development (MCD) and congenital fibrosis of the extraocular muscles (CFEOM). Distinct missense variants in the beta-tubulin encoding genes TUBB3 and TUBB2B cause MCD, CFEOM, or both, suggesting substitution-specific mechanisms. Variants in the alpha tubulin-encoding gene TUBA1A have been associated with MCD, but not with CFEOM. Using exome sequencing (ES) and genome sequencing (GS), we identified 3 unrelated probands with CFEOM who harbored novel heterozygous TUBA1A missense variants c.1216C>G, p.(His406Asp); c.467G>A, p.(Arg156His); and c.1193T>G, p.(Met398Arg). MRI revealed small oculomotor-innervated muscles and asymmetrical caudate heads and lateral ventricles with or without corpus callosal thinning. Two of the three probands had MCD. Mutated amino acid residues localize either to the longitudinal interface at which α and β tubulins heterodimerize (Met398, His406) or to the lateral interface at which tubulin protofilaments interact (Arg156), and His406 interacts with the motor domain of kinesin-1. This series of individuals supports TUBA1A variants as a cause of CFEOM and expands our knowledge of tubulinopathies.
••
TL;DR: The genotypic spectrum ofXLMTM is expanded and benefits of screening non-coding regions of MTM1 in male probands with phenotypically concordant XLMTM who remain undiagnosed following exome sequencing are highlighted.
Abstract: X-linked myotubular myopathy (XLMTM) is a severe congenital myopathy characterised by generalised weakness and respiratory insufficiency. XLMTM is associated with pathogenic variants in MTM1; a gene encoding the lipid phosphatase myotubularin. Whole genome sequencing (WGS) of an exome-negative male proband with severe hypotonia, respiratory insufficiency and centralised nuclei on muscle biopsy identified a deep intronic MTM1 variant NG_008199.1(NM_000252.2):c.1468-577A>G, which strengthened a cryptic 5' splice site (A>G substitution at the +5 position). Muscle RNA sequencing was non-diagnostic due to low read depth. Reverse transcription PCR (RT-PCR) of muscle RNA confirmed the c.1468-577A>G variant activates inclusion of a pseudo-exon encoding a premature stop codon into all detected MTM1 transcripts. Western blot analysis establishes deficiency of myotubularin protein, consistent with the severe XLMTM phenotype. We expand the genotypic spectrum of XLMTM and highlight benefits of screening non-coding regions of MTM1 in male probands with phenotypically concordant XLMTM who remain undiagnosed following exome sequencing.
••
University of Washington1, Cardiff University2, Utrecht University3, Texas Tech University4, Children's Hospital of Philadelphia5, Spectrum Health6, University of South Alabama7, McGill University Health Centre8, University of Exeter9, Newcastle upon Tyne Hospitals NHS Foundation Trust10, University Hospitals of Leicester NHS Trust11, NHS Greater Glasgow and Clyde12, Kaiser Permanente13, McMaster Children's Hospital14, Duke University15, Boston Children's Hospital16, Indiana University17, University of Oklahoma Health Sciences Center18, GeneDx19, Seattle Children's20, French Institute of Health and Medical Research21, Bethel University22, Leipzig University23, Garvan Institute of Medical Research24, Broad Institute25
TL;DR: In this paper, a large cohort trio-based exome sequencing and international data-sharing was used to identify 24 unrelated individuals with NDD phenotypes and a variant in GNAI1, which encodes the inhibitory Gαi1 subunit of heterotrimeric G-proteins.
••
TL;DR: A Correction to this paper has been published: https://doi.org/10.1038/s41586-020-03176-6.
Abstract: A Correction to this paper has been published: https://doi.org/10.1038/s41586-020-03176-6.
••
TL;DR: The Undiagnosed Diseases Program-Victoria as discussed by the authors, an Australian program embedded within a clinical genetics service in the state of Victoria with a focus on paediatric rare diseases, used family-based exome sequencing (family ES), family based genome sequencing, RNA sequencing (RNA-seq) and high-resolution chromosomal microarray (CMA) with research-based analysis.
Abstract: Background Clinical exome sequencing typically achieves diagnostic yields of 30%-57.5% in individuals with monogenic rare diseases. Undiagnosed diseases programmes implement strategies to improve diagnostic outcomes for these individuals. Aim We share the lessons learnt from the first 3 years of the Undiagnosed Diseases Program-Victoria, an Australian programme embedded within a clinical genetics service in the state of Victoria with a focus on paediatric rare diseases. Methods We enrolled families who remained without a diagnosis after clinical genomic (panel, exome or genome) sequencing between 2016 and 2018. We used family-based exome sequencing (family ES), family-based genome sequencing (family GS), RNA sequencing (RNA-seq) and high-resolution chromosomal microarray (CMA) with research-based analysis. Results In 150 families, we achieved a diagnosis or strong candidate in 64 (42.7%) (37 in known genes with a consistent phenotype, 3 in known genes with a novel phenotype and 24 in novel disease genes). Fifty-four diagnoses or strong candidates were made by family ES, six by family GS with RNA-seq, two by high-resolution CMA and two by data reanalysis. Conclusion We share our lessons learnt from the programme. Flexible implementation of multiple strategies allowed for scalability and response to the availability of new technologies. Broad implementation of family ES with research-based analysis showed promising yields post a negative clinical singleton ES. RNA-seq offered multiple benefits in family ES-negative populations. International data sharing strategies were critical in facilitating collaborations to establish novel disease-gene associations. Finally, the integrated approach of a multiskilled, multidisciplinary team was fundamental to having diverse perspectives and strategic decision-making.
••
TL;DR: In this paper, the authors present a pipeline to call mtDNA variants that addresses three technical challenges: (i) detecting homoplasmic and heterplasmic variants, present respectively in all or a fraction of mtDNA molecules, (ii) circular mtDNA genome, and (iii) misalignment of nuclear sequences of mitochondrial origin (NUMTs).
Abstract: Databases of allele frequency are extremely helpful for evaluating clinical variants of unknown significance; however, until now, genetic databases such as the Genome Aggregation Database (gnomAD) have ignored the mitochondrial genome (mtDNA). Here we present a pipeline to call mtDNA variants that addresses three technical challenges: (i) detecting homoplasmic and heteroplasmic variants, present respectively in all or a fraction of mtDNA molecules, (ii) circular mtDNA genome, and (iii) misalignment of nuclear sequences of mitochondrial origin (NUMTs). We observed that mtDNA copy number per cell varied across gnomAD cohorts and influenced the fraction of NUMT-derived false-positive variant calls, which can account for the majority of putative heteroplasmies. To avoid false positives, we excluded samples prone to NUMT misalignment (few mtDNA copies per cell), cell line artifacts (many mtDNA copies per cell), or with contamination and we reported variants with heteroplasmy greater than 10%. We applied this pipeline to 56,434 whole genome sequences in the gnomAD v3.1 database that includes individuals of European (58%), African (25%), Latino (10%), and Asian (5%) ancestry. Our gnomAD v3.1 release contains population frequencies for 10,850 unique mtDNA variants at more than half of all mtDNA bases. Importantly, we report frequencies within each nuclear ancestral population and mitochondrial haplogroup. Homoplasmic variants account for most variant calls (98%) and unique variants (85%). We observed that 1/250 individuals carry a pathogenic mtDNA variant with heteroplasmy above 10%. These mitochondrial population allele frequencies are publicly available at gnomad.broadinstitute.org and will aid in diagnostic interpretation and research studies.
•
••
TL;DR: Functional and proteomic studies link TUBGCP2 and the γ-tubulin complex to the development of the central nervous system in humans and observe dysregulation of multiple proteins involved in the assembly and organization of the cytoskeleton and the extracellular matrix.
••
•
TL;DR: The Genome Aggregation Database (gnomAD) as mentioned in this paper is the largest and most widely used publicly available collection of population variation from harmonized sequencing data, which enables rapid and intuitive variant analysis.
Abstract: Reference population databases are an essential tool in variant and gene interpretation. Their use guides the identification of pathogenic variants amidst the sea of benign variation present in every human genome, and supports the discovery of new disease-gene relationships. The Genome Aggregation Database (gnomAD) is currently the largest and most widely-used publicly available collection of population variation from harmonized sequencing data. The data is available through the online gnomAD browser (this https URL) that enables rapid and intuitive variant analysis. This review provides guidance on the content of the gnomAD browser, and its usage for variant and gene interpretation. We will introduce key features including allele frequency, per-base expression levels, and constraint scores, and provide guidance on how to use these in analysis, with a focus on the interpretation of candidate variants and novel genes in rare disease.