scispace - formally typeset
Search or ask a question

Showing papers by "Daniel G. MacArthur published in 2020"


Journal ArticleDOI
27 May 2020-Nature
TL;DR: A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.
Abstract: Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences of gene disruption: genes that are crucial for the function of an organism will be depleted of such variants in natural populations, whereas non-essential genes will tolerate their accumulation. However, predicted loss-of-function variants are enriched for annotation errors, and tend to be found at extremely low frequencies, so their analysis requires careful variant annotation and very large sample sizes1. Here we describe the aggregation of 125,748 exomes and 15,708 genomes from human sequencing studies into the Genome Aggregation Database (gnomAD). We identify 443,769 high-confidence predicted loss-of-function variants in this cohort after filtering for artefacts caused by sequencing and annotation errors. Using an improved model of human mutation rates, we classify human protein-coding genes along a spectrum that represents tolerance to inactivation, validate this classification using data from model organisms and engineered human cells, and show that it can be used to improve the power of gene discovery for both common and rare diseases. A catalogue of predicted loss-of-function variants in 125,748 whole-exome and 15,708 whole-genome sequencing datasets from the Genome Aggregation Database (gnomAD) reveals the spectrum of mutational constraints that affect these human protein-coding genes.

4,913 citations



Journal ArticleDOI
28 May 2020-Nature
TL;DR: A large empirical assessment of sequence-resolved structural variants from 14,891 genomes across diverse global populations in the Genome Aggregation Database (gnomAD) provides a reference map for disease-association studies, population genetics, and diagnostic screening.
Abstract: Structural variants (SVs) rearrange large segments of DNA1 and can have profound consequences in evolution and human disease2,3. As national biobanks, disease-association studies, and clinical genetic testing have grown increasingly reliant on genome sequencing, population references such as the Genome Aggregation Database (gnomAD)4 have become integral in the interpretation of single-nucleotide variants (SNVs)5. However, there are no reference maps of SVs from high-coverage genome sequencing comparable to those for SNVs. Here we present a reference of sequence-resolved SVs constructed from 14,891 genomes across diverse global populations (54% non-European) in gnomAD. We discovered a rich and complex landscape of 433,371 SVs, from which we estimate that SVs are responsible for 25-29% of all rare protein-truncating events per genome. We found strong correlations between natural selection against damaging SNVs and rare SVs that disrupt or duplicate protein-coding sequence, which suggests that genes that are highly intolerant to loss-of-function are also sensitive to increased dosage6. We also uncovered modest selection against noncoding SVs in cis-regulatory elements, although selection against protein-truncating SVs was stronger than all noncoding effects. Finally, we identified very large (over one megabase), rare SVs in 3.9% of samples, and estimate that 0.13% of individuals may carry an SV that meets the existing criteria for clinically important incidental findings7. This SV resource is freely distributed via the gnomAD browser8 and will have broad utility in population genetics, disease-association studies, and diagnostic screening.

494 citations


Journal ArticleDOI
08 Jan 2020-Nature
TL;DR: Progress is described in the study of human genetics, in which rapid advances in technology, foundational genomic resources and analytical tools have contributed to the understanding of the mechanisms responsible for many rare and common diseases and to preventative and therapeutic strategies for many of these conditions.
Abstract: A primary goal of human genetics is to identify DNA sequence variants that influence biomedical traits, particularly those related to the onset and progression of human disease. Over the past 25 years, progress in realizing this objective has been transformed by advances in technology, foundational genomic resources and analytical tools, and by access to vast amounts of genotype and phenotype data. Genetic discoveries have substantially improved our understanding of the mechanisms responsible for many rare and common diseases and driven development of novel preventative and therapeutic strategies. Medical innovation will increasingly focus on delivering care tailored to individual patterns of genetic predisposition.

356 citations



Journal ArticleDOI
28 May 2020-Nature
TL;DR: A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.
Abstract: The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the ‘proportion expressed across transcripts’, which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies. A novel variant annotation metric that quantifies the level of expression of genetic variants across tissues is validated in the Genome Aggregation Database (gnomAD) and is shown to improve rare variant interpretation.

130 citations


Journal ArticleDOI
TL;DR: It is shown that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection, which is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants.
Abstract: Upstream open reading frames (uORFs) are tissue-specific cis-regulators of protein translation. Isolated reports have shown that variants that create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show that variants that create new upstream start codons, and variants disrupting stop sites of existing uORFs, are under strong negative selection. This selection signal is significantly stronger for variants arising upstream of genes intolerant to loss-of-function variants. Furthermore, variants creating uORFs that overlap the coding sequence show signals of selection equivalent to coding missense variants. Finally, we identify specific genes where modification of uORFs likely represents an important disease mechanism, and report a novel uORF frameshift variant upstream of NF2 in neurofibromatosis. Our results highlight uORF-perturbing variants as an under-recognised functional class that contribute to penetrant human disease, and demonstrate the power of large-scale population sequencing data in studying non-coding variant classes. Upstream open reading frames (uORFs), located in 5’ untranslated regions, are regulators of downstream protein translation. Here, Whiffin et al. use the genomes of 15,708 individuals in the Genome Aggregation Database (gnomAD) to systematically assess the deleteriousness of variants creating or disrupting uORFs.

90 citations


Journal ArticleDOI
TL;DR: The gnomAD dataset is used to assemble a catalogue of MNVs and the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - are estimated.
Abstract: Multi-nucleotide variants (MNVs), defined as two or more nearby variants existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, existing tools typically do not accurately classify MNVs, and understanding of their mutational origins remains limited. Here, we systematically survey MNVs in 125,748 whole exomes and 15,708 whole genomes from the Genome Aggregation Database (gnomAD). We identify 1,792,248 MNVs across the genome with constituent variants falling within 2 bp distance of one another, including 18,756 variants with a novel combined effect on protein sequence. Finally, we estimate the relative impact of known mutational mechanisms - CpG deamination, replication error by polymerase zeta, and polymerase slippage at repeat junctions - on the generation of MNVs. Our results demonstrate the value of haplotype-aware variant annotation, and refine our understanding of genome-wide mutational mechanisms of MNVs. Multi-nucleotide variants (MNV) are genetic variants in close proximity of each other on the same haplotype whose functional impact is difficult to predict if they reside in the same codon. Here, Wang et al. use the gnomAD dataset to assemble a catalogue of MNVs and estimate their global mutation rate.

84 citations


Journal ArticleDOI
TL;DR: In this article, the authors systematically analyzed pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset.
Abstract: Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models of human gene inactivation and can be valuable indicators of gene function and the potential toxicity of therapeutic inhibitors targeting these genes1,2. Gain-of-kinase-function variants in LRRK2 are known to significantly increase the risk of Parkinson’s disease3,4, suggesting that inhibition of LRRK2 kinase activity is a promising therapeutic strategy. While preclinical studies in model organisms have raised some on-target toxicity concerns5–8, the biological consequences of LRRK2 inhibition have not been well characterized in humans. Here, we systematically analyze pLoF variants in LRRK2 observed across 141,456 individuals sequenced in the Genome Aggregation Database (gnomAD)9, 49,960 exome-sequenced individuals from the UK Biobank and over 4 million participants in the 23andMe genotyped dataset. After stringent variant curation, we identify 1,455 individuals with high-confidence pLoF variants in LRRK2. Experimental validation of three variants, combined with previous work10, confirmed reduced protein levels in 82.5% of our cohort. We show that heterozygous pLoF variants in LRRK2 reduce LRRK2 protein levels but that these are not strongly associated with any specific phenotype or disease state. Our results demonstrate the value of large-scale genomic databases and phenotyping of human loss-of-function carriers for target validation in drug discovery.

76 citations


Journal ArticleDOI
28 May 2020-Nature
TL;DR: In this paper, the authors report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants and provide a roadmap for human ‘knockout' studies and a guide for future research into disease biology and drug target selection.
Abstract: Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of human gene inactivation that complements knockout studies in cells and model organisms. Here we report three key findings regarding the assessment of candidate drug targets using human loss-of-function variants. First, even essential genes, in which loss-of-function variants are not tolerated, can be highly successful as targets of inhibitory drugs. Second, in most genes, loss-of-function variants are sufficiently rare that genotype-based ascertainment of homozygous or compound heterozygous ‘knockout’ humans will await sample sizes that are approximately 1,000 times those presently available, unless recruitment focuses on consanguineous individuals. Third, automated variant annotation and filtering are powerful, but manual curation remains crucial for removing artefacts, and is a prerequisite for recall-by-genotype efforts. Our results provide a roadmap for human knockout studies and should guide the interpretation of loss-of-function variants in drug development. Analysis of predicted loss-of-function variants from 125,748 human exomes and 15,708 whole genomes in the Genome Aggregation Database (gnomAD) provides a roadmap for human ‘knockout’ studies and a guide for future research into disease biology and drug-target selection.

57 citations


Journal ArticleDOI
TL;DR: A vast AE resource generated from the GTEx v8 release is presented and the utility of this resource is demonstrated, and an extension of the tool phASER is developed that allows effect sizes of cis -regulatory variants to be estimated using haplotype-level AE data.
Abstract: Allele expression (AE) analysis robustly measures cis-regulatory effects. Here, we present and demonstrate the utility of a vast AE resource generated from the GTEx v8 release, containing 15,253 samples spanning 54 human tissues for a total of 431 million measurements of AE at the SNP level and 153 million measurements at the haplotype level. In addition, we develop an extension of our tool phASER that allows effect sizes of cis-regulatory variants to be estimated using haplotype-level AE data. This AE resource is the largest to date, and we are able to make haplotype-level data publicly available. We anticipate that the availability of this resource will enable future studies of regulatory variation across human tissues.

Journal ArticleDOI
TL;DR: Cohort Profile: East London Genes & Health (ELGH), a community-based population genomics and health study in British Bangladeshi and British Pakistani people.
Abstract: Cohort profile in a nutshell: - East London Genes & Health (ELGH) is a large scale, community genomics and health study (to date >30,000 volunteers; target 100,000 volunteers). - ELGH was set up in 2015 to gain deeper understanding of health and disease, and underlying genetic influences, in people of British-Bangladeshi and -Pakistani heritage living in east London. - ELGH prioritises studies in areas important to, and identified by, the community it represents. Current priorities include cardiometabolic diseases and mental illness, these being of notably high prevalence and severity. However studies in any scientific area are possible, subject to community advisory group and ethical approval. - ELGH combines health data science (using linked UK National Health Service (NHS) electronic health record data) with exome sequencing and SNP array genotyping to elucidate the genetic influence on health and disease, including the contribution from high rates of parental relatedness on rare genetic variation and homozygosity (autozygosity), in two understudied ethnic groups. Linkage to longitudinal health record data enables both retrospective and prospective analyses. - Through stage 2 studies, ELGH offers researchers the opportunity to undertake recall-by-genotype and/or recall-by-phenotype studies on volunteers. Sub-cohort, trial-within-cohort, and other study designs are possible. - ELGH is a fully collaborative, open access resource, open to academic and life sciences industry scientific research partners.

Journal ArticleDOI
TL;DR: The data suggest that exome sequencing should be used for pathogenic variant detection in patients with suspected genetic muscle diseases, focusing first on the most common disease genes described here, and subsequently in rarer and newly characterized disease genes.

Journal ArticleDOI
24 Mar 2020-eLife
TL;DR: In this woman, lifelong HAO1 knockout is safe and without clinical phenotype, de-risking a therapeutic approach and informing therapeutic mechanisms, and Unlocking evidence from the diversity of human genetic variation can facilitate drug development.
Abstract: By sequencing autozygous human populations, we identified a healthy adult woman with lifelong complete knockout of HAO1 (expected ~1 in 30 million outbred people). HAO1 (glycolate oxidase) silencing is the mechanism of lumasiran, an investigational RNA interference therapeutic for primary hyperoxaluria type 1. Her plasma glycolate levels were 12 times, and urinary glycolate 6 times, the upper limit of normal observed in healthy reference individuals (n = 67). Plasma metabolomics and lipidomics (1871 biochemicals) revealed 18 markedly elevated biochemicals (>5 sd outliers versus n = 25 controls) suggesting additional HAO1 effects. Comparison with lumasiran preclinical and clinical trial data suggested she has <2% residual glycolate oxidase activity. Cell line p.Leu333SerfsTer4 expression showed markedly reduced HAO1 protein levels and cellular protein mis-localisation. In this woman, lifelong HAO1 knockout is safe and without clinical phenotype, de-risking a therapeutic approach and informing therapeutic mechanisms. Unlocking evidence from the diversity of human genetic variation can facilitate drug development.

Posted ContentDOI
Julia K. Goodrich1, Moriel Singer-Berk1, Rachel Son1, Abigail Sveden1  +155 moreInstitutions (66)
24 Sep 2020-medRxiv
TL;DR: Additional epidemiologic and genetic factors contributing to risk prediction are assessed, demonstrating that inclusion of common polygenic variation significantly improved biomarker estimation for two monogenic dyslipidemias.
Abstract: Hundreds of thousands of genetic variants have been reported to cause severe monogenic diseases, but the probability that a variant carrier will develop the disease (termed penetrance) is unknown for virtually all of them. Additionally, the clinical utility of common polygenetic variation remains uncertain. Using exome sequencing from 77,184 adult individuals (38,618 multi-ancestral individuals from a type 2 diabetes case-control study and 38,566 participants from the UK Biobank, for whom genotype array data were also available), we applied clinical standard-of-care gene variant curation for eight monogenic metabolic conditions. Rare variants causing monogenic diabetes and dyslipidemias displayed effect sizes significantly larger than the top 1% of the corresponding polygenic scores. Nevertheless, penetrance estimates for monogenic variant carriers averaged below 60% in both studies for all conditions except monogenic diabetes. We assessed additional epidemiologic and genetic factors contributing to risk prediction, demonstrating that inclusion of common polygenic variation significantly improved biomarker estimation for two monogenic dyslipidemias.

Journal ArticleDOI
TL;DR: Emerging evidence supporting a vital developmental role for TTN isoforms containing metatranscript‐only exons is extended, with RNA‐sequencing from 365 human gastrocnemius samples revealed that 56% specimens predominantly include exons 213‐217 in TTN transcripts (inclusion rate ≥66%).
Abstract: We present eight families with arthrogryposis multiplex congenita and myopathy bearing a TTN intron 213 extended splice-site variant (NM_001267550.1:c.39974-11T>G), inherited in trans with a second pathogenic TTN variant. Muscle-derived RNA studies of three individuals confirmed mis-splicing induced by the c.39974-11T>G variant; in-frame exon 214 skipping or use of a cryptic 3' splice-site effecting a frameshift. Confounding interpretation of pathogenicity is the absence of exons 213-217 within the described skeletal muscle TTN N2A isoform. However, RNA-sequencing from 365 adult human gastrocnemius samples revealed that 56% specimens predominantly include exons 213-217 in TTN transcripts (inclusion rate ≥66%). Further, RNA-sequencing of five fetal muscle samples confirmed that 4/5 specimens predominantly include exons 213-217 (fifth sample inclusion rate 57%). Contractures improved significantly with age for four individuals, which may be linked to decreased expression of pathogenic fetal transcripts. Our study extends emerging evidence supporting a vital developmental role for TTN isoforms containing metatranscript-only exons.

Journal ArticleDOI
Dervla M. Connaughton1, Dervla M. Connaughton2, Rufeng Dai1, Danielle J. Owen3, Jonathan Marquez4, Nina Mann1, Adda L. Graham-Paquin5, Makiko Nakayama1, Etienne Coyaud6, Etienne Coyaud7, Estelle M.N. Laurent7, Estelle M.N. Laurent6, Jonathan St-Germain6, Lot Snijders Blok8, Lot Snijders Blok9, Arianna Vino9, Verena Klämbt1, Konstantin Deutsch1, Chen Han Wilfred Wu1, Caroline M. Kolvenbach1, Franziska Kause1, Isabel Ottlewski1, Ronen Schneider1, Thomas M. Kitzler1, Amar J. Majmundar1, Florian Buerger1, Ana C. Onuchic-Whitford10, Ana C. Onuchic-Whitford1, Mao Youying1, Amy Kolb1, Daanya Salmanullah1, Evan Chen1, Amelie T. van der Ven1, Jia Rao1, Hadas Ityel1, Steve Seltzsam1, Johanna M. Rieke1, Jing Chen1, Asaf Vivante1, Asaf Vivante11, Daw Yang Hwang1, Stefan Kohl1, Gabriel C. Dworschak1, Tobias Hermle1, Marielle Alders12, Tobias Bartolomaeus13, Stuart B. Bauer1, Michelle A. Baum1, Eva H. Brilstra14, Thomas D. Challman, Jacob Zyskind15, Carrie Costin1, Katrina M. Dipple16, Floor A. M. Duijkers12, Marcia Ferguson, David FitzPatrick17, Roger Fick, Ian A. Glass16, Peter J. Hulick18, Antonie D. Kline, Ilona Krey13, Selvin Kumar, Weining Lu19, Elysa J. Marco20, Ingrid M. Wentzensen15, Heather C Mefford16, Konrad Platzer13, Inna S. Povolotskaya21, Juliann M. Savatt, N. V. Shcherbakova21, Prabha Senguttuvan, Audrey Squire22, Deborah R. Stein1, Isabelle Thiffault23, Isabelle Thiffault24, Victoria Y. Voinova21, Michael J. Somers1, Michael A. J. Ferguson1, Avram Z. Traum1, Ghaleb Daouk1, Ankana Daga1, Nancy Rodig1, Paulien A Terhal14, Ellen van Binsbergen14, Loai A. Eid25, Velibor Tasic1, Hila Milo Rasouly26, Tze Y Lim26, Dina Ahram26, Ali G. Gharavi26, Heiko Reutter27, Heidi L. Rehm28, Heidi L. Rehm29, Daniel G. MacArthur28, Daniel G. MacArthur29, Monkol Lek28, Monkol Lek29, Kristen M. Laricchia29, Kristen M. Laricchia28, Richard P. Lifton30, Hong Xu1, Shrikant Mane4, Simone Sanna-Cherchi26, Andrew D. Sharrocks3, Brian Raught6, Simon E. Fisher8, Simon E. Fisher9, Maxime Bouchard5, Mustafa K. Khokha4, Shirlee Shril1, Friedhelm Hildebrandt1 
TL;DR: Findings establish that loss-of-function mutations of ZMYM2, and potentially that of other proteins in its interactome, as causes of human CAKUT, offering new routes for studying the pathogenesis of the disorder.
Abstract: Congenital anomalies of the kidney and urinary tract (CAKUT) constitute one of the most frequent birth defects and represent the most common cause of chronic kidney disease in the first three decades of life. Despite the discovery of dozens of monogenic causes of CAKUT, most pathogenic pathways remain elusive. We performed whole-exome sequencing (WES) in 551 individuals with CAKUT and identified a heterozygous de novo stop-gain variant in ZMYM2 in two different families with CAKUT. Through collaboration, we identified in total 14 different heterozygous loss-of-function mutations in ZMYM2 in 15 unrelated families. Most mutations occurred de novo, indicating possible interference with reproductive function. Human disease features are replicated in X. tropicalis larvae with morpholino knockdowns, in which expression of truncated ZMYM2 proteins, based on individual mutations, failed to rescue renal and craniofacial defects. Moreover, heterozygous Zmym2-deficient mice recapitulated features of CAKUT with high penetrance. The ZMYM2 protein is a component of a transcriptional corepressor complex recently linked to the silencing of developmentally regulated endogenous retrovirus elements. Using protein-protein interaction assays, we show that ZMYM2 interacts with additional epigenetic silencing complexes, as well as confirming that it binds to FOXP1, a transcription factor that has also been linked to CAKUT. In summary, our findings establish that loss-of-function mutations of ZMYM2, and potentially that of other proteins in its interactome, as causes of human CAKUT, offering new routes for studying the pathogenesis of the disorder.

Posted ContentDOI
21 Oct 2020-bioRxiv
TL;DR: The expression modifier score (EMS) is presented, a predicted probability that a variant has a cis-regulatory effect on gene expression, trained on fine-mapped eQTLs and leveraging 6,121 features including epigenetic marks and sequence-based neural network predictions.
Abstract: The large majority of variants identified by GWAS are non-coding, motivating detailed characterization of the function of non-coding variants. Experimental methods to assess variants’ effect on gene expressions in native chromatin context via direct perturbation are low-throughput. Existing high-throughput computational predictors thus have lacked large gold standard sets of regulatory variants for training and validation. Here, we leverage a set of 14,807 putative causal eQTLs in humans obtained through statistical fine-mapping, and we use 6,121 features to directly train a predictor of whether a variant modifies nearby gene expression. We call the resulting prediction the expression modifier score (EMS). We validate EMS by comparing its ability to prioritize functional variants with other major scores. We then use EMS as a prior for statistical fine-mapping of eQTLs to identify an additional 20,913 putatively causal eQTLs, and we incorporate EMS into co-localization analysis to identify 310 additional candidate genes across UK Biobank phenotypes.

Journal ArticleDOI
TL;DR: Individuals with a genetic diagnosis with nanophthalmos or posterior microphthalmos had shorter mean axial lengths and higher hyperopia than those without, with recessive forms associated with the most extreme phenotypes.
Abstract: Nanophthalmos and posterior microphthalmos are ocular abnormalities in which both eyes are abnormally small, and typically associated with extreme hyperopia. We recruited 40 individuals from 13 kindreds with nanophthalmos or posterior microphthalmos, with 12 probands subjected to exome sequencing. Nine probands (69.2%) were assigned a genetic diagnosis, with variants in MYRF, TMEM98, MFRP, and PRSS56. Two of four PRSS56 families harbored the previously described c.1066dupC variant implicated in over half of all reported PRSS56 kindreds, with different surrounding haplotypes in each family suggesting a mutational hotspot. Individuals with a genetic diagnosis had shorter mean axial lengths and higher hyperopia than those without, with recessive forms associated with the most extreme phenotypes. These findings detail the genetic architecture of nanophthalmos and posterior microphthalmos in a cohort of predominantly European ancestry, their relative clinical phenotypes, and highlight the shared genetic architecture of rare and common disorders of refractive error.

Posted ContentDOI
13 Aug 2020-bioRxiv
TL;DR: It is observed that whole genome sequencing was sensitive to the detection of all classes of pathogenic variation captured by three conventional tests, and diagnostic yields from WGS were superior to any individual genetic test, warranting further evaluation as a first-tier diagnostic approach.
Abstract: Current prenatal and pediatric genetic evaluation requires three tests to capture balanced chromosomal abnormalities (karyotype), copy number variants (microarray), and coding variants (whole exome sequencing [WES] or targeted gene panels). Here, we explored the sensitivity, specificity, and added value of whole genome sequencing (WGS) to displace all three conventional approaches. We analyzed single nucleotide variants, small insertions and deletions, and structural variants from WGS in 1,612 autism spectrum disorder (ASD) quartet families (n=6,448 individuals) to benchmark the diagnostic performance of WGS against microarray and WES. We then applied these WGS variant discovery and interpretation pipelines to 175 trios (n=525 individuals) with a fetal structural anomaly (FSA) detected on ultrasound and pre-screened by karyotype and microarray. Analyses of WGS in ASD quartets identified a diagnostic variant in 7.5% of ASD probands compared to 1.1% of unaffected siblings (odds ratio=7.5; 95% confidence interval=4.5-13.6; P=2.8×10−21). We found that WGS captured all diagnostic variants detected by microarray and WES as well as five additional diagnoses, reflecting a 0.3% added yield over WES and microarray when combined. The WGS diagnostic yield was also inversely correlated with ASD proband IQ. Implementation in FSA trios identified a diagnostic variant not captured by karyotype or microarray in 12.0% of fetuses. Based on these data and prior studies, we estimate that WGS could provide an overall diagnostic yield of 47.6% in unscreened FSA referrals. We observed that WGS was sensitive to the detection of all classes of pathogenic variation captured by three conventional tests. Moreover, diagnostic yields from WGS were superior to any individual genetic test, warranting further evaluation as a first-tier diagnostic approach.

Journal ArticleDOI
TL;DR: A novel ADSSL1 mutation is reported and two sporadic cases of Turkish and Indian origin are described, one of which presented with progressive, proximally pronounced weakness, severe muscle atrophy and early contractures, suggesting mutations in ADSSL2 have to be considered in patients with both distal and proximal muscle weakness and across various ethnicities.

Posted ContentDOI
26 Oct 2020-bioRxiv
TL;DR: Ghosh et al. as discussed by the authors performed whole genome sequencing (WGS) of 143 human embryonic stem cell (hESC) lines and annotated their single nucleotide and structural genetic variants.
Abstract: There has not yet been a systematic analysis of hESC whole genomes at a single nucleotide resolution. We therefore performed whole genome sequencing (WGS) of 143 hESC lines and annotated their single nucleotide and structural genetic variants. We found that while a substantial fraction of hESC lines contained large deleterious structural variants, finer scale structural and single nucleotide variants (SNVs) that are ascertainable only through WGS analyses were present in hESCs genomes and human blood-derived genomes at similar frequencies. However, WGS did identify SNVs associated with cancer or other diseases that will likely alter cellular phenotypes and may compromise the safety of hESC-derived cellular products transplanted into humans. As a resource to enable reproducible hESC research and safer translation, we provide a user-friendly WGS data portal and a data-driven scheme for cell line maintenance and selection. GRAPHICAL IN BRIEF Merkle and Ghosh et al. describe insights from the whole genome sequences of commonly used human embryonic stem cell (hESC) lines. Analyses of these sequences show that while hESC genomes had more large structural variants than humans do from genetic inheritance, hESCs did not have an observable excess of finer-scale variants. However, many hESC lines contained rare loss-of-function variants and combinations of common variants that may profoundly shape their biological phenotypes. Thus, genome sequencing data can be valuable to those selecting cell lines for a given biological or clinical application, and the sequences and analysis reported here should facilitate such choices. HIGHLIGHTS One third of hESCs we analysed are siblings, and almost all are of European ancestry Large structural variants are common in hESCs, but finer-scale variation is similar to that human populations Many strong-effect loss-of-function mutations and cancer-associated mutations are present in specific hESC lines We provide user-friendly resources for rational hESC line selection based on genome sequence

Journal ArticleDOI
TL;DR: Through whole exome sequencing, a homozygous missense variant inTLK2 is identified in a patient showing more severe symptoms than those previously described, including cerebellar vermis hypoplasia and West syndrome, highlighting that recessive variants in TLK2 can also be disease causing and may act through a different pathomechanism.
Abstract: A distinct neurodevelopmental phenotype characterised mainly by mild motor and language delay and facial dysmorphism, caused by heterozygous de novo or dominant variants in the TLK2 gene has recently been described. All cases reported carried either truncating variants located throughout the gene, or missense changes principally located at the C-terminal end of the protein mostly resulting in haploinsufficiency of TLK2. Through whole exome sequencing, we identified a homozygous missense variant in TLK2 in a patient showing more severe symptoms than those previously described, including cerebellar vermis hypoplasia and West syndrome. Both parents are heterozygous for the variant and clinically unaffected highlighting that recessive variants in TLK2 can also be disease causing and may act through a different pathomechanism.

Posted ContentDOI
04 Aug 2020-bioRxiv
TL;DR: A new way to find and characterize genome structural variation by utilizing identity-by-descent (IBD) relationships between siblings together with high-precision measurements of segmental copy number is described, showing that dispersed duplications and mutations can be identified by looking for copy number variants that do not follow these expected inheritance patterns.
Abstract: Two intriguing forms of genome structural variation (SV) – dispersed duplications, and de novo rearrangements of complex, multi-allelic loci – have long escaped genomic analysis. We describe a new way to find and characterize such variation by utilizing identity-by-descent (IBD) relationships between siblings together with high-precision measurements of segmental copy number. Analyzing whole-genome sequence data from 706 families, we find hundreds of “IBD-discordant” (IBDD) CNVs: loci at which siblings’ CNV measurements and IBD states are mathematically inconsistent. We found that commonly-IBDD CNVs identify dispersed duplications; we mapped 95 of these common dispersed duplications to their true genomic locations through family-based linkage and population linkage disequilibrium (LD), and found several to be in strong LD with genome-wide association (GWAS) signals for common diseases or gene expression variation at their revealed genomic locations. Other CNVs that were IBDD in a single family appear to involve de novo mutations in complex and multi-allelic loci; we identified 26 de novo structural mutations that had not been previously detected in earlier analyses of the same families by diverse SV analysis methods. These included a de novo mutation of the amylase gene locus and multiple de novo mutations at chromosome 15q14. Combining these complex mutations with more-conventional CNVs, we estimate that segmental mutations larger than 1kb arise in about one per 22 human meioses. These methods are complementary to previous techniques in that they interrogate genomic regions that are home to segmental duplication, high CNV allele frequencies, and multi-allelic CNVs. Author Summary Copy number variation is an important form of genetic variation in which individuals differ in the number of copies of segments of their genomes. Certain aspects of copy number variation have traditionally been difficult to study using short-read sequencing data. For example, standard analyses often cannot tell whether the duplicated copies of a segment are located near the original copy or are dispersed to other regions of the genome. Another aspect of copy number variation that has been difficult to study is the detection of mutations in the number of DNA segments passed down from parents to their children, particularly when the mutations affect genome segments which already undergo common copy number variation in the population. We develop an analytical approach to solving these problems when sequencing data is available for all members of families with at least two children. This method is based on determining the number of parental haplotypes the two siblings share at each location in their genome, and using that information to determine the possible inheritance patterns that might explain the copy numbers we observe in each family member. We show that dispersed duplications and mutations can be identified by looking for copy number variants that do not follow these expected inheritance patterns. We use this approach to determine the location of 95 common duplications which are dispersed to distant regions of the genome, and demonstrate that these duplications are linked to genetic variants that affect disease risk or gene expression levels. We also identify a set of copy number mutations not detected by previous analyses of sequencing data from a large cohort of families, and show that repetitive and complex regions of the genome undergo frequent mutations in copy number.