scispace - formally typeset
Search or ask a question

Showing papers in "American Journal of Human Genetics in 2021"


Journal ArticleDOI
TL;DR: Beagle 5.2 as mentioned in this paper uses marker windowing and composite reference haplotypes to reduce memory usage and computation time for haplotype phasing, which is more than 20 times faster than SHAPEIT, achieves similar accuracy and scales to larger sample sizes.
Abstract: Summary Haplotype phasing is the estimation of haplotypes from genotype data. We present a fast, accurate, and memory-efficient haplotype phasing method that scales to large-scale SNP array and sequence data. The method uses marker windowing and composite reference haplotypes to reduce memory usage and computation time. It incorporates a progressive phasing algorithm that identifies confidently phased heterozygotes in each iteration and fixes the phase of these heterozygotes in subsequent iterations. For data with many low-frequency variants, such as whole-genome sequence data, the method employs a two-stage phasing algorithm that phases high-frequency markers via progressive phasing in the first stage and phases low-frequency markers via genotype imputation in the second stage. This haplotype phasing method is implemented in the open-source Beagle 5.2 software package. We compare Beagle 5.2 and SHAPEIT 4.2.1 by using expanding subsets of 485,301 UK Biobank samples and 38,387 TOPMed samples. Both methods have very similar accuracy and computation time for UK Biobank SNP array data. However, for TOPMed sequence data, Beagle is more than 20 times faster than SHAPEIT, achieves similar accuracy, and scales to larger sample sizes.

131 citations


Journal ArticleDOI
TL;DR: The current and most recent knowledge about the genetics of repeat expansion disorders and the diversity of their pathophysiological mechanisms are summarized and the perspectives of developing personalized treatments in the future are outlined.
Abstract: Tandem repeats represent one of the most abundant class of variations in human genomes, which are polymorphic by nature and become highly unstable in a length-dependent manner. The expansion of repeat length across generations is a well-established process that results in human disorders mainly affecting the central nervous system. At least 50 disorders associated with expansion loci have been described to date, with half recognized only in the last ten years, as prior methodological difficulties limited their identification. These limitations still apply to the current widely used molecular diagnostic methods (exome or gene panels) and thus result in missed diagnosis detrimental to affected individuals and their families, especially for disorders that are very rare and/or clinically not recognizable. Most of these disorders have been identified through family-driven approaches and many others likely remain to be identified. The recent development of long-read technologies provides a unique opportunity to systematically investigate the contribution of tandem repeats and repeat expansions to the genetic architecture of human disorders. In this review, we summarize the current and most recent knowledge about the genetics of repeat expansion disorders and the diversity of their pathophysiological mechanisms and outline the perspectives of developing personalized treatments in the future.

115 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the clinical and economic impact of using Rapid Whole-Genome Sequencing (rWGS)-based Rapid Precision Medicine (RPM) as a diagnostic test in the California Medicaid (Medi-Cal) program.
Abstract: Summary Genetic disorders are a leading contributor to mortality in neonatal and pediatric intensive care units (ICUs). Rapid whole-genome sequencing (rWGS)-based rapid precision medicine (RPM) is an intervention that has demonstrated improved clinical outcomes and reduced costs of care. However, the feasibility of broad clinical deployment has not been established. The objective of this study was to implement RPM based on rWGS and evaluate the clinical and economic impact of this implementation as a first line diagnostic test in the California Medicaid (Medi-Cal) program. Project Baby Bear was a payor funded, prospective, real-world quality improvement project in the regional ICUs of five tertiary care children’s hospitals. Participation was limited to acutely ill Medi-Cal beneficiaries who were admitted November 2018 to May 2020, were

105 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate an attribute of 3D genome architecture-the stability of TAD boundaries across cell types-and demonstrate its relevance to understand how genetic variation in TADs contributes to complex disease by synthesizing TAD maps across 37 diverse cell types with 41 genome-wide association studies (GWASs).
Abstract: Topologically associating domains (TADs) are fundamental units of three-dimensional (3D) nuclear organization The regions bordering TADs-TAD boundaries-contribute to the regulation of gene expression by restricting interactions of cis-regulatory sequences to their target genes TAD and TAD-boundary disruption have been implicated in rare-disease pathogenesis; however, we have a limited framework for integrating TADs and their variation across cell types into the interpretation of common-trait-associated variants Here, we investigate an attribute of 3D genome architecture-the stability of TAD boundaries across cell types-and demonstrate its relevance to understanding how genetic variation in TADs contributes to complex disease By synthesizing TAD maps across 37 diverse cell types with 41 genome-wide association studies (GWASs), we investigate the differences in disease association and evolutionary pressure on variation in TADs versus TAD boundaries We demonstrate that genetic variation in TAD boundaries contributes more to complex-trait heritability, especially for immunologic, hematologic, and metabolic traits We also show that TAD boundaries are more evolutionarily constrained than TADs Next, stratifying boundaries by their stability across cell types, we find substantial variation Compared to boundaries unique to a specific cell type, boundaries stable across cell types are further enriched for complex-trait heritability, evolutionary constraint, CTCF binding, and housekeeping genes Thus, considering TAD boundary stability across cell types provides valuable context for understanding the genome's functional landscape and enabling variant interpretation that takes 3D structure into account

86 citations


Journal ArticleDOI
TL;DR: In this article, the distribution of aneuploid chromosomes across 73 unselected preimplantation embryos and 365 biopsies, sampled from four multifocal trophectoderm (TE) samples and the inner cell mass (ICM).
Abstract: Summary Chromosome imbalance (aneuploidy) is the major cause of pregnancy loss and congenital disorders in humans. Analyses of small biopsies from human embryos suggest that aneuploidy commonly originates during early divisions, resulting in mosaicism. However, the developmental potential of mosaic embryos remains unclear. We followed the distribution of aneuploid chromosomes across 73 unselected preimplantation embryos and 365 biopsies, sampled from four multifocal trophectoderm (TE) samples and the inner cell mass (ICM). When mosaicism impacted fewer than 50% of cells in one TE biopsy (low-medium mosaicism), only 1% of aneuploidies affected other portions of the embryo. A double-blinded prospective non-selection trial (NCT03673592) showed equivalent live-birth rates and miscarriage rates across 484 euploid, 282 low-grade mosaic, and 131 medium-grade mosaic embryos. No instances of mosaicism or uniparental disomy were detected in the ensuing pregnancies or newborns, and obstetrical and neonatal outcomes were similar between the study groups. Thus, low-medium mosaicism in the trophectoderm mostly arises after TE and ICM differentiation, and such embryos have equivalent developmental potential as fully euploid ones.

69 citations


Journal ArticleDOI
TL;DR: Targeted Long-Read Sequencing (T-LRS) as discussed by the authors uses adaptive sampling on the Oxford Nanopore platform to search for pathogenic substitutions, structural variants, and methylation differences using a single data source.
Abstract: Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.

68 citations


Journal ArticleDOI
TL;DR: In this paper, the authors formalized an approach to the delineation of Mendelian genetic disorders that encompasses two distinct but inter-related concepts: (1) the gene that is mutated and (2) the phenotypic descriptor, preferably a recognizably distinct phenotype.
Abstract: The delineation of disease entities is complex, yet recent advances in the molecular characterization of diseases provide opportunities to designate diseases in a biologically valid manner. Here, we have formalized an approach to the delineation of Mendelian genetic disorders that encompasses two distinct but inter-related concepts: (1) the gene that is mutated and (2) the phenotypic descriptor, preferably a recognizably distinct phenotype. We assert that only by a combinatorial or dyadic approach taking both of these attributes into account can a unitary, distinct genetic disorder be designated. We propose that all Mendelian disorders should be designated as "GENE-related phenotype descriptor" (e.g., "CFTR-related cystic fibrosis"). This approach to delineating and naming disorders reconciles the complexity of gene-to-phenotype relationships in a simple and clear manner yet communicates the complexity and nuance of these relationships.

62 citations


Journal ArticleDOI
TL;DR: In this article, the ability of optical genome mapping (OGM) to detect known constitutional chromosomal aberrations was investigated, including seven aneuploidies, 19 deletions, 20 duplications, 34 translocations, six inversions, two insertions, six isochromosomes, one ring chromosome, and four complex rearrangements.
Abstract: Chromosomal aberrations including structural variations (SVs) are a major cause of human genetic diseases. Their detection in clinical routine still relies on standard cytogenetics. Drawbacks of these tests are a very low resolution (karyotyping) and the inability to detect balanced SVs or indicate the genomic localization and orientation of duplicated segments or insertions (copy number variant [CNV] microarrays). Here, we investigated the ability of optical genome mapping (OGM) to detect known constitutional chromosomal aberrations. Ultra-high-molecular-weight DNA was isolated from 85 blood or cultured cells and processed via OGM. A de novo genome assembly was performed followed by structural variant and CNV calling and annotation, and results were compared to known aberrations from standard-of-care tests (karyotype, FISH, and/or CNV microarray). In total, we analyzed 99 chromosomal aberrations, including seven aneuploidies, 19 deletions, 20 duplications, 34 translocations, six inversions, two insertions, six isochromosomes, one ring chromosome, and four complex rearrangements. Several of these variants encompass complex regions of the human genome involved in repeat-mediated microdeletion/microduplication syndromes. High-resolution OGM reached 100% concordance compared to standard assays for all aberrations with non-centromeric breakpoints. This proof-of-principle study demonstrates the ability of OGM to detect nearly all types of chromosomal aberrations. We also suggest suited filtering strategies to prioritize clinically relevant aberrations and discuss future improvements. These results highlight the potential for OGM to provide a cost-effective and easy-to-use alternative that would allow comprehensive detection of chromosomal aberrations and structural variants, which could give rise to an era of "next-generation cytogenetics."

61 citations


Journal ArticleDOI
TL;DR: In this article, the authors used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), a respiratory illness causing hospitalization or death.
Abstract: Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causes coronavirus disease 2019 (COVID-19), a respiratory illness that can result in hospitalization or death. We used exome sequence data to investigate associations between rare genetic variants and seven COVID-19 outcomes in 586,157 individuals, including 20,952 with COVID-19. After accounting for multiple testing, we did not identify any clear associations with rare variants either exome wide or when specifically focusing on (1) 13 interferon pathway genes in which rare deleterious variants have been reported in individuals with severe COVID-19, (2) 281 genes located in susceptibility loci identified by the COVID-19 Host Genetics Initiative, or (3) 32 additional genes of immunologic relevance and/or therapeutic potential. Our analyses indicate there are no significant associations with rare protein-coding variants with detectable effect sizes at our current sample sizes. Analyses will be updated as additional data become available, and results are publicly available through the Regeneron Genetics Center COVID-19 Results Browser.

59 citations


Journal ArticleDOI
TL;DR: These analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.
Abstract: Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 97% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 914% of deletions that were specifically discovered by lrWGS localized to these regions Across the remaining 903% of reference sequence, we observed extremely high (938%) concordance between technologies for deletions in these datasets In contrast, lrWGS was superior for detection of insertions across all genomic contexts Given that non-SD/SR sequences encompass 959% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment

54 citations


Journal ArticleDOI
TL;DR: In this paper, a cross-population analysis framework for polygenic risk score (PRS) construction with both individual-level (XPA) and summary-level GWAS data is proposed.
Abstract: The development of polygenic risk scores (PRSs) has proved useful to stratify the general European population into different risk groups. However, PRSs are less accurate in non-European populations due to genetic differences across different populations. To improve the prediction accuracy in non-European populations, we propose a cross-population analysis framework for PRS construction with both individual-level (XPA) and summary-level (XPASS) GWAS data. By leveraging trans-ancestry genetic correlation, our methods can borrow information from the Biobank-scale European population data to improve risk prediction in the non-European populations. Our framework can also incorporate population-specific effects to further improve construction of PRS. With innovations in data structure and algorithm design, our methods provide a substantial saving in computational time and memory usage. Through comprehensive simulation studies, we show that our framework provides accurate, efficient, and robust PRS construction across a range of genetic architectures. In a Chinese cohort, our methods achieved 7.3%-198.0% accuracy gain for height and 19.5%-313.3% accuracy gain for body mass index (BMI) in terms of predictive R2 compared to existing PRS approaches. We also show that XPA and XPASS can achieve substantial improvement for construction of height PRSs in the African population, suggesting the generality of our framework across global populations.

Journal ArticleDOI
TL;DR: In this paper, the authors jointly estimate the proportion of variance explained by additive (hSNP2), dominance (δ SNP2) and additive-by-additive (ηSNP 2) genetic variance in a single analysis model.
Abstract: Non-additive genetic variance for complex traits is traditionally estimated from data on relatives. It is notoriously difficult to estimate without bias in non-laboratory species, including humans, because of possible confounding with environmental covariance among relatives. In principle, non-additive variance attributable to common DNA variants can be estimated from a random sample of unrelated individuals with genome-wide SNP data. Here, we jointly estimate the proportion of variance explained by additive (hSNP2), dominance (δSNP2) and additive-by-additive (ηSNP2) genetic variance in a single analysis model. We first show by simulations that our model leads to unbiased estimates and provide a new theory to predict standard errors estimated using either least-squares or maximum likelihood. We then apply the model to 70 complex traits using 254,679 unrelated individuals from the UK Biobank and 1.1 M genotyped and imputed SNPs. We found strong evidence for additive variance (average across traits h¯SNP2=0.208). In contrast, the average estimate of δ¯SNP2 across traits was 0.001, implying negligible dominance variance at causal variants tagged by common SNPs. The average epistatic variance η¯SNP2 across the traits was 0.055, not significantly different from zero because of the large sampling variance. Our results provide new evidence that genetic variance for complex traits is predominantly additive and that sample sizes of many millions of unrelated individuals are needed to estimate epistatic variance with sufficient precision.

Journal ArticleDOI
TL;DR: In this article, the combination of karyotyping, FISH, and CNV microarrays was replaced by optical genome mapping (OGM) for the diagnosis of hematological malignancy.
Abstract: Summary Somatic structural variants (SVs) are important drivers of cancer development and progression. In a diagnostic set-up, especially for hematological malignancies, the comprehensive analysis of all SVs in a given sample still requires a combination of cytogenetic techniques, including karyotyping, FISH, and CNV microarrays. We hypothesize that the combination of these classical approaches could be replaced by optical genome mapping (OGM). Samples from 52 individuals with a clinical diagnosis of a hematological malignancy, divided into simple ( 100 kb. In addition, on average, 73 CNVs were called per sample, of which six were >5 Mb. For the 36 simple cases, all clinically reported aberrations were detected, including deletions, insertions, inversions, aneuploidies, and translocations. For the 16 complex cases, results were largely concordant between standard-of-care and OGM, but OGM often revealed higher complexity than previously recognized. Detailed technical comparison with standard-of-care tests showed high analytical validity of OGM, resulting in a sensitivity of 100% and a positive predictive value of >80%. Importantly, OGM resulted in a more complete assessment than any previous single test and most likely reported the most accurate underlying genomic architecture (e.g., for complex translocations, chromoanagenesis, and marker chromosomes). In conclusion, the excellent concordance of OGM with diagnostic standard assays demonstrates its potential to replace classical cytogenetic tests as well as to rapidly map novel leukemia drivers.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper conducted whole-exome sequencing (WES) and identified hemizygous missense variants in the X-linked CFAP47 in three unrelated Chinese individuals with MMAF.
Abstract: Asthenoteratozoospermia characterized by multiple morphological abnormalities of the flagella (MMAF) has been identified as a sub-type of male infertility. Recent progress has identified several MMAF-associated genes with an autosomal recessive inheritance in human affected individuals, but the etiology in approximately 40% of affected individuals remains unknown. Here, we conducted whole-exome sequencing (WES) and identified hemizygous missense variants in the X-linked CFAP47 in three unrelated Chinese individuals with MMAF. These three CFAP47 variants were absent in human control population genome databases and were predicted to be deleterious by multiple bioinformatic tools. CFAP47 encodes a cilia- and flagella-associated protein that is highly expressed in testis. Immunoblotting and immunofluorescence assays revealed obviously reduced levels of CFAP47 in spermatozoa from all three men harboring deleterious missense variants of CFAP47. Furthermore, WES data from an additional cohort of severe asthenoteratozoospermic men originating from Australia permitted the identification of a hemizygous Xp21.1 deletion removing the entire CFAP47 gene. All men harboring hemizygous CFAP47 variants displayed typical MMAF phenotypes. We also generated a Cfap47-mutated mouse model, the adult males of which were sterile and presented with reduced sperm motility and abnormal flagellar morphology and movement. However, fertility could be rescued by the use of intra-cytoplasmic sperm injections (ICSIs). Altogether, our experimental observations in humans and mice demonstrate that hemizygous mutations in CFAP47 can induce X-linked MMAF and asthenoteratozoospermia, for which good ICSI prognosis is suggested. These findings will provide important guidance for genetic counseling and assisted reproduction treatments.

Journal ArticleDOI
TL;DR: An in-depth investigation of the promise and limitations of the available colocalization analysis approaches is conducted, and a single analytical factor, the specification of prior enrichment levels, is identified, which can lead to severe inflation of false-positive colocalized findings.
Abstract: Colocalization analysis has emerged as a powerful tool to uncover the overlapping of causal variants responsible for both molecular and complex disease phenotypes. The findings from colocalization analysis yield insights into the molecular pathways of complex diseases. In this paper, we conduct an in-depth investigation of the promise and limitations of the available colocalization analysis approaches. Focusing on variant-level colocalization approaches, we first establish the connections between various existing methods. We proceed to discuss the impacts of various controllable analytical factors and uncontrollable practical factors on outcomes of colocalization analysis through realistic simulations and real data examples. We identify a single analytical factor, the specification of prior enrichment levels, which can lead to severe inflation of false-positive colocalization findings. Meanwhile, the combination of many other analytical and practical factors all lead to diminished power. Consequently, we recommend the following strategies for the best practice of colocalization analysis: (1) estimating prior enrichment level from the observed data and (2) separating fine-mapping and colocalization analysis. Our analysis of 4,091 complex traits and the multi-tissue expression quantitative trait loci (eQTL) data from the GTEx (v.8) suggests that colocalizations of molecular QTLs and causal complex trait associations are widespread. However, only a small proportion can be confidently identified from currently available data due to a lack of power. Our findings set a benchmark for current and future integrative genetic association analysis applications.

Journal ArticleDOI
TL;DR: In this paper, a constrained maximum likelihood and model averaging (cML-MA) approach was proposed to investigate causal relationships between pairs of traits using SNPs as instrumental variables (IVs) based on GWAS summary data.
Abstract: With the increasing availability of large-scale GWAS summary data on various complex traits and diseases, there have been tremendous interests in applications of Mendelian randomization (MR) to investigate causal relationships between pairs of traits using SNPs as instrumental variables (IVs) based on observational data. In spite of the potential significance of such applications, the validity of their causal conclusions critically depends on some strong modeling assumptions required by MR, which may be violated due to the widespread (horizontal) pleiotropy. Although many MR methods have been proposed recently to relax the assumptions by mainly dealing with uncorrelated pleiotropy, only a few can handle correlated pleiotropy, in which some SNPs/IVs may be associated with hidden confounders, such as some heritable factors shared by both traits. Here we propose a simple and effective approach based on constrained maximum likelihood and model averaging, called cML-MA, applicable to GWAS summary data. To deal with more challenging situations with many invalid IVs with only weak pleiotropic effects, we modify and improve it with data perturbation. Extensive simulations demonstrated that the proposed methods could control the type I error rate better while achieving higher power than other competitors. Applications to 48 risk factor-disease pairs based on large-scale GWAS summary data of 3 cardio-metabolic diseases (coronary artery disease, stroke, and type 2 diabetes), asthma, and 12 risk factors confirmed its superior performance.

Journal ArticleDOI
TL;DR: In this paper, the omnigenic model was proposed as a framework to understand the highly polygenic architecture of complex traits revealed by genome-wide association studies (GWASs), which explains recent observations about cross-population genetic effects, specifically the low transferability of polygenic scores and the lack of clear evidence for polygenic selection.
Abstract: Summary The omnigenic model was proposed as a framework to understand the highly polygenic architecture of complex traits revealed by genome-wide association studies (GWASs). I argue that this model also explains recent observations about cross-population genetic effects, specifically the low transferability of polygenic scores and the lack of clear evidence for polygenic selection. In particular, the omnigenic model explains why the effects of most GWAS variants vary between populations. This interpretation has several consequences for the evolutionary interpretation and practical use of GWAS summary statistics and polygenic scores. First, some polygenic scores may be applicable only in populations of the same ancestry and environment as the discovery population. Second, most GWAS associations will have differing effects between populations and are unlikely to be robust clinical targets. Finally, it may not always be possible to detect polygenic selection from population genetic data. These considerations make it difficult to interpret the clinical and evolutionary meanings of polygenic scores without an explicit model of genetic architecture.

Journal ArticleDOI
TL;DR: For example, this paper found that de novo structural mutations occur at an overall rate of at least 0.160 events per genome in unaffected individuals and a significantly higher rate (0.206 per genome) in ASD-affected individuals.
Abstract: Each human genome includes de novo mutations that arose during gametogenesis. While these germline mutations represent a fundamental source of new genetic diversity, they can also create deleterious alleles that impact fitness. Whereas the rate and patterns of point mutations in the human germline are now well understood, far less is known about the frequency and features that impact de novo structural variants (dnSVs). We report a family-based study of germline mutations among 9,599 human genomes from 33 multigenerational CEPH-Utah families and 2,384 families from the Simons Foundation Autism Research Initiative. We find that de novo structural mutations detected by alignment-based, short-read WGS occur at an overall rate of at least 0.160 events per genome in unaffected individuals, and we observe a significantly higher rate (0.206 per genome) in ASD-affected individuals. In both probands and unaffected samples, nearly 73% of de novo structural mutations arose in paternal gametes, and we predict most de novo structural mutations to be caused by mutational mechanisms that do not require sequence homology. After multiple testing correction, we did not observe a statistically significant correlation between parental age and the rate of de novo structural variation in offspring. These results highlight that a spectrum of mutational mechanisms contribute to germline structural mutations and that these mechanisms most likely have markedly different rates and selective pressures than those leading to point mutations.

Journal ArticleDOI
TL;DR: In this paper, a massively parallel screen in human cells was performed to identify loss-of-function missense variants in the key DNA mismatch repair factor MSH2 and the resulting functional effect map is substantially complete, covering 94% of the 17,746 possible variants, and is highly concordant (96%) with existing functional data and expert clinicians' interpretations.
Abstract: The lack of functional evidence for the majority of missense variants limits their clinical interpretability and poses a key barrier to the broad utility of carrier screening. In Lynch syndrome (LS), one of the most highly prevalent cancer syndromes, nearly 90% of clinically observed missense variants are deemed "variants of uncertain significance" (VUS). To systematically resolve their functional status, we performed a massively parallel screen in human cells to identify loss-of-function missense variants in the key DNA mismatch repair factor MSH2. The resulting functional effect map is substantially complete, covering 94% of the 17,746 possible variants, and is highly concordant (96%) with existing functional data and expert clinicians' interpretations. The large majority (89%) of missense variants were functionally neutral, perhaps unexpectedly in light of its evolutionary conservation. These data provide ready-to-use functional evidence to resolve the ∼1,300 extant missense VUSs in MSH2 and may facilitate the prospective classification of newly discovered variants in the clinic.

Journal ArticleDOI
TL;DR: In this article, three homozygous mutations in the SC coding gene C14ORF39/SIX6OS1 were identified in infertile individuals from different ethnic populations by whole-exome sequencing (WES).
Abstract: Human infertility is a multifactorial disease that affects 8%-12% of reproductive-aged couples worldwide. However, the genetic causes of human infertility are still poorly understood. Synaptonemal complex (SC) is a conserved tripartite structure that holds homologous chromosomes together and plays an indispensable role in the meiotic progression. Here, we identified three homozygous mutations in the SC coding gene C14orf39/SIX6OS1 in infertile individuals from different ethnic populations by whole-exome sequencing (WES). These mutations include a frameshift mutation (c.204_205del [p.His68Glnfs∗2]) from a consanguineous Pakistani family with two males suffering from non-obstructive azoospermia (NOA) and one female diagnosed with premature ovarian insufficiency (POI) as well as a nonsense mutation (c.958G>T [p.Glu320∗]) and a splicing mutation (c.1180-3C>G) in two unrelated Chinese men (individual P3907 and individual P6032, respectively) with meiotic arrest. Mutations in C14orf39 resulted in truncated proteins that retained SYCE1 binding but exhibited impaired polycomplex formation between C14ORF39 and SYCE1. Further cytological analyses of meiosis in germ cells revealed that the affected familial males with the C14orf39 frameshift mutation displayed complete asynapsis between homologous chromosomes, while the affected Chinese men carrying the nonsense or splicing mutation showed incomplete synapsis. The phenotypes of NOA and POI in affected individuals were well recapitulated by Six6os1 mutant mice carrying an analogous mutation. Collectively, our findings in humans and mice highlight the conserved role of C14ORF39/SIX6OS1 in SC assembly and indicate that the homozygous mutations in C14orf39/SIX6OS1 described here are responsible for infertility of these affected individuals, thus expanding our understanding of the genetic basis of human infertility.

Journal ArticleDOI
Francesca Clementina Radio, Kaifang Pang1, Andrea Ciolfi, Michael A. Levy2, Andres Hernandez-Garcia1, Lucia Pedace, Francesca Pantaleoni, Zhandong Liu1, Elke de Boer3, Adam Jackson4, Adam Jackson5, Alessandro Bruselles6, Haley McConkey2, Emilia Stellacci6, Stefania Lo Cicero6, Marialetizia Motta, Rosalba Carrozzo, Maria Lisa Dentici, Kirsty McWalter7, Megha Desai7, Kristin G. Monaghan7, Aida Telegrafi7, Christophe Philippe8, Antonio Vitobello8, Margaret Au9, Katheryn Grand9, Pedro A. Sanchez-Lara9, Joanne Baez9, Kristin Lindstrom10, Peggy Kulch10, Jessica Sebastian10, Suneeta Madan-Khetarpal10, Chelsea Roadhouse11, Jennifer MacKenzie11, Berrin Monteleone, Carol J Saunders12, July K. Jean Cuevas12, Laura A Cross12, Dihong Zhou12, Taila Hartley13, Sarah L. Sawyer13, Fabíola Paoli Monteiro, Tania Vertemati Secches, Fernando Kok, Laura Schultz-Rogers14, Erica L. Macke14, Eva Morava14, Eric W. Klee14, Jennifer L. Kemppainen14, Maria Iascone, Angelo Selicorni, Romano Tenconi15, David J. Amor16, Lynn Pais17, Lyndon Gallacher16, Peter D. Turnpenny, Karen Stals, Sian Ellard, Sara Cabet, Gaetan Lesca, Joset Pascal18, Katharina Steindl18, Sarit Ravid19, Karin Weiss20, Alison M R Castle21, Melissa T. Carter21, Louisa Kalsner22, Bert B.A. de Vries3, Bregje W.M. van Bon, Marijke R. Wevers, Rolph Pfundt, Alexander P.A. Stegmann23, Bronwyn Kerr4, Helen Kingston4, Kate Chandler4, Willow Sheehan10, Abdallah F. Elias10, Deepali N. Shinde, Meghan C. Towne, Nathaniel H. Robin24, Dana H. Goodloe24, Adeline Vanderver25, Adeline Vanderver26, Omar Sherbini24, Krista Bluske27, R. Tanner Hagelstrom27, Caterina Zanus28, Flavio Faletra28, Luciana Musante28, Evangeline Kurtz-Nelson29, Rachel K. Earl29, Britt-Marie Anderlid30, Gilles Morin, Marjon van Slegtenhorst31, Karin E. M. Diderich31, Alice S. Brooks31, Joost Gribnau31, Ruben Boers31, Teresa Robert Finestra31, Lauren Carter10, Anita Rauch18, Paolo Gasparini32, Paolo Gasparini28, Kym M. Boycott13, Tahsin Stefan Barakat31, John M. Graham9, Laurence Faivre33, Siddharth Banka4, Siddharth Banka5, Tianyun Wang29, Evan E. Eichler29, Manuela Priolo, Bruno Dallapiccola, Lisenka E.L.M. Vissers3, Bekim Sadikovic2, Daryl A. Scott1, Jimmy Holder1, Marco Tartaglia 
TL;DR: In this article, the authors used clinical data from 34 individuals with truncating variants in SPEN to define a neurodevelopmental disorder presenting with features that overlap considerably with those of proximal del1p36 syndrome.
Abstract: Deletion 1p36 (del1p36) syndrome is the most common human disorder resulting from a terminal autosomal deletion. This condition is molecularly and clinically heterogeneous. Deletions involving two non-overlapping regions, known as the distal (telomeric) and proximal (centromeric) critical regions, are sufficient to cause the majority of the recurrent clinical features, although with different facial features and dysmorphisms. SPEN encodes a transcriptional repressor commonly deleted in proximal del1p36 syndrome and is located centromeric to the proximal 1p36 critical region. Here, we used clinical data from 34 individuals with truncating variants in SPEN to define a neurodevelopmental disorder presenting with features that overlap considerably with those of proximal del1p36 syndrome. The clinical profile of this disease includes developmental delay/intellectual disability, autism spectrum disorder, anxiety, aggressive behavior, attention deficit disorder, hypotonia, brain and spine anomalies, congenital heart defects, high/narrow palate, facial dysmorphisms, and obesity/increased BMI, especially in females. SPEN also emerges as a relevant gene for del1p36 syndrome by co-expression analyses. Finally, we show that haploinsufficiency of SPEN is associated with a distinctive DNA methylation episignature of the X chromosome in affected females, providing further evidence of a specific contribution of the protein to the epigenetic control of this chromosome, and a paradigm of an X chromosome-specific episignature that classifies syndromic traits. We conclude that SPEN is required for multiple developmental processes and SPEN haploinsufficiency is a major contributor to a disorder associated with deletions centromeric to the previously established 1p36 critical regions.

Journal ArticleDOI
TL;DR: In this article, a full-likelihood method was proposed to infer polygenic adaptation from DNA sequence variation and GWAS summary statistics to quantify recent transient directional selection acting on a complex trait.
Abstract: Summary We present a full-likelihood method to infer polygenic adaptation from DNA sequence variation and GWAS summary statistics to quantify recent transient directional selection acting on a complex trait. Through simulations of polygenic trait architecture evolution and GWASs, we show the method substantially improves power over current methods. We examine the robustness of the method under stratification, uncertainty and bias in marginal effects, uncertainty in the causal SNPs, allelic heterogeneity, negative selection, and low GWAS sample size. The method can quantify selection acting on correlated traits, controlling for pleiotropy even among traits with strong genetic correlation ( | r g | = 80 % ) while retaining high power to attribute selection to the causal trait. When the causal trait is excluded from analysis, selection is attributed to its closest proxy. We discuss limitations of the method, cautioning against strongly causal interpretations of the results, and the possibility of undetectable gene-by-environment (GxE) interactions. We apply the method to 56 human polygenic traits, revealing signals of directional selection on pigmentation, life history, glycated hemoglobin (HbA1c), and other traits. We also conduct joint testing of 137 pairs of genetically correlated traits, revealing widespread correlated response acting on these traits (2.6-fold enrichment, p = 1.5 × 10−7). Signs of selection on some traits previously reported as adaptive (e.g., educational attainment and hair color) are largely attributable to correlated response (p = 2.9 × 10−6 and 1.7 × 10−4, respectively). Lastly, our joint test shows antagonistic selection has increased type 2 diabetes risk and decrease HbA1c (p = 1.5 × 10−5).

Journal ArticleDOI
TL;DR: This paper investigated the history of human exposure to TB by determining the evolutionary trajectory of the TYK2 P1104A variant in Europe, where TB is considered to be the deadliest documented infectious disease.
Abstract: Tuberculosis (TB), usually caused by Mycobacterium tuberculosis bacteria, is the first cause of death from an infectious disease at the worldwide scale, yet the mode and tempo of TB pressure on humans remain unknown. The recent discovery that homozygotes for the P1104A polymorphism of TYK2 are at higher risk to develop clinical forms of TB provided the first evidence of a common, monogenic predisposition to TB, offering a unique opportunity to inform on human co-evolution with a deadly pathogen. Here, we investigate the history of human exposure to TB by determining the evolutionary trajectory of the TYK2 P1104A variant in Europe, where TB is considered to be the deadliest documented infectious disease. Leveraging a large dataset of 1,013 ancient human genomes and using an approximate Bayesian computation approach, we find that the P1104A variant originated in the common ancestors of West Eurasians ∼30,000 years ago. Furthermore, we show that, following large-scale population movements of Anatolian Neolithic farmers and Eurasian steppe herders into Europe, P1104A has markedly fluctuated in frequency over the last 10,000 years of European history, with a dramatic decrease in frequency after the Bronze Age. Our analyses indicate that such a frequency drop is attributable to strong negative selection starting ∼2,000 years ago, with a relative fitness reduction on homozygotes of 20%, among the highest in the human genome. Together, our results provide genetic evidence that TB has imposed a heavy burden on European health over the last two millennia.

Journal ArticleDOI
TL;DR: In this paper, the authors reported somatic mutations of MAP3K3, PIK3CA, MAP2K7, and CCM genes in cerebral cavernous malformations (CCMs) lesions.
Abstract: Cerebral cavernous malformations (CCMs) are vascular disorders that affect up to 0.5% of the total population. About 20% of CCMs are inherited because of familial mutations in CCM genes, including CCM1/KRIT1, CCM2/MGC4607, and CCM3/PDCD10, whereas the etiology of a majority of simplex CCM-affected individuals remains unclear. Here, we report somatic mutations of MAP3K3, PIK3CA, MAP2K7, and CCM genes in CCM lesions. In particular, somatic hotspot mutations of PIK3CA are found in 11 of 38 individuals with CCMs, and a MAP3K3 somatic mutation (c.1323C>G [p.Ile441Met]) is detected in 37.0% (34 of 92) of the simplex CCM-affected individuals. Strikingly, the MAP3K3 c.1323C>G mutation presents in 95.7% (22 of 23) of the popcorn-like lesions but only 2.5% (1 of 40) of the subacute-bleeding or multifocal lesions that are predominantly attributed to mutations in the CCM1/2/3 signaling complex. Leveraging mini-bulk sequencing, we demonstrate the enrichment of MAP3K3 c.1323C>G mutation in CCM endothelium. Mechanistically, beyond the activation of CCM1/2/3-inhibited ERK5 signaling, MEKK3 p.Ile441Met (MAP3K3 encodes MEKK3) also activates ERK1/2, JNK, and p38 pathways because of mutation-induced MEKK3 kinase activity enhancement. Collectively, we identified several somatic activating mutations in CCM endothelium, and the MAP3K3 c.1323C>G mutation defines a primary CCM subtype with distinct characteristics in signaling activation and magnetic resonance imaging appearance.

Journal ArticleDOI
TL;DR: In this paper, the authors performed exome sequencing on a Chinese discovery cohort (442 affected subjects and 941 female control subjects) and a replication MRKHS cohort (150 affected subjects of mixed ethnicity from North America, South America, and Europe).
Abstract: Mayer-Rokitansky-Kuster-Hauser syndrome (MRKHS) is associated with congenital absence of the uterus, cervix, and the upper part of the vagina; it is a sex-limited trait. Disrupted development of the Mullerian ducts (MD)/Wolffian ducts (WD) through multifactorial mechanisms has been proposed to underlie MRKHS. In this study, exome sequencing (ES) was performed on a Chinese discovery cohort (442 affected subjects and 941 female control subjects) and a replication MRKHS cohort (150 affected subjects of mixed ethnicity from North America, South America, and Europe). Phenotypic follow-up of the female reproductive system was performed on an additional cohort of PAX8-associated congenital hypothyroidism (CH) (n = 5, Chinese). By analyzing 19 candidate genes essential for MD/WD development, we identified 12 likely gene-disrupting (LGD) variants in 7 genes: PAX8 (n = 4), BMP4 (n = 2), BMP7 (n = 2), TBX6 (n = 1), HOXA10 (n = 1), EMX2 (n = 1), and WNT9B (n = 1), while LGD variants in these genes were not detected in control samples (p = 1.27E-06). Interestingly, a sex-limited penetrance with paternal inheritance was observed in multiple families. One additional PAX8 LGD variant from the replication cohort and two missense variants from both cohorts were revealed to cause loss-of-function of the protein. From the PAX8-associated CH cohort, we identified one individual presenting a syndromic condition characterized by CH and MRKHS (CH-MRKHS). Our study demonstrates the comprehensive utilization of knowledge from developmental biology toward elucidating genetic perturbations, i.e., rare pathogenic alleles involving the same loci, contributing to human birth defects.

Journal ArticleDOI
TL;DR: In this paper, the authors developed click-seq, a pooled yeast-based activity assay, to test thousands of variants and found that almost two-thirds of the variants showed decreased activity and protein abundance accounted for half of the variation in CYP2C9 function.
Abstract: CYP2C9 encodes a cytochrome P450 enzyme responsible for metabolizing up to 15% of small molecule drugs, and CYP2C9 variants can alter the safety and efficacy of these therapeutics. In particular, the anti-coagulant warfarin is prescribed to over 15 million people annually and polymorphisms in CYP2C9 can affect individual drug response and lead to an increased risk of hemorrhage. We developed click-seq, a pooled yeast-based activity assay, to test thousands of variants. Using click-seq, we measured the activity of 6,142 missense variants in yeast. We also measured the steady-state cellular abundance of 6,370 missense variants in a human cell line by using variant abundance by massively parallel sequencing (VAMP-seq). These data revealed that almost two-thirds of CYP2C9 variants showed decreased activity and that protein abundance accounted for half of the variation in CYP2C9 function. We also measured activity scores for 319 previously unannotated human variants, many of which may have clinical relevance.

Journal ArticleDOI
TL;DR: In this article, the authors generated an online brain pQTL resource for 7,376 proteins through the analysis of genetic and proteomic data derived from post-mortem samples of the dorsolateral prefrontal cortex of 330 older adults.
Abstract: We generated an online brain pQTL resource for 7,376 proteins through the analysis of genetic and proteomic data derived from post-mortem samples of the dorsolateral prefrontal cortex of 330 older adults. The identified pQTLs tend to be non-synonymous variation, are over-represented among variants associated with brain diseases, and replicate well (77%) in an independent brain dataset. Comparison to a large study of brain eQTLs revealed that about 75% of pQTLs are also eQTLs. In contrast, about 40% of eQTLs were identified as pQTLs. These results are consistent with lower pQTL mapping power and greater evolutionary constraint on protein abundance. The latter is additionally supported by observations of pQTLs with large effects' tending to be rare, deleterious, and associated with proteins that have evidence for fewer protein-protein interactions. Mediation analyses using matched transcriptomic and proteomic data provided additional evidence that pQTL effects are often, but not always, mediated by mRNA. Specifically, we identified roughly 1.6 times more mRNA-mediated pQTLs than mRNA-independent pQTLs (550 versus 341). Our pQTL resource provides insight into the functional consequences of genetic variation in the human brain and a basis for novel investigations of genetics and disease.

Journal ArticleDOI
TL;DR: In this article, the authors evaluated splicing variants and their contribution to hereditary disease, and evaluated their prevalence, clinical classifications, and associations with diseases, inheritance, and functional characteristics in a 689,321-person clinical cohort and two large public datasets.
Abstract: The complexities of gene expression pose challenges for the clinical interpretation of splicing variants. To better understand splicing variants and their contribution to hereditary disease, we evaluated their prevalence, clinical classifications, and associations with diseases, inheritance, and functional characteristics in a 689,321-person clinical cohort and two large public datasets. In the clinical cohort, splicing variants represented 13% of all variants classified as pathogenic (P), likely pathogenic (LP), or variants of uncertain significance (VUSs). Most splicing variants were outside essential splice sites and were classified as VUSs. Among all individuals tested, 5.4% had a splicing VUS. If RNA analysis were to contribute supporting evidence to variant interpretation, we estimated that splicing VUSs would be reclassified in 1.7% of individuals in our cohort. This would result in a clinically significant result (i.e., P/LP) in 0.1% of individuals overall because most reclassifications would change VUSs to likely benign. In ClinVar, splicing VUSs were 4.8% of reported variants and could benefit from RNA analysis. In the Genome Aggregation Database (gnomAD), splicing variants comprised 9.4% of variants in protein-coding genes; most were rare, precluding unambiguous classification as benign. Splicing variants were depleted in genes associated with dominant inheritance and haploinsufficiency, although some genes had rare variants at essential splice sites or had common splicing variants that were most likely compatible with normal gene function. Overall, we describe the contribution of splicing variants to hereditary disease, the potential utility of RNA analysis for reclassifying splicing VUSs, and how natural variation may confound clinical interpretation of splicing variants.

Journal ArticleDOI
TL;DR: The analyses show that non-coding variants upstream of known disease-causing genes are an important cause of severe disease and demonstrate that analysing 5'UTRs can increase diagnostic yield, even using existing exome sequencing datasets.
Abstract: Clinical genetic testing of protein-coding regions identifies a likely causative variant in only around half of developmental disorder (DD) cases. The contribution of regulatory variation in non-coding regions to rare disease, including DD, remains very poorly understood. We screened 9,858 probands from the Deciphering Developmental Disorders (DDD) study for de novo mutations in the 5' untranslated regions (5' UTRs) of genes within which variants have previously been shown to cause DD through a dominant haploinsufficient mechanism. We identified four single-nucleotide variants and two copy-number variants upstream of MEF2C in a total of ten individual probands. We developed multiple bespoke and orthogonal experimental approaches to demonstrate that these variants cause DD through three distinct loss-of-function mechanisms, disrupting transcription, translation, and/or protein function. These non-coding region variants represent 23% of likely diagnoses identified in MEF2C in the DDD cohort, but these would all be missed in standard clinical genetics approaches. Nonetheless, these variants are readily detectable in exome sequence data, with 30.7% of 5' UTR bases across all genes well covered in the DDD dataset. Our analyses show that non-coding variants upstream of genes within which coding variants are known to cause DD are an important cause of severe disease and demonstrate that analyzing 5' UTRs can increase diagnostic yield. We also show how non-coding variants can help inform both the disease-causing mechanism underlying protein-coding variants and dosage tolerance of the gene.

Journal ArticleDOI
TL;DR: In this paper, the authors used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data.
Abstract: Genetic studies in underrepresented populations identify disproportionate numbers of novel associations. However, most genetic studies use genotyping arrays and sequenced reference panels that best capture variation most common in European ancestry populations. To compare data generation strategies best suited for underrepresented populations, we sequenced the whole genomes of 91 individuals to high coverage as part of the Neuropsychiatric Genetics of African Population-Psychosis (NeuroGAP-Psychosis) study with participants from Ethiopia, Kenya, South Africa, and Uganda. We used a downsampling approach to evaluate the quality of two cost-effective data generation strategies, GWAS arrays versus low-coverage sequencing, by calculating the concordance of imputed variants from these technologies with those from deep whole-genome sequencing data. We show that low-coverage sequencing at a depth of ≥4× captures variants of all frequencies more accurately than all commonly used GWAS arrays investigated and at a comparable cost. Lower depths of sequencing (0.5-1×) performed comparably to commonly used low-density GWAS arrays. Low-coverage sequencing is also sensitive to novel variation; 4× sequencing detects 45% of singletons and 95% of common variants identified in high-coverage African whole genomes. Low-coverage sequencing approaches surmount the problems induced by the ascertainment of common genotyping arrays, effectively identify novel variation particularly in underrepresented populations, and present opportunities to enhance variant discovery at a cost similar to traditional approaches.