scispace - formally typeset
Search or ask a question

Showing papers in "American Journal of Human Genetics in 2018"


Journal ArticleDOI
TL;DR: A new genotype imputation method, Beagle 5.0, is presented, which greatly reduces the computational cost of imputation from large reference panels and is compared with Beagle 4.1 and Impute4 using 1000 Genomes Project data, Haplotype Reference Consortium data, and simulated data.
Abstract: Genotype imputation is commonly performed in genome-wide association studies because it greatly increases the number of markers that can be tested for association with a trait. In general, one should perform genotype imputation using the largest reference panel that is available because the number of accurately imputed variants increases with reference panel size. However, one impediment to using larger reference panels is the increased computational cost of imputation. We present a new genotype imputation method, Beagle 5.0, which greatly reduces the computational cost of imputation from large reference panels. We compare Beagle 5.0 with Beagle 4.1, Impute4, Minimac3, and Minimac4 using 1000 Genomes Project data, Haplotype Reference Consortium data, and simulated data for 10k, 100k, 1M, and 10M reference samples. All methods produce nearly identical accuracy, but Beagle 5.0 has the lowest computation time and the best scaling of computation time with increasing reference panel size. For 10k, 100k, 1M, and 10M reference samples and 1,000 phased target samples, Beagle 5.0's computation time is 3× (10k), 12× (100k), 43× (1M), and 533× (10M) faster than the fastest alternative method. Cost data from the Amazon Elastic Compute Cloud show that Beagle 5.0 can perform genome-wide imputation from 10M reference samples into 1,000 phased target samples at a cost of less than one US cent per sample.

894 citations


Journal ArticleDOI
TL;DR: Key advances in the field of functional genomics are highlighted that may facilitate the derivation of biological meaning post-GWAS and evidence suggesting that causal variants underlying disease risk often function through regulatory effects on the expression of target genes and that these expression effects might be modest and cell-type specific is highlighted.
Abstract: During the past 12 years, genome-wide association studies (GWASs) have uncovered thousands of genetic variants that influence risk for complex human traits and diseases. Yet functional studies aimed at delineating the causal genetic variants and biological mechanisms underlying the observed statistical associations with disease risk have lagged. In this review, we highlight key advances in the field of functional genomics that may facilitate the derivation of biological meaning post-GWAS. We highlight the evidence suggesting that causal variants underlying disease risk often function through regulatory effects on the expression of target genes and that these expression effects might be modest and cell-type specific. We moreover discuss specific studies as proof-of-principle examples for current statistical, bioinformatic, and empirical bench-based approaches to downstream elucidation of GWAS-identified disease risk loci.

554 citations


Journal ArticleDOI
Symen Ligthart1, Ahmad Vaez2, Urmo Võsa3, Maria G. Stathopoulou4  +283 moreInstitutions (97)
TL;DR: In this article, the authors performed two genome-wide association studies (GWASs), on HapMap and 1000 Genomes imputed data, of circulating amounts of CRP by using data from 88 studies comprising 204,402 European individuals.
Abstract: C-reactive protein (CRP) is a sensitive biomarker of chronic low-grade inflammation and is associated with multiple complex diseases. The genetic determinants of chronic inflammation remain largely unknown, and the causal role of CRP in several clinical outcomes is debated. We performed two genome-wide association studies (GWASs), on HapMap and 1000 Genomes imputed data, of circulating amounts of CRP by using data from 88 studies comprising 204,402 European individuals. Additionally, we performed in silico functional analyses and Mendelian randomization analyses with several clinical outcomes. The GWAS meta-analyses of CRP revealed 58 distinct genetic loci (p < 5 × 10−8). After adjustment for body mass index in the regression analysis, the associations at all except three loci remained. The lead variants at the distinct loci explained up to 7.0% of the variance in circulating amounts of CRP. We identified 66 gene sets that were organized in two substantially correlated clusters, one mainly composed of immune pathways and the other characterized by metabolic pathways in the liver. Mendelian randomization analyses revealed a causal protective effect of CRP on schizophrenia and a risk-increasing effect on bipolar disorder. Our findings provide further insights into the biology of inflammation and could lead to interventions for treating inflammation and its clinical consequences.

244 citations


Journal ArticleDOI
TL;DR: Technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing are described.
Abstract: Alternative splicing is a tightly regulated biological process by which the number of gene products for any given gene can be greatly expanded. Genomic variants in splicing regulatory sequences can disrupt splicing and cause disease. Recent developments in sequencing technologies and computational biology have allowed researchers to investigate alternative splicing at an unprecedented scale and resolution. Population-scale transcriptome studies have revealed many naturally occurring genetic variants that modulate alternative splicing and consequently influence phenotypic variability and disease susceptibility in human populations. Innovations in experimental and computational tools such as massively parallel reporter assays and deep learning have enabled the rapid screening of genomic variants for their causal impacts on splicing. In this review, we describe technological advances that have greatly increased the speed and scale at which discoveries are made about the genetic variation of alternative splicing. We summarize major findings from population transcriptomic studies of alternative splicing and discuss the implications of these findings for human genetics and medicine.

240 citations


Journal ArticleDOI
Carolina Medina-Gomez1, John P. Kemp2, John P. Kemp3, Katerina Trajanoska1, Jian'an Luan4, Alessandra Chesi5, Tarunveer S. Ahluwalia6, Tarunveer S. Ahluwalia7, Dennis O. Mook-Kanamori8, Annelies C. Ham1, Fernando Pires Hartwig9, Daniel S. Evans10, Raimo Joro11, Ivana Nedeljkovic1, Hou-Feng Zheng12, Hou-Feng Zheng13, Hou-Feng Zheng14, Kun Zhu15, Kun Zhu16, Mustafa Atalay11, Ching-Ti Liu17, Maria Nethander18, Linda Broer1, Gudmar Porleifsson19, Benjamin H. Mullin15, Benjamin H. Mullin16, Samuel K. Handelman20, Mike A. Nalls21, Leon Eyrich Jessen6, Denise H. M. Heppe1, J. Brent Richards14, Carol A. Wang16, Bo L. Chawes6, Katharina E. Schraut22, Najaf Amin1, Nicholas J. Wareham4, David Karasik23, Nathalie van der Velde1, Nathalie van der Velde24, M. Arfan Ikram1, Babette S. Zemel5, Yanhua Zhou17, Christian J. Carlsson6, Yongmei Liu25, Fiona E. McGuigan26, Cindy G. Boer1, Klaus Bønnelykke6, Stuart H. Ralston22, John A Robbins27, John P. Walsh15, John P. Walsh16, M. Carola Zillikens1, Claudia Langenberg4, Ruifang Li-Gao8, Frances M K Williams28, Tamara B. Harris21, Kristina Åkesson26, Rebecca D. Jackson29, Gunnar Sigurdsson30, Martin den Heijer31, Martin den Heijer8, Bram C. J. van der Eerden1, Jeroen van de Peppel1, Tim D. Spector28, Craig E. Pennell16, Bernardo L. Horta9, Janine F. Felix1, Jing Hua Zhao4, Scott Wilson15, Scott Wilson28, Scott Wilson16, Renée de Mutsert8, Hans Bisgaard6, Unnur Styrkarsdottir19, Vincent W. V. Jaddoe1, Eric S. Orwoll32, Timo A. Lakka11, Robert A. Scott4, Struan F.A. Grant33, Mattias Lorentzon18, Cornelia M. van Duijn1, James F. Wilson22, Kari Stefansson19, Bruce M. Psaty34, Bruce M. Psaty35, Douglas P. Kiel, Claes Ohlsson18, Evangelia E. Ntzani36, Andre J. van Wijnen37, Vincenzo Forgetta14, Mohsen Ghanbari38, Mohsen Ghanbari1, John G. Logan39, Graham R. Williams39, J. H. Duncan Bassett39, Peter I. Croucher40, Evangelos Evangelou39, Evangelos Evangelou36, André G. Uitterlinden1, Cheryl L. Ackert-Bicknell41, Jonathan H Tobias3, David M. Evans3, David M. Evans2, Fernando Rivadeneira1 
TL;DR: TB-BMD is revealed as a relevant trait for genetic studies of osteoporosis, enabling the identification of variants and pathways influencing different bone compartments and their effect can be captured throughout the life course.
Abstract: Bone mineral density (BMD) assessed by DXA is used to evaluate bone health. In children, total body (TB) measurements are commonly used; in older individuals, BMD at the lumbar spine (LS) and femoral neck (FN) is used to diagnose osteoporosis. To date, genetic variants in more than 60 loci have been identified as associated with BMD. To investigate the genetic determinants of TB-BMD variation along the life course and test for age-specific effects, we performed a meta-analysis of 30 genome-wide association studies (GWASs) of TB-BMD including 66,628 individuals overall and divided across five age strata, each spanning 15 years. We identified variants associated with TB-BMD at 80 loci, of which 36 have not been previously identified; overall, they explain approximately 10% of the TB-BMD variance when combining all age groups and influence the risk of fracture. Pathway and enrichment analysis of the association signals showed clustering within gene sets implicated in the regulation of cell growth and SMAD proteins, overexpressed in the musculoskeletal system, and enriched in enhancer and promoter regions. These findings reveal TB-BMD as a relevant trait for genetic studies of osteoporosis, enabling the identification of variants and pathways influencing different bone compartments. Only variants in ESR1 and close proximity to RANKL showed a clear effect dependency on age. This most likely indicates that the majority of genetic variants identified influence BMD early in life and that their effect can be captured throughout the life course.

195 citations


Journal ArticleDOI
TL;DR: Rare mutations in seven previously unreported RP genes that may cause Diamond-Blackfan anemia are identified, as well as several distinct disorders that appear to phenocopy DBA, including nine individuals with biallelic CECR1 mutations that result in deficiency of ADA2.
Abstract: Diamond-Blackfan anemia (DBA) is a rare bone marrow failure disorder that affects 7 out of 1,000,000 live births and has been associated with mutations in components of the ribosome. In order to characterize the genetic landscape of this heterogeneous disorder, we recruited a cohort of 472 individuals with a clinical diagnosis of DBA and performed whole-exome sequencing (WES). We identified relevant rare and predicted damaging mutations for 78% of individuals. The majority of mutations were singletons, absent from population databases, predicted to cause loss of function, and located in 1 of 19 previously reported ribosomal protein (RP)-encoding genes. Using exon coverage estimates, we identified and validated 31 deletions in RP genes. We also observed an enrichment for extended splice site mutations and validated their diverse effects using RNA sequencing in cell lines obtained from individuals with DBA. Leveraging the size of our cohort, we observed robust genotype-phenotype associations with congenital abnormalities and treatment outcomes. We further identified rare mutations in seven previously unreported RP genes that may cause DBA, as well as several distinct disorders that appear to phenocopy DBA, including nine individuals with biallelic CECR1 mutations that result in deficiency of ADA2. However, no new genes were identified at exome-wide significance, suggesting that there are no unidentified genes containing mutations readily identified by WES that explain >5% of DBA-affected case subjects. Overall, this report should inform not only clinical practice for DBA-affected individuals, but also the design and analysis of rare variant studies for heterogeneous Mendelian disorders.

181 citations


Journal ArticleDOI
TL;DR: The Deafness Variation Database is developed, a comprehensive, open-access resource that integrates all available genetic, genomic, and clinical data together with expert curation to generate a single classification for each variant in 152 genes implicated in syndromic and non-syndromic deafness.
Abstract: The classification of genetic variants represents a major challenge in the post-genome era by virtue of their extraordinary number and the complexities associated with ascribing a clinical impact, especially for disorders exhibiting exceptional phenotypic, genetic, and allelic heterogeneity. To address this challenge for hearing loss, we have developed the Deafness Variation Database (DVD), a comprehensive, open-access resource that integrates all available genetic, genomic, and clinical data together with expert curation to generate a single classification for each variant in 152 genes implicated in syndromic and non-syndromic deafness. We evaluate 876,139 variants and classify them as pathogenic or likely pathogenic (more than 8,100 variants), benign or likely benign (more than 172,000 variants), or of uncertain significance (more than 695,000 variants); 1,270 variants are re-categorized based on expert curation and in 300 instances, the change is of medical significance and impacts clinical care. We show that more than 96% of coding variants are rare and novel and that pathogenicity is driven by minor allele frequency thresholds, variant effect, and protein domain. The mutational landscape we define shows complex gene-specific variability, making an understanding of these nuances foundational for improved accuracy in variant interpretation in order to enhance clinical decision making and improve our understanding of deafness biology.

181 citations


Journal ArticleDOI
TL;DR: Characterization of DNAJB11-null cells and kidney samples from affected individuals revealed a pathogenesis associated with maturation and trafficking defects involving the ADPKD protein, PC1, and ADTKD proteins, such as UMOD.
Abstract: Autosomal-dominant polycystic kidney disease (ADPKD) is characterized by the progressive development of kidney cysts, often resulting in end-stage renal disease (ESRD). This disorder is genetically heterogeneous with ∼7% of families genetically unresolved. We performed whole-exome sequencing (WES) in two multiplex ADPKD-like pedigrees, and we analyzed a further 591 genetically unresolved, phenotypically similar families by targeted next-generation sequencing of 65 candidate genes. WES identified a DNAJB11 missense variant (p.Pro54Arg) in two family members presenting with non-enlarged polycystic kidneys and a frameshifting change (c.166_167insTT) in a second family with small renal and liver cysts. DNAJB11 is a co-factor of BiP, a key chaperone in the endoplasmic reticulum controlling folding, trafficking, and degradation of secreted and membrane proteins. Five additional multigenerational families carrying DNAJB11 mutations were identified by the targeted analysis. The clinical phenotype was consistent in the 23 affected members, with non-enlarged cystic kidneys that often evolved to kidney atrophy; 7 subjects reached ESRD from 59 to 89 years. The lack of kidney enlargement, histologically evident interstitial fibrosis in non-cystic parenchyma, and recurring episodes of gout (one family) suggested partial phenotypic overlap with autosomal-dominant tubulointerstitial diseases (ADTKD). Characterization of DNAJB11-null cells and kidney samples from affected individuals revealed a pathogenesis associated with maturation and trafficking defects involving the ADPKD protein, PC1, and ADTKD proteins, such as UMOD. DNAJB11-associated disease is a phenotypic hybrid of ADPKD and ADTKD, characterized by normal-sized cystic kidneys and progressive interstitial fibrosis resulting in late-onset ESRD.

169 citations


Journal ArticleDOI
TL;DR: The results demonstrate that systematic clinically oriented pathway-based analysis of genomic data can accelerate the discovery of rare genetic disorders.
Abstract: Histone lysine methyltransferases (KMTs) and demethylases (KDMs) underpin gene regulation. Here we demonstrate that variants causing haploinsufficiency of KMTs and KDMs are frequently encountered in individuals with developmental disorders. Using a combination of human variation databases and existing animal models, we determine 22 KMTs and KDMs as additional candidates for dominantly inherited developmental disorders. We show that KMTs and KDMs that are associated with, or are candidates for, dominant developmental disorders tend to have a higher level of transcription, longer canonical transcripts, more interactors, and a higher number and more types of post-translational modifications than other KMT and KDMs. We provide evidence to firmly associate KMT2C, ASH1L, and KMT5B haploinsufficiency with dominant developmental disorders. Whereas KMT2C or ASH1L haploinsufficiency results in a predominantly neurodevelopmental phenotype with occasional physical anomalies, KMT5B mutations cause an overgrowth syndrome with intellectual disability. We further expand the phenotypic spectrum of KMT2B-related disorders and show that some individuals can have severe developmental delay without dystonia at least until mid-childhood. Additionally, we describe a recessive histone lysine-methylation defect caused by homozygous or compound heterozygous KDM5B variants and resulting in a recognizable syndrome with developmental delay, facial dysmorphism, and camptodactyly. Collectively, these results emphasize the significance of histone lysine methylation in normal human development and the importance of this process in human developmental disorders. Our results demonstrate that systematic clinically oriented pathway-based analysis of genomic data can accelerate the discovery of rare genetic disorders.

162 citations


Journal ArticleDOI
TL;DR: A detailed workflow for identifying germline CNVs >1 kb from short-read WGS data using read depth-based algorithms is empirically developed, positioning WGS as a single assay for genetic variation detection.
Abstract: A remaining hurdle to whole-genome sequencing (WGS) becoming a first-tier genetic test has been accurate detection of copy-number variations (CNVs). Here, we used several datasets to empirically develop a detailed workflow for identifying germline CNVs >1 kb from short-read WGS data using read depth-based algorithms. Our workflow is comprehensive in that it addresses all stages of the CNV-detection process, including DNA library preparation, sequencing, quality control, reference mapping, and computational CNV identification. We used our workflow to detect rare, genic CNVs in individuals with autism spectrum disorder (ASD), and 120/120 such CNVs tested using orthogonal methods were successfully confirmed. We also identified 71 putative genic de novo CNVs in this cohort, which had a confirmation rate of 70%; the remainder were incorrectly identified as de novo due to false positives in the proband (7%) or parental false negatives (23%). In individuals with an ASD diagnosis in which both microarray and WGS experiments were performed, our workflow detected all clinically relevant CNVs identified by microarrays, as well as additional potentially pathogenic CNVs

156 citations


Journal ArticleDOI
TL;DR: This study represents a "proof of concept" for using proband-derived iPSCs to model renal disease and illustrates dysfunctional cellular pathways beyond the primary cilium in the setting of IFT140 mutations, which are established for other NPHP genotypes.
Abstract: Despite the increasing diagnostic rate of genomic sequencing, the genetic basis of more than 50% of heritable kidney disease remains unresolved. Kidney organoids differentiated from induced pluripotent stem cells (iPSCs) of individuals affected by inherited renal disease represent a potential, but unvalidated, platform for the functional validation of novel gene variants and investigation of underlying pathogenetic mechanisms. In this study, trio whole-exome sequencing of a prospectively identified nephronophthisis (NPHP) proband and her parents identified compound-heterozygous variants in IFT140, a gene previously associated with NPHP-related ciliopathies. IFT140 plays a key role in retrograde intraflagellar transport, but the precise downstream cellular mechanisms responsible for disease presentation remain unknown. A one-step reprogramming and gene-editing protocol was used to derive both uncorrected proband iPSCs and isogenic gene-corrected iPSCs, which were differentiated to kidney organoids. Proband organoid tubules demonstrated shortened, club-shaped primary cilia, whereas gene correction rescued this phenotype. Differential expression analysis of epithelial cells isolated from organoids suggested downregulation of genes associated with apicobasal polarity, cell-cell junctions, and dynein motor assembly in proband epithelial cells. Matrigel cyst cultures confirmed a polarization defect in proband versus gene-corrected renal epithelium. As such, this study represents a "proof of concept" for using proband-derived iPSCs to model renal disease and illustrates dysfunctional cellular pathways beyond the primary cilium in the setting of IFT140 mutations, which are established for other NPHP genotypes.

Journal ArticleDOI
TL;DR: An NMD escape intolerance score is developed to rank genes based on the depletion of PTVs that would render them able to escape NMD using the Atherosclerosis Risk in Communities Study and the Exome Aggregation Consortium (ExAC) control databases, which was further used to screen the Baylor-Center for Mendelian Genomics disease database.
Abstract: Premature termination codon (PTC)-bearing transcripts are often degraded by nonsense-mediated decay (NMD) resulting in loss-of-function (LoF) alleles. However, not all PTCs result in LoF mutations, i.e., some such transcripts escape NMD and are translated to truncated peptide products that result in disease due to gain-of-function (GoF) effects. Since the location of the PTC is a major factor determining transcript fate, we hypothesized that depletion of protein-truncating variants (PTVs) within the gene region predicted to escape NMD in control databases could provide a rank for genic susceptibility for disease through GoF versus LoF. We developed an NMD escape intolerance score to rank genes based on the depletion of PTVs that would render them able to escape NMD using the Atherosclerosis Risk in Communities Study (ARIC) and the Exome Aggregation Consortium (ExAC) control databases, which was further used to screen the Baylor-Center for Mendelian Genomics disease database. This analysis revealed 1,996 genes significantly depleted for PTVs that are predicted to escape from NMD, i.e., PTVesc; further studies provided evidence that revealed a subset as candidate genes underlying Mendelian phenotypes. Importantly, these genes have characteristically low pLI scores, which can cause them to be overlooked as candidates for dominant diseases. Collectively, we demonstrate that this NMD escape intolerance score is an effective and efficient tool for gene discovery in Mendelian diseases due to production of truncated or altered proteins. More importantly, we provide a complementary analytical tool to aid identification of genes associated with dominant traits through a mechanism distinct from LoF.

Journal ArticleDOI
TL;DR: Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses, and the idea of "exclusion PRS PheWAS" was introduced to differentiate PRS associations driven by the primary trait from associations arising through shared genetic risk profiles.
Abstract: Health systems are stewards of patient electronic health record (EHR) data with extraordinarily rich depth and breadth, reflecting thousands of diagnoses and exposures. Measures of genomic variation integrated with EHRs offer a potential strategy to accurately stratify patients for risk profiling and discover new relationships between diagnoses and genomes. The objective of this study was to evaluate whether polygenic risk scores (PRS) for common cancers are associated with multiple phenotypes in a phenome-wide association study (PheWAS) conducted in 28,260 unrelated, genotyped patients of recent European ancestry who consented to participate in the Michigan Genomics Initiative, a longitudinal biorepository effort within Michigan Medicine. PRS for 12 cancer traits were calculated using summary statistics from the NHGRI-EBI catalog. A total of 1,711 synthetic case-control studies was used for PheWAS analyses. There were 13,490 (47.7%) patients with at least one cancer diagnosis in this study sample. PRS exhibited strong association for several cancer traits they were designed for, including female breast cancer, prostate cancer, melanoma, basal cell carcinoma, squamous cell carcinoma, and thyroid cancer. Phenome-wide significant associations were observed between PRS and many non-cancer diagnoses. To differentiate PRS associations driven by the primary trait from associations arising through shared genetic risk profiles, the idea of "exclusion PRS PheWAS" was introduced. Further analysis of temporal order of the diagnoses improved our understanding of these secondary associations. This comprehensive PheWAS used PRS instead of a single variant.

Journal ArticleDOI
TL;DR: This work systematically evaluated the effect of PTEN mutations on lipid phosphatase activity in vivo and created a comprehensive functional map by leveraging correlations between amino acid substitutions to impute functional scores for all variants, including those not present in the assay.
Abstract: Phosphatase and tensin homolog (PTEN) is a tumor suppressor frequently mutated in diverse cancers. Germline PTEN mutations are also associated with a range of clinical outcomes, including PTEN hamartoma tumor syndrome (PHTS) and autism spectrum disorder (ASD). To empower new insights into PTEN function and clinically relevant genotype-phenotype relationships, we systematically evaluated the effect of PTEN mutations on lipid phosphatase activity in vivo. Using a massively parallel approach that leverages an artificial humanized yeast model, we derived high-confidence estimates of functional impact for 7,244 single amino acid PTEN variants (86% of possible). We identified 2,273 mutations with reduced cellular lipid phosphatase activity, which includes 1,789 missense mutations. These data recapitulated known functional findings but also uncovered new insights into PTEN protein structure, biochemistry, and mutation tolerance. Several residues in the catalytic pocket showed surprising mutational tolerance. We identified that the solvent exposure of wild-type residues is a critical determinant of mutational tolerance. Further, we created a comprehensive functional map by leveraging correlations between amino acid substitutions to impute functional scores for all variants, including those not present in the assay. Variant functional scores can reliably discriminate likely pathogenic from benign alleles. Further, 32% of ClinVar unclassified missense variants are phosphatase deficient in our assay, supporting their reclassification. ASD-associated mutations generally had less severe fitness scores relative to PHTS-associated mutations (p = 7.16 × 10−5) and a higher fraction of hypomorphic mutations, arguing for continued genotype-phenotype studies in larger clinical datasets that can further leverage these rich functional data.

Journal ArticleDOI
Christopher E. Gillies1, Rosemary K B Putler1, Rajasree Menon1, Edgar A. Otto1, Kalyn Yasutake1, Viji Nair1, Paul Hoover2, Paul Hoover3, David J. Lieb2, Shuqiang Li2, Sean Eddy1, Damian Fermin1, Michelle McNulty1, John R. Sedor, Katherine MacRae Dell2, Marleen Schachere4, Kevin Lemley1, Lauren Whitted1, Tarak Srivastava1, Connie Haney, Christine B. Sethna, Kalliopi Grammatikopoulos, Gerald B. Appel, Michael Toledo, Laurence Greenbaum, Chia-shi Wang, Brian Lee, Sharon G. Adler, Cynthia C. Nast, Janine LaPage, Ambarish M. Athavale, Alicia M. Neu, Sara Boynton, Fernando C. Fervenza, Marie C. Hogan, John C. Lieske, Vladimir Chernitskiy, Frederick J. Kaskel, Neelja Kumar, Patricia Flynn, Jeffrey B. Kopp, Eveleyn Castro-Rubio, Jodi Blake, Howard Trachtman, Olga Zhdanova, Frank Modersitzki, Suzanne Vento, Richard A. Lafayette, Kshama R. Mehta, Crystal A. Gadegbeku, Duncan B. Johnstone, Daniel C. Cattran, Michelle Hladunewich, Heather N. Reich, Paul Ling, Martin Romano, Alessia Fornoni, Laura Barisoni, Carlos Bidot, Matthias Kretzler1, Debbie S. Gipson, Amanda Williams, Renee Pitter, Patrick H. Nachman, Keisha L. Gibson, Sandra Grubbs, Anne Froment, Lawrence B. Holzman, Kevin E.C. Meyers, Krishna Kallem, Fumei Cerecino, Kamal Sambandam, Elizabeth J. Brown, Natalie Johnson, Ashley Jefferson, Sangeeta Hingorani, Kathleen Tuttle, Laura Curtin, S. Dismuke, Ann Cooper, Barry I. Freedman, Jen Jar Lin, Stefanie Gray, Larua Barisoni, Brenda W. Gillespie, Laura H. Mariani, Matthew G. Sampson1, Peter X.-K. Song, Johnathan Troost, Jarcy Zee, Emily Herreshoff, Colleen Kincaid, Chrysta Lienczewski, Tina Mainieri, Kevin Abbott, Cindy Roy, Tiina K. Urv, John Brooks, Nir Hacohen3, Nir Hacohen2, Krzysztof Kiryluk4, Xiaoquan Wen1 
TL;DR: This study discovered GLOM and TI eQTLs, identified those that were tissue specific, deconvoluted them into cell-specific signals, and used them to characterize known GWAS alleles.
Abstract: Expression quantitative trait loci (eQTL) studies illuminate the genetics of gene expression and, in disease research, can be particularly illuminating when using the tissues directly impacted by the condition. In nephrology, there is a paucity of eQTL studies of human kidney. Here, we used whole-genome sequencing (WGS) and microdissected glomerular (GLOM) and tubulointerstitial (TI) transcriptomes from 187 individuals with nephrotic syndrome (NS) to describe the eQTL landscape in these functionally distinct kidney structures. Using MatrixEQTL, we performed cis-eQTL analysis on GLOM (n = 136) and TI (n = 166). We used the Bayesian “Deterministic Approximation of Posteriors” (DAP) to fine-map these signals, eQTLBMA to discover GLOM- or TI-specific eQTLs, and single-cell RNA-seq data of control kidney tissue to identify the cell type specificity of significant eQTLs. We integrated eQTL data with an IgA Nephropathy (IgAN) GWAS to perform a transcriptome-wide association study (TWAS). We discovered 894 GLOM eQTLs and 1,767 TI eQTLs at FDR 1 independent signal associated with its expression. 12% and 26% of eQTLs were GLOM specific and TI specific, respectively. GLOM eQTLs were most significantly enriched in podocyte transcripts and TI eQTLs in proximal tubules. The IgAN TWAS identified significant GLOM and TI genes, primarily at the HLA region. In this study, we discovered GLOM and TI eQTLs, identified those that were tissue specific, deconvoluted them into cell-specific signals, and used them to characterize known GWAS alleles. These data are available for browsing and download via our eQTL browser, “nephQTL.”

Journal ArticleDOI
TL;DR: ClinPred is introduced, an efficient tool for identifying disease-relevant nonsynonymous variants and it is observed that adding allele frequency as a predictive feature-as opposed to setting fixed allele frequency cutoffs-boosts the performance of prediction.
Abstract: Advances in high-throughput DNA sequencing have revolutionized the discovery of variants in the human genome; however, interpreting the phenotypic effects of those variants is still a challenge. While several computational approaches to predict variant impact are available, their accuracy is limited and further improvement is needed. Here, we introduce ClinPred, an efficient tool for identifying disease-relevant nonsynonymous variants. Our predictor incorporates two machine learning algorithms that use existing pathogenicity scores and, notably, benefits from inclusion of normal population allele frequency from the gnomAD database as an input feature. Another major strength of our approach is the use of ClinVar—a rapidly growing database that allows selection of confidently annotated disease-causing variants—as a training set. Compared to other methods, ClinPred showed superior accuracy for predicting pathogenicity, achieving the highest area under the curve (AUC) score and increasing both the specificity and sensitivity in different test datasets. It also obtained the best performance according to various other metrics. Moreover, ClinPred performance remained robust with respect to disease type (cancer or rare disease) and mechanism (gain or loss of function). Importantly, we observed that adding allele frequency as a predictive feature—as opposed to setting fixed allele frequency cutoffs—boosts the performance of prediction. We provide pre-computed ClinPred scores for all possible human missense variants in the exome to facilitate its use by the community.

Journal ArticleDOI
TL;DR: This work identifies homozygous mutations in WEE2 that are responsible for fertilization failure in humans and presents a novel gene responsible for human fertilization failures, which has implications for future therapeutic treatments for infertility cases.
Abstract: Fertilization is a fundamental process of development and is a prerequisite for successful human reproduction. In mice, although several receptor proteins have been shown to play important roles in the process of fertilization, only three genes have been shown to cause fertilization failure and infertility when deleted in vivo. In clinical practice, some infertility case subjects suffer from recurrent failure of in vitro fertilization and intracytoplasmic sperm injection attempts due to fertilization failure, but the genetic basis of fertilization failure in humans remains largely unknown. Wee2 is a key oocyte-specific kinase involved in the control of meiotic arrest in mice, but WEE2 has not been associated with any diseases in humans. In this study, we identified homozygous mutations in WEE2 that are responsible for fertilization failure in humans. All four independent affected individuals had homozygous loss-of-function missense mutations or homozygous frameshift protein-truncating mutations, and the phenotype of fertilization failure was shown to follow a Mendelian recessive inheritance pattern. All four mutations significantly decreased the amount of WEE2 protein in vitro and in affected individuals' oocytes in vivo, and they all led to abnormal serine phosphorylation of WEE2 and reduced tyrosine 15 phosphorylation of Cdc2 in vitro. In addition, injection of WEE2 cRNA into affected individuals' oocytes rescued the fertilization failure phenotype and led to the formation of blastocysts in vitro. This work presents a novel gene responsible for human fertilization failure and has implications for future therapeutic treatments for infertility cases.

Journal ArticleDOI
TL;DR: The results demonstrate that a genotype-phenotype correlation at the NF1 region 844–848 exists and will be valuable in the management and genetic counseling of a significant number of individuals.
Abstract: Neurofibromatosis type 1 (NF1), a common genetic disorder with a birth incidence of 1:2,000-3,000, is characterized by a highly variable clinical presentation. To date, only two clinically relevant intragenic genotype-phenotype correlations have been reported for NF1 missense mutations affecting p.Arg1809 and a single amino acid deletion p.Met922del. Both variants predispose to a distinct mild NF1 phenotype with neither externally visible cutaneous/plexiform neurofibromas nor other tumors. Here, we report 162 individuals (129 unrelated probands and 33 affected relatives) heterozygous for a constitutional missense mutation affecting one of five neighboring NF1 codons-Leu844, Cys845, Ala846, Leu847, and Gly848-located in the cysteine-serine-rich domain (CSRD). Collectively, these recurrent missense mutations affect ∼0.8% of unrelated NF1 mutation-positive probands in the University of Alabama at Birmingham (UAB) cohort. Major superficial plexiform neurofibromas and symptomatic spinal neurofibromas were more prevalent in these individuals compared with classic NF1-affected cohorts (both p < 0.0001). Nearly half of the individuals had symptomatic or asymptomatic optic pathway gliomas and/or skeletal abnormalities. Additionally, variants in this region seem to confer a high predisposition to develop malignancies compared with the general NF1-affected population (p = 0.0061). Our results demonstrate that these NF1 missense mutations, although located outside the GAP-related domain, may be an important risk factor for a severe presentation. A genotype-phenotype correlation at the NF1 region 844-848 exists and will be valuable in the management and genetic counseling of a significant number of individuals.

Journal ArticleDOI
TL;DR: An exportable model for delivery of clinical care through secondary use of research data is described and subject and provider participation data from the initial phase of these efforts can inform other institutions planning similar programs.
Abstract: There is growing interest in communicating clinically relevant DNA sequence findings to research participants who join projects with a primary research goal other than the clinical return of such results. Since Geisinger’s MyCode Community Health Initiative (MyCode) was launched in 2007, more than 200,000 participants have been broadly consented for discovery research. In 2013 the MyCode consent was amended to include a secondary analysis of research genomic sequences that allows for delivery of clinical results. Since May 2015, pathogenic and likely pathogenic variants from a set list of genes associated with monogenic conditions have prompted “genome-first” clinical encounters. The encounters are described as genome-first because they are identified independent of any clinical parameters. This article (1) details our process for generating clinical results from research data, delivering results to participants and providers, facilitating condition-specific clinical evaluations, and promoting cascade testing of relatives, and (2) summarizes early results and participant uptake. We report on 542 participants who had results uploaded to the electronic health record as of February 1, 2018 and 291 unique clinical providers notified with one or more participant results. Of these 542 participants, 515 (95.0%) were reached to disclose their results and 27 (5.0%) were lost to follow-up. We describe an exportable model for delivery of clinical care through secondary use of research data. In addition, subject and provider participation data from the initial phase of these efforts can inform other institutions planning similar programs.

Journal ArticleDOI
TL;DR: It is demonstrated that specific but partially overlapping DNA methylation signatures are associated with many of these conditions, and a machine learning tool can be built to concurrently screen for multiple syndromes with high sensitivity and specificity.
Abstract: Pediatric developmental syndromes present with systemic, complex, and often overlapping clinical features that are not infrequently a consequence of Mendelian inheritance of mutations in genes involved in DNA methylation, establishment of histone modifications, and chromatin remodeling (the "epigenetic machinery"). The mechanistic cross-talk between histone modification and DNA methylation suggests that these syndromes might be expected to display specific DNA methylation signatures that are a reflection of those primary errors associated with chromatin dysregulation. Given the interrelated functions of these chromatin regulatory proteins, we sought to identify DNA methylation epi-signatures that could provide syndrome-specific biomarkers to complement standard clinical diagnostics. In the present study, we examined peripheral blood samples from a large cohort of individuals encompassing 14 Mendelian disorders displaying mutations in the genes encoding proteins of the epigenetic machinery. We demonstrated that specific but partially overlapping DNA methylation signatures are associated with many of these conditions. The degree of overlap among these epi-signatures is minimal, further suggesting that, consistent with the initial event, the downstream changes are unique to every syndrome. In addition, by combining these epi-signatures, we have demonstrated that a machine learning tool can be built to concurrently screen for multiple syndromes with high sensitivity and specificity, and we highlight the utility of this tool in solving ambiguous case subjects presenting with variants of unknown significance, along with its ability to generate accurate predictions for subjects presenting with the overlapping clinical and molecular features associated with the disruption of the epigenetic machinery.

Journal ArticleDOI
TL;DR: The approach "re-discovered" genes previously implicated in IHH and introduced an approach for highly adaptable variant quality filtering that leads to well-calibrated results, and developed a user-friendly software package for performing gene-based burden testing against public databases.
Abstract: The genetic causes of many Mendelian disorders remain undefined. Factors such as lack of large multiplex families, locus heterogeneity, and incomplete penetrance hamper these efforts for many disorders. Previous work suggests that gene-based burden testing—where the aggregate burden of rare, protein-altering variants in each gene is compared between case and control subjects—might overcome some of these limitations. The increasing availability of large-scale public sequencing databases such as Genome Aggregation Database (gnomAD) can enable burden testing using these databases as controls, obviating the need for additional control sequencing for each study. However, there exist various challenges with using public databases as controls, including lack of individual-level data, differences in ancestry, and differences in sequencing platforms and data processing. To illustrate the approach of using public data as controls, we analyzed whole-exome sequencing data from 393 individuals with idiopathic hypogonadotropic hypogonadism (IHH), a rare disorder with significant locus heterogeneity and incomplete penetrance against control subjects from gnomAD (n = 123,136). We leveraged presumably benign synonymous variants to calibrate our approach. Through iterative analyses, we systematically addressed and overcame various sources of artifact that can arise when using public control data. In particular, we introduce an approach for highly adaptable variant quality filtering that leads to well-calibrated results. Our approach “re-discovered” genes previously implicated in IHH (FGFR1, TACR3, GNRHR). Furthermore, we identified a significant burden in TYRO3, a gene implicated in hypogonadotropic hypogonadism in mice. Finally, we developed a user-friendly software package TRAPD (Test Rare vAriants with Public Data) for performing gene-based burden testing against public databases.

Journal ArticleDOI
TL;DR: A comprehensive analysis of common genetic variation on DNA methylation (DNAm) by using the Illumina EPIC array to profile samples from the UK Household Longitudinal study is undertaken and the utility of these data for interpreting the functional consequences of common Genetic variation associated with > 60 human traits is demonstrated.
Abstract: Characterizing the complex relationship between genetic, epigenetic, and transcriptomic variation has the potential to increase understanding about the mechanisms underpinning health and disease phenotypes. We undertook a comprehensive analysis of common genetic variation on DNA methylation (DNAm) by using the Illumina EPIC array to profile samples from the UK Household Longitudinal study. We identified 12,689,548 significant DNA methylation quantitative trait loci (mQTL) associations (p 60 human traits by using summary-data-based Mendelian randomization (SMR) to identify 1,662 pleiotropic associations between 36 complex traits and 1,246 DNAm sites. We also use SMR to characterize the relationship between DNAm and gene expression and thereby identify 6,798 pleiotropic associations between 5,420 DNAm sites and the transcription of 1,702 genes. Our mQTL database and SMR results are available via a searchable online database as a resource to the research community.

Journal ArticleDOI
TL;DR: It is reported that missense variants in CDC42, a gene encoding a small GTPase functioning as an intracellular signaling node, underlie a clinically heterogeneous group of phenotypes characterized by variable growth dysregulation, facial dysmorphism, and neurodevelopmental, immunological, and hematological anomalies, including a phenotype resembling Noonan syndrome.
Abstract: Exome sequencing has markedly enhanced the discovery of genes implicated in Mendelian disorders, particularly for individuals in whom a known clinical entity could not be assigned. This has led to the recognition that phenotypic heterogeneity resulting from allelic mutations occurs more commonly than previously appreciated. Here, we report that missense variants in CDC42, a gene encoding a small GTPase functioning as an intracellular signaling node, underlie a clinically heterogeneous group of phenotypes characterized by variable growth dysregulation, facial dysmorphism, and neurodevelopmental, immunological, and hematological anomalies, including a phenotype resembling Noonan syndrome, a developmental disorder caused by dysregulated RAS signaling. In silico, in vitro, and in vivo analyses demonstrate that mutations variably perturb CDC42 function by altering the switch between the active and inactive states of the GTPase and/or affecting CDC42 interaction with effectors, and differentially disturb cellular and developmental processes. These findings reveal the remarkably variable impact that dominantly acting CDC42 mutations have on cell function and development, creating challenges in syndrome definition, and exemplify the importance of functional profiling for syndrome recognition and delineation.

Journal ArticleDOI
TL;DR: It is suggested that any interesting findings from massive LDSC analysis for a large number of complex traits should be followed up, where possible, with more detailed analyses with GREML methods, even if sample sizes are lesser.
Abstract: Genetic correlation is a key population parameter that describes the shared genetic architecture of complex traits and diseases. It can be estimated by current state-of-art methods, i.e., linkage disequilibrium score regression (LDSC) and genomic restricted maximum likelihood (GREML). The massively reduced computing burden of LDSC compared to GREML makes it an attractive tool, although the accuracy (i.e., magnitude of standard errors) of LDSC estimates has not been thoroughly studied. In simulation, we show that the accuracy of GREML is generally higher than that of LDSC. When there is genetic heterogeneity between the actual sample and reference data from which LD scores are estimated, the accuracy of LDSC decreases further. In real data analyses estimating the genetic correlation between schizophrenia (SCZ) and body mass index, we show that GREML estimates based on ∼150,000 individuals give a higher accuracy than LDSC estimates based on ∼400,000 individuals (from combined meta-data). A GREML genomic partitioning analysis reveals that the genetic correlation between SCZ and height is significantly negative for regulatory regions, which whole genome or LDSC approach has less power to detect. We conclude that LDSC estimates should be carefully interpreted as there can be uncertainty about homogeneity among combined meta-datasets. We suggest that any interesting findings from massive LDSC analysis for a large number of complex traits should be followed up, where possible, with more detailed analyses with GREML methods, even if sample sizes are lesser.

Journal ArticleDOI
TL;DR: A unique hypo- and hyper-immune phenotype is defined and an immune dysregulation syndrome caused by frameshift mutations that escape NMD is reported, which is not reported in any primary immunodeficiencies, autoinflammatory syndromes, or autoimmune diseases.
Abstract: The proteasome processes proteins to facilitate immune recognition and host defense. When inherently defective, it can lead to aberrant immunity resulting in a dysregulated response that can cause autoimmunity and/or autoinflammation. Biallelic or digenic loss-of-function variants in some of the proteasome subunits have been described as causing a primary immunodeficiency disease that manifests as a severe dysregulatory syndrome: chronic atypical neutrophilic dermatosis with lipodystrophy and elevated temperature (CANDLE). Proteasome maturation protein (POMP) is a chaperone for proteasome assembly and is critical for the incorporation of catalytic subunits into the proteasome. Here, we characterize and describe POMP-related autoinflammation and immune dysregulation disease (PRAID) discovered in two unrelated individuals with a unique constellation of early-onset combined immunodeficiency, inflammatory neutrophilic dermatosis, and autoimmunity. We also begin to delineate a complex genetic mechanism whereby de novo heterozygous frameshift variants in the penultimate exon of POMP escape nonsense-mediated mRNA decay (NMD) and result in a truncated protein that perturbs proteasome assembly by a dominant-negative mechanism. To our knowledge, this mechanism has not been reported in any primary immunodeficiencies, autoinflammatory syndromes, or autoimmune diseases. Here, we define a unique hypo- and hyper-immune phenotype and report an immune dysregulation syndrome caused by frameshift mutations that escape NMD.

Journal ArticleDOI
TL;DR: It is shown that analyzing conditional eQTL signatures, which could be important under specific cellular or temporal contexts, leads to improved fine mapping of GWAS associations and supports previously reported genes, identify novel genes associated with schizophrenia risk, and provide specific hypotheses for their functional follow-up.
Abstract: Causal genes and variants within genome-wide association study (GWAS) loci can be identified by integrating GWAS statistics with expression quantitative trait loci (eQTL) and determining which variants underlie both GWAS and eQTL signals. Most analyses, however, consider only the marginal eQTL signal, rather than dissect this signal into multiple conditionally independent signals for each gene. Here we show that analyzing conditional eQTL signatures, which could be important under specific cellular or temporal contexts, leads to improved fine mapping of GWAS associations. Using genotypes and gene expression levels from post-mortem human brain samples (n = 467) reported by the CommonMind Consortium (CMC), we find that conditional eQTL are widespread; 63% of genes with primary eQTL also have conditional eQTL. In addition, genomic features associated with conditional eQTL are consistent with context-specific (e.g., tissue-, cell type-, or developmental time point-specific) regulation of gene expression. Integrating the 2014 Psychiatric Genomics Consortium schizophrenia (SCZ) GWAS and CMC primary and conditional eQTL data reveals 40 loci with strong evidence for co-localization (posterior probability > 0.8), including six loci with co-localization of conditional eQTL. Our co-localization analyses support previously reported genes, identify novel genes associated with schizophrenia risk, and provide specific hypotheses for their functional follow-up.

Journal ArticleDOI
TL;DR: The findings from the CSER consortium will offer patients, healthcare systems, and policymakers a clearer understanding of the opportunities and challenges of providing genomic medicine in diverse populations and settings, and contribute evidence toward developing best practices for the delivery of clinically useful and cost-effective genomic sequencing in diverse healthcare settings.
Abstract: The Clinical Sequencing Evidence-Generating Research (CSER) consortium, now in its second funding cycle, is investigating the effectiveness of integrating genomic (exome or genome) sequencing into the clinical care of diverse and medically underserved individuals in a variety of healthcare settings and disease states. The consortium comprises a coordinating center, six funded extramural clinical projects, and an ongoing National Human Genome Research Institute (NHGRI) intramural project. Collectively, these projects aim to enroll and sequence over 6,100 participants in four years. At least 60% of participants will be of non-European ancestry or from underserved settings, with the goal of diversifying the populations that are providing an evidence base for genomic medicine. Five of the six clinical projects are enrolling pediatric patients with various phenotypes. One of these five projects is also enrolling couples whose fetus has a structural anomaly, and the sixth project is enrolling adults at risk for hereditary cancer. The ongoing NHGRI intramural project has enrolled primarily healthy adults. Goals of the consortium include assessing the clinical utility of genomic sequencing, exploring medical follow up and cascade testing of relatives, and evaluating patient-provider-laboratory level interactions that influence the use of this technology. The findings from the CSER consortium will offer patients, healthcare systems, and policymakers a clearer understanding of the opportunities and challenges of providing genomic medicine in diverse populations and settings, and contribute evidence toward developing best practices for the delivery of clinically useful and cost-effective genomic sequencing in diverse healthcare settings.

Journal ArticleDOI
TL;DR: The study shows that databases include a significant proportion of wrongly ascertained variants; however, it underscores the critical role of ClinVar to contrast claims and foster validation across submitters.
Abstract: There is a significant interest in the standardized classification of human genetic variants. We used whole-genome sequence data from 10,495 unrelated individuals to contrast population frequency of pathogenic variants to the expected population prevalence of the disease. Analyses included the ACMG-recommended 59 gene-condition sets for incidental findings and 463 genes associated with 265 OrphaNet conditions. A total of 25,505 variants were used to identify patterns of inflation (i.e., excess genetic risk and misclassification). Inflation increases as the level of evidence supporting the pathogenic nature of the variant decreases. We observed up to 11.5% of genetic disorders with inflation in pathogenic variant sets and up to 92.3% for the variant set with conflicting interpretations. This improved to 7.7% and 57.7%, respectively, after filtering for disease-specific allele frequency. The patterns of inflation were replicated using public data from more than 138,000 genomes. The burden of rare variants was a main contributing factor of the observed inflation, indicating collective misclassified rare variants. We also analyzed the dynamics of re-classification of variant pathogenicity in ClinVar over time, which indicates progressive improvement in variant classification. The study shows that databases include a significant proportion of wrongly ascertained variants; however, it underscores the critical role of ClinVar to contrast claims and foster validation across submitters.

Journal ArticleDOI
Yun J. Sung1, Thomas W. Winkler2, Lisa de las Fuentes1, Amy R. Bentley3  +326 moreInstitutions (104)
TL;DR: The identified loci show strong evidence for regulatory features and support shared pathophysiology with cardiometabolic and addiction traits and highlight a role in BP regulation for biological candidates such as modulators of vascular structure and function.
Abstract: Genome-wide association analysis advanced understanding of blood pressure (BP), a major risk factor for vascular conditions such as coronary heart disease and stroke Accounting for smoking behavior may help identify BP loci and extend our knowledge of its genetic architecture We performed genome-wide association meta-analyses of systolic and diastolic BP incorporating gene-smoking interactions in 610,091 individuals Stage 1 analysis examined ∼188 million SNPs and small insertion/deletion variants in 129,913 individuals from four ancestries (European, African, Asian, and Hispanic) with follow-up analysis of promising variants in 480,178 additional individuals from five ancestries We identified 15 loci that were genome-wide significant (p < 5 × 10−8) in stage 1 and formally replicated in stage 2 A combined stage 1 and 2 meta-analysis identified 66 additional genome-wide significant loci (13, 35, and 18 loci in European, African, and trans-ancestry, respectively) A total of 56 known BP loci were also identified by our results (p < 5 × 10−8) Of the newly identified loci, ten showed significant interaction with smoking status, but none of them were replicated in stage 2 Several loci were identified in African ancestry, highlighting the importance of genetic studies in diverse populations The identified loci show strong evidence for regulatory features and support shared pathophysiology with cardiometabolic and addiction traits They also highlight a role in BP regulation for biological candidates such as modulators of vascular structure and function (CDKN1B, BCAR1-CFDP1, PXDN, EEA1), ciliopathies (SDCCAG8, RPGRIP1L), telomere maintenance (TNKS, PINX1, AKTIP), and central dopaminergic signaling (MSRA, EBF2)

Journal ArticleDOI
TL;DR: The identification of homozygous truncating mutations (one stop-gain and one splicing variant) in CFAP69 of two unrelated individuals by whole-exome sequencing of a cohort of 78 infertile men with MMAF indicates that CFAP 69 is necessary for flagellum assembly/stability and that in both humans and mice, biallelic truncating mutation in CF AP69 cause autosomal-recessive MMAF and primary male infertility.
Abstract: The multiple morphological abnormalities of the flagella (MMAF) phenotype is among the most severe forms of sperm defects responsible for male infertility. The phenotype is characterized by the presence in the ejaculate of immotile spermatozoa with severe flagellar abnormalities including flagella being short, coiled, absent, and of irregular caliber. Recent studies have demonstrated that MMAF is genetically heterogeneous, and genes thus far associated with MMAF account for only one-third of cases. Here we report the identification of homozygous truncating mutations (one stop-gain and one splicing variant) in CFAP69 of two unrelated individuals by whole-exome sequencing of a cohort of 78 infertile men with MMAF. CFAP69 encodes an evolutionarily conserved protein found at high levels in the testis. Immunostaining experiments in sperm from fertile control individuals showed that CFAP69 localized to the midpiece of the flagellum, and the absence of CFAP69 was confirmed in both individuals carrying CFPA69 mutations. Additionally, we found that sperm from a Cfap69 knockout mouse model recapitulated the MMAF phenotype. Ultrastructural analysis of testicular sperm from the knockout mice showed severe disruption of flagellum structure, but histological analysis of testes from these mice revealed the presence of all stages of the seminiferous epithelium, indicating that the overall progression of spermatogenesis is preserved and that the sperm defects likely arise during spermiogenesis. Together, our data indicate that CFAP69 is necessary for flagellum assembly/stability and that in both humans and mice, biallelic truncating mutations in CFAP69 cause autosomal-recessive MMAF and primary male infertility.