scispace - formally typeset
Search or ask a question

Showing papers on "Locus (genetics) published in 2019"


Journal ArticleDOI
TL;DR: Although the results are not conclusive, the accurate and independent assembly of R and S haplotypes of ‘Lito’ is a valuable resource to predict and test alternative transcription and regulation mechanisms underpinning PPV resistance.
Abstract: Sharka, a common disease among most stone fruit crops, is caused by the Plum Pox Virus (PPV). Resistant genotypes have been found in apricot (Prunus armeniaca L.), one of which-the cultivar 'Lito' heterozygous for the resistance-has been used to map a major quantitative trait locus (QTL) on linkage group 1, following a pseudo-test-cross mating design with 231 individuals. In addition, 19 SNP markers were selected from among the hundreds previously developed, which allowed the region to be limited to 236 kb on chromosome 1. A 'Lito' bacterial artificial chromosome (BAC) library was produced, screened with markers of the region, and positive BAC clones were sequenced. Resistant (R) and susceptible (S) haplotypes were assembled independently. To refine the assembly, the whole genome of 'Lito' was sequenced to high coverage (98×) using PacBio technology, enabling the development of a detailed assembly of the region that was able to predict and annotate the genes in the QTL region. The selected cultivar 'Lito' allowed not only to discriminate structural variants between the two haplotypic regions but also to distinguish specific allele expression, contributing towards mining the PPVres locus. In light of these findings, genes previously indicated (i.e., MATHd genes) to have a possible role in PPV resistance were further analyzed, and new candidates were discussed. Although the results are not conclusive, the accurate and independent assembly of R and S haplotypes of 'Lito' is a valuable resource to predict and test alternative transcription and regulation mechanisms underpinning PPV resistance.

381 citations


Journal ArticleDOI
01 Nov 2019-Science
TL;DR: Tests to distinguish incomplete lineage sorting from introgression indicate that gene flow has obscured several ancient phylogenetic relationships in this group over large swathes of the genome, and a hitherto unknown inversion that traps a color pattern switch locus is identified.
Abstract: We used 20 de novo genome assemblies to probe the speciation history and architecture of gene flow in rapidly radiating Heliconius butterflies. Our tests to distinguish incomplete lineage sorting from introgression indicate that gene flow has obscured several ancient phylogenetic relationships in this group over large swathes of the genome. Introgressed loci are underrepresented in low-recombination and gene-rich regions, consistent with the purging of foreign alleles more tightly linked to incompatibility loci. Here, we identify a hitherto unknown inversion that traps a color pattern switch locus. We infer that this inversion was transferred between lineages by introgression and is convergent with a similar rearrangement in another part of the genus. These multiple de novo genome sequences enable improved understanding of the importance of introgression and selective processes in adaptive radiation.

295 citations


Journal ArticleDOI
TL;DR: The results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.
Abstract: Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5-15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.

251 citations


Journal ArticleDOI
TL;DR: A reference-grade wild soybean genome is reported and it is shown that it can be used to identify structural variation and refine quantitative trait loci and illustrate the power of this assembly in the analysis of large structural variations in soybean germplasm collections.
Abstract: Efficient crop improvement depends on the application of accurate genetic information contained in diverse germplasm resources. Here we report a reference-grade genome of wild soybean accession W05, with a final assembled genome size of 1013.2 Mb and a contig N50 of 3.3 Mb. The analytical power of the W05 genome is demonstrated by several examples. First, we identify an inversion at the locus determining seed coat color during domestication. Second, a translocation event between chromosomes 11 and 13 of some genotypes is shown to interfere with the assignment of QTLs. Third, we find a region containing copy number variations of the Kunitz trypsin inhibitor (KTI) genes. Such findings illustrate the power of this assembly in the analysis of large structural variations in soybean germplasm collections. The wild soybean genome assembly has wide applications in comparative genomic and evolutionary studies, as well as in crop breeding and improvement programs.

164 citations


Posted ContentDOI
Cassandra N. Spracklen1, Momoko Horikoshi, Young Jin Kim, Kuang Lin2, Fiona Bragg2, Sanghoon Moon, Ken Suzuki, Claudia H. T. Tam3, Yasuharu Tabara4, Soo Heon Kwak5, Fumihiko Takeuchi, Jirong Long6, Victor Jun Yu Lim7, Jin-Fang Chai7, Chien-Hsiun Chen8, Masahiro Nakatochi9, Jie Yao10, Hyeok Sun Choi11, Apoorva K Iyengar1, Hannah J Perrin1, Sarah M Brotman1, Martijn van de Bunt2, Anna L. Gloyn2, Jennifer E. Below6, Michael Boehnke12, Donald W. Bowden13, John C. Chambers14, Anubha Mahajan2, Mark I. McCarthy2, Maggie C.Y. Ng13, Lauren E. Petty6, Weihua Zhang15, Andrew P. Morris16, Linda S. Adair1, Zheng Bian17, Juliana C.N. Chan3, Li-Ching Chang8, Miao-Li Chee, Yii-Der Ida Chen10, Yuan-Tsong Chen8, Zhengming Chen2, Lee-Ming Chuang18, Shufa Du1, Penny Gordon-Larsen1, Myron D. Gross19, Xiuqing Guo10, Yu Guo17, Sohee Han, Annie-Green Howard1, Wei Huang20, Yi-Jen Hung21, Mi Yeong Hwang, Chii-Min Hwu22, Sahoko Ichihara23, Masato Isono23, Hye-Mi Jang, Guozhi Jiang3, Jost B. Jonas24, Yoichiro Kamatani25, Tomohiro Katsuya26, Takahisa Kawaguchi4, Chiea Chuen Khor27, Katsuhiko Kohara28, Myung-Shik Lee29, Nanette R. Lee30, Liming Li31, Jianjun Liu27, Andrea O.Y. Luk3, Jun Lv31, Yukinori Okada26, Mark A Pereira19, Charumathi Sabanayagam7, Shi Jinxiu20, Dong Mun Shin, Wing-Yee So3, Atsushi Takahashi, Brian Tomlinson3, Fuu Jen Tsai32, Rob M. van Dam7, Yong-Bing Xiang33, Ken Yamamoto34, Toshimasa Yamauchi25, Kyungheon Yoon, Canqing Yu31, Jian-Min Yuan35, Liang Zhang, Wei Zheng6, Michiya Igase28, Yoon Shin Cho11, Jerome I. Rotter10, Ya Xing Wang36, Wayne Huey-Herng Sheu37, Wayne Huey-Herng Sheu38, Mitsuhiro Yokota34, Jer-Yuarn Wu8, Ching-Yu Cheng7, Tien Yin Wong7, Xiao-Ou Shu6, Norihiro Kato, Kyong-Soo Park5, E-Shyong Tai7, Fumihiko Matsuda4, Woon-Puay Koh7, Ronald Cw Ma3, Shiro Maeda39, Iona Y Millwood2, Ju Young Lee, Takashi Kadowaki25, Robin G. Walters2, Bong-Jo Kim, Karen L. Mohlke1, Xueling Sim7 
28 Jun 2019-bioRxiv
TL;DR: The largest meta-analysis of East Asian individuals to identify new genetic associations and provide insight into T2D pathogenesis is performed.
Abstract: Meta-analyses of genome-wide association studies (GWAS) have identified >240 loci associated with type 2 diabetes (T2D), however most loci have been identified in analyses of European-ancestry individuals. To examine T2D risk in East Asian individuals, we meta-analyzed GWAS data in 77,418 cases and 356,122 controls. In the main analysis, we identified 298 distinct association signals at 178 loci, and across T2D association models with and without consideration of body mass index and sex, we identified 56 loci newly implicated in T2D predisposition. Common variants associated with T2D in both East Asian and European populations exhibited strongly correlated effect sizes. New associations include signals in/near GDAP1 , PTF1A , SIX3 , ALDH2 , a microRNA cluster, and genes that affect muscle and adipose differentiation. At another locus, eQTLs at two overlapping T2D signals act through two genes, NKX6-3 and ANK1 , in different tissues. Association studies in diverse populations identify additional loci and elucidate disease genes, biology, and pathways.

137 citations


Journal ArticleDOI
TL;DR: Since the establishment of the mixed linear model (MLM) method for genome-wide association studies (GWAS) by Zhang et al. (2005), a series of new MLM-based methods have been proposed, i.e., mrMLM (Wang et al., 2016), ISIS EMBLASSO (Tamba etAl., 2017), pLARmEB (Zhang et al, 2017), FASTmrEMMA (Wen and Tamba, 2018
Abstract: Since the establishment of the mixed linear model (MLM) method for genome-wide association studies (GWAS) by Zhang et al. (2005) and Yu et al. (2006), a series of new MLM-based methods have been proposed (Feng et al., 2016). These methods have been widely used in genetic dissection of complex and omics-related traits (Figure 1), especially in conjunction with the development of advanced genomic sequencing technologies. However, most existing methods are based on single marker association in genome-wide scans with population structure and polygenic background controls. To control false positive rate, Bonferroni correction for multiple tests is frequently adopted. This stringent correction results in the exclusion of important loci, especially for large experimental error inherent in field experiments of crop genetics. To address this issue, multilocus GWAS methodologies have been recommended, i.e., mrMLM (Wang et al., 2016), ISIS EMBLASSO (Tamba et al., 2017), pLARmEB (Zhang et al., 2017), FASTmrEMMA (Wen et al., 2018a), pKWmEB (Ren et al., 2018), and FASTmrMLM (Zhang and Tamba, 2018). Here we summarize their advantages and potential limitations for using these methods (Table 1).

104 citations


Journal ArticleDOI
TL;DR: An unexpectedly low level of differentiation is revealed between a pair of sex chromosomes harboring an old MSD gene in a wild teleost fish population, and this study highlights both the pivotal role of genes from the amh pathway in sex determination, as well as the importance of gene duplication as a mechanism driving the turnover of sex chromosome in this clade.
Abstract: Teleost fishes, thanks to their rapid evolution of sex determination mechanisms, provide remarkable opportunities to study the formation of sex chromosomes and the mechanisms driving the birth of new master sex determining (MSD) genes. However, the evolutionary interplay between the sex chromosomes and the MSD genes they harbor is rather unexplored. We characterized a male-specific duplicate of the anti-Mullerian hormone (amh) as the MSD gene in Northern Pike (Esox lucius), using genomic and expression evidence as well as by loss-of-function and gain-of-function experiments. Using RAD-Sequencing from a family panel, we identified Linkage Group (LG) 24 as the sex chromosome and positioned the sex locus in its sub-telomeric region. Furthermore, we demonstrated that this MSD originated from an ancient duplication of the autosomal amh gene, which was subsequently translocated to LG24. Using sex-specific pooled genome sequencing and a new male genome sequence assembled using Nanopore long reads, we also characterized the differentiation of the X and Y chromosomes, revealing a small male-specific insertion containing the MSD gene and a limited region with reduced recombination. Our study reveals an unexpectedly low level of differentiation between a pair of sex chromosomes harboring an old MSD gene in a wild teleost fish population, and highlights both the pivotal role of genes from the amh pathway in sex determination, as well as the importance of gene duplication as a mechanism driving the turnover of sex chromosomes in this clade.

97 citations


Journal ArticleDOI
TL;DR: The high-resolution genetic map, the association between polymorphisms in the different mapping populations with differences in SNS, and the known role of orthologous genes in other grass species suggest that WAPO-A1 is the most likely candidate gene for the 7AL SNS QTL among the four genes identified in the candidate gene region.
Abstract: A high-resolution genetic map combined with haplotype analyses identified a wheat ortholog of rice gene APO1 as the best candidate gene for a 7AL locus affecting spikelet number per spike. A better understanding of the genes controlling differences in wheat grain yield components can accelerate the improvements required to satisfy future food demands. In this study, we identified a promising candidate gene underlying a quantitative trait locus (QTL) on wheat chromosome arm 7AL regulating spikelet number per spike (SNS). We used large heterogeneous inbred families ( > 10,000 plants) from two crosses to map the 7AL QTL to an 87-kb region (674,019,191–674,106,327 bp, RefSeq v1.0) containing two complete and two partial genes. In this region, we found three major haplotypes that were designated as H1, H2 and H3. The H2 haplotype contributed the high-SNS allele in both H1 × H2 and H2 × H3 segregating populations. The ancestral H3 haplotype is frequent in wild emmer (48%) but rare (~ 1%) in cultivated wheats. By contrast, the H1 and H2 haplotypes became predominant in modern cultivated durum and common wheat, respectively. Among the four candidate genes, only TraesCS7A02G481600 showed a non-synonymous polymorphism that differentiated H2 from the other two haplotypes. This gene, designated here as WHEAT ORTHOLOG OF APO1 (WAPO1), is an ortholog of the rice gene ABERRANT PANICLE ORGANIZATION 1 (APO1), which affects spikelet number. Taken together, the high-resolution genetic map, the association between polymorphisms in the different mapping populations with differences in SNS, and the known role of orthologous genes in other grass species suggest that WAPO-A1 is the most likely candidate gene for the 7AL SNS QTL among the four genes identified in the candidate gene region.

97 citations


Journal ArticleDOI
TL;DR: The study reports a genome-wide significant locus for cannabis use disorder, replicating in an independent cohort, and implicates CHRNA2, which encodes an acetylcholine receptor subunit, in the disorder by analyses of genetically regulated gene expression.
Abstract: Cannabis is the most frequently used illicit psychoactive substance worldwide; around one in ten users become dependent. The risk for cannabis use disorder (CUD) has a strong genetic component, with twin heritability estimates ranging from 51 to 70%. Here we performed a genome-wide association study of CUD in 2,387 cases and 48,985 controls, followed by replication in 5,501 cases and 301,041 controls. We report a genome-wide significant risk locus for CUD (P = 9.31 × 10−12) that replicates in an independent population (Preplication = 3.27 × 10−3, Pmeta-analysis = 9.09 × 10−12). The index variant (rs56372821) is a strong expression quantitative trait locus for cholinergic receptor nicotinic α2 subunit (CHRNA2); analyses of the genetically regulated gene expression identified a significant association of CHRNA2 expression with CUD in brain tissue. At the polygenic level, analyses revealed a significant decrease in the risk of CUD with increased load of variants associated with cognitive performance. The results provide biological insights and inform on the genetic architecture of CUD. The study reports a genome-wide significant locus for cannabis use disorder, replicating in an independent cohort, and implicates CHRNA2, which encodes an acetylcholine receptor subunit, in the disorder by analyses of genetically regulated gene expression.

96 citations


Journal ArticleDOI
TL;DR: The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for soybean's domestication and improvement, serving as a basis for future research and crop improvement efforts for this important crop species.
Abstract: We report reference-quality genome assemblies and annotations for two accessions of soybean (Glycine max) and for one accession of Glycine soja, the closest wild relative of G. max. The G. max assemblies provided are for widely used US cultivars: the northern line Williams 82 (Wm82) and the southern line Lee. The Wm82 assembly improves the prior published assembly, and the Lee and G. soja assemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 single-nucleotide polymorphisms (snps) per kb between Wm82 and Lee, and 4.7 snps per kb between these lines and G. soja. snp distributions and comparisons with genotypes of the Lee and Wm82 parents highlight patterns of introgression and haplotype structure. Comparisons against the US germplasm collection show placement of the sequenced accessions relative to global soybean diversity. Analysis of a pan-gene collection shows generally high conservation, with variation occurring primarily in genomically clustered gene families. We found approximately 40-42 inversions per chromosome between either Lee or Wm82v4 and G. soja, and approximately 32 inversions per chromosome between Wm82 and Lee. We also investigated five domestication loci. For each locus, we found two different alleles with functional differences between G. soja and the two domesticated accessions. The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for the domestication and improvement of soybean, serving as a basis for future research and crop improvement efforts for this important crop species.

93 citations


Journal ArticleDOI
TL;DR: The identification of AD causal variants in PVRL2 and APOC1 regions in proximity to APOE and common risk haplotypes independent of APOE-ε4 coding change are reported, providing compelling evidence for additional risk factors in the APOE locus that contribute to AD pathogenesis.
Abstract: Alzheimer's disease (AD) is a leading cause of mortality in the elderly. While the coding change of APOE-e4 is a key risk factor for late-onset AD and has been believed to be the only risk factor in the APOE locus, it does not fully explain the risk effect conferred by the locus. Here, we report the identification of AD causal variants in PVRL2 and APOC1 regions in proximity to APOE and define common risk haplotypes independent of APOE-e4 coding change. These risk haplotypes are associated with changes of AD-related endophenotypes including cognitive performance, and altered expression of APOE and its nearby genes in the human brain and blood. High-throughput genome-wide chromosome conformation capture analysis further supports the roles of these risk haplotypes in modulating chromatin states and gene expression in the brain. Our findings provide compelling evidence for additional risk factors in the APOE locus that contribute to AD pathogenesis.

Journal ArticleDOI
TL;DR: Comparative analyses suggest that sex-specific isoform expression through alternative splicing may underlie sex determination processes in the channel catfish, and BCAR1 is identified as a potential sex determination gene.
Abstract: Sex determination mechanisms in teleost fish broadly differ from mammals and birds, with sex chromosomes that are far less differentiated and recombination often occurring along the length of the X and Y chromosomes, posing major challenges for the identification of specific sex determination genes. Here, we take an innovative approach of comparative genome analysis of the genomic sequences of the X chromosome and newly sequenced Y chromosome in the channel catfish. Using a YY channel catfish as the sequencing template, we generated, assembled, and annotated the Y genome sequence of channel catfish. The genome sequence assembly had a contig N50 size of 2.7 Mb and a scaffold N50 size of 26.7 Mb. Genetic linkage and GWAS analyses placed the sex determination locus within a genetic distance less than 0.5 cM and physical distance of 8.9 Mb. However, comparison of the channel catfish X and Y chromosome sequences showed no sex-specific genes. Instead, comparative RNA-Seq analysis between females and males revealed exclusive sex-specific expression of an isoform of the breast cancer anti-resistance 1 (BCAR1) gene in the male during early sex differentiation. Experimental knockout of BCAR1 gene converted genetic males (XY) to phenotypic females, suggesting BCAR1 as a putative sex determination gene. We present the first Y chromosome sequence among teleost fish, and one of the few whole Y chromosome sequences among vertebrate species. Comparative analyses suggest that sex-specific isoform expression through alternative splicing may underlie sex determination processes in the channel catfish, and we identify BCAR1 as a potential sex determination gene.

Journal ArticleDOI
TL;DR: It is found that genetic recombination events are strongly restricted to chromosome tips in males, but not females, and that this recombination difference between the sexes may have evolved recently in the guppy lineage, supporting the hypothesis that suppressed recombination evolved in response to the presence of SA polymorphisms.
Abstract: It is often stated that polymorphisms for mutations affecting fitness of males and females in opposite directions [sexually antagonistic (SA) polymorphisms] are the main selective force for the evolution of recombination suppression between sex chromosomes. However, empirical evidence to discriminate between different hypotheses is difficult to obtain. We report genetic mapping results in laboratory-raised families of the guppy (Poecilia reticulata), a sexually dimorphic fish with SA polymorphisms for male coloration genes, mostly on the sex chromosomes. Comparison of the genetic and physical maps shows that crossovers are distributed very differently in the two sexes (heterochiasmy); in male meiosis, they are restricted to the termini of all four chromosomes studied, including chromosome 12, which carries the sex-determining locus. Genome resequencing of male and female guppies from a population also indicates sex linkage of variants across almost the entire chromosome 12. More than 90% of the chromosome carrying the male-determining locus is therefore transmitted largely through the male lineage. A lack of heterochiasmy in a related fish species suggests that it originated recently in the lineage leading to the guppy. Our findings do not support the hypothesis that suppressed recombination evolved in response to the presence of SA polymorphisms. Instead, a low frequency of recombination on a chromosome that carries a male-determining locus and has not undergone genetic degeneration has probably facilitated the establishment of male-beneficial coloration polymorphisms.

Journal ArticleDOI
TL;DR: Hexaploid wheat lines with loss of function of homeoalleles of Qsd1, which controls seed dormancy in barley, are produced by Agrobacterium-mediated CRISPR/Cas9 for trait improvement in wheat, particularly for genetically recessive traits, based on locus information from diploid barley.

Journal ArticleDOI
Mark A. Corbett1, Thessa Kroes1, Liana Veneziano2, Mark F. Bennett3, Mark F. Bennett4, Rahel T. Florian5, Amy L Schneider3, Antonietta Coppola, Laura Licchetta6, Silvana Franceschetti, Antonio Suppa7, Aaron M. Wenger8, Davide Mei9, Manuela Pendziwiat10, Sabine Kaya5, Massimo Delledonne11, Rachel Straussberg12, Luciano Xumerle, Brigid M. Regan3, Douglas E. Crompton13, Douglas E. Crompton3, Anne Fleur van Rootselaar14, Anthony Correll15, Rachael Catford15, Francesca Bisulli6, Shreyasee Chakraborty8, Sara Baldassari, Paolo Tinuper6, Kirston Barton16, Shaun Carswell16, Martin A. Smith16, Martin A. Smith17, Alfredo Berardelli7, Renee Carroll1, Alison Gardner1, Kathryn Friend15, Ilan Blatt18, Michele Iacomino, Carlo Di Bonaventura7, Salvatore Striano, Julien Buratti, Boris Keren, Caroline Nava19, Sylvie Forlani19, Gabrielle Rudolf, Edouard Hirsch20, Eric LeGuern19, Pierre Labauge21, Simona Balestrini22, Josemir W. Sander22, Zaid Afawi12, Ingo Helbig10, Ingo Helbig23, Hiroyuki Ishiura24, Shoji Tsuji24, Shoji Tsuji25, Sanjay M. Sisodiya22, Giorgio Casari26, Lynette G. Sadleir27, Riaan van Coller28, Marina A. J. Tijssen29, Karl Martin Klein30, Karl Martin Klein31, Karl Martin Klein32, Arn M. J. M. van den Maagdenberg33, Federico Zara, Renzo Guerrini9, Samuel F. Berkovic3, Tommaso Pippucci, Laura Canafoglia, Melanie Bahlo4, Melanie Bahlo3, Pasquale Striano34, Ingrid E. Scheffer35, Ingrid E. Scheffer3, Francesco Brancati36, Francesco Brancati2, Christel Depienne5, Christel Depienne37, Jozef Gecz1 
TL;DR: Evidence is provided that chr2-linked FAME (FAME2) is caused by an expansion of an ATTTC pentamer within the first intron of STARD7, suggesting ATTTC expansions may cause this disorder, irrespective of the genomic locus involved.
Abstract: Familial Adult Myoclonic Epilepsy (FAME) is characterised by cortical myoclonic tremor usually from the second decade of life and overt myoclonic or generalised tonic-clonic seizures. Four independent loci have been implicated in FAME on chromosomes (chr) 2, 3, 5 and 8. Using whole genome sequencing and repeat primed PCR, we provide evidence that chr2-linked FAME (FAME2) is caused by an expansion of an ATTTC pentamer within the first intron of STARD7. The ATTTC expansions segregate in 158/158 individuals typically affected by FAME from 22 pedigrees including 16 previously reported families recruited worldwide. RNA sequencing from patient derived fibroblasts shows no accumulation of the AUUUU or AUUUC repeat sequences and STARD7 gene expression is not affected. These data, in combination with other genes bearing similar mutations that have been implicated in FAME, suggest ATTTC expansions may cause this disorder, irrespective of the genomic locus involved.

Journal ArticleDOI
TL;DR: An updated genome annotation of the F. vesca V4 genome is provided as well as a comprehensive gene expression atlas with the new gene ID nomenclature, which will greatly facilitate gene functional studies in strawberry and other evolutionarily related plant species.
Abstract: The diploid strawberry Fragaria vesca serves as an ideal model plant for cultivated strawberry (Fragaria × ananassa, 8x) and the Rosaceae family. The F. vesca genome was initially published in 2011 using older technologies. Recently, a new and greatly improved F. vesca genome, designated V4, was published. However, the number of annotated genes is remarkably reduced in V4 (28,588 genes) compared to the prior annotations (32,831 to 33,673 genes). Additionally, the annotation of V4 (v4.0.a1) implements a new nomenclature for gene IDs (FvH4_XgXXXXX), rather than the previous nomenclature (geneXXXXX). Hence, further improvement of the V4 genome annotation and assigning gene expression levels under the new gene IDs with existing transcriptome data are necessary to facilitate the utility of this high-quality F. vesca genome V4. Here, we built a new and improved annotation, v4.0.a2, for F. vesca genome V4. The new annotation has a total of 34,007 gene models with 98.1% complete Benchmarking Universal Single-Copy Orthologs (BUSCOs). In this v4.0.a2 annotation, gene models of 8,342 existing genes are modified, 9,029 new genes are added, and 10,176 genes possess alternatively spliced isoforms with an average of 1.90 transcripts per locus. Transcription factors/regulators and protein kinases are globally identified. Interestingly, the transcription factor family FAr-red-impaired Response 1 (FAR1) contains 82 genes in v4.0.a2 but only two members in v4.0.a1. Additionally, the expression levels of all genes in the new annotation across a total of 46 different tissues and stages are provided. Finally, miRNAs and their targets are reanalyzed and presented. Altogether, this work provides an updated genome annotation of the F. vesca V4 genome as well as a comprehensive gene expression atlas with the new gene ID nomenclature, which will greatly facilitate gene functional studies in strawberry and other evolutionarily related plant species. An updated annotation of the wild strawberry genome includes over nine thousand new genes. Since the genome sequence of the wild strawberry was first published in 2011, technological improvements have led to various refinements and updates. Chunying Kang and colleagues at Huazhong Agricultural University in Wuhan, China, found that a large number of genes were either absent or inaccurately described in the annotation of the latest wild strawberry genome. They annotated 5,419 more protein-coding genes, including 139 transcription factor and 92 protein kinase encoding genes, and carried out a comprehensive analysis of the expression patterns of all genes in the new annotation. They also identified microRNAs that contribute to regulate gene expression. These data will aid future comparative and functional studies in widely grown hybrid strawberry species.

Journal ArticleDOI
TL;DR: This work proposes a robust novel method for determining VDJ haplotypes by adapting a Bayesian framework, and attest its broad application with a large, multi-individual dataset.
Abstract: Analysis of antibody repertoires by high-throughput sequencing is of major importance in understanding adaptive immune responses. Our knowledge of variations in the genomic loci encoding immunoglobulin genes is incomplete, resulting in conflicting VDJ gene assignments and biased genotype and haplotype inference. Haplotypes can be inferred using IGHJ6 heterozygosity, observed in one third of the people. Here, we propose a robust novel method for determining VDJ haplotypes by adapting a Bayesian framework. Our method extends haplotype inference to IGHD- and IGHV-based analysis, enabling inference of deletions and copy number variations in the entire population. To test this method, we generated a multi-individual data set of naive B-cell repertoires, and found allele usage bias, as well as a mosaic, tiled pattern of deleted IGHD and IGHV genes. The inferred haplotypes may have clinical implications for genetic disease predispositions. Our findings expand the knowledge that can be extracted from antibody repertoire sequencing data. High-throughput sequencing and analyzes of antibody repertoire provide important information on immune responses, but current methodologies are limited in sequence assembly precision and haplotype inference validity. Here the authors propose a new Bayesian haplotyping method, and attest its broad application with a large, multi-individual dataset.

Journal ArticleDOI
TL;DR: Human LY6 genes represent novel biomarkers for poor cancer prognosis and are required for cancer progression in addition to playing an important role in immune escape, and the mechanism associated with these phenotype is not yet clear.
Abstract: Stem Cell Antigen-1 (Sca-1/Ly6A) was the first identified member of the Lymphocyte antigen-6 (Ly6) gene family. Sca-1 serves as a marker of cancer stem cells and tissue resident stem cells in mice. The Sca-1 gene is located on mouse chromosome 15. While a direct homolog of Sca-1 in humans is missing, human chromosome 8-the syntenic region to mouse chromosome 15-harbors several genes containing the characteristic domain known as LU domain. The function of the LU domain in human LY6 gene family is not yet defined. The LY6 gene family proteins are present on human chromosome 6, 8, 11, and 19. The most interesting of these genes are located on chromosome 8q24.3, a frequently amplified locus in human cancer. Human LY6 genes represent novel biomarkers for poor cancer prognosis and are required for cancer progression in addition to playing an important role in immune escape. Although the mechanism associated with these phenotype is not yet clear, it is timely to review the current literature in order to address the critical need for future advancements in this field. This review will summarize recent findings which describe the role of human LY6 genes-LY6D, LY6E, LY6H, LY6K, PSCA, LYPD2, SLURP1, GML, GPIHBP1, and LYNX1; and their orthologs in mice at chromosome 15.

Journal ArticleDOI
TL;DR: This study developed a Bayesian hierarchical approach that uses two-stage least squares and applied it to an ATAC-seq data set from 100 individuals, to identify over 15,000 high-confidence causal interactions in the human genome.
Abstract: Physical interaction of regulatory elements in three-dimensional space poses a challenge for studies of disease because non-coding risk variants may be great distances from the genes they regulate. Experimental methods to capture these interactions, such as chromosome conformation capture, usually cannot assign causal direction of effect between regulatory elements, an important component of fine-mapping studies. We developed a Bayesian hierarchical approach that uses two-stage least squares and applied it to an ATAC-seq (assay for transposase-accessible chromatin using sequencing) data set from 100 individuals, to identify over 15,000 high-confidence causal interactions. Most (60%) interactions occurred over <20 kb, where chromosome conformation capture-based methods perform poorly. For a fraction of loci, we identified a single variant that alters accessibility across multiple regions, and experimentally validated the BLK locus, which is associated with multiple autoimmune diseases, using CRISPR genome editing. Our study highlights how association genetics of chromatin state is a powerful approach for identifying interactions between regulatory elements. A Bayesian hierarchical approach identifies over 15,000 causal regulatory interactions in the human genome using ATAC-seq data from 100 individuals. The majority of detected interactions were over distances of <20 kb, a range where 3C methods perform poorly.

Journal ArticleDOI
TL;DR: CRISPR/Cas9-mediated mutagenesis is used for the first time in an octoploid species, the cultivated strawberry, to functionally characterize the role of the B-class MADS box gene FaTM6 in flower development.
Abstract: The B-class of MADS-box transcription factors has been studied in many plant species, but remains functionally uncharacterized in Rosaceae. APETALA3 (AP3), a member of this class, controls petal and stamen identities in Arabidopsis. In this study, we identified two members of the AP3 lineage in cultivated strawberry, Fragaria × ananassa, namely FaAP3 and FaTM6. FaTM6, and not FaAP3, showed an expression pattern equivalent to that of AP3 in Arabidopsis. We used the CRISPR/Cas9 genome editing system for the first time in an octoploid species to characterize the function of TM6 in strawberry flower development. An analysis by high-throughput sequencing of the FaTM6 locus spanning the target sites showed highly efficient genome editing already present in the T0 generation. Phenotypic characterization of the mutant lines indicated that FaTM6 plays a key role in anther development in strawberry. Our results validate the use of the CRISPR/Cas9 system for gene functional analysis in F. × ananassa as an octoploid species, and offer new opportunities for engineering strawberry to improve traits of interest in breeding programs.

Journal ArticleDOI
TL;DR: This study demonstrates an efficient approach to achieve stable, consistent self-compatibility through S-RNase KO for use in diploid potato breeding approaches.
Abstract: Potato breeding can be redirected to a diploid inbred/F1 hybrid variety breeding strategy if self-compatibility can be introduced into diploid germplasm. However, the majority of diploid potato clones (Solanum spp.) possess gametophytic self-incompatibility that is primarily controlled by a single multiallelic locus called the S-locus which is composed of tightly linked genes, S-RNase (S-locus RNase) and multiple SLFs (S-locus F-box proteins), which are expressed in the style and pollen, respectively. Using S-RNase genes known to function in the Solanaceae gametophytic SI mechanism, we identified S-RNase alleles with flower-specific expression in two diploid self-incompatible potato lines using genome resequencing data. Consistent with the location of the S-locus in potato, we genetically mapped the S-RNase gene using a segregating population to a region of low recombination within the pericentromere of chromosome 1. To generate self-compatible diploid potato lines, a dual single-guide RNA (sgRNA) strategy was used to target conserved exonic regions of the S-RNase gene and generate targeted knockouts (KOs) using a Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-associated protein 9 (Cas9) approach. Self-compatibility was achieved in nine S-RNase KO T0 lines which contained bi-allelic and homozygous deletions/insertions in both genotypes, transmitting self compatibility to T1 progeny. This study demonstrates an efficient approach to achieve stable, consistent self-compatibility through S-RNase KO for use in diploid potato breeding approaches.

Journal ArticleDOI
TL;DR: Four genetic loci associated with FFA are identified by GWAS followed by Bayesian fine-mapping, co-localisation and HLA imputation which highlights HLA-B*07:02 as a risk factor.
Abstract: Frontal fibrosing alopecia (FFA) is a recently described inflammatory and scarring type of hair loss affecting almost exclusively women. Despite a dramatic recent increase in incidence the aetiopathogenesis of FFA remains unknown. We undertake genome-wide association studies in females from a UK cohort, comprising 844 cases and 3,760 controls, a Spanish cohort of 172 cases and 385 controls, and perform statistical meta-analysis. We observe genome-wide significant association with FFA at four genomic loci: 2p22.2, 6p21.1, 8q24.22 and 15q2.1. Within the 6p21.1 locus, fine-mapping indicates that the association is driven by the HLA-B*07:02 allele. At 2p22.1, we implicate a putative causal missense variant in CYP1B1, encoding the homonymous xenobiotic- and hormone-processing enzyme. Transcriptomic analysis of affected scalp tissue highlights overrepresentation of transcripts encoding components of innate and adaptive immune response pathways. These findings provide insight into disease pathogenesis and characterise FFA as a genetically predisposed immuno-inflammatory disorder driven by HLA-B*07:02.

Journal ArticleDOI
Cyril Pottier1, Yingxue Ren1, Ralph B. Perkerson1, Matt Baker1, Gregory D. Jenkins1, Marka van Blitterswijk1, Mariely DeJesus-Hernandez1, Jeroen van Rooij2, Melissa E. Murray1, Elizabeth Christopher1, Shannon K. McDonnell1, Zachary C. Fogarty1, Anthony Batzler1, Shulan Tian1, Cristina T. Vicente1, Billie J. Matchett1, Anna Karydas3, Ging-Yuek Robin Hsiung4, Harro Seelaar2, Merel O. Mol2, Elizabeth Finger5, Caroline Graff6, Linn Öijerstedt6, Manuela Neumann7, Manuela Neumann8, Peter Heutink8, Peter Heutink7, Matthis Synofzik7, Matthis Synofzik8, Carlo Wilke7, Carlo Wilke8, Johannes Prudlo8, Johannes Prudlo9, Patrizia Rizzu8, Javier Simón-Sánchez7, Javier Simón-Sánchez8, Dieter Edbauer8, Sigrun Roeber10, Janine Diehl-Schmid11, Bret M. Evers12, Andy King13, Andy King14, M.-Marsel Mesulam15, Sandra Weintraub15, Changiz Geula15, Kevin F. Bieniek1, Kevin F. Bieniek16, Leonard Petrucelli1, Geoffrey L. Ahern17, Eric M. Reiman, Bryan K. Woodruff1, Richard J. Caselli1, Edward D. Huey18, Martin R. Farlow19, Jordan Grafman15, Simon Mead20, Lea T. Grinberg3, Salvatore Spina3, Murray Grossman21, David J. Irwin21, Edward B. Lee21, EunRan Suh21, Julie S. Snowden, David G. Mann22, Nilufer Ertekin-Taner1, Ryan J. Uitti1, Zbigniew K. Wszolek1, Keith A. Josephs1, Joseph E. Parisi1, David S. Knopman1, Ronald C. Petersen1, John R. Hodges23, Olivier Piguet23, Ethan G. Geier3, Jennifer S. Yokoyama3, Robert A. Rissman24, Ekaterina Rogaeva25, Julia Keith25, Lorne Zinman25, Maria Carmela Tartaglia25, Maria Carmela Tartaglia26, Nigel J. Cairns27, Carlos Cruchaga27, Bernardino Ghetti19, Julia Kofler28, Oscar L. Lopez17, Oscar L. Lopez28, Thomas G. Beach, Thomas Arzberger10, Thomas Arzberger8, Jochen Herms8, Jochen Herms10, Lawrence S. Honig18, Jean Paul G. Vonsattel18, Glenda M. Halliday23, Glenda M. Halliday29, John B.J. Kwok23, John B.J. Kwok29, Charles L. White12, Marla Gearing30, Jonathan D. Glass30, Sara Rollinson22, Stuart Pickering-Brown22, Jonathan D. Rohrer31, John Q. Trojanowski21, Vivianna M. Van Deerlin21, Eileen H. Bigio15, Claire Troakes13, Safa Al-Sarraj14, Safa Al-Sarraj13, Yan W. Asmann1, Bruce L. Miller3, Neill R. Graff-Radford1, Bradley F. Boeve1, William W. Seeley3, Ian R. A. Mackenzie4, John C. van Swieten2, Dennis W. Dickson1, Joanna M. Biernacka1, Rosa Rademakers1 
TL;DR: A possible role for genes functioning within the TBK1-related immune pathway (e.g., DHX58, TRIM21, IRF7) in the genetic etiology of FTLD-TDP is discovered and strongly implicates the immune pathway in FTLD/TDP pathogenesis.
Abstract: Frontotemporal lobar degeneration with neuronal inclusions of the TAR DNA-binding protein 43 (FTLD-TDP) represents the most common pathological subtype of FTLD. We established the international FTLD-TDP whole-genome sequencing consortium to thoroughly characterize the known genetic causes of FTLD-TDP and identify novel genetic risk factors. Through the study of 1131 unrelated Caucasian patients, we estimated that C9orf72 repeat expansions and GRN loss-of-function mutations account for 25.5% and 13.9% of FTLD-TDP patients, respectively. Mutations in TBK1 (1.5%) and other known FTLD genes (1.4%) were rare, and the disease in 57.7% of FTLD-TDP patients was unexplained by the known FTLD genes. To unravel the contribution of common genetic factors to the FTLD-TDP etiology in these patients, we conducted a two-stage association study comprising the analysis of whole-genome sequencing data from 517 FTLD-TDP patients and 838 controls, followed by targeted genotyping of the most associated genomic loci in 119 additional FTLD-TDP patients and 1653 controls. We identified three genome-wide significant FTLD-TDP risk loci: one new locus at chromosome 7q36 within the DPP6 gene led by rs118113626 (p value = 4.82e − 08, OR = 2.12), and two known loci: UNC13A, led by rs1297319 (p value = 1.27e − 08, OR = 1.50) and HLA-DQA2 led by rs17219281 (p value = 3.22e − 08, OR = 1.98). While HLA represents a locus previously implicated in clinical FTLD and related neurodegenerative disorders, the association signal in our study is independent from previously reported associations. Through inspection of our whole-genome sequence data for genes with an excess of rare loss-of-function variants in FTLD-TDP patients (n ≥ 3) as compared to controls (n = 0), we further discovered a possible role for genes functioning within the TBK1-related immune pathway (e.g., DHX58, TRIM21, IRF7) in the genetic etiology of FTLD-TDP. Together, our study based on the largest cohort of unrelated FTLD-TDP patients assembled to date provides a comprehensive view of the genetic landscape of FTLD-TDP, nominates novel FTLD-TDP risk loci, and strongly implicates the immune pathway in FTLD-TDP pathogenesis.

Journal ArticleDOI
TL;DR: This work reviews several questions about the evolution of sex-determining systems and sex chromosomes that require studies of young systems, including: the kinds of mutations involved in the transition to unisexual reproduction from hermaphroditism or monoecy; the times when they arose; and the extent to which the properties of sex -linked regions of genomes reflect responses to new selective situations created by the presence of a sex-Determining locus.
Abstract: A major reason for studying plant sex chromosomes is that they may often be 'young' systems. There is considerable evidence for the independent evolution of separate sexes within plant families or genera, in some cases showing that the maximum possible time during which their sex-determining genes have existed must be much shorter than those of several animal taxa. Consequently, their sex-linked regions could either have evolved soon after genetic sex determination arose or considerably later. Plants, therefore, include species with both young and old systems. I review several questions about the evolution of sex-determining systems and sex chromosomes that require studies of young systems, including: the kinds of mutations involved in the transition to unisexual reproduction from hermaphroditism or monoecy (a form of functional hermaphroditism); the times when they arose; and the extent to which the properties of sex-linked regions of genomes reflect responses to new selective situations created by the presence of a sex-determining locus. I also evaluate which questions are best studied in plants, vs other suitable candidate organisms. Studies of young plant systems can help understand general evolutionary processes that are shared with the sex chromosomes of other organisms.

Journal ArticleDOI
TL;DR: Global gene expression analyses of tissues across the panel implicated adipose tissue "beiging" and mitochondrial functions in the sex differences and reduced adipose mitochondrial function in males as compared to females was associated with increased susceptibility to obesity and insulin resistance.

Journal ArticleDOI
TL;DR: A genome-wide analysis of type 2 diabetes (T2D) in sub-Saharan Africans, an understudied ancestral group, is reported and a locus, ZRANB3, is identified that is specific for this population is identified.
Abstract: Genome analysis of diverse human populations has contributed to the identification of novel genomic loci for diseases of major clinical and public health impact. Here, we report a genome-wide analysis of type 2 diabetes (T2D) in sub-Saharan Africans, an understudied ancestral group. We analyze ~18 million autosomal SNPs in 5,231 individuals from Nigeria, Ghana and Kenya. We identify a previously-unreported genome-wide significant locus: ZRANB3 (Zinc Finger RANBP2-Type Containing 3, lead SNP p = 2.831 × 10−9). Knockdown or genomic knockout of the zebrafish ortholog results in reduction in pancreatic β-cell number which we demonstrate to be due to increased apoptosis in islets. siRNA transfection of murine Zranb3 in MIN6 β-cells results in impaired insulin secretion in response to high glucose, implicating Zranb3 in β-cell functional response to high glucose conditions. We also show transferability in our study of 32 established T2D loci. Our findings advance understanding of the genetics of T2D in non-European ancestry populations. Type 2 diabetes (T2D) is prevalent in populations worldwide, however, mostly studied in European and mixed-ancestry populations. Here, the authors perform a genome-wide association study for T2D in over 5,000 sub-Saharan Africans and identify a locus, ZRANB3, that is specific for this population.

Journal ArticleDOI
TL;DR: The StGBSSI gene was successfully and precisely edited in the tetraploid potato using gene and base-editing strategies, leading to plants with impaired amylose biosynthesis, opening up new avenues for genome engineering in this species.
Abstract: The StGBSSI gene was successfully and precisely edited in the tetraploid potato using gene and base-editing strategies, leading to plants with impaired amylose biosynthesis. Genome editing has recently become a method of choice for basic research and functional genomics, and holds great potential for molecular plant-breeding applications. The powerful CRISPR-Cas9 system that typically produces double-strand DNA breaks is mainly used to generate knockout mutants. Recently, the development of base editors has broadened the scope of genome editing, allowing precise and efficient nucleotide substitutions. In this study, we produced mutants in two cultivated elite cultivars of the tetraploid potato (Solanum tuberosum) using stable or transient expression of the CRISPR-Cas9 components to knock out the amylose-producing StGBSSI gene. We set up a rapid, highly sensitive and cost-effective screening strategy based on high-resolution melting analysis followed by direct Sanger sequencing and trace chromatogram analysis. Most mutations consisted of small indels, but unwanted insertions of plasmid DNA were also observed. We successfully created tetra-allelic mutants with impaired amylose biosynthesis, confirming the loss of function of the StGBSSI protein. The second main objective of this work was to demonstrate the proof of concept of CRISPR-Cas9 base editing in the tetraploid potato by targeting two loci encoding catalytic motifs of the StGBSSI enzyme. Using a cytidine base editor (CBE), we efficiently and precisely induced DNA substitutions in the KTGGL-encoding locus, leading to discrete variation in the amino acid sequence and generating a loss-of-function allele. The successful application of base editing in the tetraploid potato opens up new avenues for genome engineering in this species.

Journal ArticleDOI
TL;DR: This work uses locally adapted and phenotypically differentiated Arabidopsis lyrata populations from two altitudinal gradients in Norway to detect signatures of selection for local adaptation, and estimates patterns of lineage specific differentiation among these populations.
Abstract: Short-scale local adaptation is a complex process involving selection, migration and drift. The expected effects on the genome are well grounded in theory but examining these on an empirical level has proven difficult, as it requires information about local selection, demographic history and recombination rate variation. Here, we use locally adapted and phenotypically differentiated Arabidopsis lyrata populations from two altitudinal gradients in Norway to test these expectations at the whole-genome level. Demography modelling indicates that populations within the gradients diverged less than 2 kya and that the sites are connected by gene flow. The gene flow estimates are, however, highly asymmetric with migration from high to low altitudes being several times more frequent than vice versa. To detect signatures of selection for local adaptation, we estimate patterns of lineage specific differentiation among these populations. Theory predicts that gene flow leads to concentration of adaptive loci in areas of low recombination; a pattern we observe in both lowland-alpine comparisons. Although most selected loci display patterns of conditional neutrality, we found indications of genetic trade-offs, with one locus particularly showing high differentiation and signs of selection in both populations. Our results further suggest that resistance to solar radiation is an important adaptation to alpine environments, while vegetative growth and bacterial defense are indicated as selected traits in the lowland habitats. These results provide insights into genetic architectures and evolutionary processes driving local adaptation under gene flow. We also contribute to understanding of traits and biological processes underlying alpine adaptation in northern latitudes.

Journal ArticleDOI
TL;DR: A novel genome-wide approach to the analysis of a genetic cross in non-model organisms with extreme genetic diversity is highlighted, and the importance of a high-quality reference genome in interpreting the signals of selection so identified is highlighted.
Abstract: Infections with helminths cause an enormous disease burden in billions of animals and plants worldwide. Large scale use of anthelmintics has driven the evolution of resistance in a number of species that infect livestock and companion animals, and there are growing concerns regarding the reduced efficacy in some human-infective helminths. Understanding the mechanisms by which resistance evolves is the focus of increasing interest; robust genetic analysis of helminths is challenging, and although many candidate genes have been proposed, the genetic basis of resistance remains poorly resolved. Here, we present a genome-wide analysis of two genetic crosses between ivermectin resistant and sensitive isolates of the parasitic nematode Haemonchus contortus, an economically important gastrointestinal parasite of small ruminants and a model for anthelmintic research. Whole genome sequencing of parental populations, and key stages throughout the crosses, identified extensive genomic diversity that differentiates populations, but after backcrossing and selection, a single genomic quantitative trait locus (QTL) localised on chromosome V was revealed to be associated with ivermectin resistance. This QTL was common between the two geographically and genetically divergent resistant populations and did not include any leading candidate genes, suggestive of a previously uncharacterised mechanism and/or driver of resistance. Despite limited resolution due to low recombination in this region, population genetic analyses and novel evolutionary models supported strong selection at this QTL, driven by at least partial dominance of the resistant allele, and that large resistance-associated haplotype blocks were enriched in response to selection. We have described the genetic architecture and mode of ivermectin selection, revealing a major genomic locus associated with ivermectin resistance, the most conclusive evidence to date in any parasitic nematode. This study highlights a novel genome-wide approach to the analysis of a genetic cross in non-model organisms with extreme genetic diversity, and the importance of a high-quality reference genome in interpreting the signals of selection so identified.

Posted ContentDOI
10 Dec 2019-bioRxiv
TL;DR: A curated reference database of the 92 publicly available gene clusters at the locus encoding proteins responsible for biosynthesis and export of CPS (K locus), and a second database for the 12 gene clusters for outer core biosynthesis (OC locus) is created to take better advantage of the untapped information available in whole genome sequences.
Abstract: Multiply antibiotic resistant Acinetobacter baumannii infections are a global public health concern and accurate tracking of the spread of specific lineages is needed. Variation in the composition and structure of capsular polysaccharide (CPS), a critical determinant of virulence and phage susceptibility, makes it an attractive epidemiological marker. The outer core (OC) of lipooligosaccharide also exhibits variation. To take better advantage of the untapped information available in whole genome sequences, we have created a curated reference database of the 92 publicly available gene clusters at the locus encoding proteins responsible for biosynthesis and export of CPS (K locus), and a second database for the 12 gene clusters at the locus for outer core biosynthesis (OC locus). Each entry has been assigned a unique KL or OCL number, and is fully annotated using a simple, transparent and standardised nomenclature. These databases are compatible with Kaptive, a tool for in silico typing of bacterial surface polysaccharide loci, and their utility was validated using a) >630 assembled A. baumannii draft genomes for which the KL and OCL regions had been previously typed manually, and b) 3386 A. baumannii genome assemblies downloaded from NCBI. Among the previously typed genomes, Kaptive was able to confidently assign KL and OCL types with 100% accuracy. Among the genomes retrieved from NCBI, Kaptive detected known KL and OCL in 87% and 90% of genomes, respectively indicating that the majority of common KL and OCL types are captured within the databases; 19 KL were not detected in any public genome assembly. The failure to assign a KL or OCL type may indicate incomplete or poor-quality genomes. However, further novel variants may remain to be documented. Combining outputs with multi-locus sequence typing (Institut Pasteur scheme) revealed multiple KL and OCL types in collections of a single sequence type (ST) representing each of the two predominant globally-distributed clones, ST1 of GC1 and ST2 of GC2, and in collections of other clones comprising >20 isolates each (ST10, ST25, and ST140), indicating extensive within-clone replacement of these loci. The databases are available at https://github.com/katholt/Kaptive and will be updated as further locus types become available.