scispace - formally typeset
Search or ask a question

Showing papers in "G3: Genes, Genomes, Genetics in 2020"


Journal ArticleDOI
TL;DR: BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies, is presented, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the browser-based Viewer.
Abstract: Reconstruction of target genomes from sequence data produced by instruments that are agnostic as to the species-of-origin may be confounded by contaminant DNA. Whether introduced during sample processing or through co-extraction alongside the target DNA, if insufficient care is taken during the assembly process, the final assembled genome may be a mixture of data from several species. Such assemblies can confound sequence-based biological inference and, when deposited in public databases, may be included in downstream analyses by users unaware of underlying problems. We present BlobToolKit, a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies. BlobToolKit can be used to process assembly, read and analysis files for fully reproducible interactive exploration in the browser-based Viewer. BlobToolKit can be used during assembly to filter non-target DNA, helping researchers produce assemblies with high biological credibility. We have been running an automated BlobToolKit pipeline on eukaryotic assemblies publicly available in the International Nucleotide Sequence Data Collaboration and are making the results available through a public instance of the Viewer at https://blobtoolkit.genomehubs.org/view We aim to complete analysis of all publicly available genomes and then maintain currency with the flow of new genomes. We have worked to embed these views into the presentation of genome assemblies at the European Nucleotide Archive, providing an indication of assembly quality alongside the public record with links out to allow full exploration in the Viewer.

477 citations


Journal ArticleDOI
TL;DR: LongQC is proposed as an easy and automated quality control tool for genomic datasets generated by third generation sequencing technologies such as Oxford Nanopore technologies and SMRT sequencing from Pacific Bioscience.
Abstract: We propose LongQC as an easy and automated quality control tool for genomic datasets generated by third generation sequencing (TGS) technologies such as Oxford Nanopore technologies (ONT) and SMRT sequencing from Pacific Bioscience (PacBio). Key statistics were optimized for long read data, and LongQC covers all major TGS platforms. LongQC processes and visualizes those statistics automatically and quickly.

67 citations


Journal ArticleDOI
TL;DR: This work investigates the performance of PRS for height in cohorts with admixed African and European ancestry, and shows that the predictive accuracy of height PRS increases linearly with European ancestry and is partially explained by European ancestry segments of the admixed genomes.
Abstract: Polygenic risk scores (PRS) use the results of genome-wide association studies (GWAS) to predict quantitative phenotypes or disease risk at an individual level, and provide a potential route to the use of genetic data in personalized medical care. However, a major barrier to the use of PRS is that the majority of GWAS come from cohorts of European ancestry. The predictive power of PRS constructed from these studies is substantially lower in non-European ancestry cohorts, although the reasons for this are unclear. To address this question, we investigate the performance of PRS for height in cohorts with admixed African and European ancestry, allowing us to evaluate ancestry-related differences in PRS predictive accuracy while controlling for environment and cohort differences. We first show that the predictive accuracy of height PRS increases linearly with European ancestry and is partially explained by European ancestry segments of the admixed genomes. We show that recombination rate, differences in allele frequencies, and differences in marginal effect sizes across ancestries all contribute to the decrease in predictive power, but none of these effects explain the decrease on its own. Finally, we demonstrate that prediction for admixed individuals can be improved by using a linear combination of PRS that includes ancestry-specific effect sizes, although this approach is at present limited by the small size of non-European ancestry discovery cohorts.

65 citations


Journal ArticleDOI
TL;DR: It is demonstrated that inclusion of a GME can result in high efficiency of disruption of both genes during super-Mendelian propagation of split-HGD, indicating the robust somatic expression of Cas9 driven by Drosophila germline-limited promoters.
Abstract: Homing based gene drives (HGD) possess the potential to spread linked cargo genes into natural populations and are poised to revolutionize population control of animals. Given that host encoded genes have been identified that are important for pathogen transmission, targeting these genes using guide RNAs as cargo genes linked to drives may provide a robust method to prevent disease transmission. However, effectiveness of the inclusion of additional guide RNAs that target separate genes has not been thoroughly explored. To test this approach, we generated a split-HGD in Drosophila melanogaster that encoded a drive linked effector consisting of a second gRNA engineered to target a separate host-encoded gene, which we term a gRNA-mediated effector (GME). This design enabled us to assess homing and knockout efficiencies of two target genes simultaneously, and also explore the timing and tissue specificity of Cas9 expression on cleavage/homing rates. We demonstrate that inclusion of a GME can result in high efficiency of disruption of both genes during super-Mendelian propagation of split-HGD. Furthermore, both genes were knocked out one generation earlier than expected indicating the robust somatic expression of Cas9 driven by Drosophila germline-limited promoters. We also assess the efficiency of 'shadow drive' generated by maternally deposited Cas9 protein and accumulation of drive-induced resistance alleles along multiple generations, and discuss design principles of HGD that could mitigate the accumulation of resistance alleles while incorporating a GME.

64 citations


Journal ArticleDOI
TL;DR: CeMbio is introduced, a simplified natural Caenorhabditis elegans microbiota derived from the previous meta-analysis of the natural microbiome of this nematode, providing a versatile resource and toolbox for the in-depth dissection of naturally relevant host-microbiome interactions in C. elegans.
Abstract: The study of microbiomes by sequencing has revealed a plethora of correlations between microbial community composition and various life-history characteristics of the corresponding host species. However, inferring causation from correlation is often hampered by the sheer compositional complexity of microbiomes, even in simple organisms. Synthetic communities offer an effective approach to infer cause-effect relationships in host-microbiome systems. Yet the available communities suffer from several drawbacks, such as artificial (thus non-natural) choice of microbes, microbe-host mismatch (e.g., human microbes in gnotobiotic mice), or hosts lacking genetic tractability. Here we introduce CeMbio, a simplified natural Caenorhabditis elegans microbiota derived from our previous meta-analysis of the natural microbiome of this nematode. The CeMbio resource is amenable to all strengths of the C. elegans model system, strains included are readily culturable, they all colonize the worm gut individually, and comprise a robust community that distinctly affects nematode life-history. Several tools have additionally been developed for the CeMbio strains, including diagnostic PCR primers, completely sequenced genomes, and metabolic network models. With CeMbio, we provide a versatile resource and toolbox for the in-depth dissection of naturally relevant host-microbiome interactions in C. elegans.

64 citations


Journal ArticleDOI
TL;DR: The results showed that the genome-based model including GE (M3) captured more phenotypic variation than the models that did not include this component and provided higher prediction accuracy than models M1 and M2 for the different allocation scenarios.
Abstract: "Sparse testing" refers to reduced multi-environment breeding trials in which not all genotypes of interest are grown in each environment. Using genomic-enabled prediction and a model embracing genotype × environment interaction (GE), the non-observed genotype-in-environment combinations can be predicted. Consequently, the overall costs can be reduced and the testing capacities can be increased. The accuracy of predicting the unobserved data depends on different factors including (1) how many genotypes overlap between environments, (2) in how many environments each genotype is grown, and (3) which prediction method is used. In this research, we studied the predictive ability obtained when using a fixed number of plots and different sparse testing designs. The considered designs included the extreme cases of (1) no overlap of genotypes between environments, and (2) complete overlap of the genotypes between environments. In the latter case, the prediction set fully consists of genotypes that have not been tested at all. Moreover, we gradually go from one extreme to the other considering (3) intermediates between the two previous cases with varying numbers of different or non-overlapping (NO)/overlapping (O) genotypes. The empirical study is built upon two different maize hybrid data sets consisting of different genotypes crossed to two different testers (T1 and T2) and each data set was analyzed separately. For each set, phenotypic records on yield from three different environments are available. Three different prediction models were implemented, two main effects models (M1 and M2), and a model (M3) including GE. The results showed that the genome-based model including GE (M3) captured more phenotypic variation than the models that did not include this component. Also, M3 provided higher prediction accuracy than models M1 and M2 for the different allocation scenarios. Reducing the size of the calibration sets decreased the prediction accuracy under all allocation designs with M3 being the less affected model; however, using the genome-enabled models (i.e., M2 and M3) the predictive ability is recovered when more genotypes are tested across environments. Our results indicate that a substantial part of the testing resources can be saved when using genome-based models including GE for optimizing sparse testing designs.

58 citations


Journal ArticleDOI
TL;DR: This study showed the potential of improving the genomic prediction of complex traits by incorporating the information from multiple traits collected throughout breeding programs which could assist in speeding up breeding cycles.
Abstract: Plant breeders regularly evaluate multiple traits across multiple environments, which opens an avenue for using multiple traits in genomic prediction models. We assessed the potential of multi-trait (MT) genomic prediction model through evaluating several strategies of incorporating multiple traits (eight agronomic and malting quality traits) into the prediction models with two cross-validation schemes (CV1, predicting new lines with genotypic information only and CV2, predicting partially phenotyped lines using both genotypic and phenotypic information from correlated traits) in barley. The predictive ability was similar for single (ST-CV1) and multi-trait (MT-CV1) models to predict new lines. However, the predictive ability for agronomic traits was considerably increased when partially phenotyped lines (MT-CV2) were used. The predictive ability for grain yield using the MT-CV2 model with other agronomic traits resulted in 57% and 61% higher predictive ability than ST-CV1 and MT-CV1 models, respectively. Therefore, complex traits such as grain yield are better predicted when correlated traits are used. Similarly, a considerable increase in the predictive ability of malting quality traits was observed when correlated traits were used. The predictive ability for grain protein content using the MT-CV2 model with both agronomic and malting traits resulted in a 76% higher predictive ability than ST-CV1 and MT-CV1 models. Additionally, the higher predictive ability for new environments was obtained for all traits using the MT-CV2 model compared to the MT-CV1 model. This study showed the potential of improving the genomic prediction of complex traits by incorporating the information from multiple traits (cost-friendly and easy to measure traits) collected throughout breeding programs which could assist in speeding up breeding cycles.

56 citations


Journal ArticleDOI
TL;DR: This project compares estimation methods using next-generation sequencing to measurements from flow cytometry, a standard method for genome size measures, using ground beetles and other members of the beetle suborder Adephaga as the test system, and presents a new protocol for using read-depth of single-copy genes to estimate genome size.
Abstract: Measuring genome size across different species can yield important insights into evolution of the genome and allow for more informed decisions when designing next-generation genomic sequencing projects. New techniques for estimating genome size using shallow genomic sequence data have emerged which have the potential to augment our knowledge of genome sizes, yet these methods have only been used in a limited number of empirical studies. In this project, we compare estimation methods using next-generation sequencing (k-mer methods and average read depth of single-copy genes) to measurements from flow cytometry, a standard method for genome size measures, using ground beetles (Carabidae) and other members of the beetle suborder Adephaga as our test system. We also present a new protocol for using read-depth of single-copy genes to estimate genome size. Additionally, we report flow cytometry measurements for five previously unmeasured carabid species, as well as 21 new draft genomes and six new draft transcriptomes across eight species of adephagan beetles. No single sequence-based method performed well on all species, and all tended to underestimate the genome sizes, although only slightly in most samples. For one species, Bembidion sp. nr. transversale, most sequence-based methods yielded estimates half the size suggested by flow cytometry.

56 citations


Journal ArticleDOI
TL;DR: The results suggest that imputation from very low to medium density can be a cost-effective tool for genomic selection in Atlantic salmon breeding programs.
Abstract: Genomic selection enables cumulative genetic gains in key production traits such as disease resistance, playing an important role in the economic and environmental sustainability of aquaculture production. However, it requires genome-wide genetic marker data on large populations, which can be prohibitively expensive. Genotype imputation is a cost-effective method for obtaining high-density genotypes, but its value in aquaculture breeding programs which are characterised by large full-sibling families has yet to be fully assessed. The aim of this study was to optimise the use of low-density genotypes and evaluate genotype imputation strategies for cost-effective genomic prediction. Phenotypes and genotypes (78,362 SNPs) were obtained for 610 individuals from a Scottish Atlantic salmon breeding program population (Landcatch, UK) challenged with sea lice, Lepeophtheirus salmonis. The genomic prediction accuracy of genomic selection was calculated using GBLUP approaches and compared across SNP panels of varying densities and composition, with and without imputation. Imputation was tested when parents were genotyped for the optimal SNP panel, and offspring were genotyped for a range of lower density imputation panels. Reducing SNP density had little impact on prediction accuracy until 5,000 SNPs, below which the accuracy dropped. Imputation accuracy increased with increasing imputation panel density. Genomic prediction accuracy when offspring were genotyped for just 200 SNPs, and parents for 5,000 SNPs, was 0.53. This accuracy was similar to the full high density and optimal density dataset, and markedly higher than using 200 SNPs without imputation. These results suggest that imputation from very low to medium density can be a cost-effective tool for genomic selection in Atlantic salmon breeding programs.

50 citations


Journal ArticleDOI
TL;DR: This work improves the use and understanding of the AID system for dissecting gene function at the single-cell level during C. elegans development by utilizing 1-naphthaleneacetic acid (NAA), an indoles-free synthetic analog of the natural auxin indole-3-acetic Acid (IAA).
Abstract: As developmental biologists in the age of genome editing, we now have access to an ever-increasing array of tools to manipulate endogenous gene expression. The auxin-inducible degradation system allows for spatial and temporal control of protein degradation via a hormone-inducible Arabidopsis F-box protein, transport inhibitor response 1 (TIR1). In the presence of auxin, TIR1 serves as a substrate-recognition component of the E3 ubiquitin ligase complex SKP1-CUL1-F-box (SCF), ubiquitinating auxin-inducible degron (AID)-tagged proteins for proteasomal degradation. Here, we optimize the Caenorhabditis elegans AID system by utilizing 1-naphthaleneacetic acid (NAA), an indole-free synthetic analog of the natural auxin indole-3-acetic acid (IAA). We take advantage of the photostability of NAA to demonstrate via quantitative high-resolution microscopy that rapid degradation of target proteins can be detected in single cells within 30 min of exposure. Additionally, we show that NAA works robustly in both standard growth media and physiological buffer. We also demonstrate that K-NAA, the water-soluble, potassium salt of NAA, can be combined with microfluidics for targeted protein degradation in C. elegans larvae. We provide insight into how the AID system functions in C. elegans by determining that TIR1 depends on C. elegans SKR-1/2, CUL-1, and RBX-1 to degrade target proteins. Finally, we present highly penetrant defects from NAA-mediated degradation of the FTZ-F1 nuclear hormone receptor, NHR-25, during C. elegans uterine-vulval development. Together, this work improves our use and understanding of the AID system for dissecting gene function at the single-cell level during C. elegans development.

47 citations


Journal ArticleDOI
TL;DR: A high-quality Golden Promise reference assembly will be useful and utilized by the whole barley research community but will prove particularly useful for CRISPR-Cas9 experiments.
Abstract: Barley (Hordeum vulgare) is one of the most important crops worldwide and is also considered a research model for the large-genome small grain temperate cereals. Despite genomic resources improving all the time, they are limited for the cv Golden Promise, the most efficient genotype for genetic transformation. We have developed a barley cv Golden Promise reference assembly integrating Illumina paired-end reads, long mate-pair reads, Dovetail Chicago in vitro proximity ligation libraries and chromosome conformation capture sequencing (Hi-C) libraries into a contiguous reference assembly. The assembled genome of 7 chromosomes and 4.13Gb in size, has a super-scaffold N50 after Chicago libraries of 4.14Mb and contains only 2.2% gaps. Using BUSCO (benchmarking universal single copy orthologous genes) as evaluation the genome assembly contains 95.2% of complete and single copy genes from the plant database. A high-quality Golden Promise reference assembly will be useful and utilized by the whole barley research community but will prove particularly useful for CRISPR-Cas9 experiments.

Journal ArticleDOI
TL;DR: This giant sequoia reference genome sequence represents the first genome sequenced in the Cupressaceae family, and lays a foundation for using genomic tools to aid in giant Sequoiadendron giganteum conservation and management.
Abstract: The giant sequoia (Sequoiadendron giganteum) of California are massive, long-lived trees that grow along the U.S. Sierra Nevada mountains. Genomic data are limited in giant sequoia and producing a reference genome sequence has been an important goal to allow marker development for restoration and management. Using deep-coverage Illumina and Oxford Nanopore sequencing, combined with Dovetail chromosome conformation capture libraries, the genome was assembled into eleven chromosome-scale scaffolds containing 8.125 Gbp of sequence. Iso-Seq transcripts, assembled from three distinct tissues, was used as evidence to annotate a total of 41,632 protein-coding genes. The genome was found to contain, distributed unevenly across all 11 chromosomes and in 63 orthogroups, over 900 complete or partial predicted NLR genes, of which 375 are supported by annotation derived from protein evidence and gene modeling. This giant sequoia reference genome sequence represents the first genome sequenced in the Cupressaceae family, and lays a foundation for using genomic tools to aid in giant sequoia conservation and management.

Journal ArticleDOI
TL;DR: It is proposed that the hexasomic-bivalent inheritance promotes stability to the allelic transmission in sweetpotato.
Abstract: The hexaploid sweetpotato (Ipomoea batatas (L.) Lam., 2n = 6x = 90) is an important staple food crop worldwide and plays a vital role in alleviating famine in developing countries. Due to its high ploidy level, genetic studies in sweetpotato lag behind major diploid crops significantly. We built an ultra-dense multilocus integrated genetic map and characterized the inheritance system in a sweetpotato full-sib family using our newly developed software, MAPpoly. The resulting genetic map revealed 96.5% collinearity between I. batatas and its diploid relative I. trifida We computed the genotypic probabilities across the whole genome for all individuals in the mapping population and inferred their complete hexaploid haplotypes. We provide evidence that most of the meiotic configurations (73.3%) were resolved in bivalents, although a small portion of multivalent signatures (15.7%), among other inconclusive configurations (11.0%), were also observed. Except for low levels of preferential pairing in linkage group 2, we observed a hexasomic inheritance mechanism in all linkage groups. We propose that the hexasomic-bivalent inheritance promotes stability to the allelic transmission in sweetpotato.

Journal ArticleDOI
TL;DR: The ReMOT Control technique is adapted to deliver Cas9 ribonucleoprotein complex to adult mosquito ovaries, generating targeted and heritable mutations in the malaria vector Anopheles stephensi without injecting embryos, opening the power of CRISPR/Cas9 methods to malaria laboratories that lack the equipment or expertise.
Abstract: Innovative tools are essential for advancing malaria control and depend on an understanding of molecular mechanisms governing transmission of malaria parasites by Anopheles mosquitoes. CRISPR/Cas9-based gene disruption is a powerful method to uncover underlying biology of vector-pathogen interactions and can itself form the basis of mosquito control strategies. However, embryo injection methods used to genetically manipulate mosquitoes (especially Anopheles) are difficult and inefficient, particularly for non-specialist laboratories. Here, we adapted the ReMOT Control (Receptor-mediated Ovary Transduction of Cargo) technique to deliver Cas9 ribonucleoprotein complex to adult mosquito ovaries, generating targeted and heritable mutations in the malaria vector Anopheles stephensi without injecting embryos. In Anopheles, ReMOT Control gene editing was as efficient as standard embryo injections. The application of ReMOT Control to Anopheles opens the power of CRISPR/Cas9 methods to malaria laboratories that lack the equipment or expertise to perform embryo injections and establishes the flexibility of ReMOT Control for diverse mosquito species.

Journal ArticleDOI
TL;DR: Chromonomer takes a user-defined reference genome, a map of genetic markers, and, optionally, conserved synteny information to construct an improved reference genome of chromosome models: a “chromonome”, which is demonstrated on genome assemblies and genetic maps that have disparate characteristics and levels of quality.
Abstract: The pace of the sequencing and computational assembly of novel reference genomes is accelerating. Though DNA sequencing technologies and assembly software tools continue to improve, biological features of genomes such as repetitive sequence as well as molecular artifacts that often accompany sequencing library preparation can lead to fragmented or chimeric assemblies. If left uncorrected, defects like these trammel progress on understanding genome structure and function, or worse, positively mislead this research. Fortunately, integration of additional, independent streams of information, such as a marker-dense genetic map and conserved orthologous gene order from related taxa, can be used to scaffold together unlinked, disordered fragments and to restructure a reference genome where it is incorrectly joined. We present a tool set for automating these processes, one that additionally tracks any changes to the assembly and to the genetic map, and which allows the user to scrutinize these changes with the help of web-based, graphical visualizations. Chromonomer takes a user-defined reference genome, a map of genetic markers, and, optionally, conserved synteny information to construct an improved reference genome of chromosome models: a "chromonome". We demonstrate Chromonomer's performance on genome assemblies and genetic maps that have disparate characteristics and levels of quality.

Journal ArticleDOI
TL;DR: A collection of codon-optimized coding sequences for SARS-CoV-2 cloned into Gateway-compatible entry vectors, which enable rapid transfer into a variety of expression and tagging vectors are described.
Abstract: The world is facing a global pandemic of COVID-19 caused by the SARS-CoV-2 coronavirus. Here we describe a collection of codon-optimized coding sequences for SARS-CoV-2 cloned into Gateway-compatible entry vectors, which enable rapid transfer into a variety of expression and tagging vectors. The collection is freely available. We hope that widespread availability of this SARS-CoV-2 resource will enable many subsequent molecular studies to better understand the viral life cycle and how to block it.

Journal ArticleDOI
TL;DR: A meta-analysis of diets used in fly microbiome research is performed and a web-based tool for researchers to determine the nutritional content of diets of interest is provided to aid the broader community in contextualizing past and future studies across the scope of D. melanogaster research.
Abstract: Nutrition is a major factor influencing many aspects of Drosophila melanogaster physiology. However, a wide range of diets, many of which are termed "standard" in the literature, are utilized for D. melanogaster research, leading to inconsistencies in reporting of nutrition-dependent phenotypes across the field. This is especially evident in microbiome studies, as diet has a pivotal role in microbiome composition and resulting host-microbe interactions. Here, we performed a meta-analysis of diets used in fly microbiome research and provide a web-based tool for researchers to determine the nutritional content of diets of interest. While our meta-analysis primarily focuses on microbiome studies, our goal in developing these resources is to aid the broader community in contextualizing past and future studies across the scope of D. melanogaster research to better understand how individual lab diets can contribute to observed phenotypes.

Journal ArticleDOI
TL;DR: It is shown that scb-1, which was previously implicated in response to bleomycin, also underlies responses to other double-strand DNA break-inducing chemotherapeutics, providing new evidence for the role of scB-1 in the nematode drug response and highlighting the power of mediation analysis to identify causal genes.
Abstract: Pleiotropy, the concept that a single gene controls multiple distinct traits, is prevalent in most organisms and has broad implications for medicine and agriculture. The identification of the molecular mechanisms underlying pleiotropy has the power to reveal previously unknown biological connections between seemingly unrelated traits. Additionally, the discovery of pleiotropic genes increases our understanding of both genetic and phenotypic complexity by characterizing novel gene functions. Quantitative trait locus (QTL) mapping has been used to identify several pleiotropic regions in many organisms. However, gene knockout studies are needed to eliminate the possibility of tightly linked, non-pleiotropic loci. Here, we use a panel of 296 recombinant inbred advanced intercross lines of Caenorhabditis elegans and a high-throughput fitness assay to identify a single large-effect QTL on the center of chromosome V associated with variation in responses to eight chemotherapeutics. We validate this QTL with near-isogenic lines and pair genome-wide gene expression data with drug response traits to perform mediation analysis, leading to the identification of a pleiotropic candidate gene, scb-1, for some of the eight chemotherapeutics. Using deletion strains created by genome editing, we show that scb-1, which was previously implicated in response to bleomycin, also underlies responses to other double-strand DNA break-inducing chemotherapeutics. This finding provides new evidence for the role of scb-1 in the nematode drug response and highlights the power of mediation analysis to identify causal genes.

Journal ArticleDOI
TL;DR: The de novo sequenced genome of Solyntus is introduced as the next standard reference in potato genome studies and provides a more direct and contiguous reference then ever before available.
Abstract: With the rapid expansion of the application of genomics and sequencing in plant breeding, there is a constant drive for better reference genomes. In potato (Solanum tuberosum), the third largest food crop in the world, the related species S. phureja, designated “DM”, has been used as the most popular reference genome for the last 10 years. Here, we introduce the de novo sequenced genome of Solyntus as the next standard reference in potato genome studies. A true Solanum tuberosum made up of 116 contigs that is also highly homozygous, diploid, vigorous and self-compatible, Solyntus provides a more direct and contiguous reference then ever before available. It was constructed by sequencing with state-of-the-art long and short read technology and assembled with Canu. The 116 contigs were assembled into scaffolds to form each pseudochromosome, with three contigs to 17 contigs per chromosome. This assembly contains 93.7% of the single-copy gene orthologs from the Solanaceae set and has an N50 of 63.7 Mbp. The genome and related files can be found at https://www.plantbreeding.wur.nl/Solyntus/. With the release of this research line and its draft genome we anticipate many exciting developments in (diploid) potato research.

Journal ArticleDOI
TL;DR: It is proposed differentially expressed genes tied to ant neurobiology, odor response, circadian rhythms, and foraging behavior may result by activity of putative fungal effectors such as enterotoxins, aflatrem, and mechanisms disrupting feeding behaviors in the ant.
Abstract: Ant-infecting Ophiocordyceps fungi are globally distributed, host manipulating, specialist parasites that drive aberrant behaviors in infected ants, at a lethal cost to the host. An apparent increase in activity and wandering behaviors precedes a final summiting and biting behavior onto vegetation, which positions the manipulated ant in a site beneficial for fungal growth and transmission. We investigated the genetic underpinnings of host manipulation by: (i) producing a high-quality hybrid assembly and annotation of the Ophiocordyceps camponoti-floridani genome, (ii) conducting laboratory infections coupled with RNAseq of O. camponoti-floridani and its host, Camponotus floridanus, and (iii) comparing these data to RNAseq data of Ophiocordyceps kimflemingiae and Camponotus castaneus as a powerful method to identify gene expression patterns that suggest shared behavioral manipulation mechanisms across Ophiocordyceps-ant species interactions. We propose differentially expressed genes tied to ant neurobiology, odor response, circadian rhythms, and foraging behavior may result by activity of putative fungal effectors such as enterotoxins, aflatrem, and mechanisms disrupting feeding behaviors in the ant.

Journal ArticleDOI
TL;DR: For all versions of BEAGLE, the parameter ne (effective population size) had a major effect on the error rate for imputation of ungenotyped markers, reducing error rates by up to 98.5%.
Abstract: Imputation is one of the key steps in the preprocessing and quality control protocol of any genetic study. Most imputation algorithms were originally developed for the use in human genetics and thus are optimized for a high level of genetic diversity. Different versions of BEAGLE were evaluated on genetic datasets of doubled haploids of two European maize landraces, a commercial breeding line and a diversity panel in chicken, respectively, with different levels of genetic diversity and structure which can be taken into account in BEAGLE by parameter tuning. Especially for phasing BEAGLE 5.0 outperformed the newest version (5.1) which in turn also lead to improved imputation. Earlier versions were far more dependent on the adaption of parameters in all our tests. For all versions, the parameter ne (effective population size) had a major effect on the error rate for imputation of ungenotyped markers, reducing error rates by up to 98.5%. Further improvement was obtained by tuning of the parameters affecting the structure of the haplotype cluster that is used to initialize the underlying Hidden Markov Model of BEAGLE. The number of markers with extremely high error rates for the maize datasets were more than halved by the use of a flint reference genome (F7, PE0075 etc.) instead of the commonly used B73. On average, error rates for imputation of ungenotyped markers were reduced by 8.5% by excluding genetically distant individuals from the reference panel for the chicken diversity panel. To optimize imputation accuracy one has to find a balance between representing as much of the genetic diversity as possible while avoiding the introduction of noise by including genetically distant individuals.

Journal ArticleDOI
TL;DR: SNP-CRISPR has a wide range of potential research applications in model systems and for design of sgRNAs for disease-associated variant correction, and the tool computes efficiency and specificity scores for sgRNA designs targeting both the variant and the reference.
Abstract: CRISPR-Cas9 is a powerful genome editing technology in which a single guide RNA (sgRNA) confers target site specificity to achieve Cas9-mediated genome editing. Numerous sgRNA design tools have been developed based on reference genomes for humans and model organisms. However, existing resources are not optimal as genetic mutations or single nucleotide polymorphisms (SNPs) within the targeting region affect the efficiency of CRISPR-based approaches by interfering with guide-target complementarity. To facilitate identification of sgRNAs (1) in non-reference genomes, (2) across varying genetic backgrounds, or (3) for specific targeting of SNP-containing alleles, for example, disease relevant mutations, we developed a web tool, SNP-CRISPR (https://www.flyrnai.org/tools/snp_crispr/). SNP-CRISPR can be used to design sgRNAs based on public variant data sets or user-identified variants. In addition, the tool computes efficiency and specificity scores for sgRNA designs targeting both the variant and the reference. Moreover, SNP-CRISPR provides the option to upload multiple SNPs and target single or multiple nearby base changes simultaneously with a single sgRNA design. Given these capabilities, SNP-CRISPR has a wide range of potential research applications in model systems and for design of sgRNAs for disease-associated variant correction.

Journal ArticleDOI
TL;DR: A newly generated de novo genome assembly of G. longicalyx is reported, which leveraged a combination of PacBio long-read technology, Hi-C chromatin conformation capture, and BioNano optical mapping to achieve a chromosome level assembly.
Abstract: Cotton is an important crop that has made significant gains in production over the last century. Emerging pests such as the reniform nematode have threatened cotton production. The rare African diploid species Gossypium longicalyx is a wild species that has been used as an important source of reniform nematode immunity. While mapping and breeding efforts have made some strides in transferring this immunity to the cultivated polyploid species, the complexities of interploidal transfer combined with substantial linkage drag have inhibited progress in this area. Moreover, this species shares its most recent common ancestor with the cultivated A-genome diploid cottons, thereby providing insight into the evolution of long, spinnable fiber. Here we report a newly generated de novo genome assembly of G. longicalyx This high-quality genome leveraged a combination of PacBio long-read technology, Hi-C chromatin conformation capture, and BioNano optical mapping to achieve a chromosome level assembly. The utility of the G. longicalyx genome for understanding reniform immunity and fiber evolution is discussed.

Journal ArticleDOI
TL;DR: STAR globally produces more genes and higher gene-expression values, compared to Kallisto, as well as Bowtie2, another popular alignment method for bulk RNA-Seq, and STAR also yields higher correlations of the Gini index for the genes with RNA-FISH validation results.
Abstract: Alignment of scRNA-Seq data are the first and one of the most critical steps of the scRNA-Seq analysis workflow, and thus the choice of proper aligners is of paramount importance. Recently, STAR an alignment method and Kallisto a pseudoalignment method have both gained a vast amount of popularity in the single cell sequencing field. However, an unbiased third-party comparison of these two methods in scRNA-Seq is lacking. Here we conduct a systematic comparison of them on a variety of Drop-seq, Fluidigm and 10x genomics data, from the aspects of gene abundance, alignment accuracy, as well as computational speed and memory use. We observe that STAR globally produces more genes and higher gene-expression values, compared to Kallisto, as well as Bowtie2, another popular alignment method for bulk RNA-Seq. STAR also yields higher correlations of the Gini index for the genes with RNA-FISH validation results. Using 10x genomics PBMC 3K scRNA-Seq and mouse cortex single nuclei RNA-Seq data, STAR shows similar or better cell-type annotation results, by detecting a larger subset of known gene markers. However, the gain of accuracy and gene abundance of STAR alignment comes with the price of significantly slower computation time (4 folds) and more memory (7.7 folds), compared to Kallisto.

Journal ArticleDOI
TL;DR: The R-package MoBPS provides pre-implemented functions for common breeding practices such as optimum genetic contributions and single-step GBLUP but also allows the user to replace certain steps with personalized and/or self-written solutions.
Abstract: The R-package MoBPS provides a computationally efficient and flexible framework to simulate complex breeding programs and compare their economic and genetic impact. Simulations are performed on the base of individuals. MoBPS utilizes a highly efficient implementation with bit-wise data storage and matrix multiplications from the associated R-package miraculix allowing to handle large scale populations. Individual haplotypes are not stored but instead automatically derived based on points of recombination and mutations. The modular structure of MoBPS allows to combine rather coarse simulations, as needed to generate founder populations, with a very detailed modeling of todays’ complex breeding programs, making use of all available biotechnologies. MoBPS provides pre-implemented functions for common breeding practices such as optimum genetic contributions and single-step GBLUP but also allows the user to replace certain steps with personalized and/or self-written solutions.

Journal ArticleDOI
TL;DR: This study confirmed by diagnostic resistance gene enrichment sequencing (dRenSeq) that the resistance of 03112-233 toward 90128 is most likely based on a distinct new R gene(s), and revealed that the compatible interaction caused higher induction of susceptibility genes such as SWEET compared with the incompatible interaction.
Abstract: Late blight, caused by Phytophthora infestans (P. infestans), is a devastating disease in potato worldwide. Our previous study revealed that the Solanum andigena genotype 03112-233 is resistant to P. infestans isolate 90128, but susceptible to the super race isolate, CN152. In this study, we confirmed by diagnostic resistance gene enrichment sequencing (dRenSeq) that the resistance of 03112-233 toward 90128 is most likely based on a distinct new R gene(s). To gain an insight into the mechanism that governs resistance or susceptibility in 03112-223, comparative transcriptomic profiling analysis based on RNAseq was initiated. Changes in transcription at two time points (24 h and 72 h) after inoculation with isolates 90128 or CN152 were analyzed. A total of 8,881 and 7,209 genes were differentially expressed in response to 90128 and CN152, respectively, and 1,083 differentially expressed genes (DEGs) were common to both time points and isolates. A substantial number of genes were differentially expressed in an isolate-specific manner with 3,837 genes showing induction or suppression following infection with 90128 and 2,165 genes induced or suppressed after colonization by CN152. Hierarchical clustering analysis suggested that isolates with different virulence profiles can induce different defense responses at different time points. Further analysis revealed that the compatible interaction caused higher induction of susceptibility genes such as SWEET compared with the incompatible interaction. The salicylic acid, jasmonic acid, and abscisic acid mediated signaling pathways were involved in the response against both isolates, while ethylene and brassinosteroids mediated defense pathways were suppressed. Our results provide a valuable resource for understanding the interactions between P. infestans and potato.

Journal ArticleDOI
TL;DR: An analytical model of performances in mixture was set up and selection in pure stands appeared efficient within a limited range of genetic correlations between pure stand performance and mixture model effects.
Abstract: In a context of increasing environmental challenges, there is an emerging demand for plant cultivars that are adapted to cultivation in species mixture. It is thus pressing to look for the optimization of selection schemes to grow species mixtures, and especially recurrent selection schemes which are at the core of the improvement of many plant species. We considered the case of two populations from different species to be improved by recurrent selection for their performances in mixture. We set up an analytical model of performances in mixture. We expressed the expected responses of the performances in mixture to one cycle of selection in the case of a Reciprocal Mixture Ability selection scheme and of two parallel selection schemes aiming to improve General Mixture Abilities or performances in pure stands. We numerically compared these selection schemes when half-sib or topcross progeny families of selection candidates are tested in mixture. Selection in pure stands appeared efficient within a limited range of genetic correlations between pure stand performance and mixture model effects. The Reciprocal Mixture Ability selection scheme was expected to be less efficient than parallel selections for General Mixture Ability in some situations. The last option enables to control the ratio of expected responses of species contributions to the mixture performance without bias when using selection indices. When more than two species are be improved for their performances in mixture, the advantage of parallel selections for General Mixture Ability is even more marked, providing that compensation trends between species are not too prevalent.

Journal ArticleDOI
TL;DR: The high continuity of the ME034V genome assembly validates the utility of ultra-long DNA sequencing to improve genetic resources for emerging model organisms and illustrates the importance of obtaining the proper genome reference for genetic experiments.
Abstract: Setaria viridis (green foxtail) is an important model system for improving cereal crops due to its diploid genome, ease of cultivation, and use of C4 photosynthesis. The S. viridis accession ME034V is exceptionally transformable, but the lack of a sequenced genome for this accession has limited its utility. We present a 397 Mb highly contiguous de novo assembly of ME034V using ultra-long nanopore sequencing technology (read N50 = 41kb). We estimate that this genome is largely complete based on our updated k-mer based genome size estimate of 401 Mb for S. viridis Genome annotation identified 37,908 protein-coding genes and >300k repetitive elements comprising 46% of the genome. We compared the ME034V assembly with two other previously sequenced Setaria genomes as well as to a diversity panel of 235 S. viridis accessions. We found the genome assemblies to be largely syntenic, but numerous unique polymorphic structural variants were discovered. Several ME034V deletions may be associated with recent retrotransposition of copia and gypsy LTR repeat families, as evidenced by their low genotype frequencies in the sampled population. Lastly, we performed a phylogenomic analysis to identify gene families that have expanded in Setaria, including those involved in specialized metabolism and plant defense response. The high continuity of the ME034V genome assembly validates the utility of ultra-long DNA sequencing to improve genetic resources for emerging model organisms. Structural variation present in Setaria illustrates the importance of obtaining the proper genome reference for genetic experiments. Thus, we anticipate that the ME034V genome will be of significant utility for the Setaria research community.

Journal ArticleDOI
TL;DR: It is concluded that Vgll3, Akap11 and Six6 may influence Atlantic salmon maturation timing via affecting adipogenesis and gametogenesis by regulating cell fate commitment and the HPG axis.
Abstract: Despite recent taxonomic diversification in studies linking genotype with phenotype, follow-up studies aimed at understanding the molecular processes of such genotype-phenotype associations remain rare. The age at which an individual reaches sexual maturity is an important fitness trait in many wild species. However, the molecular mechanisms regulating maturation timing processes remain obscure. A recent genome-wide association study in Atlantic salmon (Salmo salar) identified large-effect age-at-maturity-associated chromosomal regions including genes vgll3, akap11 and six6, which have roles in adipogenesis, spermatogenesis and the hypothalamic-pituitary-gonadal (HPG) axis, respectively. Here, we determine expression patterns of these genes during salmon development and their potential molecular partners and pathways. Using Nanostring transcription profiling technology, we show development- and tissue-specific mRNA expression patterns for vgll3, akap11 and six6 Correlated expression levels of vgll3 and akap11, which have adjacent chromosomal location, suggests they may have shared regulation. Further, vgll3 correlating with arhgap6 and yap1, and akap11 with lats1 and yap1 suggests that Vgll3 and Akap11 take part in actin cytoskeleton regulation. Tissue-specific expression results indicate that vgll3 and akap11 paralogs have sex-dependent expression patterns in gonads. Moreover, six6 correlating with slc38a6 and rtn1, and Hippo signaling genes suggests that Six6 could have a broader role in the HPG neuroendrocrine and cell fate commitment regulation, respectively. We conclude that Vgll3, Akap11 and Six6 may influence Atlantic salmon maturation timing via affecting adipogenesis and gametogenesis by regulating cell fate commitment and the HPG axis. These results may help to unravel general molecular mechanisms behind maturation.

Journal ArticleDOI
TL;DR: TM3′seq is a 3′-enriched library preparation protocol that uses Tn5 transposase and preserves sample identity at each step, and is designed for high-throughput processing of individual samples at a fraction of the cost of commercial kits.
Abstract: RNA-seq has become the standard tool for collecting genome-wide expression data in diverse fields, from quantitative genetics and medical genomics to ecology and developmental biology. However, RNA-seq library preparation is still prohibitive for many laboratories. Recently, the field of single-cell transcriptomics has reduced costs and increased throughput by adopting early barcoding and pooling of individual samples -producing a single final library containing all samples. In contrast, RNA-seq protocols where each sample is processed individually are significantly more expensive and lower throughput than single-cell approaches. Yet, many projects depend on individual library generation to preserve important samples or for follow-up re-sequencing experiments. Improving on currently available RNA-seq methods we have developed TM3'seq, a 3'-enriched library preparation protocol that uses Tn5 transposase and preserves sample identity at each step. TM3'seq is designed for high-throughput processing of individual samples (96 samples in 6h, with only 3h hands-on time) at a fraction of the cost of commercial kits ($1.5 per sample). The protocol was tested in a range of human and Drosophila melanogaster RNA samples, recovering transcriptomes of the same quality and reliability than the commercial NEBNext kit. We expect that the cost- and time-efficient features of TM3'seq make large-scale RNA-seq experiments more permissive for the entire scientific community.