scispace - formally typeset
Search or ask a question

Showing papers in "G3: Genes, Genomes, Genetics in 2021"


Journal ArticleDOI
TL;DR: The AlphaSimR as discussed by the authors is an R package for simulating a wide range of plant and animal breeding programs for diploid and autopolyploid species, which is ideal for testing the overall strategy and detailed design of breeding programs.
Abstract: This paper introduces AlphaSimR, an R package for stochastic simulations of plant and animal breeding programs. AlphaSimR is a highly flexible software package able to simulate a wide range of plant and animal breeding programs for diploid and autopolyploid species. AlphaSimR is ideal for testing the overall strategy and detailed design of breeding programs. AlphaSimR utilizes a scripting approach to building simulations that is particularly well suited for modeling highly complex breeding programs, such as commercial breeding programs. The primary benefit of this scripting approach is that it frees users from preset breeding program designs and allows them to model nearly any breeding program design. This paper lists the main features of AlphaSimR and provides a brief example simulation to show how to use the software.

77 citations


Journal ArticleDOI
TL;DR: The EnvRtype R package as mentioned in this paper is a toolkit developed to interplay large-scale enviro-typing data (enviromics) into quantitative genomics.
Abstract: Envirotyping is an essential technique used to unfold the nongenetic drivers associated with the phenotypic adaptation of living organisms. Here, we introduce the EnvRtype R package, a novel toolkit developed to interplay large-scale envirotyping data (enviromics) into quantitative genomics. To start a user-friendly envirotyping pipeline, this package offers: (1) remote sensing tools for collecting (get_weather and extract_GIS functions) and processing ecophysiological variables (processWTH function) from raw environmental data at single locations or worldwide; (2) environmental characterization by typing environments and profiling descriptors of environmental quality (env_typing function), in addition to gathering environmental covariables as quantitative descriptors for predictive purposes (W_matrix function); and (3) identification of environmental similarity that can be used as an enviromic-based kernel (env_typing function) in whole-genome prediction (GP), aimed at increasing ecophysiological knowledge in genomic best-unbiased predictions (GBLUP) and emulating reaction norm effects (get_kernel and kernel_model functions). We highlight literature mining concepts in fine-tuning envirotyping parameters for each plant species and target growing environments. We show that envirotyping for predictive breeding collects raw data and processes it in an eco-physiologically smart way. Examples of its use for creating global-scale envirotyping networks and integrating reaction-norm modeling in GP are also outlined. We conclude that EnvRtype provides a cost-effective envirotyping pipeline capable of providing high quality enviromic data for a diverse set of genomic-based studies, especially for increasing accuracy in GP across untested growing environments.

39 citations


Journal ArticleDOI
TL;DR: Popvae as mentioned in this paper uses variational autoencoders (VAEs) to generate latent embeddings that capture subtle aspects of population structure in humans and Anopheles mosquitoes, and generate artificial genotypes characteristic of a given sample or population.
Abstract: Dimensionality reduction is a common tool for visualization and inference of population structure from genotypes, but popular methods either return too many dimensions for easy plotting (PCA) or fail to preserve global geometry (t-SNE and UMAP). Here we explore the utility of variational autoencoders (VAEs)-generative machine learning models in which a pair of neural networks seek to first compress and then recreate the input data-for visualizing population genetic variation. VAEs incorporate nonlinear relationships, allow users to define the dimensionality of the latent space, and in our tests preserve global geometry better than t-SNE and UMAP. Our implementation, which we call popvae, is available as a command-line python program at github.com/kr-colab/popvae. The approach yields latent embeddings that capture subtle aspects of population structure in humans and Anopheles mosquitoes, and can generate artificial genotypes characteristic of a given sample or population.

39 citations


Journal ArticleDOI
TL;DR: In this paper, the authors curated and analyzed genotypic and phenotypic data on 1918 maize (Zea mays L.) hybrids and environmental data from 65 testing environments, and the resulting models can be used for genomic prediction of mean hybrid performance across populations of environments tested or for environment-specific predictions.
Abstract: High-dimensional and high-throughput genomic, field performance, and environmental data are becoming increasingly available to crop breeding programs, and their integration can facilitate genomic prediction within and across environments and provide insights into the genetic architecture of complex traits and the nature of genotype-by-environment interactions. To partition trait variation into additive and dominance (main effect) genetic and corresponding genetic-by-environment variances, and to identify specific environmental factors that influence genotype-by-environment interactions, we curated and analyzed genotypic and phenotypic data on 1918 maize (Zea mays L.) hybrids and environmental data from 65 testing environments. For grain yield, dominance variance was similar in magnitude to additive variance, and genetic-by-environment variances were more important than genetic main effect variances. Models involving both additive and dominance relationships best fit the data and modeling unique genetic covariances among all environments provided the best characterization of the genotype-by-environment interaction patterns. Similarity of relative hybrid performance among environments was modeled as a function of underlying weather variables, permitting identification of weather covariates driving correlations of genetic effects across environments. The resulting models can be used for genomic prediction of mean hybrid performance across populations of environments tested or for environment-specific predictions. These results can also guide efforts to incorporate high-throughput environmental data into genomic prediction models and predict values in new environments characterized with the same environmental characteristics.

36 citations


Journal ArticleDOI
TL;DR: In this paper, the Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data, which was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the arlee line.
Abstract: Currently, there is still a need to improve the contiguity of the rainbow trout reference genome and to use multiple genetic backgrounds that will represent the genetic diversity of this species. The Arlee doubled haploid line was originated from a domesticated hatchery strain that was originally collected from the northern California coast. The Canu pipeline was used to generate the Arlee line genome de-novo assembly from high coverage PacBio long-reads sequence data. The assembly was further improved with Bionano optical maps and Hi-C proximity ligation sequence data to generate 32 major scaffolds corresponding to the karyotype of the Arlee line (2 N = 64). It is composed of 938 scaffolds with N50 of 39.16 Mb and a total length of 2.33 Gb, of which ∼95% was in 32 chromosome sequences with only 438 gaps between contigs and scaffolds. In rainbow trout the haploid chromosome number can vary from 29 to 32. In the Arlee karyotype the haploid chromosome number is 32 because chromosomes Omy04, 14 and 25 are divided into six acrocentric chromosomes. Additional structural variations that were identified in the Arlee genome included the major inversions on chromosomes Omy05 and Omy20 and additional 15 smaller inversions that will require further validation. This is also the first rainbow trout genome assembly that includes a scaffold with the sex-determination gene (sdY) in the chromosome Y sequence. The utility of this genome assembly is shown through the improved annotation of the duplicated genome loci that harbor the IGH genes on chromosomes Omy12 and Omy13.

31 citations


Journal ArticleDOI
TL;DR: In this article, the authors explore the phylogenetic stability of nematode chromosomes using a new telomere-to-telomere assembly of the rhabditine Oscheius tipulae generated from nanopore long reads.
Abstract: Eukaryotic chromosomes have phylogenetic persistence In many taxa, each chromosome has a single functional centromere with essential roles in spindle attachment and segregation Fusion and fission can generate chromosomes with no or multiple centromeres, leading to genome instability Groups with holocentric chromosomes (where centromeric function is distributed along each chromosome) might be expected to show karyotypic instability This is generally not the case, and in Caenorhabditis elegans, it has been proposed that the role of maintenance of a stable karyotype has been transferred to the meiotic pairing centers, which are found at one end of each chromosome Here, we explore the phylogenetic stability of nematode chromosomes using a new telomere-to-telomere assembly of the rhabditine nematode Oscheius tipulae generated from nanopore long reads The 60-Mb O tipulae genome is resolved into six chromosomal molecules We find the evidence of specific chromatin diminution at all telomeres Comparing this chromosomal O tipulae assembly with chromosomal assemblies of diverse rhabditid nematodes, we identify seven ancestral chromosomal elements (Nigon elements) and present a model for the evolution of nematode chromosomes through rearrangement and fusion of these elements We identify frequent fusion events involving NigonX, the element associated with the rhabditid X chromosome, and thus sex chromosome-associated gene sets differ markedly between species Despite the karyotypic stability, gene order within chromosomes defined by Nigon elements is not conserved Our model for nematode chromosome evolution provides a platform for investigation of the tensions between local genome rearrangement and karyotypic evolution in generating extant genome architectures

28 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compare the performance of GWAS meta-analyses using a strict P-value threshold of 5'×'10-8' to other multiple testing strategies: (1) less stringent Pvalue thresholds, (2) controlling the FDR with the Benjamini-Hochberg and Benjamani-Yekutieli procedure, and (3) controlling Bayesian FDR with posterior probabilities.
Abstract: Over the last decade, GWAS meta-analyses have used a strict P-value threshold of 5 × 10-8 to classify associations as significant. Here, we use our current understanding of frequently studied traits including lipid levels, height, and BMI to revisit this genome-wide significance threshold. We compare the performance of studies using the P = 5 × 10-8 threshold in terms of true and false positive rate to other multiple testing strategies: (1) less stringent P-value thresholds, (2) controlling the FDR with the Benjamini-Hochberg and Benjamini-Yekutieli procedure, and (3) controlling the Bayesian FDR with posterior probabilities. We applied these procedures to re-analyze results from the Global Lipids and GIANT GWAS meta-analysis consortia and supported them with extensive simulation that mimics the empirical data. We observe in simulated studies with sample sizes ∼20,000 and >120,000 that relaxing the P-value threshold to 5 × 10-7 increased discovery at the cost of 18% and 8% of additional loci being false positive results, respectively. FDR and Bayesian FDR are well controlled for both sample sizes with a few exceptions that disappear under a less stringent definition of true positives and the two approaches yield similar results. Our work quantifies the value of using a relaxed P-value threshold in large studies to increase their true positive discovery but also show the excess false positive rates due to such actions in modest-sized studies. These results may guide investigators considering different thresholds in replication studies and downstream work such as gene-set enrichment or pathway analysis. Finally, we demonstrate the viability of FDR-controlling procedures in GWAS.

27 citations


Journal ArticleDOI
TL;DR: In this article, a high-quality chromosome-scale genome assembly of the Black Soldier fly (BSF) using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology is presented.
Abstract: Hermetia illucens L. (Diptera: Stratiomyidae), the Black Soldier Fly (BSF) is an increasingly important species for bioconversion of organic material into animal feed. We generated a high-quality chromosome-scale genome assembly of the BSF using Pacific Bioscience, 10X Genomics linked read and high-throughput chromosome conformation capture sequencing technology. Scaffolding the final assembly with Hi-C data produced a highly contiguous 1.01 Gb genome with 99.75% of scaffolds assembled into pseudochromosomes representing seven chromosomes with 16.01 Mb contig and 180.46 Mb scaffold N50 values. The highly complete genome obtained a Benchmarking Universal Single-Copy Orthologs (BUSCO) completeness of 98.6%. We masked 67.32% of the genome as repetitive sequences and annotated a total of 16,478 protein-coding genes using the BRAKER2 pipeline. We analyzed an established lab population to investigate the genomic variation and architecture of the BSF revealing six autosomes and an X chromosome. Additionally, we estimated the inbreeding coefficient (1.9%) of the lab population by assessing runs of homozygosity. This provided evidence for inbreeding events including long runs of homozygosity on chromosome 5. The release of this novel chromosome-scale BSF genome assembly will provide an improved resource for further genomic studies, functional characterization of genes of interest and genetic modification of this economically important species.

25 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used single-molecule real-time reads combined with optical maps to reconstruct the two haplotypes of each of the 20 M. rotundifolia cv. trayshed chromosomes.
Abstract: Muscadinia rotundifolia, the muscadine grape, has been cultivated for centuries in the southeastern United States. M. rotundifolia is resistant to many of the pathogens that detrimentally affect Vitis vinifera, the grape species commonly used for winemaking. For this reason, M. rotundifolia is a valuable genetic resource for breeding. Single-molecule real-time reads were combined with optical maps to reconstruct the two haplotypes of each of the 20 M. rotundifolia cv. Trayshed chromosomes. The completeness and accuracy of the assembly were confirmed using a high-density linkage map. Protein-coding genes were annotated using an integrated and comprehensive approach. This included using full-length cDNA sequencing (Iso-Seq) to improve gene structure and hypothetical spliced variant predictions. Our data strongly support that Muscadinia chromosomes 7 and 20 are fused in Vitis and pinpoint the location of the fusion in Cabernet Sauvignon and PN40024 chromosome 7. Disease-related gene numbers in Trayshed and Cabernet Sauvignon were similar, but their clustering locations were different. A dramatic expansion of the Toll/Interleukin-1 Receptor-like Nucleotide-Binding Site Leucine-Rich Repeat (TIR-NBS-LRR) class was detected on Trayshed chromosome 12 at the Resistance to Uncinula necator 1 (RUN1)/Resistance to Plasmopara viticola 1 (RPV1) locus, which confers strong dominant resistance to powdery and downy mildews. A genome browser, annotation, and Blast tool for Trayshed are available at www.grapegenomics.com.

24 citations


Journal ArticleDOI
TL;DR: In this paper, the onion line DHCU066619 was assembled into 14.9 Gb with an N50 of 464 Kb, of which 2.4 Gb was ordered into eight pseudomolecules using four genetic linkage maps and the remainder of the genome is available in 89.6 K scaffolds.
Abstract: Onion is an important vegetable crop with an estimated genome size of 16 Gb. We describe the de novo assembly and ab initio annotation of the genome of a doubled haploid onion line DHCU066619, which resulted in a final assembly of 14.9 Gb with an N50 of 464 Kb. Of this, 2.4 Gb was ordered into eight pseudomolecules using four genetic linkage maps. The remainder of the genome is available in 89.6 K scaffolds. Only 72.4% of the genome could be identified as repetitive sequences and consist, to a large extent, of (retro) transposons. In addition, an estimated 20% of the putative (retro) transposons had accumulated a large number of mutations, hampering their identification, but facilitating their assembly. These elements are probably already quite old. The ab initio gene prediction indicated 540,925 putative gene models, which is far more than expected, possibly due to the presence of pseudogenes. Of these models, 47,066 showed RNASeq support. No gene rich regions were found, genes are uniformly distributed over the genome. Analysis of synteny with Allium sativum (garlic) showed collinearity but also major rearrangements between both species. This assembly is the first high-quality genome sequence available for the study of onion and will be a valuable resource for further research.

22 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared de novo TE annotations and repeat-induced point mutation signatures in 26 genomes from the Zymoseptoria species-complex and assessed the relative insertion ages of TEs using a comparative genomics approach.
Abstract: Transposable elements (TEs) impact genome plasticity, architecture, and evolution in fungal plant pathogens. The wide range of TE content observed in fungal genomes reflects diverse efficacy of host-genome defense mechanisms that can counter-balance TE expansion and spread. Closely related species can harbor drastically different TE repertoires. The evolution of fungal effectors, which are crucial determinants of pathogenicity, has been linked to the activity of TEs in pathogen genomes. Here, we describe how TEs have shaped genome evolution of the fungal wheat pathogen Zymoseptoria tritici and four closely related species. We compared de novo TE annotations and repeat-induced point mutation signatures in 26 genomes from the Zymoseptoria species-complex. Then, we assessed the relative insertion ages of TEs using a comparative genomics approach. Finally, we explored the impact of TE insertions on genome architecture and plasticity. The 26 genomes of Zymoseptoria species reflect different TE dynamics with a majority of recent insertions. TEs associate with accessory genome compartments, with chromosomal rearrangements, with gene presence/absence variation, and with effectors in all Zymoseptoria species. We find that the extent of RIP-like signatures varies among Z. tritici genomes compared to genomes of the sister species. The detection of a reduction of RIP-like signatures and TE recent insertions in Z. tritici reflects ongoing but still moderate TE mobility.

Journal ArticleDOI
TL;DR: In this article, the behavior of Brassica napus during meiosis was compared by unambiguous chromosome identification between resynthesized and natural B. napus, and it was shown that resyntheized lines show high rates of nonhomologous centromere association, homoeologous recombination leading to translocation, homologous chromosome replacement, and association and breakage of 45S rDNA loci.
Abstract: Homoeologous recombination, aneuploidy, and other genetic changes are common in resynthesized allopolyploid Brassica napus. In contrast, the chromosomes of cultivars have long been considered to be meiotically stable. To gain a better understanding of the underlying mechanisms leading to stabilization in the allopolyploid, the behavior of chromosomes during meiosis can be compared by unambiguous chromosome identification between resynthesized and natural B. napus. Compared with natural B. napus, resynthesized lines show high rates of nonhomologous centromere association, homoeologous recombination leading to translocation, homoeologous chromosome replacement, and association and breakage of 45S rDNA loci. In both natural and resynthesized B. napus, we observed low rates of univalents, A-C bivalents, and early sister chromatid separations. Reciprocal homoeologous chromosome exchanges and double reductions were photographed for the first time in meiotic telophase I. Meiotic errors were non-uniformly distributed across the genome in resynthesized B. napus, and in particular homoeologs sharing synteny along their entire length exhibited multivalents at diakinesis and polysomic inheritance at telophase I. Natural B. napus appeared to resolve meiotic errors mainly by suppressing homoeologous pairing, resolving nonhomologous centromere associations and 45S rDNA associations before diakinesis, and reducing homoeologous cross-overs.

Journal ArticleDOI
TL;DR: In this article, a comprehensive, tissue-specific investigation of Drosophila melanogaster FRT gene expression before and after mating was performed, identifying expression profiles that distinguished each tissue, including major differences between glandular or primarily nonglandular epithelium.
Abstract: Sexual reproduction in internally fertilizing species requires complex coordination between female and male reproductive systems and among the diverse tissues of the female reproductive tract (FRT). Here, we report a comprehensive, tissue-specific investigation of Drosophila melanogaster FRT gene expression before and after mating. We identified expression profiles that distinguished each tissue, including major differences between tissues with glandular or primarily nonglandular epithelium. All tissues were enriched for distinct sets of genes possessing secretion signals that exhibited accelerated evolution, as might be expected for genes participating in molecular interactions between the sexes within the FRT extracellular environment. Despite robust transcriptional differences between tissues, postmating responses were dominated by coordinated transient changes indicative of an integrated systems-level functional response. This comprehensive characterization of gene expression throughout the FRT identifies putative female contributions to postcopulatory events critical to reproduction and potentially reproductive isolation, as well as the putative targets of sexual selection and conflict.

Journal ArticleDOI
TL;DR: In this article, the authors present a carbon partitioning nested association mapping population generated by crossing 11 diverse founder lines with Grassl as the single recurrent female, and compare this nested association map population with an existing grain population generated using Tx430 as the recurrent female.
Abstract: Sorghum bicolor, a photosynthetically efficient C4 grass, represents an important source of grain, forage, fermentable sugars, and cellulosic fibers that can be utilized in myriad applications ranging from bioenergy to bioindustrial feedstocks. Sorghum's efficient fixation of carbon per unit time per unit area per unit input has led to its classification as a preferred biomass crop highlighted by its designation as an advanced biofuel by the U.S. Department of Energy. Due to its extensive genetic diversity and worldwide colonization, sorghum has considerable diversity for a range of phenotypes influencing productivity, composition, and sink/source dynamics. To dissect the genetic basis of these key traits, we present a sorghum carbon-partitioning nested association mapping population generated by crossing 11 diverse founder lines with Grassl as the single recurrent female. By exploiting existing variation among cellulosic, forage, sweet and grain sorghum carbon partitioning regimes, the sorghum carbon-partitioning nested association mapping population will allow the identification of important biomass-associated traits, elucidate the genetic architecture underlying carbon partitioning and improve our understanding of the genetic determinants affecting unique phenotypes within Poaceae. We contrast this nested association mapping population with an existing grain population generated using Tx430 as the recurrent female. Genotypic data are assessed for quality by examining variant density, nucleotide diversity, linkage decay, and is validated using pericarp and testa phenotypes to map known genes affecting these phenotypes. We release the 11-family nested association mapping population along with corresponding genomic data for use in genetic, genomic, and agronomic studies with a focus on carbon-partitioning regimes.

Journal ArticleDOI
TL;DR: These insights shed light on the evolution of the developmental toolkit in arachnopulmonates, highlight the importance of the comparative approach within lineages, and provide substantial new transcriptomic data for future study.
Abstract: Whole-genome duplications (WGDs) have occurred multiple times during animal evolution, including in lineages leading to vertebrates, teleosts, horseshoe crabs, and arachnopulmonates. These dramatic events initially produce a wealth of new genetic material, generally followed by extensive gene loss. It appears, however, that developmental genes such as homeobox genes, signaling pathway components and microRNAs are frequently retained as duplicates (so-called ohnologs) following WGD. These not only provide the best evidence for WGD, but an opportunity to study its evolutionary consequences. Although these genes are well studied in the context of vertebrate WGD, similar comparisons across the extant arachnopulmonate orders are patchy. We sequenced embryonic transcriptomes from two spider species and two amblypygid species and surveyed three important gene families, Hox, Wnt, and frizzled, across these and 12 existing transcriptomic and genomic resources for chelicerates. We report extensive retention of putative ohnologs, further supporting the ancestral arachnopulmonate WGD. We also found evidence of consistent evolutionary trajectories in Hox and Wnt gene repertoires across three of the six arachnopulmonate orders, with interorder variation in the retention of specific paralogs. We identified variation between major clades in spiders and are better able to reconstruct the chronology of gene duplications and losses in spiders, amblypygids, and scorpions. These insights shed light on the evolution of the developmental toolkit in arachnopulmonates, highlight the importance of the comparative approach within lineages, and provide substantial new transcriptomic data for future study.

Journal ArticleDOI
TL;DR: In this article, the authors applied TurboID-mediated biotinylation in a wide range of developmental stages and tissues, and demonstrate the feasibility of TurboID mediated labeling system in desired cell types.
Abstract: The protein-protein interaction (PPI) is a basic strategy for life to operate. The analysis of PPIs in multicellular organisms is very important but extremely challenging because PPIs are particularly dynamic and variable among different development stages, tissues, cells, and even organelles. Therefore, understanding PPI needs a good resolution of time and space. More importantly, understanding in vivo PPI needs to be realized in situ. Proximity-based biotinylation combined with mass spectrometry (MS) has emerged as a powerful approach to study PPI networks and protein subcellular compartmentation. TurboID, the newly engineered promiscuous ligase, has been reported to label proximate proteins effectively in various species. In Drosophila, we systematically apply TurboID-mediated biotinylation in a wide range of developmental stages and tissues, and demonstrate the feasibility of TurboID-mediated labeling system in desired cell types. For a proof-of-principle, we use the TurboID-mediated biotinylation coupled with MS to distinguish CTP synthase with or without the ability to form filamentous cytoophidia, retrieving two distinct sets of proximate proteomes. Therefore, this makes it possible to map PPIs in vivo and in situ at a defined spatiotemporal resolution, and demonstrates a referable resource for cytoophidium proteome in Drosophila.

Journal ArticleDOI
TL;DR: Access to the genome of M. speciosa will facilitate an improved understanding of alkaloid biosynthesis and accelerate production of bioactive alkaloids in heterologous hosts.
Abstract: Mitragyna speciosa (kratom) produces numerous compounds with pharmaceutical properties including the production of bioactive monoterpene indole and oxindole alkaloids. Using a linked-read approach, a 1,122,519,462 bp draft assembly of M. speciosa "Rifat" was generated with an N50 scaffold size of 1,020,971 bp and an N50 contig size of 70,448 bp that encodes 55,746 genes. Chromosome counting revealed that "Rifat" is a tetraploid with a base chromosome number of 11, which was further corroborated by orthology and syntenic analysis of the genome. Analysis of genes and clusters involved in specialized metabolism revealed genes putatively involved in alkaloid biosynthesis. Access to the genome of M. speciosa will facilitate an improved understanding of alkaloid biosynthesis and accelerate the production of bioactive alkaloids in heterologous hosts.

Journal ArticleDOI
TL;DR: In this paper, an experimental pipeline that combines high-efficiency CRISPR/Cas9 mutagenesis with functional phenotypic screening to identify genes required for spinal cord repair in adult zebrafish is described.
Abstract: Adult zebrafish are increasingly used to interrogate mechanisms of disease development and tissue regeneration. Yet, the prospect of large-scale genetics in adult zebrafish has traditionally faced a host of biological and technical challenges, including inaccessibility of adult tissues to high-throughput phenotyping and the spatial and technical demands of adult husbandry. Here, we describe an experimental pipeline that combines high-efficiency CRISPR/Cas9 mutagenesis with functional phenotypic screening to identify genes required for spinal cord repair in adult zebrafish. Using CRISPR/Cas9 dual-guide ribonucleic proteins, we show selective and combinatorial mutagenesis of 17 genes at 28 target sites with efficiencies exceeding 85% in adult F0 'crispants'. We find that capillary electrophoresis is a reliable method to measure indel frequencies. Using a quantifiable behavioral assay, we identify seven single- or duplicate-gene crispants with reduced functional recovery after spinal cord injury. To rule out off-target effects, we generate germline mutations that recapitulate the crispant regeneration phenotypes. This study provides a platform that combines high-efficiency somatic mutagenesis with a functional phenotypic readout to perform medium- to large-scale genetic studies in adult zebrafish.

Journal ArticleDOI
TL;DR: In this article, the authors used long-read sequencing to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species.
Abstract: While the cost and time for assembling a genome has drastically decreased, it still remains a challenge to assemble a highly contiguous genome. These challenges are rapidly being overcome by the integration of long-read sequencing technologies. Here, we use long-read sequencing to improve the contiguity of the threespine stickleback fish (Gasterosteus aculeatus) genome, a prominent genetic model species. Using Pacific Biosciences sequencing, we assembled a highly contiguous genome of a freshwater fish from Paxton Lake. Using contigs from this genome, we were able to fill over 76.7% of the gaps in the existing reference genome assembly, improving contiguity over fivefold. Our gap filling approach was highly accurate, validated by 10X Genomics long-distance linked-reads. In addition to closing a majority of gaps, we were able to assemble segments of telomeres and centromeres throughout the genome. This highlights the power of using long sequencing reads to assemble highly repetitive and difficult to assemble regions of genomes. This latest genome build has been released through a newly designed community genome browser that aims to consolidate the growing number of genomics datasets available for the threespine stickleback fish.

Journal ArticleDOI
TL;DR: In this article, a genome-wide association study in the maize Ames panel of nearly 2,000 inbred lines that was imputed with ∼7.7 million SNP markers was conducted to investigate the genetic basis of natural variation for the concentration of 11 elements in grain.
Abstract: Despite its importance to plant function and human health, the genetics underpinning element levels in maize grain remains largely unknown. Through a genome-wide association study in the maize Ames panel of nearly 2,000 inbred lines that was imputed with ∼7.7 million SNP markers, we investigated the genetic basis of natural variation for the concentration of 11 elements in grain. Novel associations were detected for the metal transporter genes rte2 (rotten ear2) and irt1 (iron-regulated transporter1) with boron and nickel, respectively. We also further resolved loci that were previously found to be associated with one or more of five elements (copper, iron, manganese, molybdenum, and/or zinc), with two metal chelator and five metal transporter candidate causal genes identified. The nas5 (nicotianamine synthase5) gene involved in the synthesis of nicotianamine, a metal chelator, was found associated with both zinc and iron and suggests a common genetic basis controlling the accumulation of these two metals in the grain. Furthermore, moderate predictive abilities were obtained for the 11 elemental grain phenotypes with two whole-genome prediction models: Bayesian Ridge Regression (0.33-0.51) and BayesB (0.33-0.53). Of the two models, BayesB, with its greater emphasis on large-effect loci, showed ∼4-10% higher predictive abilities for nickel, molybdenum, and copper. Altogether, our findings contribute to an improved genotype-phenotype map for grain element accumulation in maize.

Journal ArticleDOI
TL;DR: The authors presented a new reference genome for M. sexta, JHU_Msex_v1.0, applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness.
Abstract: The tobacco hornworm, Manduca sexta, is a lepidopteran insect that is used extensively as a model system for studying insect biology, development, neuroscience, and immunity. However, current studies rely on the highly fragmented reference genome Msex_1.0, which was created using now-outdated technologies and is hindered by a variety of deficiencies and inaccuracies. We present a new reference genome for M. sexta, JHU_Msex_v1.0, applying a combination of modern technologies in a de novo assembly to increase continuity, accuracy, and completeness. The assembly is 470 Mb and is ∼20× more continuous than the original assembly, with scaffold N50 > 14 Mb. We annotated the assembly by lifting over existing annotations and supplementing with additional supporting RNA-based data for a total of 25,256 genes. The new reference assembly is accessible in annotated form for public use. We demonstrate that improved continuity of the M. sexta genome improves resequencing studies and benefits future research on M. sexta as a model organism.

Journal ArticleDOI
TL;DR: An updated, near complete, telomere-to-telomere assembly and re-annotation of the eight chromosomes of A. flavus NRRL 3357 genome is described, accomplished via long-read PacBio and Oxford Nanopore technologies combined with Illumina short-read sequencing.
Abstract: Aspergillus flavus is an opportunistic pathogen of crops, including peanuts and maize, and is the second leading cause of aspergillosis in immunocompromised patients. A. flavus is also a major producer of the mycotoxin, aflatoxin, a potent carcinogen, which results in significant crop losses annually. The A. flavus isolate NRRL 3357 was originally isolated from peanut and has been used as a model organism for understanding the regulation and production of secondary metabolites, such as aflatoxin. A draft genome of NRRL 3357 was previously constructed, enabling the development of molecular tools and for understanding population biology of this particular species. Here, we describe an updated, near complete, telomere-to-telomere assembly and re-annotation of the eight chromosomes of A. flavus NRRL 3357 genome, accomplished via long-read PacBio and Oxford Nanopore technologies combined with Illumina short-read sequencing. A total of 13,715 protein-coding genes were predicted. Using RNA-seq data, a significant improvement was achieved in predicted 5' and 3' untranslated regions, which were incorporated into the new gene models.

Journal ArticleDOI
TL;DR: Using CRISPR-mediated homology-directed repair (HDR) to visualize the initial stages of Xanthomonas axonopodis pv. manihotis (Xam) infection in cassava was presented in this article.
Abstract: Research on a few model plant-pathogen systems has benefitted from years of tool and resource development. This is not the case for the vast majority of economically and nutritionally important plants, creating a crop improvement bottleneck. Cassava bacterial blight (CBB), caused by Xanthomonas axonopodis pv. manihotis (Xam), is an important disease in all regions where cassava (Manihot esculenta Crantz) is grown. Here, we describe the development of cassava that can be used to visualize one of the initial steps of CBB infection in vivo. Using CRISPR-mediated homology-directed repair (HDR), we generated plants containing scarless insertion of GFP at the 3' end of CBB susceptibility (S) gene MeSWEET10a. Activation of MeSWEET10a-GFP by the transcription activator-like (TAL) effector TAL20 was subsequently visualized at transcriptional and translational levels. To our knowledge, this is the first such demonstration of HDR via gene editing in cassava.

Journal ArticleDOI
TL;DR: In this paper, the authors used PacBio long reads and Illumina sequencing to assemble and polish a more integrated genome for the oyster mushroom Pleurotus ostreatus.
Abstract: The oyster mushroom Pleurotus ostreatus is a basidiomycete commonly found in the rotten wood and it is one of the most cultivated edible mushrooms globally. Pleurotus ostreatus is also a carnivorous fungus, which can paralyze and kill nematodes within minutes. However, the molecular mechanisms of the predator-prey interactions between P. ostreatus and nematodes remain unclear. PC9 and PC15 are two model strains of P. ostreatus and the genomes of both strains have been sequenced and deposited at the Joint Genome Institute (JGI). These two monokaryotic strains exhibit dramatic differences in growth, but because PC9 grows more robustly in laboratory conditions, it has become the strain of choice for many studies. Despite the fact that PC9 is the common strain for investigation, its genome is fragmentary and incomplete relative to that of PC15. To overcome this problem, we used PacBio long reads and Illumina sequencing to assemble and polish a more integrated genome for PC9. Our PC9 genome assembly, distributed across 17 scaffolds, is highly contiguous and includes five telomere-to-telomere scaffolds, dramatically improving the genome quality. We believe that our PC9 genome resource will be useful to the fungal research community investigating various aspects of P. ostreatus biology.

Journal ArticleDOI
TL;DR: In this paper, the authors used the natural anthocyanin diversity present in a purple corn landrace, Apache Red, to generate a population with variable flavonoid profiles.
Abstract: While maize with anthocyanin-rich pericarp (purple corn) is rising in popularity as a source of natural colorant for foods and beverages, information on color range and stability-factors associated with anthocyanin decorations and compositional profiles-is currently limited. Furthermore, to maximize the scalability and meet growing demands, both anthocyanin concentrations and agronomic performance must improve in purple corn varieties. Using the natural anthocyanin diversity present in a purple corn landrace, Apache Red, we generated a population with variable flavonoid profiles-flavanol-anthocyanin condensed forms (0-83%), acylated anthocyanins (2-72%), pelargonidin-derived anthocyanins (5-99%), C-glycosyl flavone co-pigments up to 1904 µg/g, and with anthocyanin content up to 1598 µg/g. Each aspect of the flavonoid profiles was found to play a role in either the resulting extract hue or intensity. With genotyping-by-sequencing of this population, we mapped aspects of the flavonoid profile. Major quantitative trait loci (QTLs) for anthocyanin type were found near loci previously identified only in aleurone-pigmented maize varieties [Purple aleurone1 (Pr1) and Anthocyanin acyltransferase1 (Aat1)]. A QTL near P1 (Pericarp color1) was found for both flavone content and flavanol-anthocyanin condensed forms. A significant QTL associated with peonidin-derived anthocyanins near a candidate S-adenosylmethionine-dependent methyltransferase was also identified, warranting further investigation. Mapping total anthocyanin content produced signals near Aat1, the aleurone-associated bHLH R1 (Colored1), the plant color-associated MYB, Pl1 (Purple plant1), the aleurone-associated recessive intensifier, In1 (Intensifier1), and several previously unidentified candidates. This population represents one of the most anthocyanin diverse pericarp-pigmented maize varieties characterized to date. Moreover, the candidates identified here will serve as branching points for future research studying the genetic and molecular processes determining anthocyanin profile in pericarp.

Journal ArticleDOI
TL;DR: A meta-genome wide association study involving 73 published studies in soybean (Glycine max L. as mentioned in this paper ) covering 17,556 unique accessions, with improved statistical power for robust detection of loci associated with a broad range of traits.
Abstract: We report a meta-Genome Wide Association Study involving 73 published studies in soybean (Glycine max L. [Merr.]) covering 17,556 unique accessions, with improved statistical power for robust detection of loci associated with a broad range of traits. De novo GWAS and meta-analysis were conducted for composition traits including fatty acid and amino acid composition traits, disease resistance traits, and agronomic traits including seed yield, plant height, stem lodging, seed weight, seed mottling, seed quality, flowering timing, and pod shattering. To examine differences in detectability and test statistical power between single- and multi-environment GWAS, comparison of meta-GWAS results to those from the constituent experiments were performed. Using meta-GWAS analysis and the analysis of individual studies, we report 483 peaks at 393 unique loci. Using stringent criteria to detect significant marker trait associations, 59 candidate genes were identified, including 17 agronomic traits loci, 19 for seed related traits, and 33 for disease reaction traits. This study identified potentially valuable candidate genes that affect multiple traits. The success in narrowing down the genomic region for some loci through overlapping mapping results of multiple studies is a promising avenue for community-based studies and plant breeding applications.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper isolated a semi-dwarf mutant (sdw-e), which exhibits a 30% reduction in plant height compared to Zhongshuang 11-HP (ZS11-HP) Quantitative trait locus sequencing (QTL-seq) was conducted using two extreme DNA bulks in F2 populations in Wuchang-2017 derived from ZS11−HP × sdw-E to identify QTLs associated with plant height.
Abstract: Plant height is a crucial element related to plant architecture that influences the seed yield of oilseed rape (Brassica napus L) In this study, we isolated a natural B napus mutant, namely a semi-dwarf mutant (sdw-e), which exhibits a 30% reduction in plant height compared to Zhongshuang 11-HP (ZS11-HP) Quantitative trait locus sequencing (QTL-seq) was conducted using two extreme DNA bulks in F2 populations in Wuchang-2017 derived from ZS11-HP × sdw-e to identify QTLs associated with plant height The result suggested that two QTL intervals were located on chromosome A10 The F2 population consisting of 200 individuals in Yangluo-2018 derived from ZS11-HP × sdw-e was used to construct a high-density linkage map using whole-genome resequencing The high-density linkage map harbored 4323 bin markers and covered a total distance of 202652 cM with an average marker interval of 047 cM The major QTL for plant height named qPHA10 was identified on linkage group A10 by interval mapping (IM) and composite interval mapping (CIM) methods The major QTL qPHA10 was highly consistent with the QTL-seq results And then, we integrated the variation sites and expression levels of genes in the major QTL interval to predict the candidate genes Thus, the identified QTL and candidate genes could be used in marker-assisted selection for B napus breeding in the future

Journal ArticleDOI
TL;DR: The pigeon louse Columbicola columbae is a longstanding and important model for studies of ectoparasitism and host-parasite coevolution as mentioned in this paper, however, a deeper understanding of its evolution and capacity for rapid adaptation is limited by a lack of genomic resources.
Abstract: The pigeon louse Columbicola columbae is a longstanding and important model for studies of ectoparasitism and host-parasite coevolution. However, a deeper understanding of its evolution and capacity for rapid adaptation is limited by a lack of genomic resources. Here, we present a high-quality draft assembly of the C. columbae genome, produced using a combination of Oxford Nanopore, Illumina, and Hi-C technologies. The final assembly is 208 Mb in length, with 12 chromosome-size scaffolds representing 98.1% of the assembly. For gene model prediction, we used a novel clustering method (wavy_choose) for Oxford Nanopore RNA-seq reads to feed into the MAKER annotation pipeline. High recovery of conserved single-copy orthologs (BUSCOs) suggests that our assembly and annotation are both highly complete and highly accurate. Consistent with the results of the only other assembled louse genome, Pediculus humanus, we find that C. columbae has a relatively low density of repetitive elements, the majority of which are DNA transposons. Also similar to P. humanus, we find a reduced number of genes encoding opsins, G protein-coupled receptors, odorant receptors, insulin signaling pathway components, and detoxification proteins in the C. columbae genome, relative to other insects. We propose that such losses might characterize the genomes of obligate, permanent ectoparasites with predictable habitats, limited foraging complexity, and simple dietary regimes. The sequencing and analysis for this genome were relatively low cost, and took advantage of a new clustering technique for Oxford Nanopore RNAseq reads that will be useful to future genome projects.

Journal ArticleDOI
TL;DR: A. psidii has a broad host range with more than 480 myrtaceous species and is a globally invasive fungal plant pathogen that causes rust disease on Myrtaceae as mentioned in this paper.
Abstract: Austropuccinia psidii, originating in South America, is a globally invasive fungal plant pathogen that causes rust disease on Myrtaceae. Several biotypes are recognized, with the most widely distributed pandemic biotype spreading throughout the Asia-Pacific and Oceania regions over the last decade. Austropuccinia psidii has a broad host range with more than 480 myrtaceous species. Since first detected in Australia in 2010, the pathogen has caused the near extinction of at least three species and negatively affected commercial production of several Myrtaceae. To enable molecular and evolutionary studies into A. psidii pathogenicity, we assembled a highly contiguous genome for the pandemic biotype. With an estimated haploid genome size of just over 1 Gb (gigabases), it is the largest assembled fungal genome to date. The genome has undergone massive expansion via distinct transposable element (TE) bursts. Over 90% of the genome is covered by TEs predominantly belonging to the Gypsy superfamily. These TE bursts have likely been followed by deamination events of methylated cytosines to silence the repetitive elements. This in turn led to the depletion of CpG sites in TEs and a very low overall GC content of 33.8%. Compared to other Pucciniales, the intergenic distances are increased by an order of magnitude indicating a general insertion of TEs between genes. Overall, we show how TEs shaped the genome evolution of A. psidii and provide a greatly needed resource for strategic approaches to combat disease spread.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper applied PacBio single-molecule sequencing technique (SMRT) and the high-throughput chromosome conformation capture (Hi-C) technologies to assemble the M. albus genome.
Abstract: The swamp eel (Monopterus albus) is one economically important fish in China and South-Eastern Asia and a good model species to study sex inversion. There are different genetic lineages and multiple local strains of swamp eel in China, and one local strain of M. albus with deep yellow and big spots has been selected for consecutive selective breeding due to superiority in growth rate and fecundity. A high-quality reference genome of the swamp eel would be a very useful resource for future selective breeding program. In the present study, we applied PacBio single-molecule sequencing technique (SMRT) and the high-throughput chromosome conformation capture (Hi-C) technologies to assemble the M. albus genome. A 799 Mb genome was obtained with the contig N50 length of 2.4 Mb and scaffold N50 length of 67.24 Mb, indicating 110-fold and ∼31.87-fold improvement compared to the earlier released assembly (∼22.24 Kb and 2.11 Mb, respectively). Aided with Hi-C data, a total of 750 contigs were reliably assembled into 12 chromosomes. Using 22,373 protein-coding genes annotated here, the phylogenetic relationships of the swamp eel with other teleosts showed that swamp eel separated from the common ancestor of Zig-zag eel ∼49.9 million years ago, and 769 gene families were found expanded, which are mainly enriched in the immune system, sensory system, and transport and catabolism. This highly accurate, chromosome-level reference genome of M. albus obtained in this work will be used for the development of genome-scale selective breeding.