scispace - formally typeset
Search or ask a question

Showing papers in "G3: Genes, Genomes, Genetics in 2013"


Journal ArticleDOI
TL;DR: This study performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile, providing the first detailed account of genomic variants in the HeLa genome.
Abstract: HeLa is the most widely used model cell line for studying human cellular and molecular biology. To date, no genomic reference for this cell line has been released, and experiments have relied on the human reference genome. Effective design and interpretation of molecular genetic studies performed using HeLa cells require accurate genomic information. Here we present a detailed genomic and transcriptomic characterization of a HeLa cell line. We performed DNA and RNA sequencing of a HeLa Kyoto cell line and analyzed its mutational portfolio and gene expression profile. Segmentation of the genome according to copy number revealed a remarkably high level of aneuploidy and numerous large structural variants at unprecedented resolution. Some of the extensive genomic rearrangements are indicative of catastrophic chromosome shattering, known as chromothripsis. Our analysis of the HeLa gene expression profile revealed that several pathways, including cell cycle and DNA repair, exhibit significantly different expression patterns from those in normal human tissues. Our results provide the first detailed account of genomic variants in the HeLa genome, yielding insight into their impact on gene expression and cellular function as well as their origins. This study underscores the importance of accounting for the strikingly aberrant characteristics of HeLa cells when designing and interpreting experiments, and has implications for the use of HeLa as a model of human biology.

403 citations


Journal ArticleDOI
TL;DR: The application of CRISPR-Cas–mediated genome editing to wheat, the most important food crop plant with a very large and complex genome, is reported, suggesting that the off target effects can be abolished in vivo by selecting target sites with unique sequences at 3′ end.
Abstract: The clustered, regularly interspaced, short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system has been used as an efficient tool for genome editing. We report the application of CRISPR-Cas–mediated genome editing to wheat (Triticum aestivum), the most important food crop plant with a very large and complex genome. The mutations were targeted in the inositol oxygenase (inox) and phytoene desaturase (pds) genes using cell suspension culture of wheat and in the pds gene in leaves of Nicotiana benthamiana. The expression of chimeric guide RNAs (cgRNA) targeting single and multiple sites resulted in indel mutations in all the tested samples. The expression of Cas9 or sgRNA alone did not cause any mutation. The expression of duplex cgRNA with Cas9 targeting two sites in the same gene resulted in deletion of DNA fragment between the targeted sequences. Multiplexing the cgRNA could target two genes at one time. Target specificity analysis of cgRNA showed that mismatches at the 3′ end of the target site abolished the cleavage activity completely. The mismatches at the 5′ end reduced cleavage, suggesting that the off target effects can be abolished in vivo by selecting target sites with unique sequences at 3′ end. This approach provides a powerful method for genome engineering in plants.

365 citations


Journal ArticleDOI
TL;DR: The assembly strategy implemented by PRICE is described and examples of its application to the sequence of particular genes, transcripts, and virus genomes from complex multicomponent datasets, including an assembly of the BCBL-1 strain of Kaposi’s sarcoma-associated herpesvirus.
Abstract: Low-cost DNA sequencing technologies have expanded the role for direct nucleic acid sequencing in the analysis of genomes, transcriptomes, and the metagenomes of whole ecosystems. Human and machine comprehension of such large datasets can be simplified via synthesis of sequence fragments into long, contiguous blocks of sequence (contigs), but most of the progress in the field of assembly has focused on genomes in isolation rather than metagenomes. Here, we present software for paired-read iterative contig extension (PRICE), a strategy for focused assembly of particular nucleic acid species using complex metagenomic data as input. We describe the assembly strategy implemented by PRICE and provide examples of its application to the sequence of particular genes, transcripts, and virus genomes from complex multicomponent datasets, including an assembly of the BCBL-1 strain of Kaposi’s sarcoma-associated herpesvirus. PRICE is open-source and available for free download (derisilab.ucsf.edu/software/price/ or sourceforge.net/projects/pricedenovo/).

271 citations


Journal ArticleDOI
TL;DR: The work presented here has led to a greatly improved ordering of the potato reference genome superscaffolds into chromosomal “pseudomolecules”.
Abstract: The genome of potato, a major global food crop, was recently sequenced. The work presented here details the integration of the potato reference genome (DM) with a new sequence-tagged site marker−based linkage map and other physical and genetic maps of potato and the closely related species tomato. Primary anchoring of the DM genome assembly was accomplished by the use of a diploid segregating population, which was genotyped with several types of molecular genetic markers to construct a new ~936 cM linkage map comprising 2469 marker loci. In silico anchoring approaches used genetic and physical maps from the diploid potato genotype RH89-039-16 (RH) and tomato. This combined approach has allowed 951 superscaffolds to be ordered into pseudomolecules corresponding to the 12 potato chromosomes. These pseudomolecules represent 674 Mb (~93%) of the 723 Mb genome assembly and 37,482 (~96%) of the 39,031 predicted genes. The superscaffold order and orientation within the pseudomolecules are closely collinear with independently constructed high density linkage maps. Comparisons between marker distribution and physical location reveal regions of greater and lesser recombination, as well as regions exhibiting significant segregation distortion. The work presented here has led to a greatly improved ordering of the potato reference genome superscaffolds into chromosomal “pseudomolecules”.

236 citations


Journal ArticleDOI
TL;DR: Evaluated methods for incorporating GBS information and compare them with pedigree models for predicting genetic values of lines from two maize populations evaluated for different traits measured in different environments found consistent gains in prediction accuracy.
Abstract: Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) arrays. Therefore, GBS has become an attractive alternative technology for genomic selection. However, the use of GBS data poses important challenges, and the accuracy of genomic prediction using GBS is currently undergoing investigation in several crops, including maize, wheat, and cassava. The main objective of this study was to evaluate various methods for incorporating GBS information and compare them with pedigree models for predicting genetic values of lines from two maize populations evaluated for different traits measured in different environments (experiments 1 and 2). Given that GBS data come with a large percentage of uncalled genotypes, we evaluated methods using nonimputed, imputed, and GBS-inferred haplotypes of different lengths (short or long). GBS and pedigree data were incorporated into statistical models using either the genomic best linear unbiased predictors (GBLUP) or the reproducing kernel Hilbert spaces (RKHS) regressions, and prediction accuracy was quantified using cross-validation methods. The following results were found: relative to pedigree or marker-only models, there were consistent gains in prediction accuracy by combining pedigree and GBS data; there was increased predictive ability when using imputed or nonimputed GBS data over inferred haplotype in experiment 1, or nonimputed GBS and information-based imputed short and long haplotypes, as compared to the other methods in experiment 2; the level of prediction accuracy achieved using GBS data in experiment 2 is comparable to those reported by previous authors who analyzed this data set using SNP arrays; and GBLUP and RKHS models with pedigree with nonimputed and imputed GBS data provided the best prediction correlations for the three traits in experiment 1, whereas for experiment 2 RKHS provided slightly better prediction than GBLUP for drought-stressed environments, and both models provided similar predictions in well-watered environments.

209 citations


Journal ArticleDOI
TL;DR: This work used RNA-seq to identify novel genes and provide the first high-resolution view of the transcriptome throughout development and in response to blood feeding in a mosquito vector of human disease, Aedes aegypti, the primary vector for Dengue and yellow fever.
Abstract: Mosquitoes are vectors of a number of important human and animal diseases. The development of novel vector control strategies requires a thorough understanding of mosquito biology. To facilitate this, we used RNA-seq to identify novel genes and provide the first high-resolution view of the transcriptome throughout development and in response to blood feeding in a mosquito vector of human disease, Aedes aegypti, the primary vector for Dengue and yellow fever. We characterized mRNA expression at 34 distinct time points throughout Aedes development, including adult somatic and germline tissues, by using polyA+ RNA-seq. We identify a total of 14,238 novel new transcribed regions corresponding to 12,597 new loci, as well as many novel transcript isoforms of previously annotated genes. Altogether these results increase the annotated fraction of the transcribed genome into long polyA+ RNAs by more than twofold. We also identified a number of patterns of shared gene expression, as well as genes and/or exons expressed sex-specifically or sex-differentially. Expression profiles of small RNAs in ovaries, early embryos, testes, and adult male and female somatic tissues also were determined, resulting in the identification of 38 new Aedes-specific miRNAs, and ~291,000 small RNA new transcribed regions, many of which are likely to be endogenous small-interfering RNAs and Piwi-interacting RNAs. Genes of potential interest for transgene-based vector control strategies also are highlighted. Our data have been incorporated into a user-friendly genome browser located at www.Aedes.caltech.edu, with relevant links to Vectorbase (www.vectorbase.org)

194 citations


Journal ArticleDOI
TL;DR: Although improvement and diversification for distinct market classes was observed through whole-genome analysis of historic and current potato lines, an increased rate of gain from selection will be required to meet growing global food demands and challenges due to climate change.
Abstract: Cultivated potato (Solanum tuberosum L.), a vegetatively propagated autotetraploid, has been bred for distinct market classes, including fresh market, pigmented, and processing varieties. Breeding efforts have relied on phenotypic selection of populations developed from intra- and intermarket class crosses and introgressions of wild and cultivated Solanum relatives. To retrospectively explore the effects of potato breeding at the genome level, we used 8303 single-nucleotide polymorphism markers to genotype a 250-line diversity panel composed of wild species, genetic stocks, and cultivated potato lines with release dates ranging from 1857 to 2011. Population structure analysis revealed four subpopulations within the panel, with cultivated potato lines grouping together and separate from wild species and genetic stocks. With pairwise kinship estimates clear separation between potato market classes was observed. Modern breeding efforts have scarcely changed the percentage of heterozygous loci or the frequency of homozygous, single-dose, and duplex loci on a genome level, despite concerted efforts by breeders. In contrast, clear selection in less than 50 years of breeding was observed for alleles in biosynthetic pathways important for market class-specific traits such as pigmentation and carbohydrate composition. Although improvement and diversification for distinct market classes was observed through whole-genome analysis of historic and current potato lines, an increased rate of gain from selection will be required to meet growing global food demands and challenges due to climate change. Understanding the genetic basis of diversification and trait improvement will allow for more rapid genome-guided improvement of potato in future breeding efforts.

176 citations


Journal ArticleDOI
TL;DR: It is concluded that high levels ofMissing data in dense marker sets is not a major obstacle for genomic selection, even when marker order is not known, and all four imputation methods evaluated led to greater genomic selection accuracies when the level of missing data was high.
Abstract: Genomic selection, a breeding method that promises to accelerate rates of genetic gain, requires dense, genome-wide marker data. Genotyping-by-sequencing can generate a large number of de novo markers. However, without a reference genome, these markers are unordered and typically have a large proportion of missing data. Because marker imputation algorithms were developed for species with a reference genome, algorithms suited for unordered markers have not been rigorously evaluated. Using four empirical datasets, we evaluate and characterize four such imputation methods, referred to as k-nearest neighbors, singular value decomposition, random forest regression, and expectation maximization imputation, in terms of their imputation accuracies and the factors affecting accuracy. The effect of imputation method on the genomic selection accuracy is assessed in comparison with mean imputation. The effect of excluding markers with a large proportion of missing data on the genomic selection accuracy is also examined. Our results show that imputation of unordered markers can be accurate, especially when linkage disequilibrium between markers is high and genotyped individuals are related. Of the methods evaluated, random forest regression imputation produced superior accuracy. In comparison with mean imputation, all four imputation methods we evaluated led to greater genomic selection accuracies when the level of missing data was high. Including rather than excluding markers with a large proportion of missing data nearly always led to greater GS accuracies. We conclude that high levels of missing data in dense marker sets is not a major obstacle for genomic selection, even when marker order is not known.

169 citations


Journal ArticleDOI
TL;DR: Comparison of the genomes of three P. tritici-repentis isolates provides evidence that pathogenicity in this species arose through an influx of transposable elements, which created a genetically flexible landscape that can easily respond to environmental changes.
Abstract: Pyrenophora tritici-repentis is a necrotrophic fungus causal to the disease tan spot of wheat, whose contribution to crop loss has increased significantly during the last few decades. Pathogenicity by this fungus is attributed to the production of host-selective toxins (HST), which are recognized by their host in a genotype-specific manner. To better understand the mechanisms that have led to the increase in disease incidence related to this pathogen, we sequenced the genomes of three P. tritici-repentis isolates. A pathogenic isolate that produces two known HSTs was used to assemble a reference nuclear genome of approximately 40 Mb composed of 11 chromosomes that encode 12,141 predicted genes. Comparison of the reference genome with those of a pathogenic isolate that produces a third HST, and a nonpathogenic isolate, showed the nonpathogen genome to be more diverged than those of the two pathogens. Examination of gene-coding regions has provided candidate pathogen-specific proteins and revealed gene families that may play a role in a necrotrophic lifestyle. Analysis of transposable elements suggests that their presence in the genome of pathogenic isolates contributes to the creation of novel genes, effector diversification, possible horizontal gene transfer events, identified copy number variation, and the first example of transduplication by DNA transposable elements in fungi. Overall, comparative analysis of these genomes provides evidence that pathogenicity in this species arose through an influx of transposable elements, which created a genetically flexible landscape that can easily respond to environmental changes.

162 citations


Journal ArticleDOI
TL;DR: The results provide the most favorable ZmVTE4 haplotype and suggest three new gene targets for increasing vitamin E and antioxidant levels through marker-assisted selection.
Abstract: Tocopherols and tocotrienols, collectively known as tocochromanols, are the major lipid-soluble antioxidants in maize (Zea mays L.) grain. Given that individual tocochromanols differ in their degree of vitamin E activity, variation for tocochromanol composition and content in grain from among diverse maize inbred lines has important nutritional and health implications for enhancing the vitamin E and antioxidant contents of maize-derived foods through plant breeding. Toward this end, we conducted a genome-wide association study of six tocochromanol compounds and 14 of their sums, ratios, and proportions with a 281 maize inbred association panel that was genotyped for 591,822 SNP markers. In addition to providing further insight into the association between ZmVTE4 (γ-tocopherol methyltransferase) haplotypes and α-tocopherol content, we also detected a novel association between ZmVTE1 (tocopherol cyclase) and tocotrienol composition. In a pathway-level analysis, we assessed the genetic contribution of 60 a priori candidate genes encoding the core tocochromanol pathway (VTE genes) and reactions for pathways supplying the isoprenoid tail and aromatic head group of tocochromanols. This analysis identified two additional genes, ZmHGGT1 (homogentisate geranylgeranyltransferase) and one prephenate dehydratase parolog (of four in the genome) that also modestly contribute to tocotrienol variation in the panel. Collectively, our results provide the most favorable ZmVTE4 haplotype and suggest three new gene targets for increasing vitamin E and antioxidant levels through marker-assisted selection.

150 citations


Journal ArticleDOI
TL;DR: Eight human population-genetic data sets at the 645 microsatellite loci they share in common are combined to assemble a single data set containing 5795 individuals from 267 worldwide populations, the largest of their kind reported to date.
Abstract: Over the past two decades, microsatellite genotypes have provided the data for landmark studies of human population-genetic variation. However, the various microsatellite data sets have been prepared with different procedures and sets of markers, so that it has been difficult to synthesize available data for a comprehensive analysis. Here, we combine eight human population-genetic data sets at the 645 microsatellite loci they share in common, accounting for procedural differences in the production of the different data sets, to assemble a single data set containing 5795 individuals from 267 worldwide populations. We perform a systematic analysis of genetic relatedness, detecting 240 intra-population and 92 inter-population pairs of previously unidentified close relatives and proposing standardized subsets of unrelated individuals for use in future studies. We then augment the human data with a data set of 84 chimpanzees at the 246 loci they share in common with the human samples. Multidimensional scaling and neighbor-joining analyses of these data sets offer new insights into the structure of human populations and enable a comparison of genetic variation patterns in chimpanzees with those in humans. Our combined data sets are the largest of their kind reported to date and provide a resource for use in human population-genetic studies.

Journal ArticleDOI
TL;DR: It is demonstrated that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome.
Abstract: The emergence of new sequencing technologies has provided fast and cost-efficient strategies for high-resolution mapping of complex genomes. Although these approaches hold great promise to accelerate genome analysis, their application in studying genetic variation in wheat has been hindered by the complexity of its polyploid genome. Here, we applied the next-generation sequencing of a wheat doubled-haploid mapping population for high-resolution gene mapping and tested its utility for ordering shotgun sequence contigs of a flow-sorted wheat chromosome. A bioinformatical pipeline was developed for reliable variant analysis of sequence data generated for polyploid wheat mapping populations. The results of variant mapping were consistent with the results obtained using the wheat 9000 SNP iSelect assay. A reference map of the wheat genome integrating 2740 gene-associated single-nucleotide polymorphisms from the wheat iSelect assay, 1351 diversity array technology, 118 simple sequence repeat/sequence-tagged sites, and 416,856 genotyping-by-sequencing markers was developed. By analyzing the sequenced megabase-size regions of the wheat genome we showed that mapped markers are located within 40−100 kb from genes providing a possibility for high-resolution mapping at the level of a single gene. In our population, gene loci controlling a seed color phenotype cosegregated with 2459 markers including one that was located within the red seed color gene. We demonstrate that the high-density reference map presented here is a useful resource for gene mapping and linking physical and genetic maps of the wheat genome.

Journal ArticleDOI
TL;DR: The basic properties of the genome and transcriptome are discussed and patterns of genome evolution in D. suzukii and its close relatives are described and presented in a web portal, SpottedWingFlyBase, to facilitate public access.
Abstract: Drosophila suzukii Matsumura (spotted wing drosophila) has recently become a serious pest of a wide variety of fruit crops in the United States as well as in Europe, leading to substantial yearly crop losses. To enable basic and applied research of this important pest, we sequenced the D. suzukii genome to obtain a high-quality reference sequence. Here, we discuss the basic properties of the genome and transcriptome and describe patterns of genome evolution in D. suzukii and its close relatives. Our analyses and genome annotations are presented in a web portal, SpottedWingFlyBase, to facilitate public access.

Journal ArticleDOI
TL;DR: It is confirmed that HR outcomes are enhanced relative to the alternative nonhomologous end joining (NHEJ) repair pathway in flies lacking DNA ligase IV and these observations should enable better experimental design for gene targeting in Drosophila and help guide similar efforts in other systems.
Abstract: Gene targeting is the term commonly applied to experimental gene replacement by homologous recombination (HR). This process is substantially stimulated by a double-strand break (DSB) in the genomic target. Zinc-finger nucleases (ZFNs) are targetable cleavage reagents that provide an effective means of introducing such a break in conjunction with delivery of a homologous donor DNA. In this study we explored several parameters of donor DNA structure during ZFN-mediated gene targeting in Drosophila melanogaster embryos, as follows. 1) We confirmed that HR outcomes are enhanced relative to the alternative nonhomologous end joining (NHEJ) repair pathway in flies lacking DNA ligase IV. 2) The minimum amount of homology needed to support efficient HR in fly embryos is between 200 and 500 bp. 3) Conversion tracts are very broad in this system: donor sequences more than 3 kb from the ZFN-induced break are found in the HR products at approximately 50% of the frequency of a marker at the break. 4) Deletions carried by the donor DNA are readily incorporated at the target. 5) While linear double-stranded DNAs are not effective as donors, single-stranded oligonucleotides are. These observations should enable better experimental design for gene targeting in Drosophila and help guide similar efforts in other systems.

Journal ArticleDOI
TL;DR: This work adapted and refined the algorithms used for the mammalian PrimerBank to design 45,417 primer pairs for 13,860 Drosophila melanogaster genes, with three or more primer pairs per gene, and included the overlap of each predicted amplified sequence with RNAi reagents from several public resources, making it possible for researchers to choose primers suitable for knockdown evaluation of RNAiReagents in vivo.
Abstract: The evaluation of specific endogenous transcript levels is important for understanding transcriptional regulation. More specifically, it is useful for independent confirmation of results obtained by the use of microarray analysis or RNA-seq and for evaluating RNA interference (RNAi)-mediated gene knockdown. Designing specific and effective primers for high-quality, moderate-throughput evaluation of transcript levels, i.e., quantitative, real-time PCR (qPCR), is nontrivial. To meet community needs, predefined qPCR primer pairs for mammalian genes have been designed and sequences made available, e.g., via PrimerBank. In this work, we adapted and refined the algorithms used for the mammalian PrimerBank to design 45,417 primer pairs for 13,860 Drosophila melanogaster genes, with three or more primer pairs per gene. We experimentally validated primer pairs for ~300 randomly selected genes expressed in early Drosophila embryos, using SYBR Green-based qPCR and sequence analysis of products derived from conventional PCR. All relevant information, including primer sequences, isoform specificity, spatial transcript targeting, and any available validation results and/or user feedback, is available from an online database (www.flyrnai.org/flyprimerbank). At FlyPrimerBank, researchers can retrieve primer information for fly genes either one gene at a time or in batch mode. Importantly, we included the overlap of each predicted amplified sequence with RNAi reagents from several public resources, making it possible for researchers to choose primers suitable for knockdown evaluation of RNAi reagents (i.e., to avoid amplification of the RNAi reagent itself). We demonstrate the utility of this resource for validation of RNAi reagents in vivo.

Journal ArticleDOI
TL;DR: This work demonstrates that TALENs are useful reagents for achieving targeted mutagenesis in this important plant model, Arabidopsis thaliana.
Abstract: Custom TAL effector nucleases (TALENs) are increasingly used as reagents to manipulate genomes in vivo. Here, we used TALENs to modify the genome of the model plant, Arabidopsis thaliana. We engineered seven TALENs targeting five Arabidopsis genes, namely ADH1, TT4, MAPKKK1, DSK2B, and NATA2. In pooled seedlings expressing the TALENs, we observed somatic mutagenesis frequencies ranging from 2–15% at the intended targets for all seven TALENs. Somatic mutagenesis frequencies as high as 41–73% were observed in individual transgenic plant lines expressing the TALENs. Additionally, a TALEN pair targeting a tandemly duplicated gene induced a 4.4-kb deletion in somatic cells. For the most active TALEN pairs, namely those targeting ADH1 and NATA2, we found that TALEN-induced mutations were transmitted to the next generation at frequencies of 1.5–12%. Our work demonstrates that TALENs are useful reagents for achieving targeted mutagenesis in this important plant model.

Journal ArticleDOI
TL;DR: Assessment of the effect of resource allocation on response to MAS and genomic selection in a single biparental population of doubled haploid lines by using computer simulation indicated theoretical formulas used for calculating prediction accuracies in the simulation are useful for making resource allocation decisions.
Abstract: Allocating resources between population size and replication affects both genetic gain through phenotypic selection and quantitative trait loci detection power and effect estimation accuracy for marker-assisted selection (MAS). It is well known that because alleles are replicated across individuals in quantitative trait loci mapping and MAS, more resources should be allocated to increasing population size compared with phenotypic selection. Genomic selection is a form of MAS using all marker information simultaneously to predict individual genetic values for complex traits and has widely been found superior to MAS. No studies have explicitly investigated how resource allocation decisions affect success of genomic selection. My objective was to study the effect of resource allocation on response to MAS and genomic selection in a single biparental population of doubled haploid lines by using computer simulation. Simulation results were compared with previously derived formulas for the calculation of prediction accuracy under different levels of heritability and population size. Response of prediction accuracy to resource allocation strategies differed between genomic selection models (ridge regression best linear unbiased prediction [RR-BLUP], BayesCπ) and multiple linear regression using ordinary least-squares estimation (OLS), leading to different optimal resource allocation choices between OLS and RR-BLUP. For OLS, it was always advantageous to maximize population size at the expense of replication, but a high degree of flexibility was observed for RR-BLUP. Prediction accuracy of doubled haploid lines included in the training set was much greater than of those excluded from the training set, so there was little benefit to phenotyping only a subset of the lines genotyped. Finally, observed prediction accuracies in the simulation compared well to calculated prediction accuracies, indicating these theoretical formulas are useful for making resource allocation decisions.

Journal ArticleDOI
TL;DR: The results encourage the application of genomic prediction in NCLB-resistance breeding programs and the use of combined training sets, which led to significantly greater prediction accuracies for both heterotic groups.
Abstract: Northern corn leaf blight (NCLB), a severe fungal disease causing yield losses worldwide, is most effectively controlled by resistant varieties. Genomic prediction could greatly aid resistance breeding efforts. However, the development of accurate prediction models requires large training sets of genotyped and phenotyped individuals. Maize hybrid breeding is based on distinct heterotic groups that maximize heterosis (the dent and flint groups in Central Europe). The resulting allocation of resources to parallel breeding programs challenges the establishment of sufficiently sized training sets within groups. Therefore, using training sets combining both heterotic groups might be a possibility of increasing training set sizes and thereby prediction accuracies. The objectives of our study were to assess the prospect of genomic prediction of NCLB resistance in maize and the benefit of a training set that combines two heterotic groups. Our data comprised 100 dent and 97 flint lines, phenotyped for NCLB resistance per se and genotyped with high-density single-nucleotide polymorphism marker data. A genomic BLUP model was used to predict genotypic values. Prediction accuracies reached a maximum of 0.706 (dent) and 0.690 (flint), and there was a strong positive response to increases in training set size. The use of combined training sets led to significantly greater prediction accuracies for both heterotic groups. Our results encourage the application of genomic prediction in NCLB-resistance breeding programs and the use of combined training sets.

Journal ArticleDOI
TL;DR: This paper applied a multiplexed, reduced genome sequencing strategy (restriction site−associated sequencing or RAD-seq) to genotype a large collection of S. cerevisiae strains isolated from a wide range of geographical locations and environmental niches, finding diversity among these strains is principally organized by geography, with European, North American, Asian, and African populations defining the major axes of genetic variation.
Abstract: The budding yeast Saccharomyces cerevisiae is important for human food production and as a model organism for biological research. The genetic diversity contained in the global population of yeast strains represents a valuable resource for a number of fields, including genetics, bioengineering, and studies of evolution and population structure. Here, we apply a multiplexed, reduced genome sequencing strategy (restriction site−associated sequencing or RAD-seq) to genotype a large collection of S. cerevisiae strains isolated from a wide range of geographical locations and environmental niches. The method permits the sequencing of the same 1% of all genomes, producing a multiple sequence alignment of 116,880 bases across 262 strains. We find diversity among these strains is principally organized by geography, with European, North American, Asian, and African/S. E. Asian populations defining the major axes of genetic variation. At a finer scale, small groups of strains from cacao, olives, and sake are defined by unique variants not present in other strains. One population, containing strains from a variety of fermentations, exhibits high levels of heterozygosity and a mixture of alleles from European and Asian populations, indicating an admixed origin for this group. We propose a model of geographic differentiation followed by human-associated admixture, primarily between European and Asian populations and more recently between European and North American populations. The large collection of genotyped yeast strains characterized here will provide a useful resource for the broad community of yeast researchers.

Journal ArticleDOI
TL;DR: The data show that the biological roles of A. nidulans LaeA and T. reesei LAE1 are much less conserved than hitherto thought and appears predominantly to regulate genes increasing relative fitness in its environment.
Abstract: The putative methyltransferase LaeA is a global regulator that affects the expression of multiple secondary metabolite gene clusters in several fungi, and it can modify heterochromatin structure in Aspergillus nidulans. We have recently shown that the LaeA ortholog of Trichoderma reesei (LAE1), a fungus that is an industrial producer of cellulase and hemicellulase enzymes, regulates the expression of cellulases and polysaccharide hydrolases. To learn more about the function of LAE1 in T. reesei, we assessed the effect of deletion and overexpression of lae1 on genome-wide gene expression. We found that in addition to positively regulating 7 of 17 polyketide or nonribosomal peptide synthases, genes encoding ankyrin-proteins, iron uptake, heterokaryon incompatibility proteins, PTH11-receptors, and oxidases/monoxygenases are major gene categories also regulated by LAE1. chromatin immunoprecipitation sequencing with antibodies against histone modifications known to be associated with transcriptionally active (H3K4me2 and -me3) or silent (H3K9me3) chromatin detected 4089 genes bearing one or more of these methylation marks, of which 75 exhibited a correlation between either H3K4me2 or H3K4me3 and regulation by LAE1. Transformation of a laeA-null mutant of A. nidulans with the T. reesei lae1 gene did not rescue sterigmatocystin formation and further impaired sexual development. LAE1 did not interact with A. nidulans VeA in yeast two-hybrid assays, whereas it interacted with the T. reesei VeA ortholog, VEL1. LAE1 was shown to be required for the expression of vel1, whereas the orthologs of velB and VosA are unaffected by lae1 deletion. Our data show that the biological roles of A. nidulans LaeA and T. reesei LAE1 are much less conserved than hitherto thought. In T. reesei, LAE1 appears predominantly to regulate genes increasing relative fitness in its environment.

Journal ArticleDOI
TL;DR: A closer scrutiny of tumor suppressors with homopolymeric runs with proximal repeats as the potential drivers of oncogenesis in mismatch repair defective cells is suggested.
Abstract: DNA mismatch repair is a highly conserved DNA repair pathway. In humans, germline mutations in hMSH2 or hMLH1, key components of mismatch repair, have been associated with Lynch syndrome, a leading cause of inherited cancer mortality. Current estimates of the mutation rate and the mutational spectra in mismatch repair defective cells are primarily limited to a small number of individual reporter loci. Here we use the yeast Saccharomyces cerevisiae to generate a genome-wide view of the rates, spectra, and distribution of mutation in the absence of mismatch repair. We performed mutation accumulation assays and next generation sequencing on 19 strains, including 16 msh2 missense variants implicated in Lynch cancer syndrome. The mutation rate for DNA mismatch repair null strains was approximately 1 mutation per genome per generation, 225-fold greater than the wild-type rate. The mutations were distributed randomly throughout the genome, independent of replication timing. The mutation spectra included insertions/deletions at homopolymeric runs (87.7%) and at larger microsatellites (5.9%), as well as transitions (4.5%) and transversions (1.9%). Additionally, repeat regions with proximal repeats are more likely to be mutated. A bias toward deletions at homopolymers and insertions at (AT)n microsatellites suggests a different mechanism for mismatch generation at these sites. Interestingly, 5% of the single base pair substitutions might represent double-slippage events that occurred at the junction of immediately adjacent repeats, resulting in a shift in the repeat boundary. These data suggest a closer scrutiny of tumor suppressors with homopolymeric runs with proximal repeats as the potential drivers of oncogenesis in mismatch repair defective cells.

Journal ArticleDOI
TL;DR: An ultra-high-density genetic map for lettuce, an economically important member of the Compositae, consisting of 12,842 unigenes (13,943 markers) mapped in 3696 genetic bins distributed over nine chromosomal linkage groups is generated.
Abstract: We have generated an ultra-high-density genetic map for lettuce, an economically important member of the Compositae, consisting of 12,842 unigenes (13,943 markers) mapped in 3696 genetic bins distributed over nine chromosomal linkage groups. Genomic DNA was hybridized to a custom Affymetrix oligonucleotide array containing 6.4 million features representing 35,628 unigenes of Lactuca spp. Segregation of single-position polymorphisms was analyzed using 213 F7:8 recombinant inbred lines that had been generated by crossing cultivated Lactuca sativa cv. Salinas and L. serriola acc. US96UC23, the wild progenitor species of L. sativa. The high level of replication of each allele in the recombinant inbred lines was exploited to identify single-position polymorphisms that were assigned to parental haplotypes. Marker information has been made available using GBrowse to facilitate access to the map. This map has been anchored to the previously published integrated map of lettuce providing candidate genes for multiple phenotypes. The high density of markers achieved in this ultradense map allowed syntenic studies between lettuce and Vitis vinifera as well as other plant species.

Journal ArticleDOI
TL;DR: The functionality of PolyCat is demonstrated on allotetraploid cotton, Gossypium hirsutum, and a functional SNP index for efficiently mapping sequence reads to the D-genome sequence of G. raimondii is created.
Abstract: Read mapping is a fundamental part of next-generation genomic research but is complicated by genome duplication in many plants. Categorizing DNA sequence reads into their respective genomes enables current methods to analyze polyploid genomes as if they were diploid. We present PolyCat—a pipeline for mapping and categorizing all types of next-generation sequence data produced from allopolyploid organisms. PolyCat uses GSNAP’s single-nucleotide polymorphism (SNP)-tolerant mapping to minimize the mapping efficiency bias caused by SNPs between genomes. PolyCat then uses SNPs between genomes to categorize reads according to their respective genomes. Bisulfite-treated reads have a significant reduction in nucleotide complexity because nucleotide conversion events are confounded with transition substitutions. PolyCat includes special provisions to properly handle bisulfite-treated data. We demonstrate the functionality of PolyCat on allotetraploid cotton, Gossypium hirsutum, and create a functional SNP index for efficiently mapping sequence reads to the D-genome sequence of G. raimondii. PolyCat is appropriate for all allopolyploids and all types of next-generation genome analysis, including differential expression (RNA sequencing), differential methylation (bisulfite sequencing), differential DNA-protein binding (chromatin immunoprecipitation sequencing), and population diversity.

Journal ArticleDOI
TL;DR: Genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models.
Abstract: In crop breeding, the interest of predicting the performance of candidate cultivars in the field has increased due to recent advances in molecular breeding technologies. However, the complexity of the wheat genome presents some challenges for applying new technologies in molecular marker identification with next-generation sequencing. We applied genotyping-by-sequencing, a recently developed method to identify single-nucleotide polymorphisms, in the genomes of 384 wheat (Triticum aestivum) genotypes that were field tested under three different water regimes in Mediterranean climatic conditions: rain-fed only, mild water stress, and fully irrigated. We identified 102,324 single-nucleotide polymorphisms in these genotypes, and the phenotypic data were used to train and test genomic selection models intended to predict yield, thousand-kernel weight, number of kernels per spike, and heading date. Phenotypic data showed marked spatial variation. Therefore, different models were tested to correct the trends observed in the field. A mixed-model using moving-means as a covariate was found to best fit the data. When we applied the genomic selection models, the accuracy of predicted traits increased with spatial adjustment. Multiple genomic selection models were tested, and a Gaussian kernel model was determined to give the highest accuracy. The best predictions between environments were obtained when data from different years were used to train the model. Our results confirm that genotyping-by-sequencing is an effective tool to obtain genome-wide information for crops with complex genomes, that these data are efficient for predicting traits, and that correction of spatial variation is a crucial ingredient to increase prediction accuracy in genomic selection models.

Journal ArticleDOI
TL;DR: A large-scale full-length cDNA collection from 21 full- lengths cDNA libraries derived from 14 tissues of the domesticated silkworm enabled the authors to annotate the genome of this lepidopteran model insect more accurately, enhancing genomic and functional studies of Lepidoptera and comparative analyses with other insect orders, and yielding new insights into the evolution and organization of le pidopteran-specific genes.
Abstract: The establishment of a complete genomic sequence of silkworm, the model species of Lepidoptera, laid a foundation for its functional genomics. A more complete annotation of the genome will benefit functional and comparative studies and accelerate extensive industrial applications for this insect. To realize these goals, we embarked upon a large-scale full-length cDNA collection from 21 full-length cDNA libraries derived from 14 tissues of the domesticated silkworm and performed full sequencing by primer walking for 11,104 full-length cDNAs. The large average intron size was 1904 bp, resulting from a high accumulation of transposons. Using gene models predicted by GLEAN and published mRNAs, we identified 16,823 gene loci on the silkworm genome assembly. Orthology analysis of 153 species, including 11 insects, revealed that among three Lepidoptera including Monarch and Heliconius butterflies, the 403 largest silkworm-specific genes were composed mainly of protective immunity, hormone-related, and characteristic structural proteins. Analysis of testis-/ovary-specific genes revealed distinctive features of sexual dimorphism, including depletion of ovary-specific genes on the Z chromosome in contrast to an enrichment of testis-specific genes. More than 40% of genes expressed in specific tissues mapped in tissue-specific chromosomal clusters. The newly obtained FL-cDNA sequences enabled us to annotate the genome of this lepidopteran model insect more accurately, enhancing genomic and functional studies of Lepidoptera and comparative analyses with other insect orders, and yielding new insights into the evolution and organization of lepidopteran-specific genes.

Journal ArticleDOI
TL;DR: A training package for basic Drosophila genetics, designed to ensure that basic knowledge on all key areas is covered while reducing the time invested by trainers, is provided.
Abstract: Drosophila melanogaster is a powerful model organism for biological research. The essential and common instrument of fly research is genetics, the art of applying Mendelian rules in the specific context of Drosophila with its unique classical genetic tools and the breadth of modern genetic tools and strategies brought in by molecular biology, transgenic technologies and the use of recombinases. Training newcomers to fly genetics is a complex and time-consuming task but too important to be left to chance. Surprisingly, suitable training resources for beginners currently are not available. Here we provide a training package for basic Drosophila genetics, designed to ensure that basic knowledge on all key areas is covered while reducing the time invested by trainers. First, a manual introduces to fly history, rationale for mating schemes, fly handling, Mendelian rules in fly, markers and balancers, mating scheme design, and transgenic technologies. Its self-study is followed by a practical training session on gender and marker selection, introducing real flies under the dissecting microscope. Next, through self-study of a PowerPoint presentation, trainees are guided step-by-step through a mating scheme. Finally, to consolidate knowledge, trainees are asked to design similar mating schemes reflecting routine tasks in a fly laboratory. This exercise requires individual feedback but also provides unique opportunities for trainers to spot weaknesses and strengths of each trainee and take remedial action. This training package is being successfully applied at the Manchester fly facility and may serve as a model for further training resources covering other aspects of fly research.

Journal ArticleDOI
TL;DR: This study uses population genomics in a nonmodel fish species, rainbow trout (Oncorhynchus mykiss), to better understand adaptive divergence between migratory and nonmigratory ecotypes and to further understand about the genetic basis of migration.
Abstract: Next-generation sequencing and the application of population genomic and association approaches have made it possible to detect selection and unravel the genetic basis to variable phenotypic traits. The use of these two approaches in parallel is especially attractive in nonmodel organisms that lack a sequenced and annotated genome, but only works well when population structure is not confounded with the phenotype of interest. Herein, we use population genomics in a nonmodel fish species, rainbow trout (Oncorhynchus mykiss), to better understand adaptive divergence between migratory and nonmigratory ecotypes and to further our understanding about the genetic basis of migration. Restriction site-associated DNA (RAD) tag sequencing was used to identify single-nucleotide polymorphisms (SNPs) in migrant and resident O. mykiss from two systems, one in Alaska and the other in Oregon. A total of 7920 and 6755 SNPs met filtering criteria in the Alaska and Oregon data sets, respectively. Population genetic tests determined that 1423 SNPs were candidates for selection when loci were compared between resident and migrant samples. Previous linkage mapping studies that used RAD DNA tag SNPs were available to determine the position of 1990 markers. Several significant SNPs are located in genome regions that contain quantitative trait loci for migratory-related traits, reinforcing the importance of these regions in the genetic basis of migration/residency. Annotation of genome regions linked to significant SNPs revealed genes involved in processes known to be important in migration (such as osmoregulatory function). This study adds to our growing knowledge on adaptive divergence between migratory and nonmigratory ecotypes of this species; across studies, this complex trait appears to be controlled by many loci of small effect, with some in common, but many loci not shared between populations studied.

Journal ArticleDOI
TL;DR: A genome-wide association study to detect allele variants associated with increased resistance to Fusarium ear rot in a maize core diversity panel of 267 inbred lines evaluated in two sets of environments identified three marker loci significantly associated with disease resistance in at least one subset of environments.
Abstract: Fusarium ear rot is a common disease of maize that affects food and feed quality globally. Resistance to the disease is highly quantitative, and maize breeders have difficulty incorporating polygenic resistance alleles from unadapted donor sources into elite breeding populations without having a negative impact on agronomic performance. Identification of specific allele variants contributing to improved resistance may be useful to breeders by allowing selection of resistance alleles in coupling phase linkage with favorable agronomic characteristics. We report the results of a genome-wide association study to detect allele variants associated with increased resistance to Fusarium ear rot in a maize core diversity panel of 267 inbred lines evaluated in two sets of environments. We performed association tests with 47,445 single-nucleotide polymorphisms (SNPs) while controlling for background genomic relationships with a mixed model and identified three marker loci significantly associated with disease resistance in at least one subset of environments. Each associated SNP locus had relatively small additive effects on disease resistance (±1.1% on a 0–100% scale), but nevertheless were associated with 3 to 12% of the genotypic variation within or across environment subsets. Two of three identified SNPs colocalized with genes that have been implicated with programmed cell death. An analysis of associated allele frequencies within the major maize subpopulations revealed enrichment for resistance alleles in the tropical/subtropical and popcorn subpopulations compared with other temperate breeding pools.

Journal ArticleDOI
TL;DR: Estimation of genomic relationships could be a powerful tool in forest tree breeding because it accurately accounts both for genetic relationships among individuals and for nuisance effects such as location and replicate effects, and makes more accurate selection possible within full-sib crosses.
Abstract: Replacement of the average numerator relationship matrix derived from the pedigree with the realized genomic relationship matrix based on DNA markers might be an attractive strategy in forest tree breeding for predictions of genetic merit. We used genotypes from 3461 single-nucleotide polymorphism loci to estimate genomic relationships for a population of 165 loblolly pine (Pinus taeda L.) individuals. Phenotypes of the 165 individuals were obtained from clonally replicated field trials and were used to estimate breeding values for growth (stem volume). Two alternative methods, based on allele frequencies or regression, were used to generate the genomic relationship matrices. The accuracies of genomic estimated breeding values based on the genomic relationship matrices and breeding values estimated based on the average numerator relationship matrix were compared. On average, the accuracy of predictions based on genomic relationships ranged between 0.37 and 0.74 depending on the validation method. We did not detect differences in the accuracy of predictions based on genomic relationship matrices estimated by two different methods. Using genomic relationship matrices allowed modeling of Mendelian segregation within full-sib families, an important advantage over a traditional genetic evaluation system based on pedigree. We conclude that estimation of genomic relationships could be a powerful tool in forest tree breeding because it accurately accounts both for genetic relationships among individuals and for nuisance effects such as location and replicate effects, and makes more accurate selection possible within full-sib crosses.

Journal ArticleDOI
TL;DR: Analysis of genomic and genetic information available for 12 Drosophila species showed a rapid early diversification of the DEG/ENaC family in Diptera followed by physiological and/or cellular specialization.
Abstract: Degenerin/epithelial sodium channels (DEG/ENaC) represent a large family of animal-specific membrane proteins. Although the physiological functions of most family members are not known, some have been shown to act as nonvoltage gated, amiloride-sensitive sodium channels. The DEG/ENaC family is exceptionally large in genomes of Drosophila species relative to vertebrates and other insects. To elucidate the evolutionary history of the DEG/ENaC family in Drosophila, we took advantage of the genomic and genetic information available for 12 Drosophila species that represent all the major species groups in the Drosophila clade. We have identified 31 family members (termed pickpocket genes) in Drosophila melanogaster, which can be divided into six subfamilies, which are represented in all 12 species. Structure prediction analyses suggested that some subunits evolved unique structural features in the large extracellular domain, possibly supporting mechanosensory functions. This finding is further supported by experimental data that show that both ppk1 and ppk26 are expressed in multidendritic neurons, which can sense mechanical nociceptive stimuli in larvae. We also identified representative genes from five of the six DEG/ENaC subfamilies in a mosquito genome, suggesting that the core DEG/ENaC subfamilies were already present early in the dipteran radiation. Spatial and temporal analyses of expression patterns of the various pickpocket genes indicated that paralogous genes often show very different expression patterns, possibly indicating that gene duplication events have led to new physiological or cellular functions rather than redundancy. In summary, our analyses support a rapid early diversification of the DEG/ENaC family in Diptera followed by physiological and/or cellular specialization. Some members of the family may have diversified to support the physiological functions of a yet unknown class of ligands.