scispace - formally typeset
Search or ask a question

Showing papers in "Genetics in 2014"


Journal ArticleDOI
01 Jun 2014-Genetics
TL;DR: Developing efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework and proposing useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data.
Abstract: Tools for estimating population structure from genetic data are now used in a wide variety of applications in population genetics. However, inferring population structure in large modern data sets imposes severe computational challenges. Here, we develop efficient algorithms for approximate inference of the model underlying the STRUCTURE program using a variational Bayesian framework. Variational methods pose the problem of computing relevant posterior distributions as an optimization problem, allowing us to build on recent advances in optimization theory to develop fast inference tools. In addition, we propose useful heuristic scores to identify the number of populations represented in a data set and a new hierarchical prior to detect weak population structure in the data. We test the variational algorithms on simulated data and illustrate using genotype data from the CEPH-Human Genome Diversity Panel. The variational algorithms are almost two orders of magnitude faster than STRUCTURE and achieve accuracies comparable to those of ADMIXTURE. Furthermore, our results show that the heuristic scores for choosing model complexity provide a reasonable range of values for the number of populations represented in the data, with minimal bias toward detecting structure when it is very weak. Our algorithm, fastSTRUCTURE, is freely available online at http://pritchardlab.stanford.edu/structure.html.

1,266 citations


Journal ArticleDOI
01 Oct 2014-Genetics
TL;DR: The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures, which allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner.
Abstract: Many modern genomic data analyses require implementing regressions where the number of parameters (p, e.g., the number of marker effects) exceeds sample size (n). Implementing these large-p-with-small-n regressions poses several statistical and computational challenges, some of which can be confronted using Bayesian methods. This approach allows integrating various parametric and nonparametric shrinkage and variable selection procedures in a unified and consistent manner. The BGLR R-package implements a large collection of Bayesian regression models, including parametric variable selection and shrinkage methods and semiparametric procedures (Bayesian reproducing kernel Hilbert spaces regressions, RKHS). The software was originally developed for genomic applications; however, the methods implemented are useful for many nongenomic applications as well. The response can be continuous (censored or not) or categorical (either binary or ordinal). The algorithm is based on a Gibbs sampler with scalar updates and the implementation takes advantage of efficient compiled C and Fortran routines. In this article we describe the methods implemented in BGLR, present examples of the use of the package, and discuss practical issues emerging in real-data analysis.

987 citations


Journal ArticleDOI
01 Jan 2014-Genetics
TL;DR: In this paper, homology-directed repair (HDR) with double-stranded DNA (dsDNA) donor templates was used to enable complex genome engineering through the precise incorporation of large DNA sequences, including screenable markers.
Abstract: We and others recently demonstrated that the readily programmable CRISPR/Cas9 system can be used to edit the Drosophila genome. However, most applications to date have relied on aberrant DNA repair to stochastically generate frameshifting indels and adoption has been limited by a lack of tools for efficient identification of targeted events. Here we report optimized tools and techniques for expanded application of the CRISPR/Cas9 system in Drosophila through homology-directed repair (HDR) with double-stranded DNA (dsDNA) donor templates that facilitate complex genome engineering through the precise incorporation of large DNA sequences, including screenable markers. Using these donors, we demonstrate the replacement of a gene with exogenous sequences and the generation of a conditional allele. To optimize efficiency and specificity, we generated transgenic flies that express Cas9 in the germline and directly compared HDR and off-target cleavage rates of different approaches for delivering CRISPR components. We also investigated HDR efficiency in a mutant background previously demonstrated to bias DNA repair toward HDR. Finally, we developed a web-based tool that identifies CRISPR target sites and evaluates their potential for off-target cleavage using empirically rooted rules. Overall, we have found that injection of a dsDNA donor and guide RNA-encoding plasmids into vasa-Cas9 flies yields the highest efficiency HDR and that target sites can be selected to avoid off-target mutations. Efficient and specific CRISPR/Cas9-mediated HDR opens the door to a broad array of complex genome modifications and greatly expands the utility of CRISPR technology for Drosophila research.

838 citations


Journal ArticleDOI
01 Nov 2014-Genetics
TL;DR: A coconversion strategy, using CRISPR/Cas9 in which screening for a dominant phenotypic oligonucleotide-templated conversion event at one locus can be used to enrich for custom modifications at another unlinked locus, which shows that custom modification events can be carried out recursively, enabling multiple mutant animals to be made.
Abstract: Facilitated by recent advances using CRISPR/Cas9, genome editing technologies now permit custom genetic modifications in a wide variety of organisms. Ideally, modified animals could be both efficiently made and easily identified with minimal initial screening and without introducing exogenous sequence at the locus of interest or marker mutations elsewhere. To this end, we describe a coconversion strategy, using CRISPR/Cas9 in which screening for a dominant phenotypic oligonucleotide-templated conversion event at one locus can be used to enrich for custom modifications at another unlinked locus. After the desired mutation is identified among the F1 progeny heterozygous for the dominant marker mutation, F2 animals that have lost the marker mutation are picked to obtain the desired mutation in an unmarked genetic background. We have developed such a coconversion strategy for Caenorhabditis elegans, using a number of dominant phenotypic markers. Examining the coconversion at a second (unselected) locus of interest in the marked F1 animals, we observed that 14–84% of screened animals showed homologous recombination. By reconstituting the unmarked background through segregation of the dominant marker mutation at each step, we show that custom modification events can be carried out recursively, enabling multiple mutant animals to be made. While our initial choice of a coconversion marker [rol-6(su1006)] was readily applicable in a single round of coconversion, the genetic properties of this locus were not optimal in that CRISPR-mediated deletion mutations at the unselected rol-6 locus can render a fraction of coconverted strains recalcitrant to further rounds of similar mutagenesis. An optimal marker in this sense would provide phenotypic distinctions between the desired mutant/+ class and alternative +/+, mutant/null, null/null, and null/+ genotypes. Reviewing dominant alleles from classical C. elegans genetics, we identified one mutation in dpy-10 and one mutation in sqt-1 that meet these criteria and demonstrate that these too can be used as effective conversion markers. Coconversion was observed using a variety of donor molecules at the second (unselected) locus, including oligonucleotides, PCR products, and plasmids. We note that the coconversion approach described here could be applied in any of the variety of systems where suitable coconversion markers can be identified from previous intensive genetic analyses of gain-of-function alleles.

690 citations


Journal ArticleDOI
01 Apr 2014-Genetics
TL;DR: This work presents a fast and efficient method for estimating individual ancestry coefficients based on sparse nonnegative matrix factorization algorithms in the computer program sNMF and applied it to human and plant data sets.
Abstract: Inference of individual ancestry coefficients, which is important for population genetic and association studies, is commonly performed using computer-intensive likelihood algorithms. With the availability of large population genomic data sets, fast versions of likelihood algorithms have attracted considerable attention. Reducing the computational burden of estimation algorithms remains, however, a major challenge. Here, we present a fast and efficient method for estimating individual ancestry coefficients based on sparse nonnegative matrix factorization algorithms. We implemented our method in the computer program sNMF and applied it to human and plant data sets. The performances of sNMF were then compared to the likelihood algorithm implemented in the computer program ADMIXTURE. Without loss of accuracy, sNMF computed estimates of ancestry coefficients with runtimes ∼10-30 times shorter than those of ADMIXTURE.

552 citations


Journal ArticleDOI
01 Jul 2014-Genetics
TL;DR: It is proposed that the domestication syndrome results predominantly from mild neural crest cell deficits during embryonic development, which can be readily explained as direct consequences of such deficiencies, while other traits are explicable as indirect consequences.
Abstract: Charles Darwin, while trying to devise a general theory of heredity from the observations of animal and plant breeders, discovered that domesticated mammals possess a distinctive and unusual suite of heritable traits not seen in their wild progenitors. Some of these traits also appear in domesticated birds and fish. The origin of Darwin's "domestication syndrome" has remained a conundrum for more than 140 years. Most explanations focus on particular traits, while neglecting others, or on the possible selective factors involved in domestication rather than the underlying developmental and genetic causes of these traits. Here, we propose that the domestication syndrome results predominantly from mild neural crest cell deficits during embryonic development. Most of the modified traits, both morphological and physiological, can be readily explained as direct consequences of such deficiencies, while other traits are explicable as indirect consequences. We first show how the hypothesis can account for the multiple, apparently unrelated traits of the syndrome and then explore its genetic dimensions and predictions, reviewing the available genetic evidence. The article concludes with a brief discussion of some genetic and developmental questions raised by the idea, along with specific predictions and experimental tests.

478 citations


Journal ArticleDOI
01 Oct 2014-Genetics
TL;DR: This work proposes a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal, and validate the approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus.
Abstract: Although genome-wide association studies have successfully identified thousands of risk loci for complex traits, only a handful of the biologically causal variants, responsible for association at these loci, have been successfully identified. Current statistical methods for identifying causal variants at risk loci either use the strength of the association signal in an iterative conditioning framework or estimate probabilities for variants to be causal. A main drawback of existing methods is that they rely on the simplifying assumption of a single causal variant at each risk locus, which is typically invalid at many risk loci. In this work, we propose a new statistical framework that allows for the possibility of an arbitrary number of causal variants when estimating the posterior probability of a variant being causal. A direct benefit of our approach is that we predict a set of variants for each locus that under reasonable assumptions will contain all of the true causal variants with a high confidence level (e.g., 95%) even when the locus contains multiple causal variants. We use simulations to show that our approach provides 20–50% improvement in our ability to identify the causal variants compared to the existing methods at loci harboring multiple causal variants. We validate our approach using empirical data from an expression QTL study of CHI3L2 to identify new causal variants that affect gene expression at this locus. CAVIAR is publicly available online at http://genetics.cs.ucla.edu/caviar/.

396 citations


Journal ArticleDOI
01 Jun 2014-Genetics
TL;DR: The yeast deletion collection, or yeast knockout (YKO) set, represents the first and only complete, systematically constructed deletion collection available for any organism.
Abstract: The yeast deletion collections comprise >21,000 mutant strains that carry precise start-to-stop deletions of ∼6000 open reading frames. This collection includes heterozygous and homozygous diploids, and haploids of both MATa and MATα mating types. The yeast deletion collection, or yeast knockout (YKO) set, represents the first and only complete, systematically constructed deletion collection available for any organism. Conceived during the Saccharomyces cerevisiae sequencing project, work on the project began in 1998 and was completed in 2002. The YKO strains have been used in numerous laboratories in >1000 genome-wide screens. This landmark genome project has inspired development of numerous genome-wide technologies in organisms from yeast to man. Notable spinoff technologies include synthetic genetic array and HIPHOP chemogenomics. In this retrospective, we briefly describe the yeast deletion project and some of its most noteworthy biological contributions and the impact that these collections have had on the yeast research community and on genomics in general.

395 citations


Journal ArticleDOI
01 Apr 2014-Genetics
TL;DR: In this paper, the authors measured the plant height, ear height, flowering time, and node counts of plants grown in.64,500 plots across 13 environments and found that maize height was under strong genetic control and had a highly polygenic genetic architecture.
Abstract: Height is one of the most heritable and easily measured traits in maize (Zea mays L.). Given a pedigree or estimates of the genomic identity-by-state among related plants, height is also accurately predictable. But, mapping alleles explaining natural variation in maize height remains a formidable challenge. To address this challenge, we measured the plant height, ear height, flowering time, and node counts of plants grown in .64,500 plots across 13 environments. These plots contained .7300 inbreds representing most publically available maize inbreds in the United States and families of the maize Nested Association Mapping (NAM) panel. Joint- linkage mapping of quantitative trait loci (QTL), fine mapping in near isogenic lines (NILs), genome-wide association studies (GWAS), and genomic best linear unbiased prediction (GBLUP) were performed. The heritability of maize height was estimated to be .90%. Mapping NAM family-nested QTL revealed the largest explained 2.1 6 0.9% of height variation. The effects of two tropical alleles at this QTL were independently validated by fine mapping in NIL families. Several significant associations found by GWAS colocalized with established height loci, including brassinosteroid-deficient dwarf1, dwarf plant1, and semi-dwarf2. GBLUP explained .80% of height variation in the panels and outperformed bootstrap aggregation of family-nested QTL models in evaluations of prediction accuracy. These results revealed maize height was under strong genetic control and had a highly polygenic genetic architecture. They also showed that multiple models of genetic architecture differing in polygenicity and effect sizes can plausibly explain a population's variation in maize height, but they may vary in predictive efficacy.

312 citations


Journal ArticleDOI
01 Jan 2014-Genetics
TL;DR: A systematic method to mutate, tag, or delete any gene in the C. elegans genome without the use of co-integrated markers or long homology arms is developed, scalable for multi-gene editing projects and could be applied to other animals with an accessible germline.
Abstract: Homology-directed repair (HDR) of double-strand DNA breaks is a promising method for genome editing, but is thought to be less efficient than error-prone nonhomologous end joining in most cell types. We have investigated HDR of double-strand breaks induced by CRISPR-associated protein 9 (Cas9) in Caenorhabditis elegans. We find that HDR is very robust in the C. elegans germline. Linear repair templates with short (∼30–60 bases) homology arms support the integration of base and gene-sized edits with high efficiency, bypassing the need for selection. Based on these findings, we developed a systematic method to mutate, tag, or delete any gene in the C. elegans genome without the use of co-integrated markers or long homology arms. We generated 23 unique edits at 11 genes, including premature stops, whole-gene deletions, and protein fusions to antigenic peptides and GFP. Whole-genome sequencing of five edited strains revealed the presence of passenger variants, but no mutations at predicted off-target sites. The method is scalable for multi-gene editing projects and could be applied to other animals with an accessible germline.

304 citations


Journal ArticleDOI
01 Nov 2014-Genetics
TL;DR: Several models that have been proposed to explain the mechanism of mitotic recombination are discussed, the genes and proteins involved in various pathways, the genetic and physical assays used to discover and study these genes, and the roles of many of these proteins inside the cell are discussed.
Abstract: Homology-dependent exchange of genetic information between DNA molecules has a profound impact on the maintenance of genome integrity by facilitating error-free DNA repair, replication, and chromosome segregation during cell division as well as programmed cell developmental events. This chapter will focus on homologous mitotic recombination in budding yeast Saccharomyces cerevisiae. However, there is an important link between mitotic and meiotic recombination (covered in the forthcoming chapter by Hunter et al. 2015) and many of the functions are evolutionarily conserved. Here we will discuss several models that have been proposed to explain the mechanism of mitotic recombination, the genes and proteins involved in various pathways, the genetic and physical assays used to discover and study these genes, and the roles of many of these proteins inside the cell.

Journal ArticleDOI
01 Mar 2014-Genetics
TL;DR: This work employed novel strategies that allowed it to determine the loblolly pine reference genome sequence, the largest genome assembled to date, and implemented additional scaffolding methods utilizing independent genome and transcriptome assemblies to improve the contiguity and biological utility of the genome sequence.
Abstract: Conifers are the predominant gymnosperm. The size and complexity of their genomes has presented formidable technical challenges for whole-genome shotgun sequencing and assembly. We employed novel strategies that allowed us to determine the loblolly pine (Pinus taeda) reference genome sequence, the largest genome assembled to date. Most of the sequence data were derived from whole-genome shotgun sequencing of a single megagametophyte, the haploid tissue of a single pine seed. Although that constrained the quantity of available DNA, the resulting haploid sequence data were well-suited for assembly. The haploid sequence was augmented with multiple linking long-fragment mate pair libraries from the parental diploid DNA. For the longest fragments, we used novel fosmid DiTag libraries. Sequences from the linking libraries that did not match the megagametophyte were identified and removed. Assembly of the sequence data were aided by condensing the enormous number of paired-end reads into a much smaller set of longer “super-reads,” rendering subsequent assembly with an overlap-based assembly algorithm computationally feasible. To further improve the contiguity and biological utility of the genome sequence, additional scaffolding methods utilizing independent genome and transcriptome assemblies were implemented. The combination of these strategies resulted in a draft genome sequence of 20.15 billion bases, with an N50 scaffold size of 66.9 kbp.

Journal ArticleDOI
01 Jun 2014-Genetics
TL;DR: This study simulated genetic data for 21 iteroparous animal and plant species to evaluate two untested hypotheses regarding performance of the single-sample method based on linkage disequilibrium (LD), and shows that single-cohort samples should be equally influenced by Nb and Ne.
Abstract: Use of single-sample genetic methods to estimate effective population size has skyrocketed in recent years. Although the underlying models assume discrete generations, they are widely applied to age-structured species. We simulated genetic data for 21 iteroparous animal and plant species to evaluate two untested hypotheses regarding performance of the single-sample method based on linkage disequilibrium (LD): (1) estimates based on single-cohort samples reflect the effective number of breeders in one reproductive cycle (Nb), and (2) mixed-age samples reflect the effective size per generation (Ne). We calculated true Ne and Nb, using the model species' vital rates, and verified these with individual-based simulations. We show that single-cohort samples should be equally influenced by Nb and Ne and confirm this with simulated results: [Formula: see text] was a linear (r(2) = 0.98) function of the harmonic mean of Ne and Nb. We provide a quantitative bias correction for raw [Formula: see text] based on the ratio Nb/Ne, which can be estimated from two or three simple life history traits. Bias-adjusted estimates were within 5% of true Nb for all 21 study species and proved robust when challenged with new data. Mixed-age adult samples produced downwardly biased estimates in all species, which we attribute to a two-locus Wahlund effect (mixture LD) caused by combining parents from different cohorts in a single sample. Results from this study will facilitate interpretation of rapidly accumulating genetic estimates in terms of both Ne (which influences long-term evolutionary processes) and Nb (which is more important for understanding eco-evolutionary dynamics and mating systems).

Journal ArticleDOI
01 Aug 2014-Genetics
TL;DR: These findings reveal a surprisingly high frequency of HR-mediated gene conversion, making it possible to rapidly and precisely edit the C. elegans genome both with and without the use of co-inserted marker genes.
Abstract: Genome editing based on CRISPR (clustered regularly interspaced short palindromic repeats)-associated nuclease (Cas9) has been successfully applied in dozens of diverse plant and animal species, including the nematode Caenorhabditis elegans. The rapid life cycle and easy access to the ovary by micro-injection make C. elegans an ideal organism both for applying CRISPR-Cas9 genome editing technology and for optimizing genome-editing protocols. Here we report efficient and straightforward CRISPR-Cas9 genome-editing methods for C. elegans, including a Co-CRISPR strategy that facilitates detection of genome-editing events. We describe methods for detecting homologous recombination (HR) events, including direct screening methods as well as new selection/counterselection strategies. Our findings reveal a surprisingly high frequency of HR-mediated gene conversion, making it possible to rapidly and precisely edit the C. elegans genome both with and without the use of co-inserted marker genes.

Journal ArticleDOI
01 Jan 2014-Genetics
TL;DR: Results suggest that zebrafish in nature possess a WZ/ZZ sex-determination mechanism with a major determinant lying near the right telomere of chromosome 4 that was modified during domestication.
Abstract: Sex determination can be robustly genetic, strongly environmental, or genetic subject to environmental perturbation. The genetic basis of sex determination is unknown for zebrafish (Danio rerio), a model for development and human health. We used RAD-tag population genomics to identify sex-linked polymorphisms. After verifying this “RAD-sex” method on medaka (Oryzias latipes), we studied two domesticated zebrafish strains (AB and TU), two natural laboratory strains (WIK and EKW), and two recent isolates from nature (NA and CB). All four natural strains had a single sex-linked region at the right tip of chromosome 4, enabling sex genotyping by PCR. Genotypes for the single nucleotide polymorphism (SNP) with the strongest statistical association to sex suggested that wild zebrafish have WZ/ZZ sex chromosomes. In natural strains, “male genotypes” became males and some “female genotypes” also became males, suggesting that the environment or genetic background can cause female-to-male sex reversal. Surprisingly, TU and AB lacked detectable sex-linked loci. Phylogenomics rooted on D. nigrofasciatus verified that all strains are monophyletic. Because AB and TU branched as a monophyletic clade, we could not rule out shared loss of the wild sex locus in a common ancestor despite their independent domestication. Mitochondrial DNA sequences showed that investigated strains represent only one of the three identified zebrafish haplogroups. Results suggest that zebrafish in nature possess a WZ/ZZ sex-determination mechanism with a major determinant lying near the right telomere of chromosome 4 that was modified during domestication. Strains providing the zebrafish reference genome lack key components of the natural sex-determination system but may have evolved variant sex-determining mechanisms during two decades in laboratory culture.

Journal ArticleDOI
01 Jan 2014-Genetics
TL;DR: Deep genome sequencing of two parents and 12 of their offspring to estimate the mutation rate per site per generation in a full-sib family of Drosophila melanogaster recently sampled from a natural population suggests an effective population size for the species of ∼1.4 × 106.9 million.
Abstract: We employed deep genome sequencing of two parents and 12 of their offspring to estimate the mutation rate per site per generation in a full-sib family of Drosophila melanogaster recently sampled from a natural population. Sites that were homozygous for the same allele in the parents and heterozygous in one or more offspring were categorized as candidate mutations and subjected to detailed analysis. In 1.23 × 10(9) callable sites from 12 individuals, we confirmed six single nucleotide mutations. We estimated the false negative rate in the experiment by generating synthetic mutations using the empirical distributions of numbers of nonreference bases at heterozygous sites in the offspring. The proportion of synthetic mutations at callable sites that we failed to detect was <1%, implying that the false negative rate was extremely low. Our estimate of the point mutation rate is 2.8 × 10(-9) (95% confidence interval = 1.0 × 10(-9) - 6.1 × 10(-9)) per site per generation, which is at the low end of the range of previous estimates, and suggests an effective population size for the species of ∼1.4 × 10(6). At one site, point mutations were present in two individuals, indicating that there had been a premeiotic mutation cluster, although surprisingly one individual had a G→A transition and the other a G→T transversion, possibly associated with error-prone mismatch repair. We also detected three short deletion mutations and no insertions, giving a deletion mutation rate of 1.2 × 10(-9) (95% confidence interval = 0.7 × 10(-9) - 11 × 10(-9)).

Journal ArticleDOI
01 Jun 2014-Genetics
TL;DR: The major apomixis mechanisms are described and update knowledge concerning the loci that control them, in addition to presenting candidate genes that may be used as tools for switching the sexual pathway to an apomictic mode of reproduction in crops.
Abstract: Apomixis (asexual seed formation) is the result of a plant gaining the ability to bypass the most fundamental aspects of sexual reproduction: meiosis and fertilization. Without the need for male fertilization, the resulting seed germinates a plant that develops as a maternal clone. This dramatic shift in reproductive process has been documented in many flowering plant species, although no major seed crops have been shown to be capable of apomixis. The ability to generate maternal clones and therefore rapidly fix desirable genotypes in crop species could accelerate agricultural breeding strategies. The potential of apomixis as a next-generation breeding technology has contributed to increasing interest in the mechanisms controlling apomixis. In this review, we discuss the progress made toward understanding the genetic and molecular control of apomixis. Research is currently focused on two fronts. One aims to identify and characterize genes causing apomixis in apomictic species that have been developed as model species. The other aims to engineer or switch the sexual seed formation pathway in non-apomictic species, to one that mimics apomixis. Here we describe the major apomictic mechanisms and update knowledge concerning the loci that control them, in addition to presenting candidate genes that may be used as tools for switching the sexual pathway to an apomictic mode of reproduction in crops.

Journal ArticleDOI
01 Mar 2014-Genetics
TL;DR: The first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence is presented, and in depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%.
Abstract: The largest genus in the conifer family Pinaceae is Pinus, with over 100 species. The size and complexity of their genomes (∼20–40 Gb, 2n = 24) have delayed the arrival of a well-annotated reference sequence. In this study, we present the annotation of the first whole-genome shotgun assembly of loblolly pine (Pinus taeda L.), which comprises 20.1 Gb of sequence. The MAKER-P annotation pipeline combined evidence-based alignments and ab initio predictions to generate 50,172 gene models, of which 15,653 are classified as high confidence. Clustering these gene models with 13 other plant species resulted in 20,646 gene families, of which 1554 are predicted to be unique to conifers. Among the conifer gene families, 159 are composed exclusively of loblolly pine members. The gene models for loblolly pine have the highest median and mean intron lengths of 24 fully sequenced plant genomes. Conifer genomes are full of repetitive DNA, with the most significant contributions from long-terminal-repeat retrotransposons. In depth analysis of the tandem and interspersed repetitive content yielded a combined estimate of 82%.

Journal ArticleDOI
01 Jun 2014-Genetics
TL;DR: The data demonstrate that targeted, heritable gene editing can be achieved in tilapia, providing a convenient and effective approach for generating loss-of-function mutants, and shows the utility of the CRISPR/Cas9 system for genetic engineering in non-model species like Tilapia and potentially in many other teleost species.
Abstract: Studies of gene function in non-model animals have been limited by the approaches available for eliminating gene function. The CRISPR/Cas9 (clustered regularly interspaced short palindromic repeats/CRISPR associated) system has recently become a powerful tool for targeted genome editing. Here, we report the use of the CRISPR/Cas9 system to disrupt selected genes, including nanos2, nanos3, dmrt1, and foxl2, with efficiencies as high as 95%. In addition, mutations in dmrt1 and foxl2 induced by CRISPR/Cas9 were efficiently transmitted through the germline to F1. Obvious phenotypes were observed in the G0 generation after mutation of germ cell or somatic cell-specific genes. For example, loss of Nanos2 and Nanos3 in XY and XX fish resulted in germ cell-deficient gonads as demonstrated by GFP labeling and Vasa staining, respectively, while masculinization of somatic cells in both XY and XX gonads was demonstrated by Dmrt1 and Cyp11b2 immunohistochemistry and by up-regulation of serum androgen levels. Our data demonstrate that targeted, heritable gene editing can be achieved in tilapia, providing a convenient and effective approach for generating loss-of-function mutants. Furthermore, our study shows the utility of the CRISPR/Cas9 system for genetic engineering in non-model species like tilapia and potentially in many other teleost species.

Journal ArticleDOI
01 Aug 2014-Genetics
TL;DR: High consistency of linkage phases and large differences in allele frequencies between the Dent and Flint heterotic groups in pericentromeric regions are found and support the hypothesis of differential fixation of alleles due to pseudo-overdominance in these regions.
Abstract: Maize (Zea mays L.) serves as model plant for heterosis research and is the crop where hybrid breeding was pioneered. We analyzed genomic and phenotypic data of 1254 hybrids of a typical maize hybrid breeding program based on the important Dent × Flint heterotic pattern. Our main objectives were to investigate genome properties of the parental lines (e.g., allele frequencies, linkage disequilibrium, and phases) and examine the prospects of genomic prediction of hybrid performance. We found high consistency of linkage phases and large differences in allele frequencies between the Dent and Flint heterotic groups in pericentromeric regions. These results can be explained by the Hill–Robertson effect and support the hypothesis of differential fixation of alleles due to pseudo-overdominance in these regions. In pericentromeric regions we also found indications for consistent marker–QTL linkage between heterotic groups. With prediction methods GBLUP and BayesB, the cross-validation prediction accuracy ranged from 0.75 to 0.92 for grain yield and from 0.59 to 0.95 for grain moisture. The prediction accuracy of untested hybrids was highest, if both parents were parents of other hybrids in the training set, and lowest, if none of them were involved in any training set hybrid. Optimizing the composition of the training set in terms of number of lines and hybrids per line could further increase prediction accuracy. We conclude that genomic prediction facilitates a paradigm shift in hybrid breeding by focusing on the performance of experimental hybrids rather than the performance of parental lines in testcrosses.

Journal ArticleDOI
01 May 2014-Genetics
TL;DR: A discussion of several major discoveries derived from yeast studies highlights the far-reaching impact that the yeast system has had and will continue to have on the understanding of a variety of cellular processes relevant to all eukaryotes, including humans.
Abstract: The budding yeast Saccharomyces cerevisiae is a powerful model organism for studying fundamental aspects of eukaryotic cell biology. This Primer article presents a brief historical perspective on the emergence of this organism as a premier experimental system over the course of the past century. An overview of the central features of the S. cerevisiae genome, including the nature of its genetic elements and general organization, is also provided. Some of the most common experimental tools and resources available to yeast geneticists are presented in a way designed to engage and challenge undergraduate and graduate students eager to learn more about the experimental amenability of budding yeast. Finally, a discussion of several major discoveries derived from yeast studies highlights the far-reaching impact that the yeast system has had and will continue to have on our understanding of a variety of cellular processes relevant to all eukaryotes, including humans.

Journal ArticleDOI
01 Apr 2014-Genetics
TL;DR: The authors used mixed linear models to identify candidate loci responsible for adaptation to three climatic gradients (annual mean temperature (AMT), precipitation in the wettest month (PWM), and isothermality (ITH) representing the major axes of climate variation across the species' range.
Abstract: Local adaptation and adaptive clines are pervasive in natural plant populations, yet the effects of these types of adaptation on genomic diversity are not well understood. With a data set of 202 accessions of Medicago truncatula genotyped at almost 2 million single nucleotide polymorphisms, we used mixed linear models to identify candidate loci responsible for adaptation to three climatic gradients—annual mean temperature (AMT), precipitation in the wettest month (PWM), and isothermality (ITH)—representing the major axes of climate variation across the species’ range. Loci with the strongest association to these climate gradients tagged genome regions with high sequence similarity to genes with functional roles in thermal tolerance, drought tolerance, or resistance to herbivores of pathogens. Genotypes at these candidate loci also predicted the performance of an independent sample of plant accessions grown in climate-controlled conditions. Compared to a genome-wide sample of randomly drawn reference SNPs, candidates for two climate gradients, AMT and PWM, were significantly enriched for genic regions, and genome segments flanking genic AMT and PWM candidates harbored less nucleotide diversity, elevated differentiation between haplotypes carrying alternate alleles, and an overrepresentation of the most common haplotypes. These patterns of diversity are consistent with a history of soft selective sweeps acting on loci underlying adaptation to climate, but not with a history of long-term balancing selection.

Journal ArticleDOI
01 Jan 2014-Genetics
TL;DR: A flexible strategy involving use of a small number of genes that can be selected for rapid conversion of elite white grain germplasm, with minimal amounts of carotenoids, to orange grain versions containing high levels of provitamin A is outlined.
Abstract: Efforts are underway for development of crops with improved levels of provitamin A carotenoids to help combat dietary vitamin A deficiency. As a global staple crop with considerable variation in kernel carotenoid composition, maize (Zea mays L.) could have a widespread impact. We performed a genome-wide association study (GWAS) of quantified seed carotenoids across a panel of maize inbreds ranging from light yellow to dark orange in grain color to identify some of the key genes controlling maize grain carotenoid composition. Significant associations at the genome-wide level were detected within the coding regions of zep1 and lut1, carotenoid biosynthetic genes not previously shown to impact grain carotenoid composition in association studies, as well as within previously associated lcyE and crtRB1 genes. We leveraged existing biochemical and genomic information to identify 58 a priori candidate genes relevant to the biosynthesis and retention of carotenoids in maize to test in a pathway-level analysis. This revealed dxs2 and lut5, genes not previously associated with kernel carotenoids. In genomic prediction models, use of markers that targeted a small set of quantitative trait loci associated with carotenoid levels in prior linkage studies were as effective as genome-wide markers for predicting carotenoid traits. Based on GWAS, pathway-level analysis, and genomic prediction studies, we outline a flexible strategy involving use of a small number of genes that can be selected for rapid conversion of elite white grain germplasm, with minimal amounts of carotenoids, to orange grain versions containing high levels of provitamin A.

Journal ArticleDOI
01 Dec 2014-Genetics
TL;DR: This work evaluates height data from a multifamily population of the tree species Pinus taeda with a systematic series of models accounting for additive, dominance, and first-order epistatic interactions, showing that the results suggest that the additive and nonadditive components of genetic variance are similar in magnitude.
Abstract: The application of quantitative genetics in plant and animal breeding has largely focused on additive models, which may also capture dominance and epistatic effects. Partitioning genetic variance into its additive and nonadditive components using pedigree-based models (P-genomic best linear unbiased predictor) (P-BLUP) is difficult with most commonly available family structures. However, the availability of dense panels of molecular markers makes possible the use of additive- and dominance-realized genomic relationships for the estimation of variance components and the prediction of genetic values (G-BLUP). We evaluated height data from a multifamily population of the tree species Pinus taeda with a systematic series of models accounting for additive, dominance, and first-order epistatic interactions (additive by additive, dominance by dominance, and additive by dominance), using either pedigree- or marker-based information. We show that, compared with the pedigree, use of realized genomic relationships in marker-based models yields a substantially more precise separation of additive and nonadditive components of genetic variance. We conclude that the marker-based relationship matrices in a model including additive and nonadditive effects performed better, improving breeding value prediction. Moreover, our results suggest that, for tree height in this population, the additive and nonadditive components of genetic variance are similar in magnitude. This novel result improves our current understanding of the genetic control and architecture of a quantitative trait and should be considered when developing breeding strategies.

Journal ArticleDOI
01 Jan 2014-Genetics
TL;DR: An analysis for quantitative traits utilizing a range of multilocus quantitative genetic models and gene frequency distributions concludes that theoretical predictions and experimental observations of low amounts of epistatic variance in outbred populations are concordant.
Abstract: Although research effort is being expended into determining the importance of epistasis and epistatic variance for complex traits, there is considerable controversy about their importance. Here we undertake an analysis for quantitative traits utilizing a range of multilocus quantitative genetic models and gene frequency distributions, focusing on the potential magnitude of the epistatic variance. All the epistatic terms involving a particular locus appear in its average effect, with the number of two-locus interaction terms increasing in proportion to the square of the number of loci and that of third order as the cube and so on. Hence multilocus epistasis makes substantial contributions to the additive variance and does not, per se, lead to large increases in the nonadditive part of the genotypic variance. Even though this proportion can be high where epistasis is antagonistic to direct effects, it reduces with multiple loci. As the magnitude of the epistatic variance depends critically on the heterozygosity, for models where frequencies are widely dispersed, such as for selectively neutral mutations, contributions of epistatic variance are always small. Epistasis may be important in understanding the genetic architecture, for example, of function or human disease, but that does not imply that loci exhibiting it will contribute much genetic variance. Overall we conclude that theoretical predictions and experimental observations of low amounts of epistatic variance in outbred populations are concordant. It is not a likely source of missing heritability, for example, or major influence on predictions of rates of evolution.

Journal ArticleDOI
01 Jan 2014-Genetics
TL;DR: An adaptive SPU (aSPU) test is proposed to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios.
Abstract: This article focuses on conducting global testing for association between a binary trait and a set of rare variants (RVs), although its application can be much broader to other types of traits, common variants (CVs), and gene set or pathway analysis. We show that many of the existing tests have deteriorating performance in the presence of many nonassociated RVs: their power can dramatically drop as the proportion of nonassociated RVs in the group to be tested increases. We propose a class of so-called sum of powered score (SPU) tests, each of which is based on the score vector from a general regression model and hence can deal with different types of traits and adjust for covariates, e.g., principal components accounting for population stratification. The SPU tests generalize the sum test, a representative burden test based on pooling or collapsing genotypes of RVs, and a sum of squared score (SSU) test that is closely related to several other powerful variance component tests; a previous study (Basu and Pan 2011) has demonstrated good performance of one, but not both, of the Sum and SSU tests in many situations. The SPU tests are versatile in the sense that one of them is often powerful, although its identity varies with the unknown true association parameters. We propose an adaptive SPU (aSPU) test to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios. We conducted extensive simulations to show superior performance of the aSPU test over several state-of-the-art association tests in the presence of many nonassociated RVs. Finally we applied the SPU and aSPU tests to the GAW17 mini-exome sequence data to compare its practical performance with some existing tests, demonstrating their potential usefulness.

Journal ArticleDOI
01 Mar 2014-Genetics
TL;DR: A two-layer hidden Markov model is presented to detect the structure of haplotypes for unrelated individuals, taking advantage of rich haplotype information to infer local ancestry of admixed individuals and outperforms competing state-of-the-art methods.
Abstract: We present a two-layer hidden Markov model to detect the structure of haplotypes for unrelated individuals. This allows us to model two scales of linkage disequilibrium (one within a group of haplotypes and one between groups), thereby taking advantage of rich haplotype information to infer local ancestry of admixed individuals. Our method outperforms competing state-of-the-art methods, particularly for regions of small ancestral track lengths. Applying our method to Mexican samples in HapMap3, we found two regions on chromosomes 6 and 8 that show significant departure of local ancestry from the genome-wide average. A software package implementing the methods described in this article is freely available at http://bcm.edu/cnrc/mcmcmc.

Journal ArticleDOI
01 Mar 2014-Genetics
TL;DR: Overall, it seems that sex determination in fish does not resort to a single genetic cascade but is rather regulated along a continuum of environmental and heritable factors.
Abstract: Teleost fishes are the most species-rich clade of vertebrates and feature an overwhelming diversity of sex-determining mechanisms, classically grouped into environmental and genetic systems. Here, we review the recent findings in the field of sex determination in fish. In the past few years, several new master regulators of sex determination and other factors involved in sexual development have been discovered in teleosts. These data point toward a greater genetic plasticity in generating the male and female sex than previously appreciated and implicate novel gene pathways in the initial regulation of the sexual fate. Overall, it seems that sex determination in fish does not resort to a single genetic cascade but is rather regulated along a continuum of environmental and heritable factors.

Journal ArticleDOI
01 Jul 2014-Genetics
TL;DR: This work compares the expected distribution of admixture tract lengths under a number of population-genetic models to the distribution predicted by the Wright–Fisher model with recombination and develops a dyadic interval-based stochastic process for generating admixture tracts.
Abstract: The distribution of admixture tract lengths has received considerable attention, in part because it can be used to infer the timing of past gene flow events between populations. It is commonly assumed that these lengths can be modeled as independently and identically distributed (iid) exponential random variables. This assumption is fundamental for many popular methods that analyze admixture using hidden Markov models. We compare the expected distribution of admixture tract lengths under a number of population-genetic models to the distribution predicted by the Wright–Fisher model with recombination. We show that under the latter model, the assumption of iid exponential tract lengths does not hold for recent or for ancient admixture events and that relying on this assumption can lead to false positives when inferring the number of admixture events. To further investigate the tract-length distribution, we develop a dyadic interval-based stochastic process for generating admixture tracts. This representation is useful for analyzing admixture tract-length distributions for populations with recent admixture, a scenario in which existing models perform poorly.

Journal ArticleDOI
01 May 2014-Genetics
TL;DR: In this article, a large collection of quantitative trait loci (QTL) was used to address long-standing questions about the anatomical specificity, genetic dominance, and genomic clustering of loci controlling skeletal differences in evolving populations.
Abstract: Understanding the genetic architecture of evolutionary change remains a long-standing goal in biology. In vertebrates, skeletal evolution has contributed greatly to adaptation in body form and function in response to changing ecological variables like diet and predation. Here we use genome-wide linkage mapping in threespine stickleback fish to investigate the genetic architecture of evolved changes in many armor and trophic traits. We identify .100 quantitative trait loci (QTL) controlling the pattern of serially repeating skeletal elements, including gill rakers, teeth, branchial bones, jaws, median fin spines, and vertebrae. We use this large collection of QTL to address long-standing questions about the anatomical specificity, genetic dominance, and genomic clustering of loci controlling skeletal differences in evolving populations. We find that most QTL (76%) that influence serially repeating skeletal elements have anatomically regional effects. In addition, most QTL (71%) have at least partially additive effects, regardless of whether the QTL controls evolved loss or gain of skeletal elements. Finally, many QTL with high LOD scores cluster on chromosomes 4, 20, and 21. These results identify a modular system that can control highly specific aspects of skeletal form. Because of the general additivity and genomic clustering of major QTL, concerted changes in both protective armor and trophic traits may occur when sticklebacks inherit either marine or freshwater alleles at linked or possible "supergene" regions of the stickleback genome. Further study of these regions will help identify the molecular basis of both modular and coordinated changes in the vertebrate skeleton.