scispace - formally typeset
Search or ask a question

Showing papers in "Genetics in 2016"


Journal ArticleDOI
01 Feb 2016-Genetics
TL;DR: The results demonstrate that Rapture is a rapid and flexible technology capable of analyzing a very large number of individuals with minimal sequencing and library preparation cost and should improve the efficiency of genetic analysis for many aspects of agricultural, environmental, and biomedical science.
Abstract: Massively parallel sequencing has revolutionized many areas of biology, but sequencing large amounts of DNA in many individuals is cost-prohibitive and unnecessary for many studies. Genomic complexity reduction techniques such as sequence capture and restriction enzyme-based methods enable the analysis of many more individuals per unit cost. Despite their utility, current complexity reduction methods have limitations, especially when large numbers of individuals are analyzed. Here we develop a much improved restriction site-associated DNA (RAD) sequencing protocol and a new method called Rapture ( R: AD c APTURE: ). The new RAD protocol improves versatility by separating RAD tag isolation and sequencing library preparation into two distinct steps. This protocol also recovers more unique (nonclonal) RAD fragments, which improves both standard RAD and Rapture analysis. Rapture then uses an in-solution capture of chosen RAD tags to target sequencing reads to desired loci. Rapture combines the benefits of both RAD and sequence capture, i.e., very inexpensive and rapid library preparation for many individuals as well as high specificity in the number and location of genomic loci analyzed. Our results demonstrate that Rapture is a rapid and flexible technology capable of analyzing a very large number of individuals with minimal sequencing and library preparation cost. The methods presented here should improve the efficiency of genetic analysis for many aspects of agricultural, environmental, and biomedical science.

340 citations


Journal ArticleDOI
01 Jun 2016-Genetics
TL;DR: It is inferred that Neanderthals had at least 40% lower fitness than humans on average; this increased load predicts the reduction in Neanderthal introgression around genes without the need to invoke epistasis, and the Neanderthal admixture fraction could increase over time due to the protective effect of Neanderthal haplotypes against deleterious alleles that arose recently in the human population.
Abstract: Approximately 2-4% of genetic material in human populations outside Africa is derived from Neanderthals who interbred with anatomically modern humans. Recent studies have shown that this Neanderthal DNA is depleted around functional genomic regions; this has been suggested to be a consequence of harmful epistatic interactions between human and Neanderthal alleles. However, using published estimates of Neanderthal inbreeding and the distribution of mutational fitness effects, we infer that Neanderthals had at least 40% lower fitness than humans on average; this increased load predicts the reduction in Neanderthal introgression around genes without the need to invoke epistasis. We also predict a residual Neanderthal mutational load in non-Africans, leading to a fitness reduction of at least 0.5%. This effect of Neanderthal admixture has been left out of previous debate on mutation load differences between Africans and non-Africans. We also show that if many deleterious mutations are recessive, the Neanderthal admixture fraction could increase over time due to the protective effect of Neanderthal haplotypes against deleterious alleles that arose recently in the human population. This might partially explain why so many organisms retain gene flow from other species and appear to derive adaptive benefits from introgression.

330 citations


Journal ArticleDOI
01 Jul 2016-Genetics
TL;DR: The authors' increasingly molecular understanding of the assembly of the multi-enzyme replisomes that perform replication is divided into stages that occur at distinct phases of the cell cycle and their regulation is reviewed.
Abstract: The accurate and complete replication of genomic DNA is essential for all life. In eukaryotic cells, the assembly of the multi-enzyme replisomes that perform replication is divided into stages that occur at distinct phases of the cell cycle. Replicative DNA helicases are loaded around origins of DNA replication exclusively during G1 phase. The loaded helicases are then activated during S phase and associate with the replicative DNA polymerases and other accessory proteins. The function of the resulting replisomes is monitored by checkpoint proteins that protect arrested replisomes and inhibit new initiation when replication is inhibited. The replisome also coordinates nucleosome disassembly, assembly, and the establishment of sister chromatid cohesion. Finally, when two replisomes converge they are disassembled. Studies in Saccharomyces cerevisiae have led the way in our understanding of these processes. Here, we review our increasingly molecular understanding of these events and their regulation.

305 citations


Journal ArticleDOI
01 Mar 2016-Genetics
TL;DR: An overview of CRISPR-based strategies for genome editing in C. elegans is provided, including a discussion of which strategies are best suited to producing different kinds of targeted genome modifications.
Abstract: The advent of genome editing techniques based on the clustered regularly interspersed short palindromic repeats (CRISPR)-Cas9 system has revolutionized research in the biological sciences. CRISPR is quickly becoming an indispensible experimental tool for researchers using genetic model organisms, including the nematode Caenorhabditis elegans. Here, we provide an overview of CRISPR-based strategies for genome editing in C. elegans. We focus on practical considerations for successful genome editing, including a discussion of which strategies are best suited to producing different kinds of targeted genome modifications.

249 citations


Journal ArticleDOI
01 Apr 2016-Genetics
TL;DR: It is shown that the admixture tests can be interpreted as testing general properties of phylogenies, allowing extension of some ideas applications to arbitrary phylogenetic trees, and how population substructure complicates inference.
Abstract: Many questions about human genetic history can be addressed by examining the patterns of shared genetic variation between sets of populations. A useful methodological framework for this purpose is F-statistics that measure shared genetic drift between sets of two, three, and four populations and can be used to test simple and complex hypotheses about admixture between populations. This article provides context from phylogenetic and population genetic theory. I review how F-statistics can be interpreted as branch lengths or paths and derive new interpretations, using coalescent theory. I further show that the admixture tests can be interpreted as testing general properties of phylogenies, allowing extension of some ideas applications to arbitrary phylogenetic trees. The new results are used to investigate the behavior of the statistics under different models of population structure and show how population substructure complicates inference. The results lead to simplified estimators in many cases, and I recommend to replace F3 with the average number of pairwise differences for estimating population divergence.

179 citations


Journal ArticleDOI
01 Dec 2016-Genetics
TL;DR: New markers and gene annotation are described that are both tightly linked to Cr1 in a mapping population, and associated withCr1 in unrelated sugar pine individuals sampled throughout the species’ range, creating a solid foundation for future mapping.
Abstract: Until very recently, complete characterization of the megagenomes of conifers has remained elusive The diploid genome of sugar pine (Pinus lambertiana Dougl) has a highly repetitive, 31 billion bp genome It is the largest genome sequenced and assembled to date, and the first from the subgenus Strobus, or white pines, a group that is notable for having the largest genomes among the pines The genome represents a unique opportunity to investigate genome “obesity” in conifers and white pines Comparative analysis of P lambertiana and P taeda L reveals new insights on the conservation, age, and diversity of the highly abundant transposable elements, the primary factor determining genome size Like most North American white pines, the principal pathogen of P lambertiana is white pine blister rust (Cronartium ribicola JC Fischer ex Raben) Identification of candidate genes for resistance to this pathogen is of great ecological importance The genome sequence afforded us the opportunity to make substantial progress on locating the major dominant gene for simple resistance hypersensitive response, Cr1 We describe new markers and gene annotation that are both tightly linked to Cr1 in a mapping population, and associated with Cr1 in unrelated sugar pine individuals sampled throughout the species’ range, creating a solid foundation for future mapping This genomic variation and annotated candidate genes characterized in our study of the Cr1 region are resources for future marker-assisted breeding efforts as well as for investigations of fundamental mechanisms of invasive disease and evolutionary response

156 citations


Journal ArticleDOI
01 Apr 2016-Genetics
TL;DR: SapTrap vectors introduce the possibility for high-throughput generation of CRISPR/Cas9 genome modifications in animals by direct insertion of 3- to 4-kb tags at six different loci in 10–37% of injected animals.
Abstract: In principle, clustered regularly interspaced short palindromic repeats (CRISPR)/Cas9 allows genetic tags to be inserted at any locus. However, throughput is limited by the laborious construction of repair templates and guide RNA constructs and by the identification of modified strains. We have developed a reagent toolkit and plasmid assembly pipeline, called "SapTrap," that streamlines the production of targeting vectors for tag insertion, as well as the selection of modified Caenorhabditis elegans strains. SapTrap is a high-efficiency modular plasmid assembly pipeline that produces single plasmid targeting vectors, each of which encodes both a guide RNA transcript and a repair template for a particular tagging event. The plasmid is generated in a single tube by cutting modular components with the restriction enzyme SapI, which are then "trapped" in a fixed order by ligation to generate the targeting vector. A library of donor plasmids supplies a variety of protein tags, a selectable marker, and regulatory sequences that allow cell-specific tagging at either the N or the C termini. All site-specific sequences, such as guide RNA targeting sequences and homology arms, are supplied as annealed synthetic oligonucleotides, eliminating the need for PCR or molecular cloning during plasmid assembly. Each tag includes an embedded Cbr-unc-119 selectable marker that is positioned to allow concurrent expression of both the tag and the marker. We demonstrate that SapTrap targeting vectors direct insertion of 3- to 4-kb tags at six different loci in 10-37% of injected animals. Thus SapTrap vectors introduce the possibility for high-throughput generation of CRISPR/Cas9 genome modifications.

155 citations


Journal ArticleDOI
01 Nov 2016-Genetics
TL;DR: The accuracy of the resulting method for evolutionary prediction by simulation is demonstrated, and known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of these expressions.
Abstract: Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioral traits, have inherently nonnormal distributions. The generalized linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for nonnormal traits. However, whereas GLMMs provide inference on a statistically convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation and apply our approach to data from a wild pedigreed vertebrate population.

148 citations


Journal ArticleDOI
01 May 2016-Genetics
TL;DR: Analysis of pedigree and high-density SNP information in a wild population of Soay sheep showed that haplotypes associated with recombination rates are both old and globally distributed, suggesting a common genetic architecture of recombination rate variation in mammals.
Abstract: Meiotic recombination breaks down linkage disequilibrium (LD) and forms new haplotypes, meaning that it is an important driver of diversity in eukaryotic genomes. Understanding the causes of variation in recombination rate is important in interpreting and predicting evolutionary phenomena and in understanding the potential of a population to respond to selection. However, despite attention in model systems, there remains little data on how recombination rate varies at the individual level in natural populations. Here we used extensive pedigree and high-density SNP information in a wild population of Soay sheep (Ovis aries) to investigate the genetic architecture of individual autosomal recombination rates. Individual rates were high relative to other mammal systems and were higher in males than in females (autosomal map lengths of 3748 and 2860 cM, respectively). The heritability of autosomal recombination rate was low but significant in both sexes (h2 = 0.16 and 0.12 in females and males, respectively). In females, 46.7% of the heritable variation was explained by a subtelomeric region on chromosome 6; a genome-wide association study showed the strongest associations at locus RNF212, with further associations observed at a nearby ∼374-kb region of complete LD containing three additional candidate loci, CPLX1, GAK, and PCGF3. A second region on chromosome 7 containing REC8 and RNF212B explained 26.2% of the heritable variation in recombination rate in both sexes. Comparative analyses with 40 other sheep breeds showed that haplotypes associated with recombination rates are both old and globally distributed. Both regions have been implicated in rate variation in mice, cattle, and humans, suggesting a common genetic architecture of recombination rate variation in mammals.

146 citations


Journal ArticleDOI
01 Nov 2016-Genetics
TL;DR: It is argued that natural populations may experience the amount of recent positive selection required to skew inferences, and results suggest that demographic studies conducted in many species to date may have exaggerated the extent and frequency of population size changes.
Abstract: The availability of large-scale population genomic sequence data has resulted in an explosion in efforts to infer the demographic histories of natural populations across a broad range of organisms. As demographic events alter coalescent genealogies, they leave detectable signatures in patterns of genetic variation within and between populations. Accordingly, a variety of approaches have been designed to leverage population genetic data to uncover the footprints of demographic change in the genome. The vast majority of these methods make the simplifying assumption that the measures of genetic variation used as their input are unaffected by natural selection. However, natural selection can dramatically skew patterns of variation not only at selected sites, but at linked, neutral loci as well. Here we assess the impact of recent positive selection on demographic inference by characterizing the performance of three popular methods through extensive simulation of data sets with varying numbers of linked selective sweeps. In particular, we examined three different demographic models relevant to a number of species, finding that positive selection can bias parameter estimates of each of these models—often severely. We find that selection can lead to incorrect inferences of population size changes when none have occurred. Moreover, we show that linked selection can lead to incorrect demographic model selection, when multiple demographic scenarios are compared. We argue that natural populations may experience the amount of recent positive selection required to skew inferences. These results suggest that demographic studies conducted in many species to date may have exaggerated the extent and frequency of population size changes.

142 citations


Journal ArticleDOI
01 Jan 2016-Genetics
TL;DR: It is shown that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing, and finds that individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation.
Abstract: Comprehensive whole-genome structural variation detection is challenging with current approaches. With diploid cells as DNA source and the presence of numerous repetitive elements, short-read DNA sequencing cannot be used to detect structural variation efficiently. In this report, we show that genome mapping with long, fluorescently labeled DNA molecules imaged on nanochannel arrays can be used for whole-genome structural variation detection without sequencing. While whole-genome haplotyping is not achieved, local phasing (across >150-kb regions) is routine, as molecules from the parental chromosomes are examined separately. In one experiment, we generated genome maps from a trio from the 1000 Genomes Project, compared the maps against that derived from the reference human genome, and identified structural variations that are >5 kb in size. We find that these individuals have many more structural variants than those published, including some with the potential of disrupting gene function or regulation.

Journal ArticleDOI
01 Dec 2016-Genetics
TL;DR: The challenges and strategies of species tree inference for distantly related species when the molecular clock is violated are discussed, and the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods is highlighted.
Abstract: The multispecies coalescent (MSC) model has emerged as a powerful framework for inferring species phylogenies while accounting for ancestral polymorphism and gene tree-species tree conflict. A number of methods have been developed in the past few years to estimate the species tree under the MSC. The full likelihood methods (including maximum likelihood and Bayesian inference) average over the unknown gene trees and accommodate their uncertainties properly but involve intensive computation. The approximate or summary coalescent methods are computationally fast and are applicable to genomic datasets with thousands of loci, but do not make an efficient use of information in the multilocus data. Most of them take the two-step approach of reconstructing the gene trees for multiple loci by phylogenetic methods and then treating the estimated gene trees as observed data, without accounting for their uncertainties appropriately. In this article we review the statistical nature of the species tree estimation problem under the MSC, and explore the conceptual issues and challenges of species tree estimation by focusing mainly on simple cases of three or four closely related species. We use mathematical analysis and computer simulation to demonstrate that large differences in statistical performance may exist between the two classes of methods. We illustrate that several counterintuitive behaviors may occur with the summary methods but they are due to inefficient use of information in the data by summary methods and vanish when the data are analyzed using full-likelihood methods. These include (i) unidentifiability of parameters in the model, (ii) inconsistency in the so-called anomaly zone, (iii) singularity on the likelihood surface, and (iv) deterioration of performance upon addition of more data. We discuss the challenges and strategies of species tree inference for distantly related species when the molecular clock is violated, and highlight the need for improving the computational efficiency and model realism of the likelihood methods as well as the statistical efficiency of the summary methods.

Journal ArticleDOI
01 Aug 2016-Genetics
TL;DR: A comprehensive review of a wide array of biochemical, molecular genetic, cell biological, and genomics studies down to the “nuts and bolts” of silent chromatin and the processes that yield transcriptional silencing.
Abstract: Transcriptional silencing in Saccharomyces cerevisiae occurs at several genomic sites including the silent mating-type loci, telomeres, and the ribosomal DNA (rDNA) tandem array. Epigenetic silencing at each of these domains is characterized by the absence of nearly all histone modifications, including most prominently the lack of histone H4 lysine 16 acetylation. In all cases, silencing requires Sir2, a highly-conserved NAD(+)-dependent histone deacetylase. At locations other than the rDNA, silencing also requires additional Sir proteins, Sir1, Sir3, and Sir4 that together form a repressive heterochromatin-like structure termed silent chromatin. The mechanisms of silent chromatin establishment, maintenance, and inheritance have been investigated extensively over the last 25 years, and these studies have revealed numerous paradigms for transcriptional repression, chromatin organization, and epigenetic gene regulation. Studies of Sir2-dependent silencing at the rDNA have also contributed to understanding the mechanisms for maintaining the stability of repetitive DNA and regulating replicative cell aging. The goal of this comprehensive review is to distill a wide array of biochemical, molecular genetic, cell biological, and genomics studies down to the "nuts and bolts" of silent chromatin and the processes that yield transcriptional silencing.

Journal ArticleDOI
01 Sep 2016-Genetics
TL;DR: It is found that telomere-length variation does not correlate with offspring production or longevity in C. elegans wild isolates, suggesting that naturally long telomeres play a limited role in modifying fitness phenotypes in Cournorhabditis elegans.
Abstract: Telomeres are involved in the maintenance of chromosomes and the prevention of genome instability. Despite this central importance, significant variation in telomere length has been observed in a variety of organisms. The genetic determinants of telomere-length variation and their effects on organismal fitness are largely unexplored. Here, we describe natural variation in telomere length across the Caenorhabditis elegans species. We identify a large-effect variant that contributes to differences in telomere length. The variant alters the conserved oligonucleotide/oligosaccharide-binding fold of protection of telomeres 2 (POT-2), a homolog of a human telomere-capping shelterin complex subunit. Mutations within this domain likely reduce the ability of POT-2 to bind telomeric DNA, thereby increasing telomere length. We find that telomere-length variation does not correlate with offspring production or longevity in C. elegans wild isolates, suggesting that naturally long telomeres play a limited role in modifying fitness phenotypes in C. elegans.

Journal ArticleDOI
01 May 2016-Genetics
TL;DR: An overview of protein synthesis in the yeast Saccharomyces cerevisiae is provided with descriptions of the roles of translation initiation and elongation factors that assist the ribosome in binding the messenger RNA (mRNA), selecting the start codon, and synthesizing the polypeptide.
Abstract: In this review, we provide an overview of protein synthesis in the yeast Saccharomyces cerevisiae. The mechanism of protein synthesis is well conserved between yeast and other eukaryotes, and molecular genetic studies in budding yeast have provided critical insights into the fundamental process of translation as well as its regulation. The review focuses on the initiation and elongation phases of protein synthesis with descriptions of the roles of translation initiation and elongation factors that assist the ribosome in binding the messenger RNA (mRNA), selecting the start codon, and synthesizing the polypeptide. We also examine mechanisms of translational control highlighting the mRNA cap-binding proteins and the regulation of GCN4 and CPA1 mRNAs.

Journal ArticleDOI
01 May 2016-Genetics
TL;DR: A Bayesian method is developed to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in nonequilibrium populations, and it is found that ignoring a relevant demographic history can significantly bias the results of inference.
Abstract: The advent of accessible ancient DNA technology now allows the direct ascertainment of allele frequencies in ancestral populations, thereby enabling the use of allele frequency time series to detect and estimate natural selection. Such direct observations of allele frequency dynamics are expected to be more powerful than inferences made using patterns of linked neutral variation obtained from modern individuals. We developed a Bayesian method to make use of allele frequency time series data and infer the parameters of general diploid selection, along with allele age, in nonequilibrium populations. We introduce a novel path augmentation approach, in which we use Markov chain Monte Carlo to integrate over the space of allele frequency trajectories consistent with the observed data. Using simulations, we show that this approach has good power to estimate selection coefficients and allele age. Moreover, when applying our approach to data on horse coat color, we find that ignoring a relevant demographic history can significantly bias the results of inference. Our approach is made available in a C++ software package.

Journal ArticleDOI
26 Mar 2016-Genetics
TL;DR: Results indicate previously unidentified functions of this important transcription factor CreA in amino acid transport and nitrogen assimilation were observed, and serve as a basis for additional research in fungal carbon metabolism with the potential aim to improve fungal industrial applications.
Abstract: Carbon catabolite repression (CCR) is a process that selects the energetically most favorable carbon source in an environment. CCR represses the use of less favorable carbon sources when a better source is available. Glucose is the preferential carbon source for most microorganisms because it is rapidly metabolized, generating quick energy for growth. In the filamentous fungus Aspergillus nidulans, CCR is mediated by the transcription factor CreA, a C2H2 finger domain DNA-binding protein. The aim of this work was to investigate the regulation of CreA and characterize its functionally distinct protein domains. CreA depends in part on de novo protein synthesis and is regulated in part by ubiquitination. CreC, the scaffold protein in the CreB-CreC deubiquitination (DUB) complex, is essential for CreA function and stability. Deletion of select protein domains in CreA resulted in persistent nuclear localization and target gene repression. A region in CreA conserved between Aspergillus spp. and Trichoderma reesei was identified as essential for growth on various carbon, nitrogen, and lipid sources. In addition, a role of CreA in amino acid transport and nitrogen assimilation was observed. Taken together, these results indicate previously unidentified functions of this important transcription factor. These novel functions serve as a basis for additional research in fungal carbon metabolism with the potential aim to improve fungal industrial applications.

Journal ArticleDOI
01 Feb 2016-Genetics
TL;DR: A theory is presented of why a recursion of the inverse of the genomic relationship matrix works and its implication for other populations with small effective population size.
Abstract: Many computations with SNP data including genomic evaluation, parameter estimation, and genome-wide association studies use an inverse of the genomic relationship matrix. The cost of a regular inversion is cubic and is prohibitively expensive for large matrices. Recent studies in cattle demonstrated that the inverse can be computed in almost linear time by recursion on any subset of ∼10,000 individuals. The purpose of this study is to present a theory of why such a recursion works and its implication for other populations. Assume that, because of a small effective population size, the additive information in a genotyped population has a small dimensionality, even with a very large number of SNP markers. That dimensionality is visible as a limited number of effective SNP effects, independent chromosome segments, or the rank of the genomic relationship matrix. Decompose a population arbitrarily into core and noncore individuals, with the number of core individuals equal to that dimensionality. Then, breeding values of noncore individuals can be derived by recursions on breeding values of core individuals, with coefficients of the recursion computed from the genomic relationship matrix. A resulting algorithm for the inversion called “algorithm for proven and young” (APY) has a linear computing and memory cost for noncore animals. Noninfinitesimal genetic architecture can be accommodated through a trait-specific genomic relationship matrix, possibly derived from Bayesian regressions. For populations with small effective population size, the inverse of the genomic relationship matrix can be computed inexpensively for a very large number of genotyped individuals.

Journal ArticleDOI
01 Sep 2016-Genetics
TL;DR: The value of sorghum as a functional model for candidate gene discovery for bioenergy Andropogoneae is demonstrated by performing genome-wide association analysis for two contrasting phenotypes representing key components of structural and non-structural carbohydrates.
Abstract: With high productivity and stress tolerance, numerous grass genera of the Andropogoneae have emerged as candidates for bioenergy production. To optimize these candidates, research examining the genetic architecture of yield, carbon partitioning, and composition is required to advance breeding objectives. Significant progress has been made developing genetic and genomic resources for Andropogoneae, and advances in comparative and computational genomics have enabled research examining the genetic basis of photosynthesis, carbon partitioning, composition, and sink strength. To provide a pivotal resource aimed at developing a comparative understanding of key bioenergy traits in the Andropogoneae, we have established and characterized an association panel of 390 racially, geographically, and phenotypically diverse Sorghum bicolor accessions with 232,303 genetic markers. Sorghum bicolor was selected because of its genomic simplicity, phenotypic diversity, significant genomic tools, and its agricultural productivity and resilience. We have demonstrated the value of sorghum as a functional model for candidate gene discovery for bioenergy Andropogoneae by performing genome-wide association analysis for two contrasting phenotypes representing key components of structural and non-structural carbohydrates. We identified potential genes, including a cellulase enzyme and a vacuolar transporter, associated with increased non-structural carbohydrates that could lead to bioenergy sorghum improvement. Although our analysis identified genes with potentially clear functions, other candidates did not have assigned functions, suggesting novel molecular mechanisms for carbon partitioning traits. These results, combined with our characterization of phenotypic and genetic diversity and the public accessibility of each accession and genomic data, demonstrate the value of this resource and provide a foundation for future improvement of sorghum and related grasses for bioenergy production.

Journal ArticleDOI
01 May 2016-Genetics
TL;DR: This study suggests that positive selection is less pervasive in these butterflies as compared to fruit flies, a fact that curiously results in very similar levels of neutral diversity in these very different insects.
Abstract: A combination of selective and neutral evolutionary forces shape patterns of genetic diversity in nature. Among the insects, most previous analyses of the roles of drift and selection in shaping variation across the genome have focused on the genus Drosophila. A more complete understanding of these forces will come from analyzing other taxa that differ in population demography and other aspects of biology. We have analyzed diversity and signatures of selection in the neotropical Heliconius butterflies using resequenced genomes from 58 wild-caught individuals of Heliconius melpomene and another 21 resequenced genomes representing 11 related species. By comparing intraspecific diversity and interspecific divergence, we estimate that 31% of amino acid substitutions between Heliconius species are adaptive. Diversity at putatively neutral sites is negatively correlated with the local density of coding sites as well as nonsynonymous substitutions and positively correlated with recombination rate, indicating widespread linked selection. This process also manifests in significantly reduced diversity on longer chromosomes, consistent with lower recombination rates. Although hitchhiking around beneficial nonsynonymous mutations has significantly shaped genetic variation in H. melpomene, evidence for strong selective sweeps is limited overall. We did however identify two regions where distinct haplotypes have swept in different populations, leading to increased population differentiation. On the whole, our study suggests that positive selection is less pervasive in these butterflies as compared to fruit flies, a fact that curiously results in very similar levels of neutral diversity in these very different insects.

Journal ArticleDOI
01 Aug 2016-Genetics
TL;DR: The GFBLUP model using prior information on genomic features enriched for causal variants can increase the accuracy of genomic predictions in populations of unrelated individuals and provides a formal statistical framework for leveraging and evaluating information across multiple experimental studies to provide novel insights into the genetic architecture of complex traits.
Abstract: Predicting individual quantitative trait phenotypes from high-resolution genomic polymorphism data is important for personalized medicine in humans, plant and animal breeding, and adaptive evolution. However, this is difficult for populations of unrelated individuals when the number of causal variants is low relative to the total number of polymorphisms and causal variants individually have small effects on the traits. We hypothesized that mapping molecular polymorphisms to genomic features such as genes and their gene ontology categories could increase the accuracy of genomic prediction models. We developed a genomic feature best linear unbiased prediction (GFBLUP) model that implements this strategy and applied it to three quantitative traits (startle response, starvation resistance, and chill coma recovery) in the unrelated, sequenced inbred lines of the Drosophila melanogaster Genetic Reference Panel. Our results indicate that subsetting markers based on genomic features increases the predictive ability relative to the standard genomic best linear unbiased prediction (GBLUP) model. Both models use all markers, but GFBLUP allows differential weighting of the individual genetic marker relationships, whereas GBLUP weighs the genetic marker relationships equally. Simulation studies show that it is possible to further increase the accuracy of genomic prediction for complex traits using this model, provided the genomic features are enriched for causal variants. Our GFBLUP model using prior information on genomic features enriched for causal variants can increase the accuracy of genomic predictions in populations of unrelated individuals and provides a formal statistical framework for leveraging and evaluating information across multiple experimental studies to provide novel insights into the genetic architecture of complex traits.

Journal ArticleDOI
01 Aug 2016-Genetics
TL;DR: The HACK technique is robust and readily adaptable for targeting and replacement of other genomic sequences, and could be a useful approach to repurpose existing transgenes as new genetic reagents become available.
Abstract: Gene conversions occur when genomic double-strand DNA breaks (DSBs) trigger unidirectional transfer of genetic material from a homologous template sequence. Exogenous or mutated sequence can be introduced through this homology-directed repair (HDR). We leveraged gene conversion to develop a method for genomic editing of existing transgenic insertions in Drosophila melanogaster . The clustered regularly-interspaced palindromic repeats (CRISPR)/Cas9 system is used in the h omology a ssisted C RISPR k nock-in (HACK) method to induce DSBs in a GAL4 transgene, which is repaired by a single-genomic transgenic construct containing GAL4 homologous sequences flanking a T2A-QF2 cassette. With two crosses, this technique converts existing GAL4 lines, including enhancer traps, into functional QF2 expressing lines. We used HACK to convert the most commonly-used GAL4 lines (labeling tissues such as neurons, fat, glia, muscle, and hemocytes) to QF2 lines. We also identified regions of the genome that exhibited differential efficiencies of HDR. The HACK technique is robust and readily adaptable for targeting and replacement of other genomic sequences, and could be a useful approach to repurpose existing transgenes as new genetic reagents become available.

Journal ArticleDOI
01 Jun 2016-Genetics
TL;DR: This review focuses on nonsynonymous variant prediction with two aims in mind: to review the prioritization methods that have been developed to date and the principles on which they are based and to discuss the challenges to further improving these methods.
Abstract: As personal genome sequencing becomes a reality, understanding the effects of genetic variants on phenotype-particularly the impact of germline variants on disease risk and the impact of somatic variants on cancer development and treatment-continues to increase in importance. Because of their clear potential for affecting phenotype, nonsynonymous genetic variants (variants that cause a change in the amino acid sequence of a protein encoded by a gene) have long been the target of efforts to predict the effects of genetic variation. Whole-genome sequencing is identifying large numbers of nonsynonymous variants in each genome, intensifying the need for computational methods that accurately predict which of these are likely to impact disease phenotypes. This review focuses on nonsynonymous variant prediction with two aims in mind: (1) to review the prioritization methods that have been developed to date and the principles on which they are based and (2) to discuss the challenges to further improving these methods.

Journal ArticleDOI
01 May 2016-Genetics
TL;DR: It is found that unlike COs, NCOs are insensitive to the centromere effect and do not demonstrate interference, which has multiple implications for the understanding of how meiotic recombination is regulated to ensure proper chromosome segregation and maintain genome stability.
Abstract: A century of genetic analysis has revealed that multiple mechanisms control the distribution of meiotic crossover events. In Drosophila melanogaster, two significant positional controls are interference and the strongly polar centromere effect. Here, we assess the factors controlling the distribution of crossovers (COs) and noncrossover gene conversions (NCOs) along all five major chromosome arms in 196 single meiotic divisions to generate a more detailed understanding of these controls on a genome-wide scale. Analyzing the outcomes of single meiotic events allows us to distinguish among different classes of meiotic recombination. In so doing, we identified 291 NCOs spread uniformly among the five major chromosome arms and 541 COs (including 52 double crossovers and one triple crossover). We find that unlike COs, NCOs are insensitive to the centromere effect and do not demonstrate interference. Although the positions of COs appear to be determined predominately by the long-range influences of interference and the centromere effect, each chromosome may display a different pattern of sensitivity to interference, suggesting that interference may not be a uniform global property. In addition, unbiased sequencing of a large number of individuals allows us to describe the formation of de novo copy number variants, the majority of which appear to be mediated by unequal crossing over between transposable elements. This work has multiple implications for our understanding of how meiotic recombination is regulated to ensure proper chromosome segregation and maintain genome stability.

Journal ArticleDOI
01 Aug 2016-Genetics
TL;DR: It is found that TI can be used to obtain estimates of the model evidence that are more accurate and precise than those based on heuristics, and estimates of K based on these values are found to be more reliable than thosebased on a suite of model comparison statistics.
Abstract: A key quantity in the analysis of structured populations is the parameter K, which describes the number of subpopulations that make up the total population. Inference of K ideally proceeds via the model evidence, which is equivalent to the likelihood of the model. However, the evidence in favor of a particular value of K cannot usually be computed exactly, and instead programs such as Structure make use of heuristic estimators to approximate this quantity. We show-using simulated data sets small enough that the true evidence can be computed exactly-that these heuristics often fail to estimate the true evidence and that this can lead to incorrect conclusions about K Our proposed solution is to use thermodynamic integration (TI) to estimate the model evidence. After outlining the TI methodology we demonstrate the effectiveness of this approach, using a range of simulated data sets. We find that TI can be used to obtain estimates of the model evidence that are more accurate and precise than those based on heuristics. Furthermore, estimates of K based on these values are found to be more reliable than those based on a suite of model comparison statistics. Finally, we test our solution in a reanalysis of a white-footed mouse data set. The TI methodology is implemented for models both with and without admixture in the software MavericK1.0.

Journal ArticleDOI
01 Sep 2016-Genetics
TL;DR: It is estimated that the southern Kalahari populations were among the last to experience gene flow from Bantu speakers, ∼14 generations ago, and it is concluded that local adoption of pastoralism appears to have been primarily a cultural process with limited genetic impact from eastern Africa.
Abstract: Recent genetic studies have established that the KhoeSan populations of southern Africa are distinct from all other African populations and have remained largely isolated during human prehistory until ∼2000 years ago. Dozens of different KhoeSan groups exist, belonging to three different language families, but very little is known about their population history. We examine new genome-wide polymorphism data and whole mitochondrial genomes for >100 South Africans from the ≠Khomani San and Nama populations of the Northern Cape, analyzed in conjunction with 19 additional southern African populations. Our analyses reveal fine-scale population structure in and around the Kalahari Desert. Surprisingly, this structure does not always correspond to linguistic or subsistence categories as previously suggested, but rather reflects the role of geographic barriers and the ecology of the greater Kalahari Basin. Regardless of subsistence strategy, the indigenous Khoe-speaking Nama pastoralists and the N|u-speaking ≠Khomani (formerly hunter-gatherers) share ancestry with other Khoe-speaking forager populations that form a rim around the Kalahari Desert. We reconstruct earlier migration patterns and estimate that the southern Kalahari populations were among the last to experience gene flow from Bantu speakers, ∼14 generations ago. We conclude that local adoption of pastoralism, at least by the Nama, appears to have been primarily a cultural process with limited genetic impact from eastern Africa.

Journal ArticleDOI
01 Feb 2016-Genetics
TL;DR: 3-population composite likelihood ratio - outperforms XP-CLR when testing for selection that occurred before two populations split from each other and can distinguish between those events and events that occurred specifically in each of the populations after the split.
Abstract: A powerful way to detect selection in a population is by modeling local allele frequency changes in a particular region of the genome under scenarios of selection and neutrality and finding which model is most compatible with the data. A previous method based on a cross-population composite likelihood ratio (XP-CLR) uses an outgroup population to detect departures from neutrality that could be compatible with hard or soft sweeps, at linked sites near a beneficial allele. However, this method is most sensitive to recent selection and may miss selective events that happened a long time ago. To overcome this, we developed an extension of XP-CLR that jointly models the behavior of a selected allele in a three-population tree. Our method - called "3-population composite likelihood ratio" (3P-CLR) - outperforms XP-CLR when testing for selection that occurred before two populations split from each other and can distinguish between those events and events that occurred specifically in each of the populations after the split. We applied our new test to population genomic data from the 1000 Genomes Project, to search for selective sweeps that occurred before the split of Yoruba and Eurasians, but after their split from Neanderthals, and that could have led to the spread of modern-human-specific phenotypes. We also searched for sweep events that occurred in East Asians, Europeans, and the ancestors of both populations, after their split from Yoruba. In both cases, we are able to confirm a number of regions identified by previous methods and find several new candidates for selection in recent and ancient times. For some of these, we also find suggestive functional mutations that may have driven the selective events.

Journal ArticleDOI
Michael Lynch1
01 Mar 2016-Genetics
TL;DR: Although the human germline mutation rate is higher than that in any other well-studied species, the rate is not exceptional once the effective genome size and effective population size are taken into consideration.
Abstract: Although the human germline mutation rate is higher than that in any other well-studied species, the rate is not exceptional once the effective genome size and effective population size are taken into consideration. Human somatic mutation rates are substantially elevated above those in the germline, but this is also seen in other species. What is exceptional about humans is the recent detachment from the challenges of the natural environment and the ability to modify phenotypic traits in ways that mitigate the fitness effects of mutations, e.g., precision and personalized medicine. This results in a relaxation of selection against mildly deleterious mutations, including those magnifying the mutation rate itself. The long-term consequence of such effects is an expected genetic deterioration in the baseline human condition, potentially measurable on the timescale of a few generations in westernized societies, and because the brain is a particularly large mutational target, this is of particular concern. Ultimately, the price will have to be covered by further investment in various forms of medical intervention. Resolving the uncertainties of the magnitude and timescale of these effects will require the establishment of stable, standardized, multigenerational measurement procedures for various human traits.

Journal ArticleDOI
01 Jan 2016-Genetics
TL;DR: The results demonstrate thatMitonuclear epistases are context dependent, suggesting the selective pressure acting on mitonuclear genotypes may vary with food environment in a genotype-specific manner.
Abstract: Mitochondrial (mtDNA) and nuclear genes have to operate in a coordinated manner to maintain organismal function, and the regulation of this homeostasis presents a substantial source of potential epistatic (G × G) interactions. How these interactions shape the fitness landscape is poorly understood. Here we developed a novel mitonuclear epistasis model, using selected strains of the Drosophila Genetic Reference Panel (DGRP) and mitochondrial genomes from within Drosophila melanogaster and D. simulans to test the hypothesis that mtDNA × nDNA interactions influence fitness. In total we built 72 genotypes (12 nuclear backgrounds × 6 mtDNA haplotypes, with 3 from each species) to dissect the relationship between genotype and phenotype. Each genotype was assayed on four food environments. We found considerable variation in several phenotypes, including development time and egg-to-adult viability, and this variation was partitioned into genetic (G), environmental (E), and higher-order (G × G, G × E, and G × G × E) components. Food type had a significant impact on development time and also modified mitonuclear epistases, evidencing a broad spectrum of G × G × E across these genotypes. Nuclear background effects were substantial, followed by mtDNA effects and their G × G interaction. The species of mtDNA haplotype had negligible effects on phenotypic variation and there was no evidence that mtDNA variation has different effects on male and female fitness traits. Our results demonstrate that mitonuclear epistases are context dependent, suggesting the selective pressure acting on mitonuclear genotypes may vary with food environment in a genotype-specific manner.

Journal ArticleDOI
01 May 2016-Genetics
TL;DR: Significant associations between a single nucleotide polymorphism (SNP) in CIRBP, transcript levels in embryonic gonads during specification of gonad fate, and sex in hatchlings from a thermal regime that produces mixed sex ratios strongly suggest that CIR BP is involved in determining the fate of the bipotential gonad.
Abstract: Temperature-dependent sex determination (TSD) was described nearly 50 years ago. Researchers have since identified many genes that display differential expression at male- vs. female-producing temperatures. Yet, it is unclear whether these genes (1) are involved in sex determination per se, (2) are downstream effectors involved in differentiation of ovaries and testes, or (3) are thermo-sensitive but unrelated to gonad development. Here we present multiple lines of evidence linking CIRBP to sex determination in the snapping turtle, Chelydra serpentina We demonstrate significant associations between a single nucleotide polymorphism (SNP) (c63A > C) in CIRBP, transcript levels in embryonic gonads during specification of gonad fate, and sex in hatchlings from a thermal regime that produces mixed sex ratios. The A allele was induced in embryos exposed to a female-producing temperature, while expression of the C allele did not differ between female- and male-producing temperatures. In accord with this pattern of temperature-dependent, allele-specific expression, AA homozygotes were more likely to develop ovaries than AC heterozygotes, which, in turn, were more likely to develop ovaries than CC homozygotes. Multiple regression using SNPs in CIRBP and adjacent loci suggests that c63A > C may be the causal variant or closely linked to it. Differences in CIRBP allele frequencies among turtles from northern Minnesota, southern Minnesota, and Texas reflect small and large-scale latitudinal differences in TSD pattern. Finally, analysis of CIRBP protein localization reveals that CIRBP is in a position to mediate temperature effects on the developing gonads. Together, these studies strongly suggest that CIRBP is involved in determining the fate of the bipotential gonad.