scispace - formally typeset
Search or ask a question

Showing papers in "Genetics in 2013"


Journal ArticleDOI
01 Mar 2013-Genetics
TL;DR: The current state of knowledge of the lncRNA field is reviewed, discussing what is known about the genomic contexts, biological functions, and mechanisms of action of lncRNAs and how this interest is deeply rooted in biology's longstanding concern with the evolution and function of genomes.
Abstract: Long noncoding RNAs (lncRNAs) have gained widespread attention in recent years as a potentially new and crucial layer of biological regulation. lncRNAs of all kinds have been implicated in a range of developmental processes and diseases, but knowledge of the mechanisms by which they act is still surprisingly limited, and claims that almost the entirety of the mammalian genome is transcribed into functional noncoding transcripts remain controversial. At the same time, a small number of well-studied lncRNAs have given us important clues about the biology of these molecules, and a few key functional and mechanistic themes have begun to emerge, although the robustness of these models and classification schemes remains to be seen. Here, we review the current state of knowledge of the lncRNA field, discussing what is known about the genomic contexts, biological functions, and mechanisms of action of lncRNAs. We also reflect on how the recent interest in lncRNAs is deeply rooted in biology’s longstanding concern with the evolution and function of genomes.

1,582 citations


Journal ArticleDOI
01 Jan 2013-Genetics
TL;DR: A bacterial CRISPR RNA/Cas9 system is adapted to precisely engineer the Drosophila genome and it is reported that Cas9-mediated genomic modifications are efficiently transmitted through the germline.
Abstract: We have adapted a bacterial CRISPR RNA/Cas9 system to precisely engineer the Drosophila genome and report that Cas9-mediated genomic modifications are efficiently transmitted through the germline. This RNA-guided Cas9 system can be rapidly programmed to generate targeted alleles for probing gene function in Drosophila.

1,067 citations


Journal ArticleDOI
01 Feb 2013-Genetics
TL;DR: An overview of available methods for implementing parametric WGR models is provided, selected topics that emerge in applications are discussed, and a general discussion of lessons learned from simulation and empirical data analysis in the last decade are presented.
Abstract: Genomic-enabled prediction is becoming increasingly important in animal and plant breeding and is also receiving attention in human genetics. Deriving accurate predictions of complex traits requires implementing whole-genome regression (WGR) models where phenotypes are regressed on thousands of markers concurrently. Methods exist that allow implementing these large-p with small-n regressions, and genome-enabled selection (GS) is being implemented in several plant and animal breeding programs. The list of available methods is long, and the relationships between them have not been fully addressed. In this article we provide an overview of available methods for implementing parametric WGR models, discuss selected topics that emerge in applications, and present a general discussion of lessons learned from simulation and empirical data analysis in the last decade.

741 citations


Journal ArticleDOI
01 Nov 2013-Genetics
TL;DR: Yeast ribosome biogenesis provide useful models for ribosomopathies, diseases in humans that result from failure to properly assemble ribosomes.
Abstract: Ribosomes are highly conserved ribonucleoprotein nanomachines that translate information in the genome to create the proteome in all cells. In yeast these complex particles contain four RNAs (>5400 nucleotides) and 79 different proteins. During the past 25 years, studies in yeast have led the way to understanding how these molecules are assembled into ribosomes in vivo. Assembly begins with transcription of ribosomal RNA in the nucleolus, where the RNA then undergoes complex pathways of folding, coupled with nucleotide modification, removal of spacer sequences, and binding to ribosomal proteins. More than 200 assembly factors and 76 small nucleolar RNAs transiently associate with assembling ribosomes, to enable their accurate and efficient construction. Following export of preribosomes from the nucleus to the cytoplasm, they undergo final stages of maturation before entering the pool of functioning ribosomes. Elaborate mechanisms exist to monitor the formation of correct structural and functional neighborhoods within ribosomes and to destroy preribosomes that fail to assemble properly. Studies of yeast ribosome biogenesis provide useful models for ribosomopathies, diseases in humans that result from failure to properly assemble ribosomes.

646 citations


Journal ArticleDOI
01 Nov 2013-Genetics
TL;DR: A simple yet extremely efficient platform for systematic gene targeting by the RNA-guided endonuclease Cas9 in Drosophila, which demonstrates rapid generation of mutants in seven neuropeptide and two microRNA genes in which no mutants have been described.
Abstract: We report a simple yet extremely efficient platform for systematic gene targeting by the RNA-guided endonuclease Cas9 in Drosophila. The system comprises two transgenic strains: one expressing Cas9 protein from the germline-specific nanos promoter and the other ubiquitously expressing a custom guide RNA (gRNA) that targets a unique site in the genome. The two strains are crossed to form an active Cas9-gRNA complex specifically in germ cells, which cleaves and mutates the target site. We demonstrate rapid generation of mutants in seven neuropeptide and two microRNA genes in which no mutants have been described. Founder animals stably expressing Cas9-gRNA transmitted germline mutations to an average of 60% of their progeny, a dramatic improvement in efficiency over the previous methods based on transient Cas9 expression. Simultaneous cleavage of two sites by co-expression of two gRNAs efficiently induced internal deletion with frequencies of 4.3-23%. Our method is readily scalable to high-throughput gene targeting, thereby accelerating comprehensive functional annotation of the Drosophila genome.

567 citations


Journal ArticleDOI
01 Jun 2013-Genetics
TL;DR: Refined IBD allows for IBD reporting on a haplotype level, which facilitates determination of multi-individual IBD and allows for haplotype-based downstream analyses and is implemented in Beagle version 4.
Abstract: Segments of indentity-by-descent (IBD) detected from high-density genetic data are useful for many applications, including long-range phase determination, phasing family data, imputation, IBD mapping, and heritability analysis in founder populations. We present Refined IBD, a new method for IBD segment detection. Refined IBD achieves both computational efficiency and highly accurate IBD segment reporting by searching for IBD in two steps. The first step (identification) uses the GERMLINE algorithm to find shared haplotypes exceeding a length threshold. The second step (refinement) evaluates candidate segments with a probabilistic approach to assess the evidence for IBD. Like GERMLINE, Refined IBD allows for IBD reporting on a haplotype level, which facilitates determination of multi-individual IBD and allows for haplotype-based downstream analyses. To investigate the properties of Refined IBD, we simulate SNP data from a model with recent superexponential population growth that is designed to match United Kingdom data. The simulation results show that Refined IBD achieves a better power/accuracy profile than fastIBD or GERMLINE. We find that a single run of Refined IBD achieves greater power than 10 runs of fastIBD. We also apply Refined IBD to SNP data for samples from the United Kingdom and from Northern Finland and describe the IBD sharing in these data sets. Refined IBD is powerful, highly accurate, and easy to use and is implemented in Beagle version 4.

524 citations


Journal ArticleDOI
01 Jan 2013-Genetics
TL;DR: In this article, the authors developed a set of methods to robustly test for unusual allele frequency patterns and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv.
Abstract: Comparing allele frequencies among populations that differ in environment has long been a tool for detecting loci involved in local adaptation. However, such analyses are complicated by an imperfect knowledge of population allele frequencies and neutral correlations of allele frequencies among populations due to shared population history and gene flow. Here we develop a set of methods to robustly test for unusual allele frequency patterns and correlations between environmental variables and allele frequencies while accounting for these complications based on a Bayesian model previously implemented in the software Bayenv. Using this model, we calculate a set of “standardized allele frequencies” that allows investigators to apply tests of their choice to multiple populations while accounting for sampling and covariance due to population history. We illustrate this first by showing that these standardized frequencies can be used to detect nonparametric correlations with environmental variables; these correlations are also less prone to spurious results due to outlier populations. We then demonstrate how these standardized allele frequencies can be used to construct a test to detect SNPs that deviate strongly from neutral population structure. This test is conceptually related to FST and is shown to be more powerful, as we account for population history. We also extend the model to next-generation sequencing of population pools—a cost-efficient way to estimate population allele frequencies, but one that introduces an additional level of sampling noise. The utility of these methods is demonstrated in simulations and by reanalyzing human SNP data from the Human Genome Diversity Panel populations and pooled next-generation sequencing data from Atlantic herring. An implementation of our method is available from http://gcbias.org.

523 citations


Journal ArticleDOI
01 Nov 2013-Genetics
TL;DR: A new method for inferring an individual’s ancestry that takes the uncertainty introduced in next generation sequencing data into account is presented by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes.
Abstract: Inference of population structure and individual ancestry is important both for population genetics and for association studies. With next generation sequencing technologies it is possible to obtain genetic data for all accessible genetic variations in the genome. Existing methods for admixture analysis rely on known genotypes. However, individual genotypes cannot be inferred from low-depth sequencing data without introducing errors. This article presents a new method for inferring an individual’s ancestry that takes the uncertainty introduced in next generation sequencing data into account. This is achieved by working directly with genotype likelihoods that contain all relevant information of the unobserved genotypes. Using simulations as well as publicly available sequencing data, we demonstrate that the presented method has great accuracy even for very low-depth data. At the same time, we demonstrate that applying existing methods to genotypes called from the same data can introduce severe biases. The presented method is implemented in the NGSadmix software available at http://www.popgen.dk/software.

471 citations


Journal ArticleDOI
01 Apr 2013-Genetics
TL;DR: In this paper, a new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance, which can be used to infer mixture proportions as well as dates with fewer constraints on reference populations.
Abstract: Long-range migrations and the resulting admixtures between populations have been important forces shaping human genetic diversity. Most existing methods for detecting and reconstructing historical admixture events are based on allele frequency divergences or patterns of ancestry segments in chromosomes of admixed individuals. An emerging new approach harnesses the exponential decay of admixture-induced linkage disequilibrium (LD) as a function of genetic distance. Here, we comprehensively develop LD-based inference into a versatile tool for investigating admixture. We present a new weighted LD statistic that can be used to infer mixture proportions as well as dates with fewer constraints on reference populations than previous methods. We define an LD-based three-population test for admixture and identify scenarios in which it can detect admixture events that previous formal tests cannot. We further show that we can uncover phylogenetic relationships among populations by comparing weighted LD curves obtained using a suite of references. Finally, we describe several improvements to the computation and fitting of weighted LD curves that greatly increase the robustness and speed of the calculations. We implement all of these advances in a software package, ALDER, which we validate in simulations and apply to test for admixture among all populations from the Human Genome Diversity Project (HGDP), highlighting insights into the admixture history of Central African Pygmies, Sardinians, and Japanese.

414 citations


Journal ArticleDOI
01 Sep 2013-Genetics
TL;DR: The Cas9/gRNA system is an attractive tool for rapid disruption of essentially any gene in Drosophila through targeting seven loci and achieving germline efficiency of up to 100%.
Abstract: We report that Cas9/gRNA mediates efficient genetic modifications in Drosophila. Through targeting seven loci, we achieved a germline efficiency of up to 100%. Genes in both heterochromatin and euchromatin can be modified efficiently. Thus the Cas9/gRNA system is an attractive tool for rapid disruption of essentially any gene in Drosophila.

373 citations


Journal ArticleDOI
01 Jul 2013-Genetics
TL;DR: It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.
Abstract: Whole-genome enabled prediction of complex traits has received enormous attention in animal and plant breeding and is making inroads into human and even Drosophila genetics. The term “Bayesian alphabet” denotes a growing number of letters of the alphabet used to denote various Bayesian linear regressions that differ in the priors adopted, while sharing the same sampling model. We explore the role of the prior distribution in whole-genome regression models for dissecting complex traits in what is now a standard situation with genomic data where the number of unknown parameters (p) typically exceeds sample size (n). Members of the alphabet aim to confront this overparameterization in various manners, but it is shown here that the prior is always influential, unless n ≫ p. This happens because parameters are not likelihood identified, so Bayesian learning is imperfect. Since inferences are not devoid of the influence of the prior, claims about genetic architecture from these methods should be taken with caution. However, all such procedures may deliver reasonable predictions of complex traits, provided that some parameters (“tuning knobs”) are assessed via a properly conducted cross-validation. It is concluded that members of the alphabet have a room in whole-genome prediction of phenotypes, but have somewhat doubtful inferential value, at least when sample size is such that n ≪ p.

Journal ArticleDOI
01 Feb 2013-Genetics
TL;DR: Simulation procedures, validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data are reviewed, concluding that no single method can serve as a benchmark for genomic prediction.
Abstract: The genomic prediction of phenotypes and breeding values in animals and plants has developed rapidly into its own research field. Results of genomic prediction studies are often difficult to compare because data simulation varies, real or simulated data are not fully described, and not all relevant results are reported. In addition, some new methods have been compared only in limited genetic architectures, leading to potentially misleading conclusions. In this article we review simulation procedures, discuss validation and reporting of results, and apply benchmark procedures for a variety of genomic prediction methods in simulated and real example data. Plant and animal breeding programs are being transformed by the use of genomic data, which are becoming widely available and cost-effective to predict genetic merit. A large number of genomic prediction studies have been published using both simulated and real data. The relative novelty of this area of research has made the development of scientific conventions difficult with regard to description of the real data, simulation of genomes, validation and reporting of results, and forward in time methods. In this review article we discuss the generation of simulated genotype and phenotype data, using approaches such as the coalescent and forward in time simulation. We outline ways to validate simulated data and genomic prediction results, including cross-validation. The accuracy and bias of genomic prediction are highlighted as performance indicators that should be reported. We suggest that a measure of relatedness between the reference and validation individuals be reported, as its impact on the accuracy of genomic prediction is substantial. A large number of methods were compared in example simulated and real (pine and wheat) data sets, all of which are publicly available. In our limited simulations, most methods performed similarly in traits with a large number of quantitative trait loci (QTL), whereas in traits with fewer QTL variable selection did have some advantages. In the real data sets examined here all methods had very similar accuracies. We conclude that no single method can serve as a benchmark for genomic prediction. We recommend comparing accuracy and bias of new methods to results from genomic best linear prediction and a variable selection approach (e.g., BayesB), because, together, these methods are appropriate for a range of genetic architectures. An accompanying article in this issue provides a comprehensive review of genomic prediction methods and discusses a selection of topics related to application of genomic prediction in plants and animals.

Journal ArticleDOI
01 Jun 2013-Genetics
TL;DR: Many aspects of autophagy are conserved from yeast to human; in particular, this applies to the gene products mediating these pathways as well as some of the signaling cascades regulating it, so that the information the authors relate is relevant to higher eukaryotes.
Abstract: Autophagy refers to a group of processes that involve degradation of cytoplasmic components including cytosol, macromolecular complexes, and organelles, within the vacuole or the lysosome of higher eukaryotes. The various types of autophagy have attracted increasing attention for at least two reasons. First, autophagy provides a compelling example of dynamic rearrangements of subcellular membranes involving issues of protein trafficking and organelle identity, and thus it is fascinating for researchers interested in questions pertinent to basic cell biology. Second, autophagy plays a central role in normal development and cell homeostasis, and, as a result, autophagic dysfunctions are associated with a range of illnesses including cancer, diabetes, myopathies, some types of neurodegeneration, and liver and heart diseases. That said, this review focuses on autophagy in yeast. Many aspects of autophagy are conserved from yeast to human; in particular, this applies to the gene products mediating these pathways as well as some of the signaling cascades regulating it, so that the information we relate is relevant to higher eukaryotes. Indeed, as with many cellular pathways, the initial molecular insights were made possible due to genetic studies in Saccharomyces cerevisiae and other fungi.

Journal ArticleDOI
01 Jul 2013-Genetics
TL;DR: It is speculated that modeling polygenic effects via pedigree relationships jointly with genomic breeding values using Bayesian methods may prevent that decline in accuracy of GEBVs based on additive-genetic relationships, and genomic BLUP cannot capture short-range LD information well.
Abstract: Genomic best linear unbiased prediction (BLUP) is a statistical method that uses relationships between individuals calculated from single-nucleotide polymorphisms (SNPs) to capture relationships at quantitative trait loci (QTL) We show that genomic BLUP exploits not only linkage disequilibrium (LD) and additive-genetic relationships, but also cosegregation to capture relationships at QTL Simulations were used to study the contributions of those types of information to accuracy of genomic estimated breeding values (GEBVs), their persistence over generations without retraining, and their effect on the correlation of GEBVs within families We show that accuracy of GEBVs based on additive-genetic relationships can decline with increasing training data size and speculate that modeling polygenic effects via pedigree relationships jointly with genomic breeding values using Bayesian methods may prevent that decline Cosegregation information from half sibs contributes little to accuracy of GEBVs in current dairy cattle breeding schemes but from full sibs it contributes considerably to accuracy within family in corn breeding Cosegregation information also declines with increasing training data size, and its persistence over generations is lower than that of LD, suggesting the need to model LD and cosegregation explicitly The correlation between GEBVs within families depends largely on additive-genetic relationship information, which is determined by the effective number of SNPs and training data size As genomic BLUP cannot capture short-range LD information well, we recommend Bayesian methods with t-distributed priors

Journal ArticleDOI
01 Jun 2013-Genetics
TL;DR: This review presents IBD as the framework connecting evolutionary and coalescent theory with the analysis of genetic data observed on individuals, and focuses on the high variance of the processes that determine IBD, its changes across the genome, and its impact on observable data.
Abstract: Gene identity by descent (IBD) is a fundamental concept that underlies genetically mediated similarities among relatives. Gene IBD is traced through ancestral meioses and is defined relative to founders of a pedigree, or to some time point or mutational origin in the coalescent of a set of extant genes in a population. The random process underlying changes in the patterns of IBD across the genome is recombination, so the natural context for defining IBD is the ancestral recombination graph (ARG), which specifies the complete ancestry of a collection of chromosomes. The ARG determines both the sequence of coalescent ancestries across the chromosome and the extant segments of DNA descending unbroken by recombination from their most recent common ancestor (MRCA). DNA segments IBD from a recent common ancestor have high probability of being of the same allelic type. Non-IBD DNA is modeled as of independent allelic type, but the population frame of reference for defining allelic independence can vary. Whether of IBD, allelic similarity, or phenotypic covariance, comparisons may be made to other genomic regions of the same gametes, or to the same genomic regions in other sets of gametes or diploid individuals. In this review, I present IBD as the framework connecting evolutionary and coalescent theory with the analysis of genetic data observed on individuals. I focus on the high variance of the processes that determine IBD, its changes across the genome, and its impact on observable data.

Journal ArticleDOI
01 Nov 2013-Genetics
TL;DR: A novel method of targeted gene disruption that involves direct injection of recombinant Cas9 protein complexed with guide RNA into the gonad of the nematode Caenorhabditis elegans is presented.
Abstract: We present a novel method of targeted gene disruption that involves direct injection of recombinant Cas9 protein complexed with guide RNA into the gonad of the nematode Caenorhabditis elegans. Biallelic mutants were recovered among the F1 progeny, demonstrating the high efficiency of this method.

Journal ArticleDOI
01 Jan 2013-Genetics
TL;DR: A matrix of dominant genomic relationships across individuals, D, is described, similar to the G matrix used in genomic best linear unbiased prediction, which can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population.
Abstract: Genomic evaluation models can fit additive and dominant SNP effects. Under quantitative genetics theory, additive or "breeding" values of individuals are generated by substitution effects, which involve both "biological" additive and dominant effects of the markers. Dominance deviations include only a portion of the biological dominant effects of the markers. Additive variance includes variation due to the additive and dominant effects of the markers. We describe a matrix of dominant genomic relationships across individuals, D, which is similar to the G matrix used in genomic best linear unbiased prediction. This matrix can be used in a mixed-model context for genomic evaluations or to estimate dominant and additive variances in the population. From the "genotypic" value of individuals, an alternative parameterization defines additive and dominance as the parts attributable to the additive and dominant effect of the markers. This approach underestimates the additive genetic variance and overestimates the dominance variance. Transforming the variances from one model into the other is trivial if the distribution of allelic frequencies is known. We illustrate these results with mouse data (four traits, 1884 mice, and 10,946 markers) and simulated data (2100 individuals and 10,000 markers). Variance components were estimated correctly in the model, considering breeding values and dominance deviations. For the model considering genotypic values, the inclusion of dominant effects biased the estimate of additive variance. Genomic models were more accurate for the estimation of variance components than their pedigree-based counterparts.

Journal ArticleDOI
01 Jan 2013-Genetics
TL;DR: A new statistic, denoted hapFLK, which focuses instead on the differences of haplotype frequencies between populations, and is robust with respect to bottlenecks and migration and improves over existing approaches in many situations.
Abstract: The detection of molecular signatures of selection is one of the major concerns of modern population genetics. A widely used strategy in this context is to compare samples from several populations and to look for genomic regions with outstanding genetic differentiation between these populations. Genetic differentiation is generally based on allele frequency differences between populations, which are measured by FST or related statistics. Here we introduce a new statistic, denoted hapFLK, which focuses instead on the differences of haplotype frequencies between populations. In contrast to most existing statistics, hapFLK accounts for the hierarchical structure of the sampled populations. Using computer simulations, we show that each of these two features-the use of haplotype information and of the hierarchical structure of populations-significantly improves the detection power of selected loci and that combining them in the hapFLK statistic provides even greater power. We also show that hapFLK is robust with respect to bottlenecks and migration and improves over existing approaches in many situations. Finally, we apply hapFLK to a set of six sheep breeds from Northern Europe and identify seven regions under selection, which include already reported regions but also several new ones. We propose a method to help identifying the population(s) under selection in a detected region, which reveals that in many of these regions selection most likely occurred in more than one population. Furthermore, several of the detected regions correspond to incomplete sweeps, where the favorable haplotype is only at intermediate frequency in the population(s) under selection.

Journal ArticleDOI
01 May 2013-Genetics
TL;DR: Modifications are introduced to the rjMCMC algorithms that remove the constraint on the new species divergence time when splitting and alter the gene trees to remove incompatibilities, and are found to improve mixing of the Markov chain for both simulated and empirical data sets.
Abstract: Several computational methods have recently been proposed for delimiting species using multilocus sequence data. Among them, the Bayesian method of Yang and Rannala uses the multispecies coalescent model in the likelihood framework to calculate the posterior probabilities for the different species-delimitation models. It has a sound statistical basis and is found to have nice statistical properties in simulation studies, such as low error rates of undersplitting and oversplitting. However, the method suffers from poor mixing of the reversible-jump Markov chain Monte Carlo (rjMCMC) algorithms. Here, we describe several modifications to the algorithms. We propose a flexible prior that allows the user to specify the probability that each node on the guide tree represents a true speciation event. We also introduce modifications to the rjMCMC algorithms that remove the constraint on the new species divergence time when splitting and alter the gene trees to remove incompatibilities. The new algorithms are found to improve mixing of the Markov chain for both simulated and empirical data sets.

Journal ArticleDOI
01 Feb 2013-Genetics
TL;DR: The key players that mediate secretory protein biogenesis and trafficking are discussed, highlighting recent advances that have deepened the understanding of the complexity of this conserved and essential process.
Abstract: The secretory pathway is responsible for the synthesis, folding, and delivery of a diverse array of cellular proteins. Secretory protein synthesis begins in the endoplasmic reticulum (ER), which is charged with the tasks of correctly integrating nascent proteins and ensuring correct post-translational modification and folding. Once ready for forward traffic, proteins are captured into ER-derived transport vesicles that form through the action of the COPII coat. COPII-coated vesicles are delivered to the early Golgi via distinct tethering and fusion machineries. Escaped ER residents and other cycling transport machinery components are returned to the ER via COPI-coated vesicles, which undergo similar tethering and fusion reactions. Ultimately, organelle structure, function, and cell homeostasis are maintained by modulating protein and lipid flux through the early secretory pathway. In the last decade, structural and mechanistic studies have added greatly to the strong foundation of yeast genetics on which this field was built. Here we discuss the key players that mediate secretory protein biogenesis and trafficking, highlighting recent advances that have deepened our understanding of the complexity of this conserved and essential process.

Journal ArticleDOI
01 May 2013-Genetics
TL;DR: The results combined with those previously published show that a more complex model of admixture between Neanderthals and modern humans is necessary to account for the different levels of Neanderthal ancestry among human populations.
Abstract: Neanderthals were a group of archaic hominins that occupied most of Europe and parts of Western Asia from ∼30,000 to 300,000 years ago (KYA). They coexisted with modern humans during part of this time. Previous genetic analyses that compared a draft sequence of the Neanderthal genome with genomes of several modern humans concluded that Neanderthals made a small (1-4%) contribution to the gene pools of all non-African populations. This observation was consistent with a single episode of admixture from Neanderthals into the ancestors of all non-Africans when the two groups coexisted in the Middle East 50-80 KYA. We examined the relationship between Neanderthals and modern humans in greater detail by applying two complementary methods to the published draft Neanderthal genome and an expanded set of high-coverage modern human genome sequences. We find that, consistent with the recent finding of Meyer et al. (2012), Neanderthals contributed more DNA to modern East Asians than to modern Europeans. Furthermore we find that the Maasai of East Africa have a small but significant fraction of Neanderthal DNA. Because our analysis is of several genomic samples from each modern human population considered, we are able to document the extent of variation in Neanderthal ancestry within and among populations. Our results combined with those previously published show that a more complex model of admixture between Neanderthals and modern humans is necessary to account for the different levels of Neanderthal ancestry among human populations. In particular, at least some Neanderthal-modern human admixture must postdate the separation of the ancestors of modern European and modern East Asian populations.

Journal ArticleDOI
01 Aug 2013-Genetics
TL;DR: This Review seeks to provide an historical perspective on the impact of molecular biology on the authors' understanding of resistance and to begin to look forward to the likely impact of rapid advances in both sequencing and genome-wide association analysis.
Abstract: The past 60 years have seen a revolution in our understanding of the molecular genetics of insecticide resistance. While at first the field was split by arguments about the relative importance of mono- vs. polygenic resistance and field- vs. laboratory-based selection, the application of molecular cloning to insecticide targets and to the metabolic enzymes that degrade insecticides before they reach those targets has brought out an exponential growth in our understanding of the mutations involved. Molecular analysis has confirmed the relative importance of single major genes in target-site resistance and has also revealed some interesting surprises about the multi-gene families, such as cytochrome P450s, involved in metabolic resistance. Identification of the mutations involved in resistance has also led to parallel advances in our understanding of the enzymes and receptors involved, often with implications for the role of these receptors in humans. This Review seeks to provide an historical perspective on the impact of molecular biology on our understanding of resistance and to begin to look forward to the likely impact of rapid advances in both sequencing and genome-wide association analysis.

Journal ArticleDOI
01 Aug 2013-Genetics
TL;DR: This work sequenced eight genomes produced by a mutation-accumulation experiment in Drosophila melanogaster to estimate genome-wide rates of large deletions and tandem duplications and reveals that point mutation and small indel rates vary significantly between the two different genetic backgrounds examined.
Abstract: Because spontaneous mutation is the source of all genetic diversity, measuring mutation rates can reveal how natural selection drives patterns of variation within and between species. We sequenced eight genomes produced by a mutation-accumulation experiment in Drosophila melanogaster. Our analysis reveals that point mutation and small indel rates vary significantly between the two different genetic backgrounds examined. We also find evidence that ∼2% of mutational events affect multiple closely spaced nucleotides. Unlike previous similar experiments, we were able to estimate genome-wide rates of large deletions and tandem duplications. These results suggest that, at least in inbred lines like those examined here, mutational pressures may result in net growth rather than contraction of the Drosophila genome. By comparing our mutation rate estimates to polymorphism data, we are able to estimate the fraction of new mutations that are eliminated by purifying selection. These results suggest that ∼99% of duplications and deletions are deleterious--making them 10 times more likely to be removed by selection than nonsynonymous mutations. Our results illuminate not only the rates of new small- and large-scale mutations, but also the selective forces that they encounter once they arise.

Journal ArticleDOI
01 Mar 2013-Genetics
TL;DR: This review summarizes the current understanding of how cation balance is achieved and modulated in baker’s yeast and discusses the mechanisms that allow cells to maintain appropriate intracellular cation concentrations when challenged by extreme conditions, i.e., either limited availability or toxic levels in the environment.
Abstract: All living organisms require nutrient minerals for growth and have developed mechanisms to acquire, utilize, and store nutrient minerals effectively. In the aqueous cellular environment, these elements exist as charged ions that, together with protons and hydroxide ions, facilitate biochemical reactions and establish the electrochemical gradients across membranes that drive cellular processes such as transport and ATP synthesis. Metal ions serve as essential enzyme cofactors and perform both structural and signaling roles within cells. However, because these ions can also be toxic, cells have developed sophisticated homeostatic mechanisms to regulate their levels and avoid toxicity. Studies in Saccharomyces cerevisiae have characterized many of the gene products and processes responsible for acquiring, utilizing, storing, and regulating levels of these ions. Findings in this model organism have often allowed the corresponding machinery in humans to be identified and have provided insights into diseases that result from defects in ion homeostasis. This review summarizes our current understanding of how cation balance is achieved and modulated in baker's yeast. Control of intracellular pH is discussed, as well as uptake, storage, and efflux mechanisms for the alkali metal cations, Na + and K + , the divalent cations, Ca 2+ and Mg 2+ , and the trace metal ions, Fe 2+ ,Z n 2+ ,C u 2+ , and Mn 2+ . Signal transduction pathways that are regulated by pH and Ca 2+ are reviewed, as well as the mechanisms that allow cells to maintain appropriate intracellular cation concentrations when challenged by extreme conditions, i.e., either limited availability or toxic levels in the environment. Abstract 677

Journal ArticleDOI
01 Feb 2013-Genetics
TL;DR: The relationship between FST and the frequency of the most frequent allele is examined, demonstrating that the range of values that FST can take is restricted considerably by the allele-frequency distribution and providing a conceptual basis for understanding the dependence of FST on allele frequencies and genetic diversity.
Abstract: F(ST) is frequently used as a summary of genetic differentiation among groups. It has been suggested that F(ST) depends on the allele frequencies at a locus, as it exhibits a variety of peculiar properties related to genetic diversity: higher values for biallelic single-nucleotide polymorphisms (SNPs) than for multiallelic microsatellites, low values among high-diversity populations viewed as substantially distinct, and low values for populations that differ primarily in their profiles of rare alleles. A full mathematical understanding of the dependence of F(ST) on allele frequencies, however, has been elusive. Here, we examine the relationship between F(ST) and the frequency of the most frequent allele, demonstrating that the range of values that F(ST) can take is restricted considerably by the allele-frequency distribution. For a two-population model, we derive strict bounds on F(ST) as a function of the frequency M of the allele with highest mean frequency between the pair of populations. Using these bounds, we show that for a value of M chosen uniformly between 0 and 1 at a multiallelic locus whose number of alleles is left unspecified, the mean maximum F(ST) is ∼0.3585. Further, F(ST) is restricted to values much less than 1 when M is low or high, and the contribution to the maximum F(ST) made by the most frequent allele is on average ∼0.4485. Using bounds on homozygosity that we have previously derived as functions of M, we describe strict bounds on F(ST) in terms of the homozygosity of the total population, finding that the mean maximum F(ST) given this homozygosity is 1 - ln 2 ≈ 0.3069. Our results provide a conceptual basis for understanding the dependence of F(ST) on allele frequencies and genetic diversity and for interpreting the roles of these quantities in computations of F(ST) from population-genetic data. Further, our analysis suggests that many unusual observations of F(ST), including the relatively low F(ST) values in high-diversity human populations from Africa and the relatively low estimates of F(ST) for microsatellites compared to SNPs, can be understood not as biological phenomena associated with different groups of populations or classes of markers but rather as consequences of the intrinsic mathematical dependence of F(ST) on the properties of allele-frequency distributions.

Journal ArticleDOI
L. Ryan Baugh1
01 Jul 2013-Genetics
TL;DR: The first larval stage of Caenorhabditis elegans can also reversibly arrest development in response to starvation as discussed by the authors, which is known as "L1 diapause" and is accompanied by increased stress resistance.
Abstract: It is widely appreciated that larvae of the nematode Caenorhabditis elegans arrest development by forming dauer larvae in response to multiple unfavorable environmental conditions. C. elegans larvae can also reversibly arrest development earlier, during the first larval stage (L1), in response to starvation. "L1 arrest" (also known as "L1 diapause") occurs without morphological modification but is accompanied by increased stress resistance. Caloric restriction and periodic fasting can extend adult lifespan, and developmental models are critical to understanding how the animal is buffered from fluctuations in nutrient availability, impacting lifespan. L1 arrest provides an opportunity to study nutritional control of development. Given its relevance to aging, diabetes, obesity and cancer, interest in L1 arrest is increasing, and signaling pathways and gene regulatory mechanisms controlling arrest and recovery have been characterized. Insulin-like signaling is a critical regulator, and it is modified by and acts through microRNAs. DAF-18/PTEN, AMP-activated kinase and fatty acid biosynthesis are also involved. The nervous system, epidermis, and intestine contribute systemically to regulation of arrest, but cell-autonomous signaling likely contributes to regulation in the germline. A relatively small number of genes affecting starvation survival during L1 arrest are known, and many of them also affect adult lifespan, reflecting a common genetic basis ripe for exploration. mRNA expression is well characterized during arrest, recovery, and normal L1 development, providing a metazoan model for nutritional control of gene expression. In particular, post-recruitment regulation of RNA polymerase II is under nutritional control, potentially contributing to a rapid and coordinated response to feeding. The phenomenology of L1 arrest will be reviewed, as well as regulation of developmental arrest and starvation survival by various signaling pathways and gene regulatory mechanisms.

Journal ArticleDOI
01 Apr 2013-Genetics
TL;DR: This review will focus on repair mechanisms that involve excision of a single strand from duplex DNA with the intact, complementary strand serving as a template to fill the resulting gap.
Abstract: DNA repair mechanisms are critical for maintaining the integrity of genomic DNA, and their loss is associated with cancer predisposition syndromes. Studies in Saccharomyces cerevisiae have played a central role in elucidating the highly conserved mechanisms that promote eukaryotic genome stability. This review will focus on repair mechanisms that involve excision of a single strand from duplex DNA with the intact, complementary strand serving as a template to fill the resulting gap. These mechanisms are of two general types: those that remove damage from DNA and those that repair errors made during DNA synthesis. The major DNA-damage repair pathways are base excision repair and nucleotide excision repair, which, in the most simple terms, are distinguished by the extent of single-strand DNA removed together with the lesion. Mistakes made by DNA polymerases are corrected by the mismatch repair pathway, which also corrects mismatches generated when single strands of non-identical duplexes are exchanged during homologous recombination. In addition to the true repair pathways, the postreplication repair pathway allows lesions or structural aberrations that block replicative DNA polymerases to be tolerated. There are two bypass mechanisms: an error-free mechanism that involves a switch to an undamaged template for synthesis past the lesion and an error-prone mechanism that utilizes specialized translesion synthesis DNA polymerases to directly synthesize DNA across the lesion. A high level of functional redundancy exists among the pathways that deal with lesions, which minimizes the detrimental effects of endogenous and exogenous DNA damage.

Journal ArticleDOI
01 Aug 2013-Genetics
TL;DR: SLiM as mentioned in this paper is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale, which can incorporate complex scenarios of demography and population substructure.
Abstract: SLiM is an efficient forward population genetic simulation designed for studying the effects of linkage and selection on a chromosome-wide scale. The program can incorporate complex scenarios of demography and population substructure, various models for selection and dominance of new mutations, arbitrary gene structure, and user-defined recombination maps.

Journal ArticleDOI
01 Jul 2013-Genetics
TL;DR: This work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes.
Abstract: Throughout history, the population size of modern humans has varied considerably due to changes in environment, culture, and technology. More accurate estimates of population size changes, and when they occurred, should provide a clearer picture of human colonization history and help remove confounding effects from natural selection inference. Demography influences the pattern of genetic variation in a population, and thus genomic data of multiple individuals sampled from one or more present-day populations contain valuable information about the past demographic history. Recently, Li and Durbin developed a coalescent-based hidden Markov model, called the pairwise sequentially Markovian coalescent (PSMC), for a pair of chromosomes (or one diploid individual) to estimate past population sizes. This is an efficient, useful approach, but its accuracy in the very recent past is hampered by the fact that, because of the small sample size, only few coalescence events occur in that period. Multiple genomes from the same population contain more information about the recent past, but are also more computationally challenging to study jointly in a coalescent framework. Here, we present a new coalescent-based method that can efficiently infer population size changes from multiple genomes, providing access to a new store of information about the recent past. Our work generalizes the recently developed sequentially Markov conditional sampling distribution framework, which provides an accurate approximation of the probability of observing a newly sampled haplotype given a set of previously sampled haplotypes. Simulation results demonstrate that we can accurately reconstruct the true population histories, with a significant improvement over the PSMC in the recent past. We apply our method, called diCal, to the genomes of multiple human individuals of European and African ancestry to obtain a detailed population size change history during recent times.

Journal ArticleDOI
01 Jan 2013-Genetics
TL;DR: Failure of LD and peroxisome biogenesis and function are likely to lead to deregulated lipid fluxes and disrupted energy homeostasis with detrimental consequences for the cell.
Abstract: Lipid droplets (LDs) and peroxisomes are central players in cellular lipid homeostasis: some of their main functions are to control the metabolic flux and availability of fatty acids (LDs and peroxisomes) as well as of sterols (LDs). Both fatty acids and sterols serve multiple functions in the cell—as membrane stabilizers affecting membrane fluidity, as crucial structural elements of membrane-forming phospholipids and sphingolipids, as protein modifiers and signaling molecules, and last but not least, as a rich carbon and energy source. In addition, peroxisomes harbor enzymes of the malic acid shunt, which is indispensable to regenerate oxaloacetate for gluconeogenesis, thus allowing yeast cells to generate sugars from fatty acids or nonfermentable carbon sources. Therefore, failure of LD and peroxisome biogenesis and function are likely to lead to deregulated lipid fluxes and disrupted energy homeostasis with detrimental consequences for the cell. These pathological consequences of LD and peroxisome failure have indeed sparked great biomedical interest in understanding the biogenesis of these organelles, their functional roles in lipid homeostasis, interaction with cellular metabolism and other organelles, as well as their regulation, turnover, and inheritance. These questions are particularly burning in view of the pandemic development of lipid-associated disorders worldwide.