scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Populations and Evolution in 2013"


Posted Content
TL;DR: It is shown that for many diseases, rare alleles are unlikely to contribute a large fraction of the heritable variation, and therefore the impact of recent growth is likely to be modest, however, for those diseases that have a direct impact on fitness, strongly deleterious rare mutations probably do have an important role, and recent growth will have increased their impact.
Abstract: Human populations have undergone dramatic changes in population size in the past 100,000 years, including a severe bottleneck of non-African populations and recent explosive population growth. There is currently great interest in how these demographic events may have affected the burden of deleterious mutations in individuals and the allele frequency spectrum of disease mutations in populations. Here we use population genetic models to show that--contrary to previous conjectures--recent human demography has likely had very little impact on the average burden of deleterious mutations carried by individuals. This prediction is supported by exome sequence data showing that African American and European American individuals carry very similar burdens of damaging mutations. We next consider whether recent population growth has increased the importance of very rare mutations in complex traits. Our analysis predicts that for most classes of disease variants, rare alleles are unlikely to contribute a large fraction of the total genetic variance, and that the impact of recent growth is likely to be modest. However, for diseases that have a direct impact on fitness, strongly deleterious rare mutations likely do play important roles, and the impact of very rare mutations will be far greater as a result of recent growth. In summary, demographic history has dramatically impacted patterns of variation in different human populations, but these changes have likely had little impact on either genetic load or on the importance of rare variants for most complex traits.

246 citations


Posted Content
TL;DR: It is found that deleterious mutations accumulate steadily on the wave front during range expansions, thus creating an expansion load that can persist and represent a major fraction of the total mutation load for thousands of generations after the expansion.
Abstract: We investigate the effect of spatial range expansions on the evolution of fitness when beneficial and deleterious mutations co-segregate. We perform individual-based simulations of a uniform linear habitat and complement them with analytical approximations for the evolution of mean fitness at the edge of the expansion. We find that deleterious mutations accumulate steadily on the wave front during range expansions, thus creating an expansion load. Reduced fitness due to the expansion load is not restricted to the wave front but occurs over a large proportion of newly colonized habitats. The expansion load can persist and represent a major fraction of the total mutation load thousands of generations after the expansion. Our results extend qualitatively and quantitatively to two-dimensional expansions. The phenomenon of expansion load may explain growing evidence that populations that have recently expanded, including humans, show an excess of deleterious mutations. To test the predictions of our model, we analyze patterns of neutral and non-neutral genetic diversity in humans and find an excellent fit between theory and data.

231 citations


Posted Content
TL;DR: Bayesian Estimation of Differentiation in Alleles by Spatial Structure and Local Ecology (BEDASSLE) as mentioned in this paper is a Bayesian method that enables users to quantify the relative contributions of geographic distance and ecological distance to genetic differentiation between sampled populations or individuals.
Abstract: Populations can be genetically isolated both by geographic distance and by differences in their ecology or environment that decrease the rate of successful migration. Empirical studies often seek to investigate the relationship between genetic differentiation and some ecological variable(s) while accounting for geographic distance, but common approaches to this problem (such as the partial Mantel test) have a number of drawbacks. In this article, we present a Bayesian method that enables users to quantify the relative contributions of geographic distance and ecological distance to genetic differentiation between sampled populations or individuals. We model the allele frequencies in a set of populations at a set of unlinked loci as spatially correlated Gaussian processes, in which the covariance structure is a decreasing function of both geographic and ecological distance. Parameters of the model are estimated using a Markov chain Monte Carlo algorithm. We call this method Bayesian Estimation of Differentiation in Alleles by Spatial Structure and Local Ecology (BEDASSLE), and have implemented it in a user-friendly format in the statistical platform R. We demonstrate its utility with a simulation study and empirical applications to human and teosinte datasets.

202 citations


Posted Content
TL;DR: Simulations as well as empirical studies on genomic data show that combining gene tree–species tree models with models of sequence evolution improves gene tree reconstruction, and these better gene trees provide a more reliable basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences.
Abstract: Molecular phylogeny has focused mainly on improving models for the reconstruction of gene trees based on sequence alignments. Yet, most phylogeneticists seek to reveal the history of species. Although the histories of genes and species are tightly linked, they are seldom identical, because genes duplicate, are lost or horizontally transferred, and because alleles can co-exist in populations for periods that may span several speciation events. Building models describing the relationship between gene and species trees can thus improve the reconstruction of gene trees when a species tree is known, and vice-versa. Several approaches have been proposed to solve the problem in one direction or the other, but in general neither gene trees nor species trees are known. Only a few studies have attempted to jointly infer gene trees and species trees. In this article we review the various models that have been used to describe the relationship between gene trees and species trees. These models account for gene duplication and loss, transfer or incomplete lineage sorting. Some of them consider several types of events together, but none exists currently that considers the full repertoire of processes that generate gene trees along the species tree. Simulations as well as empirical studies on genomic data show that combining gene tree-species tree models with models of sequence evolution improves gene tree reconstruction. In turn, these better gene trees provide a better basis for studying genome evolution or reconstructing ancestral chromosomes and ancestral gene sequences. We predict that gene tree-species tree methods that can deal with genomic data sets will be instrumental to advancing our understanding of genomic evolution.

197 citations


Journal ArticleDOI
TL;DR: DivMigrate-online as mentioned in this paper is a web application that allows the estimation of directional components of genetic divergence between pairs of populations at low computational effort, using any of the classical or modern measures of genetic differentiation, which can further be used to calculate directional relative migration and to detect asymmetries in gene flow patterns.
Abstract: Understanding the population structure and patterns of gene flow within species is of fundamental importance to the study of evolution. In the fields of population and evolutionary genetics, measures of genetic differentiation are commonly used to gather this information. One potential caveat is that these measures assume gene flow to be symmetric. However, asymmetric gene flow is common in nature, especially in systems driven by physical processes such as wind or water currents. Since information about levels of asymmetric gene flow among populations is essential for the correct interpretation of the distribution of contemporary genetic diversity within species, this should not be overlooked. To obtain information on asymmetric migration patterns from genetic data, complex models based on maximum likelihood or Bayesian approaches generally need to be employed, often at great computational cost. Here, a new simpler and more efficient approach for understanding gene flow patterns is presented. This approach allows the estimation of directional components of genetic divergence between pairs of populations at low computational effort, using any of the classical or modern measures of genetic differentiation. These directional measures of genetic differentiation can further be used to calculate directional relative migration and to detect asymmetries in gene flow patterns. This can be done in a user-friendly web application called divMigrate-online introduced in this paper. Using simulated data sets with known gene flow regimes, we demonstrate that the method is capable of resolving complex migration patterns under a range of study designs.

186 citations


Posted Content
TL;DR: Overall chondrichthyan extinction risk is substantially higher for sharks, rays, and chimaeras than for most other vertebrates, and only one-third of species are considered safe.
Abstract: The rapid expansion of human activities threatens ocean-wide biodiversity loss. Numerous marine animal populations have declined, yet it remains unclear whether these trends are symptomatic of a chronic accumulation of global marine extinction risk. We present the first systematic analysis of threat for a globally-distributed lineage of 1,041 chondrichthyan fishes - sharks, rays, and chimaeras. We estimate that one-quarter are threatened according to IUCN Red List criteria due to overfishing (targeted and incidental). Large-bodied, shallow-water species are at greatest risk and five out of the seven most threatened families are rays. Overall chondrichthyan extinction risk is substantially higher than for most other vertebrates, and only one-third of species are considered safe. Population depletion has occurred throughout the world's ice-free waters, but is particularly prevalent in the Indo-Pacific Biodiversity Triangle and Mediterranean Sea. Improved management of fisheries and trade is urgently needed to avoid extinctions and promote population recovery.

183 citations


Posted Content
TL;DR: The genomic contributions of African, European, and especially Native American ancestry to these populations are explored and identity-by-descent (IBD) and ancestry tract length are compared to find evidence for relatedness among European founders to the three populations.
Abstract: There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern America ancestry of the Ta\'ino people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent and ancestry tract length, we show that post-contact populations differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe.

155 citations


Posted Content
TL;DR: This paper showed that recent population growth increases the proportion of nonsynonymous variants segregating in the population, but does not affect the genetic load relative to that in a population that did not expand.
Abstract: Population genetic studies have found evidence for dramatic population growth in recent human history. It is unclear how this recent population growth, combined with the effects of negative natural selection, has affected patterns of deleterious variation, as well as the number, frequencies, and effect sizes of mutations that contribute risk to complex traits. Here I use simulations under population genetic models where a proportion of the heritability of the trait is accounted for by mutations in a subset of the exome. I show that recent population growth increases the proportion of nonsynonymous variants segregating in the population, but does not affect the genetic load relative to that in a population that did not expand. Under a model where a mutation's effect on a trait is correlated with its effect on fitness, rare variants explain a greater portion of the additive genetic variance of the trait in a population that has recently expanded than in a population that did not recently expand. Further, when using a single-marker test, for a given false-positive rate and sample size, recent population growth decreases the expected number of significant association with the trait relative to the number detected in a population that did not expand. However, in a model where there is no correlation between a mutation's effect on fitness and the effect on the trait, common variants account for much of the additive genetic variance, regardless of demography. Moreover, here demography does not affect the number of significant association detected. These finding suggest recent population history may be an important factor influencing the power of association tests in accounting for the missing heritability of certain complex traits.

137 citations


Posted Content
TL;DR: Diversity among these strains is principally organized by geography, with European, North American, Asian, and African/S.
Abstract: The budding yeast Saccharomyces cerevisiae is important for human food production and as a model organism for biological research. The genetic diversity contained in the global population of yeast strains represents a valuable resource for a number of fields, including genetics, bioengineering, and studies of evolution and population structure. Here, we apply a multiplexed, reduced genome sequencing strategy (known as RAD-seq) to genotype a large collection of S. cerevisiae strains, isolated from a wide range of geographical locations and environmental niches. The method permits the sequencing of the same 1% of all genomes, producing a multiple sequence alignment of 116,880 bases across 262 strains. We find diversity among these strains is principally organized by geography, with European, North American, Asian and African/S. E. Asian populations defining the major axes of genetic variation. At a finer scale, small groups of strains from cacao, olives and sake are defined by unique variants not present in other strains. One population, containing strains from a variety of fermentations, exhibits high levels of heterozygosity and mixtures of alleles from European and Asian populations, indicating an admixed origin for this group. In the context of this global diversity, we demonstrate that a collection of seven strains commonly used in the laboratory encompasses only one quarter of the genetic diversity present in the full collection of strains, underscoring the relatively limited genetic diversity captured by the current set of lab strains. We propose a model of geographic differentiation followed by human-associated admixture, primarily between European and Asian populations and more recently between European and North American populations. The large collection of genotyped yeast strains characterized here will provide a useful resource for the broad community of yeast researchers.

133 citations


Posted Content
TL;DR: In this paper, a probabilistic approach is proposed to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of trees.
Abstract: Gene trees record the combination of gene level events, such as duplication, transfer and loss, and species level events, such as speciation and extinction. Gene tree-species tree reconciliation methods model these processes by drawing gene trees into the species tree using a series of gene and species level events. The reconstruction of gene trees based on sequence alone almost always involves choosing between statistically equivalent or weakly distinguishable relationships that could be much better resolved based on a putative species tree. To exploit this potential for accurate reconstruction of gene trees the space of reconciled gene trees must be explored according to a joint model of sequence evolution and gene tree-species tree reconciliation. Here we present amalgamated likelihood estimation (ALE), a probabilistic approach to exhaustively explore all reconciled gene trees that can be amalgamated as a combination of clades observed in a sample of trees. We implement ALE in the context of a reconciliation model, which allows for the duplication, transfer and loss of genes. We use ALE to efficiently approximate the sum of the joint likelihood over amalgamations and to find the reconciled gene tree that maximizes the joint likelihood. We demonstrate using simulations that gene trees reconstructed using the joint likelihood are substantially more accurate than those reconstructed using sequence alone. Using realistic topologies, branch lengths and alignment sizes, we demonstrate that ALE produces more accurate gene trees even if the model of sequence evolution is greatly simplified. Finally, examining 1099 gene families from 36 cyanobacterial genomes we find that joint likelihood-based inference results in a striking reduction in apparent phylogenetic discord, with 24%, 59% and 46% percent reductions in the mean numbers of duplications, transfers and losses.

106 citations


Journal ArticleDOI
TL;DR: A phylodynamic method is developed that enables the joint estimation of epidemiological parameters and phylogenetic history based on a compartmental susceptible–infected–removed (SIR) model, which provides separate information on incidence and prevalence of infections.
Abstract: The evolution of RNA viruses such as HIV, Hepatitis C and Influenza virus occurs so rapidly that the viruses' genomes contain information on past ecological dynamics. Hence, we develop a phylodynamic method that enables the joint estimation of epidemiological parameters and phylogenetic history. Based on a compartmental susceptible-infected-removed (SIR) model, this method provides separate information on incidence and prevalence of infections. Detailed information on the interaction of host population dynamics and evolutionary history can inform decisions on how to contain or entirely avoid disease outbreaks. We apply our Birth-Death SIR method (BDSIR) to two viral data sets. First, five human immunodeficiency virus type 1 clusters sampled in the United Kingdom between 1999 and 2003 are analyzed. The estimated basic reproduction ratios range from 1.9 to 3.2 among the clusters. All clusters show a decline in the growth rate of the local epidemic in the middle or end of the 90's. The analysis of a hepatitis C virus (HCV) genotype 2c data set shows that the local epidemic in the Cordoban city Cruz del Eje originated around 1906 (median), coinciding with an immigration wave from Europe to central Argentina that dates from 1880--1920. The estimated time of epidemic peak is around 1970.

Posted Content
TL;DR: In this article, the origin of the self-fertilizing species Capsella rubella was analyzed and it was shown that C. rubella is founded by multiple individuals drawn from a diverse ancestral population closely related to extant C. grandiflora.
Abstract: The shift from outcrossing to self-fertilization is among the most common transitions in plants. Until recently, however, a genome-wide view of this transition has been obscured by a dearth of appropriate data and the lack of appropriate population genomic methods to interpret such data. Here, we present novel analyses detailing the origin of the selfing species, Capsella rubella, which recently split from its outcrossing sister, Capsella grandiflora. Due to the recency of the split, most variation within C. rubella is found within C. grandiflora. We can therefore identify genomic regions where two C. rubella individuals have inherited the same or different segments of ancestral diversity (i.e. founding haplotypes) present in C. rubella's founder(s). Based on this analysis, we show that C. rubella was founded by multiple individuals drawn from a diverse ancestral population closely related to extant C. grandiflora, that drift and selection have rapidly homogenized most of this ancestral variation since C. rubella's founding, and that little novel variation has accumulated within this time. Despite the extensive loss of ancestral variation, the approximately 25% of the genome for which two C. rubella individuals have inherited different founding haplotypes makes up roughly 90% of the genetic variation between them. To extend these findings, we develop a coalescent model that utilizes the inferred frequency of founding haplotypes and variation within founding haplotypes to estimate that C. rubella was founded by a potentially large number of individuals 50-100 kya, and has subsequently experienced a 20X reduction in its effective population size. As population genomic data from an increasing number of outcrossing/selfing pairs are generated, analyses like this here will facilitate a fine-scaled view of the evolutionary and demographic impact of the transition to self-fertilization.

Posted Content
TL;DR: This article showed that the fruit fly, Drosophila melanogaster, obeys the temperature size rule (TSR) using a novel mechanism: reduction of critical size at higher temperatures.
Abstract: Most ectotherms show an inverse relationship between developmental temperature and body size, a phenomenon known as the temperature size rule (TSR). Several competing hypotheses have been proposed to explain its occurrence. According to one set of views, the TSR results from inevitable biophysical effects of temperature on the rates of growth and differentiation, whereas other views suggest the TSR is an adaptation that can be achieved by a diversity of mechanisms in different taxa. Our data reveal that the fruit fly, Drosophila melanogaster, obeys the TSR using a novel mechanism: reduction of critical size at higher temperatures. In holometabolous insects, attainment of critical size initiates the hormonal cascade that terminates growth, and hence, Drosophila larvae appear to instigate the signal to stop growth at a smaller size at higher temperatures. This is in contrast to findings from another holometabolous insect, Manduca sexta, in which the TSR results from the effect of temperature on the rate and duration of growth. This contrast suggests that there is no single mechanism that accounts for the TSR. Instead, the TSR appears to be an adaptation that is achieved at a proximate level through different mechanisms in different taxa.

Posted Content
TL;DR: The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool‐Seq data from diverse D. melanogaster populations.
Abstract: Sequencing of pools of individuals (Pool-Seq) represents a reliable and cost- effective approach for estimating genome-wide SNP and transposable element insertion frequencies. However, Pool-Seq does not provide direct information on haplotypes so that for example obtaining inversion frequencies has not been possible until now. Here, we have developed a new set of diagnostic marker SNPs for 7 cosmopolitan inversions in Drosophila melanogaster that can be used to infer inversion frequencies from Pool-Seq data. We applied our novel marker set to Pool-Seq data from an experimental evolution study and from North American and Australian latitudinal clines. In the experimental evolution data, we find evidence that positive selection has driven the frequencies of In(3R)C and In(3R)Mo to increase over time. In the clinal data, we confirm the existence of frequency clines for In(2L)t, In(3L)P and In(3R)Payne in both North America and Australia and detect a previously unknown latitudinal cline for In(3R)Mo in North America. The inversion markers developed here provide a versatile and robust tool for characterizing inversion frequencies and their dynamics in Pool- Seq data from diverse D. melanogaster populations.

Book ChapterDOI
TL;DR: This chapter aims to inform a practitioner about current methods for predicting potential distributions of invasive species, covering the conceptual bases, touching on mechanistic models, and then focusing on methods using species distribution records and environmental data to predict distributions.
Abstract: This chapter aims to inform a practitioner about current methods for predicting potential distributions of invasive species. It mostly addresses single species models, covering the conceptual bases, touching on mechanistic models, and then focusing on methods using species distribution records and environmental data to predict distributions. The commentary in this last section is oriented towards key issues that arise in fitting, and predicting with, these models (which include CLIMEX, MaxEnt and other regression methods). In other words, it is more about the process of thinking about the data and the modelling problem (which is a challenging one) than it is about one technique versus another. The discussion helps clarify the necessary steps and expertise for predicting distributions. Some researchers are optimistic that correlative models will predict with high precision; while that may be true for some species at some scales of evaluation, I believe that the issues discussed in this chapter show that substantial errors are reasonably likely. I am hopeful that ongoing developments will produce models better suited to the task and tools to help practitioners to better understand predictions and their uncertainties.

Posted Content
TL;DR: It is now clear that studying RKN variation via individual marker loci may fail due to the species’ convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex.
Abstract: Meloidogyne root knot nematodes (RKN) can infect most of the world's agricultural crop species and are among the most important of all plant pathogens. As yet however we have little understanding of their origins or the genomic basis of their extreme polyphagy. The most damaging pathogens reproduce by mitotic parthenogenesis and are suggested to originate by interspecific hybridizations between unknown parental taxa. We sequenced the genome of the diploid meiotic parthenogen Meloidogyne floridensis, and use a comparative genomic approach to test the hypothesis that it was involved in the hybrid origin of the tropical mitotic parthenogen M. incognita. Phylogenomic analysis of gene families from M. floridensis, M. incognita and an outgroup species M. hapla was used to trace the evolutionary history of these species' genomes, demonstrating that M. floridensis was one of the parental species in the hybrid origins of M. incognita. Analysis of the M. floridensis genome revealed many gene loci present in divergent copies, as they are in M. incognita, indicating that it too had a hybrid origin. The triploid M. incognita is shown to be a complex double-hybrid between M. floridensis and a third, unidentified parent. The agriculturally important RKN have very complex origins involving the mixing of several parental genomes by hybridization and their extreme polyphagy and agricultural success may be related to this hybridization, producing transgressive variation on which natural selection acts. Studying RKN variation via individual marker loci may fail due to the species' convoluted origins, and multi-species population genomics is essential to understand the hybrid diversity and adaptive variation of this important species complex. This comparative genomic analysis provides a compelling example of the importance and complexity of hybridization in generating animal species diversity more generally.

Posted Content
TL;DR: The authors reported on a genome-wide scan for introgression in the house mouse (Mus musculus domesticus) involving the Algerian mouse (mus spretus), using samples from the ranges of sympatry and allopatry in Africa and Europe.
Abstract: We report on a genome-wide scan for introgression in the house mouse (Mus musculus domesticus) involving the Algerian mouse (Mus spretus), using samples from the ranges of sympatry and allopatry in Africa and Europe. Our analysis reveals wide variability in introgression signatures along the genomes, as well as across the samples. We find that fewer than half of the autosomes in each genome harbor all detectable introgression, while the X chromosome has none. Further, European mice carry more M. spretus alleles than the sympatric African ones. Using the length distribution and sharing patterns of introgressed genomic tracts across the samples, we infer, first, that at least three distinct hybridization events involving M. spretus have occurred, one of which is ancient, and the other two are recent (one presumably due to warfarin rodenticide selection). Second, several of the inferred introgressed tracts contain genes that are likely to confer adaptive advantage. Third, introgressed tracts might contain driver genes that determine the evolutionary fate of those tracts. Further, functional analysis revealed introgressed genes that are essential to fitness, including the Vkorc1 gene, which is implicated in rodenticide resistance, and olfactory receptor genes. Our findings highlight the extent and role of introgression in nature, and call for careful analysis and interpretation of house mouse data in evolutionary and genetic studies.

Posted Content
TL;DR: This paper unify, simplify, and extend previous work on the evolutionary dynamics of symmetric N-player matrix games with two pure strategies by making use of the theory of polynomials in Bernstein form.
Abstract: In this paper we unify, simplify, and extend previous work on the evolutionary dynamics of symmetric $N$-player matrix games with two pure strategies. In such games, gains from switching strategies depend, in general, on how many other individuals in the group play a given strategy. As a consequence, the gain function determining the gradient of selection can be a polynomial of degree $N-1$. In order to deal with the intricacy of the resulting evolutionary dynamics, we make use of the theory of polynomials in Bernstein form. This theory implies a tight link between the sign pattern of the gains from switching on the one hand and the number and stability properties of the rest points of the replicator dynamics on the other hand. While this relationship is a general one, it is most informative if gains from switching have at most two sign changes, as it is the case for most multi-player matrix games considered in the literature. We demonstrate that previous results for public goods games are easily recovered and extended using this observation. Further examples illustrate how focusing on the sign pattern of the gains from switching obviates the need for a more involved analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors outline general project design considerations for phylogenetic analyses of gene expression, and suggest solutions to these three categories of challenges and suggest new approaches to investigate the expression of genes whose phylogenies are not congruent with species phylogenies due to gene loss, gene duplication, and incomplete lineage sorting.
Abstract: Phylogenetic analyses of gene expression have great potential for addressing a wide range of questions. These analyses will, for example, identify genes that have evolutionary shifts in expression that are correlated with evolutionary changes in morphological, physiological, and developmental characters of interest. This will provide entirely new opportunities to identify genes related to particular phenotypes. There are, however, three key challenges that must be addressed for such studies to realize their potential. First, gene expression data must be measured from multiple species, some of which may be field collected, and parameterized in such a way that they can be compared across species. Second, it will be necessary to develop phylogenetic comparative methods suitable for large multidimensional datasets. In most phylogenetic comparative studies to date, the number n of independent observations (independent contrasts) has been greater than the number p of variables (characters). The behavior of comparative methods for these classic n>p problems are now well understood under a wide variety of conditions. In gene expression studies, and studies based on other high-throughput tools, the number n of samples is dwarfed by the number p of variables. The estimated covariance matrices will be singular, complicating their analysis and interpretation, and prone to spurious results. Third, new approaches are needed to investigate the expression of the many genes whose phylogenies are not congruent with species phylogenies due to gene loss, gene duplication, and incomplete lineage sorting. Here we outline general project design considerations for phylogenetic analyses of gene expression, and suggest solutions to these three categories of challenges. These topics are relevant to high-throughput phenotypic data well beyond gene expression.

Journal ArticleDOI
TL;DR: An extensive population-level study conducted through one-time censuses in urban India to understand the foraging associations of free-ranging dogs concludes that to be or not to be social is a matter of choice for the free- ranging dogs, and not amatter of chance.
Abstract: Canids display a wide diversity of social systems, from solitary to pairs to packs, and hence they have been extensively used as model systems to understand social dynamics in natural systems. Among canids, the dog can show various levels of social organization due to the influence of humans on their lives. Though the dog is known as man's best friend and has been studied extensively as a pet, studies on the natural history, ecology and behaviour of dogs in a natural habitat are rare. Here we report results of an extensive population-level study conducted through one-time censuses in urban India to understand the ecoethology of free-ranging dogs. We built a model to test if the observed groups could have been formed through random associations while foraging. Our modeling results suggest that the dogs, like all efficient scavengers, tend to forage singly but also form random uncorrelated groups. A closer inspection of the group compositions however reveals that the foraging associations are non-random events. The tendency of adults to associate with the opposite sex in the mating season and of juveniles to stay close to adults in the non-mating season drives the population towards aggregation, in spite of the apparently random nature of the group size distribution. Hence we conclude that to be or not to be social is a matter of choice for the free-ranging dogs, and not a matter of chance.

Journal ArticleDOI
TL;DR: A sensitivity analysis of the epidemiological model is performed in order to determine the relative importance of the model parameters to the disease transmission.
Abstract: Epidemiological models may give some basic guidelines for public health practitioners, allowing to analyze issues that can influence the strategies to prevent and fight a disease. To be used in decision-making, however, a mathematical model must be carefully parameterized and validated with epidemiological and entomological data. Here a SIR (S for susceptible, I for infectious, R for recovered individuals) and ASI (A for the aquatic phase of the mosquito, S for susceptible and I for infectious mosquitoes) epidemiological model describing a dengue disease is presented, as well as the associated basic reproduction number. A sensitivity analysis of the epidemiological model is performed in order to determine the relative importance of the model parameters to the disease transmission.

Posted Content
TL;DR: An evolutionary approach is proposed that explicitly relaxes the time-homogeneity assumption by allowing the specification of different infinitesimal substitution rate matrices across different time intervals, called epochs, along the evolutionary history.
Abstract: Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous substitution processes. Motivated by computational convenience, this assumption sacrifices biological realism and offers little opportunity to uncover the temporal dynamics in evolutionary histories. Here, we extend and generalize an evolutionary approach that relaxes the time-homogeneous process assumption by allowing the specification of different infinitesimal substitution rate matrices across different time intervals, called epochs, along the evolutionary history. We focus on an epoch model implementation in a Bayesian inference framework that offers great modeling flexibility in drawing inference about any discrete data type characterized as a continuous-time Markov chain, including phylogeographic traits. To alleviate the computational burden that the additional temporal heterogeneity imposes, we adopt a massively parallel approach that achieves both fine- and coarse-grain parallelization of the computations across branches that accommodate epoch transitions, making extensive use of graphics processing units. Through synthetic examples, we assess model performance in recovering evolutionary parameters from data generated according to different evolutionary scenarios that comprise different numbers of epochs for both nucleotide and codon substitution processes. We illustrate the usefulness of our inference framework in two different applications to empirical data sets: the selection dynamics on within-host HIV populations throughout infection and the seasonality of global influenza circulation. In both cases, our epoch model captures key features of temporal heterogeneity that remained difficult to test using ad hoc procedures.

Journal ArticleDOI
TL;DR: In this article, the authors developed a spectral algorithm to integrate over all possible frequency trajectories between consecutive time points to compute the likelihood of nonneutral models for the population allele frequency dynamics, given the observed temporal DNA data.
Abstract: The increased availability of time series genetic variation data from experimental evolution studies and ancient DNA samples has created new opportunities to identify genomic regions under selective pressure and to estimate their associated fitness parameters. However, it is a challenging problem to compute the likelihood of nonneutral models for the population allele frequency dynamics, given the observed temporal DNA data. Here, we develop a novel spectral algorithm to analytically and efficiently integrate over all possible frequency trajectories between consecutive time points. This advance circumvents the limitations of existing methods which require fine-tuning the discretization of the population allele frequency space when numerically approximating requisite integrals. Furthermore, our method is flexible enough to handle general diploid models of selection where the heterozygote and homozygote fitness parameters can take any values, while previous methods focused on only a few restricted models of selection. We demonstrate the utility of our method on simulated data and also apply it to analyze ancient DNA data from genetic loci associated with coat coloration in horses. In contrast to previous studies, our exploration of the full fitness parameter space reveals that a heterozygote advantage form of balancing selection may have been acting on these loci.

Journal ArticleDOI
TL;DR: It is proved that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large, and a general bound on the sample size sufficient for identifiability is obtained.
Abstract: The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions

Posted Content
TL;DR: It is concluded that if mortality continues to stagnate at young ages yet declines steadily at old ages, increases in lifespan inequality will become a common feature of future demographic change.
Abstract: In the past six decades, lifespan inequality has varied greatly within and among countries even while life expectancy has continued to increase. How and why does mortality change generate this diversity? We derive a precise link between changes in age-specific mortality and lifespan inequality, measured as the variance of age at death. Key to this relationship is a young-old threshold age, below and above which mortality decline respectively decreases and increases lifespan inequality. First, we show that shifts in the threshold's location modified the correlation between changes in life expectancy and lifespan inequality over the last two centuries. Second, we analyze the post Second World War trajectories of lifespan inequality in a set of developed countries, Japan, Canada and the United States (US), where thresholds centered on retirement age. Our method reveals how divergence in the age-pattern of mortality change drives international divergence in lifespan inequality. Most strikingly, early in the 1980s, mortality increases in young US males led lifespan inequality to remain high in the US, while in Canada the decline of inequality continued. In general, our wider international comparisons show that mortality change varied most at young working ages after the Second World War, particularly for males. We conclude that if mortality continues to stagnate at young ages, yet declines steadily at old ages, increases in lifespan inequality will become a common feature of future demographic change.

Journal ArticleDOI
TL;DR: This study evaluates four major predictions of METE simultaneously at an unprecedented scale and identifies a mismatch between abundance and body size in METE, demonstrating the importance of conducting strong tests of ecological theories.
Abstract: The Maximum Entropy Theory of Ecology (METE) is a unified theory of biodiversity that predicts a large number of macroecological patterns using only information on the species richness, total abundance, and total metabolic rate of the community. We evaluated four major predictions of METE simultaneously at an unprecedented scale using data from 60 globally distributed forest communities including over 300,000 individuals and nearly 2000 species. METE successfully captured 96% and 89% of the variation in the species abundance distribution and the individual size distribution, but performed poorly when characterizing the size-density relationship and intraspecific distribution of individual size. Specifically, METE predicted a negative correlation between size and species abundance, which is weak in natural communities. By evaluating multiple predictions with large quantities of data, our study not only identifies a mismatch between abundance and body size in METE, but also demonstrates the importance of conducting strong tests of ecological theories.

Journal ArticleDOI
TL;DR: This letter introduces a model of SIR (susceptible-infected-removed) type which explicitly incorporates the effect of cooperative coinfection, and argues that the results are obtained in a mean-field model using rate equations and should hold also in more general frameworks.
Abstract: Modeling epidemic dynamics plays an important role in studying how diseases spread, predicting their future course, and designing strategies to control them. In this letter, we introduce a model of SIR (susceptible-infected-removed) type which explicitly incorporates the effect of {\it cooperative coinfection}. More precisely, each individual can get infected by two different diseases, and an individual already infected with one disease has an increased probability to get infected by the other. Depending on the amount of this increase, we observe different threshold scenarios. Apart from the standard continuous phase transition for single disease outbreaks, we observe continuous transitions where both diseases must coexist, but also discontinuous transitions are observed, where a finite fraction of the population is already affected by both diseases at the threshold. All our results are obtained in a mean field model using rate equations, but we argue that they should hold also in more general frameworks.

Posted Content
TL;DR: In this paper, the diffusion of antigenic phenotype over a shared virus phylogeny is modeled by using the hemagglutination inhibition (HI) assay, and it is shown that A/H3N2 evolves faster and in a more punctuated fashion than other influenza lineages.
Abstract: Influenza viruses undergo continual antigenic evolution allowing mutant viruses to evade host immunity acquired to previous virus strains. Antigenic phenotype is often assessed through pairwise measurement of cross-reactivity between influenza strains using the hemagglutination inhibition (HI) assay. Here, we extend previous approaches to antigenic cartography, and simultaneously characterize antigenic and genetic evolution by modeling the diffusion of antigenic phenotype over a shared virus phylogeny. Using HI data from influenza lineages A/H3N2, A/H1N1, B/Victoria and B/Yamagata, we determine patterns of antigenic drift across viral lineages, showing that A/H3N2 evolves faster and in a more punctuated fashion than other influenza lineages. We also show that year-to-year antigenic drift appears to drive incidence patterns within each influenza lineage. This work makes possible substantial future advances in investigating the dynamics of influenza and other antigenically-variable pathogens by providing a model that intimately combines molecular and antigenic evolution.

Posted Content
TL;DR: A new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes and which converges rapidly to the posterior distribution over ARGs and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations.
Abstract: The complex correlation structure of a collection of orthologous DNA sequences is uniquely captured by the "ancestral recombination graph" (ARG), a complete record of coalescence and recombination events in the history of the sample. However, existing methods for ARG inference are computationally intensive, highly approximate, or limited to small numbers of sequences, and, as a consequence, explicit ARG inference is rarely used in applied population genomics. Here, we introduce a new algorithm for ARG inference that is efficient enough to apply to dozens of complete mammalian genomes. The key idea of our approach is to sample an ARG of n chromosomes conditional on an ARG of n-1 chromosomes, an operation we call "threading." Using techniques based on hidden Markov models, we can perform this threading operation exactly, up to the assumptions of the sequentially Markov coalescent and a discretization of time. An extension allows for threading of subtrees instead of individual sequences. Repeated application of these threading operations results in highly efficient Markov chain Monte Carlo samplers for ARGs. We have implemented these methods in a computer program called ARGweaver. Experiments with simulated data indicate that ARGweaver converges rapidly to the true posterior distribution and is effective in recovering various features of the ARG for dozens of sequences generated under realistic parameters for human populations. In applications of ARGweaver to 54 human genome sequences from Complete Genomics, we find clear signatures of natural selection, including regions of unusually ancient ancestry associated with balancing selection and reductions in allele age in sites under directional selection. Preliminary results also indicate that our methods can be used to gain insight into complex features of human population structure, even with a noninformative prior distribution.

Posted Content
TL;DR: Bambach et al. as mentioned in this paper identified 19 intervals of marked extinction intensity, including mass extinctions, spanning the last 470 million years (and with another six present in the Cambrian) and found that 10 of the 19 lie within 3 Myr of the maxima in the spacing of the 27 Myr periodicity.
Abstract: Analysis of two independent data sets with increased taxonomic resolution (genera rather than families) using the revised 2012 time scale reveals that an extinction periodicity first detected by Raup and Sepkoski (1984) for only the post-Paleozoic actually runs through the entire Phanerozoic. Although there is not a local peak of extinction every 27 million years, an excess of the fraction of genus extinction by interval follows a 27 million year timing interval and differs from a random distribution at the p ~ 0.02 level. A 27 million year periodicity in the spectrum of interval lengths no longer appears, removing the question of a possible artifact arising from it. Using a method originally developed in Bambach (2006) we identify 19 intervals of marked extinction intensity, including mass extinctions, spanning the last 470 million years (and with another six present in the Cambrian) and find that 10 of the 19 lie within 3 Myr of the maxima in the spacing of the 27 Myr periodicity, which differs from a random distribution at the p = 0.004 level. These 19 intervals of marked extinction intensity also preferentially occur during decreasing diversity phases of a well-known 62 Myr periodicity in diversity (16 of 19, p = 0.002). Both periodicities appear to enhance the likelihood of increased severity of extinction, but the cause of neither periodicity is known. Variations in the strength of the many suggested causes of extinction coupled to the variation in combined effect of the two different periodicities as they shift in and out of phase is surely one of the reasons that definitive comparative study of the causes of major extinction events is so elusive.