scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci

TL;DR: In this article, a Gaussian Markov random field (GMRF) model was proposed for the analysis of multilocus sequence data and the time to the most recent common ancestor (TMRCA) was recovered.
Abstract: Effective population size is fundamental in population genetics and characterizes genetic diversity. To infer past population dynamics from molecular sequence data, coalescent-based models have been developed for Bayesian nonparametric estimation of effective population size over time. Among the most successful is a Gaussian Markov random field (GMRF) model for a single gene locus. Here, we present a generalization of the GMRF model that allows for the analysis of multilocus sequence data. Using simulated data, we demonstrate the improved performance of our method to recover true population trajectories and the time to the most recent common ancestor (TMRCA). We analyze a multilocus alignment of HIV-1 CRF02_AG gene sequences sampled from Cameroon. Our results are consistent with HIV prevalence data and uncover some aspects of the population history that go undetected in Bayesian parametric estimation. Finally, we recover an older and more reconcilable TMRCA for a classic ancient DNA data set.
Citations
More filters
Journal ArticleDOI
TL;DR: The software package Tracer is presented, for visualizing and analyzing the MCMC trace files generated through Bayesian phylogenetic inference, which provides kernel density estimation, multivariate visualization, demographic trajectory reconstruction, conditional posterior distribution summary, and more.
Abstract: Bayesian inference of phylogeny using Markov chain Monte Carlo (MCMC) plays a central role in understanding evolutionary history from molecular sequence data. Visualizing and analyzing the MCMC-generated samples from the posterior distribution is a key step in any non-trivial Bayesian inference. We present the software package Tracer (version 1.7) for visualizing and analyzing the MCMC trace files generated through Bayesian phylogenetic inference. Tracer provides kernel density estimation, multivariate visualization, demographic trajectory reconstruction, conditional posterior distribution summary, and more. Tracer is open-source and available at http://beast.community/tracer.

5,492 citations

Journal ArticleDOI
Joshua Quick1, Nicholas J. Loman1, Sophie Duraffour2, Jared T. Simpson3, Jared T. Simpson4, Ettore Severi5, Ettore Severi6, Lauren A. Cowley, Joseph Akoi Bore2, Raymond Koundouno2, Gytis Dudas7, Amy Mikhail, Nobila Ouedraogo8, Babak Afrough, Amadou Bah9, Jonathan H.J. Baum2, Beate Becker-Ziaja2, Jan Peter Boettcher8, Mar Cabeza-Cabrerizo2, Álvaro Camino-Sánchez2, Lisa L. Carter10, Juliane Doerrbecker2, Theresa Enkirch11, Isabel García-Dorival12, Nicole Hetzelt8, Julia Hinzmann8, Tobias Holm2, Liana E. Kafetzopoulou13, Liana E. Kafetzopoulou6, Michel Koropogui, Abigael Kosgey14, Eeva Kuisma6, Christopher H. Logue6, Antonio Mazzarelli, Sarah Meisel2, Marc Mertens15, Janine Michel8, Didier Ngabo, Katja Nitzsche2, Elisa Pallasch2, Livia Victoria Patrono2, Jasmine Portmann, Johanna Repits16, Natasha Y. Rickett12, Andreas Sachse8, Katrin Singethan17, Inês Vitoriano, Rahel L. Yemanaberhan2, Elsa Gayle Zekeng12, Trina Racine18, Alexander Bello18, Amadou A. Sall19, Ousmane Faye19, Oumar Faye19, N’Faly Magassouba, Cecelia V. Williams20, Victoria Amburgey20, Linda Winona20, Emily Davis21, Jon Gerlach21, Frank Washington21, Vanessa Monteil, Marine Jourdain, Marion Bererd, Alimou Camara, Hermann Somlare, Abdoulaye Camara, Marianne Gerard, Guillaume Bado, Bernard Baillet, Déborah Delaune, Koumpingnin Yacouba Nebie22, Abdoulaye Diarra22, Yacouba Savane22, Raymond Pallawo22, Giovanna Jaramillo Gutierrez23, Natacha Milhano5, Natacha Milhano24, Isabelle Roger22, Christopher Williams, Facinet Yattara, Kuiama Lewandowski, James E. Taylor, Phillip A. Rachwal25, Daniel J. Turner, Georgios Pollakis12, Julian A. Hiscox12, David A. Matthews, Matthew K. O'Shea, Andrew Johnston, Duncan W. Wilson, Emma Hutley, Erasmus Smit6, Antonino Di Caro, Roman Wölfel26, Kilian Stoecker26, Erna Fleischmann26, Martin Gabriel2, Simon A. Weller25, Lamine Koivogui, Boubacar Diallo22, Sakoba Keita, Andrew Rambaut7, Andrew Rambaut27, Pierre Formenty22, Stephan Günther2, Miles W. Carroll 
11 Feb 2016-Nature
TL;DR: This paper presents sequence data and analysis of 142 EBOV samples collected during the period March to October 2015 and shows that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.
Abstract: A nanopore DNA sequencer is used for real-time genomic surveillance of the Ebola virus epidemic in the field in Guinea; the authors demonstrate that it is possible to pack a genomic surveillance laboratory in a suitcase and transport it to the field for on-site virus sequencing, generating results within 24 hours of sample collection. This paper reports the use of nanopore DNA sequencers (known as MinIONs) for real-time genomic surveillance of the Ebola virus epidemic, in the field in Guinea. The authors demonstrate that it is possible to pack a genomic surveillance laboratory in a suitcase and transport it to the field for on-site virus sequencing, generating results within 24 hours of sample collection. The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths1. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10−3 and 1.42 × 10−3 mutations per site per year. This is equivalent to 16–27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic2,3,4,5,6,7. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions8. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities9. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15–60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.

1,187 citations

Journal ArticleDOI
12 Sep 2014-Science
TL;DR: This West African variant likely diverged from central African lineages around 2004, crossed from Guinea to Sierra Leone in May 2014, and has exhibited sustained human-to-human transmission subsequently, with no evidence of additional zoonotic sources.
Abstract: In its largest outbreak, Ebola virus disease is spreading through Guinea, Liberia, Sierra Leone, and Nigeria. We sequenced 99 Ebola virus genomes from 78 patients in Sierra Leone to ~2000× coverage. We observed a rapid accumulation of interhost and intrahost genetic variation, allowing us to characterize patterns of viral transmission over the initial weeks of the epidemic. This West African variant likely diverged from central African lineages around 2004, crossed from Guinea to Sierra Leone in May 2014, and has exhibited sustained human-to-human transmission subsequently, with no evidence of additional zoonotic sources. Because many of the mutations alter protein sequences and other biologically meaningful targets, they should be monitored for impact on diagnostics, vaccines, and therapies critical to outbreak response.

1,164 citations

Journal ArticleDOI
Nuno R. Faria, Thomas A. Mellan1, Charles Whittaker1, Ingra Morales Claro2, Darlan da Silva Candido2, Darlan da Silva Candido3, Swapnil Mishra1, Myuki A E Crispim, Flavia C. S. Sales2, Iwona Hawryluk1, John T. McCrone4, Ruben J.G. Hulswit3, Lucas A M Franco2, Mariana S. Ramundo2, Jaqueline Goes de Jesus2, Pamela S Andrade2, Thais M. Coletti2, Giulia M. Ferreira5, Camila A. M. Silva2, Erika R. Manuli2, Rafael Henrique Moraes Pereira, Pedro S. Peixoto2, Moritz U. G. Kraemer3, Nelson Gaburo, Cecilia da C. Camilo, Henrique Hoeltgebaum1, William Marciel de Souza2, Esmenia C. Rocha2, Leandro Marques de Souza2, Mariana C. Pinho2, Leonardo José Tadeu de Araújo6, Frederico S V Malta, Aline B. de Lima, Joice do P. Silva, Danielle A G Zauli, Alessandro C. S. Ferreira, Ricardo P Schnekenberg3, Daniel J Laydon1, Patrick G T Walker1, Hannah M. Schlüter1, Ana L. P. dos Santos, Maria S. Vidal, Valentina S. Del Caro, Rosinaldo M. F. Filho, Helem M. dos Santos, Renato Santana Aguiar7, José Luiz Proença-Módena8, Bruce Walker Nelson9, James A. Hay10, Melodie Monod1, Xenia Miscouridou1, Helen Coupland1, Raphael Sonabend1, Michaela A. C. Vollmer1, Axel Gandy1, Carlos A. Prete2, Vitor H. Nascimento2, Marc A. Suchard11, Thomas A. Bowden3, Sergei L Kosakovsky Pond12, Chieh-Hsi Wu13, Oliver Ratmann1, Neil M. Ferguson1, Christopher Dye3, Nicholas J. Loman14, Philippe Lemey15, Andrew Rambaut4, Nelson Abrahim Fraiji, Maria Perpétuo Socorro Sampaio Carvalho, Oliver G. Pybus16, Oliver G. Pybus3, Seth Flaxman1, Samir Bhatt1, Samir Bhatt17, Ester Cerdeira Sabino2 
21 May 2021-Science
TL;DR: In this article, the authors used a two-category dynamical model that integrates genomic and mortality data to estimate that P.1 may be 1.7-to 2.4-fold more transmissible and that previous (non-P.1) infection provides 54 to 79% of the protection against infection with P.
Abstract: Cases of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection in Manaus, Brazil, resurged in late 2020 despite previously high levels of infection. Genome sequencing of viruses sampled in Manaus between November 2020 and January 2021 revealed the emergence and circulation of a novel SARS-CoV-2 variant of concern. Lineage P.1 acquired 17 mutations, including a trio in the spike protein (K417T, E484K, and N501Y) associated with increased binding to the human ACE2 (angiotensin-converting enzyme 2) receptor. Molecular clock analysis shows that P.1 emergence occurred around mid-November 2020 and was preceded by a period of faster molecular evolution. Using a two-category dynamical model that integrates genomic and mortality data, we estimate that P.1 may be 1.7- to 2.4-fold more transmissible and that previous (non-P.1) infection provides 54 to 79% of the protection against infection with P.1 that it provides against non-P.1 lineages. Enhanced global genomic surveillance of variants of concern, which may exhibit increased transmissibility and/or immune evasion, is critical to accelerate pandemic responsiveness.

985 citations

Journal ArticleDOI
03 Oct 2014-Science
TL;DR: Using statistical approaches applied to HIV-1 sequence data from central Africa, it is shown that from the 1920s Kinshasa was the focus of early transmission and the source of pre-1960 pandemic viruses elsewhere.
Abstract: Thirty years after the discovery of HIV-1, the early transmission, dissemination, and establishment of the virus in human populations remain unclear. Using statistical approaches applied to HIV-1 sequence data from central Africa, we show that from the 1920s Kinshasa (in what is now the Democratic Republic of Congo) was the focus of early transmission and the source of pre-1960 pandemic viruses elsewhere. Location and dating estimates were validated using the earliest HIV-1 archival sample, also from Kinshasa. The epidemic histories of HIV-1 group M and nonpandemic group O were similar until ~1960, after which group M underwent an epidemiological transition and outpaced regional population growth. Our results reconstruct the early dynamics of HIV-1 and emphasize the role of social changes and transport networks in the establishment of this virus in human populations.

580 citations

References
More filters
Journal ArticleDOI
TL;DR: The Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7 is presented, which implements a family of Markov chain Monte Carlo algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses.
Abstract: Computational evolutionary biology, statistical phylogenetics and coalescent-based population genetics are becoming increasingly central to the analysis and understanding of molecular sequence data. We present the Bayesian Evolutionary Analysis by Sampling Trees (BEAST) software package version 1.7, which implements a family of Markov chain Monte Carlo (MCMC) algorithms for Bayesian phylogenetic inference, divergence time dating, coalescent analysis, phylogeography and related molecular evolutionary analyses. This package includes an enhanced graphical user interface program called Bayesian Evolutionary Analysis Utility (BEAUti) that enables access to advanced models for molecular sequence and phenotypic trait evolution that were previously available to developers only. The package also provides new tools for visualizing and summarizing multispecies coalescent and phylogeographic analyses. BEAUti and BEAST 1.7 are open source under the GNU lesser general public license and available at http://beast-mcmc.googlecode.com and http://beast.bio.ed.ac.uk

9,055 citations

Journal ArticleDOI
TL;DR: A new statistical method for estimating divergence dates of species from DNA sequence data by a molecular clock approach is developed, and this dating may pose a problem for the widely believed hypothesis that the bipedal creatureAustralopithecus afarensis, which lived some 3.7 million years ago, was ancestral to man and evolved after the human-ape splitting.
Abstract: A new statistical method for estimating divergence dates of species from DNA sequence data by a molecular clock approach is developed. This method takes into account effectively the information contained in a set of DNA sequence data. The molecular clock of mitochondrial DNA (mtDNA) was calibrated by setting the date of divergence between primates and ungulates at the Cretaceous-Tertiary boundary (65 million years ago), when the extinction of dinosaurs occurred. A generalized least-squares method was applied in fitting a model to mtDNA sequence data, and the clock gave dates of 92.3 +/- 11.7, 13.3 +/- 1.5, 10.9 +/- 1.2, 3.7 +/- 0.6, and 2.7 +/- 0.6 million years ago (where the second of each pair of numbers is the standard deviation) for the separation of mouse, gibbon, orangutan, gorilla, and chimpanzee, respectively, from the line leading to humans. Although there is some uncertainty in the clock, this dating may pose a problem for the widely believed hypothesis that the pipedal creature Australopithecus afarensis, which lived some 3.7 million years ago at Laetoli in Tanzania and at Hadar in Ethiopia, was ancestral to man and evolved after the human-ape splitting. Another likelier possibility is that mtDNA was transferred through hybridization between a proto-human and a proto-chimpanzee after the former had developed bipedalism.

8,124 citations

Journal ArticleDOI
TL;DR: The Bayesian skyline plot is introduced, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecified parametric model of demographic history, and a Markov chain Monte Carlo sampling procedure is described that efficiently samples a variant of the generalized skyline plot, given sequence data.
Abstract: We introduce the Bayesian skyline plot, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecified parametric model of demographic history. We describe a Markov chain Monte Carlo sampling procedure that efficiently samples a variant of the generalized skyline plot, given sequence data, and combines these plots to generate a posterior distribution of effective population size through time. We apply the Bayesian skyline plot to simulated data sets and show that it correctly reconstructs demographic history under canonical scenarios. Finally, we compare the Bayesian skyline plot model to previous coalescent approaches by analyzing two real data sets (hepatitis C virus in Egypt and mitochondrial DNA of Beringian bison) that have been previously investigated using alternative coalescent methods. In the bison analysis, we detect a severe but previously unrecognized bottleneck, estimated to have occurred 10,000 radiocarbon years ago, which coincides with both the earliest undisputed record of large numbers of humans in Alaska and the megafaunal extinctions in North America at the beginning of the Holocene.

2,850 citations

Journal ArticleDOI
TL;DR: In this article, a new Markov chain is introduced which can be used to describe the family relationships among n individuals drawn from a particular generation of a large haploid population, and the properties of this process can be studied, simultaneously for all n, by coupling techniques.
Abstract: A new Markov chain is introduced which can be used to describe the family relationships among n individuals drawn from a particular generation of a large haploid population. The properties of this process can be studied, simultaneously for all n, by coupling techniques. Recent results in neutral mutation theory are seen as consequences of the genealogy described by the chain.

1,495 citations

Journal ArticleDOI
01 Jul 2002-Genetics
TL;DR: A Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration.
Abstract: Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.

1,000 citations