scispace - formally typeset
Search or ask a question
Author

Geoff K. Nicholls

Other affiliations: University of Auckland
Bio: Geoff K. Nicholls is an academic researcher from University of Oxford. The author has contributed to research in topics: Bayesian inference & Markov chain Monte Carlo. The author has an hindex of 20, co-authored 58 publications receiving 1981 citations. Previous affiliations of Geoff K. Nicholls include University of Auckland.


Papers
More filters
Journal ArticleDOI
01 Jul 2002-Genetics
TL;DR: A Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration.
Abstract: Molecular sequences obtained at different sampling times from populations of rapidly evolving pathogens and from ancient subfossil and fossil sources are increasingly available with modern sequencing technology. Here, we present a Bayesian statistical inference approach to the joint estimation of mutation rate and population size that incorporates the uncertainty in the genealogy of such temporally spaced sequences by using Markov chain Monte Carlo (MCMC) integration. The Kingman coalescent model is used to describe the time structure of the ancestral tree. We recover information about the unknown true ancestral coalescent tree, population size, and the overall mutation rate from temporally spaced data, that is, from nucleotide sequences gathered at different times, from different individuals, in an evolving haploid population. We briefly discuss the methodological implications and show what can be inferred, in various practically relevant states of prior knowledge. We develop extensions for exponentially growing population size and joint estimation of substitution model parameters. We illustrate some of the important features of this approach on a genealogy of HIV-1 envelope (env) partial sequences.

1,000 citations

Journal ArticleDOI
TL;DR: It is revealed that labour-intensive practices such as manuring/middening and water management formed an integral part of the agricultural strategy from the seventh millennium bc, providing the possibility for greater bureaucratic control and contributing to the wider societal changes that accompanied urbanization.
Abstract: This study sheds light on the agricultural economy that underpinned the emergence of the first urban centres in northern Mesopotamia. Using δ13C and δ15N values of crop remains from the sites of Tell Sabi Abyad, Tell Zeidan, Hamoukar, Tell Brak and Tell Leilan (6500-2000 cal bc), we reveal that labour-intensive practices such as manuring/middening and water management formed an integral part of the agricultural strategy from the seventh millennium bc. Increased agricultural production to support growing urban populations was achieved by cultivation of larger areas of land, entailing lower manure/midden inputs per unit area-extensification. Our findings paint a nuanced picture of the role of agricultural production in new forms of political centralization. The shift towards lower-input farming most plausibly developed gradually at a household level, but the increased importance of land-based wealth constituted a key potential source of political power, providing the possibility for greater bureaucratic control and contributing to the wider societal changes that accompanied urbanization.

130 citations

Journal ArticleDOI
TL;DR: An alternative data set of ancient Indo-European languages is used and two very different stochastic models of lexical evolution are employed – Gray & Atkinson’s (2003) finite-sites model and a Stochastic-Dollo model of word evolution introduced by Nicholls & Gray (in press).
Abstract: Gray & Atkinson’s (2003) application of quantitative phylogenetic methods to Dyen, Kruskal & Black’s (1992) IndoEuropean database produced controversial divergence time estimates. Here we test the robustness of these results using an alternative data set of ancient Indo-European languages. We employ two very different stochastic models of lexical evolution – Gray & Atkinson’s (2003) finite-sites model and a stochastic-Dollo model of word evolution introduced by Nicholls & Gray (in press). Results of this analysis support the findings of Gray & Atkinson (2003). We also tested the ability of both methods to reconstruct phylogeny and divergence times accurately from synthetic data. The methods performed well under a range of scenarios, including widespread and localized borrowing.

85 citations

01 Jan 2007
TL;DR: In this paper, the authors demonstrate Bayesian inference from electrical impedance tomography (EIT) data by using a Markov chain with Metropolis-Hastings dynamics, and treating the conductivity update as a small perturbation.
Abstract: Electrical impedance tomography (EIT) is a technique for imaging the conductivity of material inside an object, using current/voltage measurements at its surface. We demonstrate Bayesian inference from EIT data. A prior probability distribution modeling the unknown conductivity distribution is given. A MCMC algorithm is specified which samples the posterior probability for the conductivity given the prior and the EIT data. In order to compute the likelihood of a conductivity distribution it is necessary to solve a second order linear partial differential equation (PDE). This would appear to make the sampling problem computationally intractable. However by using a Markov chain with Metropolis-Hastings dynamics, and treating the conductivity update as a small perturbation, we are able to avoid solving the PDE for those updates which are rejected by the MCMC sampling process. For real applications the likelihood will need to be sensitive to very small changes in the state, so that the posterior distribution may be sharply peaked as well as multi-modal. The details of the Metropolis-Hastings dynamics are chosen so that ergodic behavior is displayed on useful time scales. We show that the sampling problem is tractable, and illustrate inference from a simple synthetic data set.

77 citations

Journal ArticleDOI
TL;DR: In this paper, radiocarbon determinations obtained from the excavation of heat retainer hearths from Sturt National Park in western New South Wales, indicate Aboriginal occupation of the arid margin of Australia during the last 1700 years.

70 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: BEAST is a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree that provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions.
Abstract: The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented. BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at http://beast-mcmc.googlecode.com/ under the GNU LGPL license. BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.

11,916 citations

Journal ArticleDOI
TL;DR: An overview of the main model components used in chronological analysis, their mathematical formulation, and examples of how such analyses can be performed using the latest version of the OxCal software (v4) are given.
Abstract: If radiocarbon measurements are to be used at all for chronological purposes, we have to use statistical methods for calibration. The most widely used method of calibration can be seen as a simple application of Bayesian statistics, which uses both the information from the new measurement and information from the 14C calibration curve. In most dating applications, however, we have larger numbers of 14C measurements and we wish to relate those to events in the past. Bayesian statistics provides a coherent framework in which such analysis can be performed and is becoming a core element in many 14C dating projects. This article gives an overview of the main model components used in chronological analysis, their mathematical formulation, and examples of how such analyses can be performed using the latest version of the OxCal software (v4). Many such models can be put together, in a modular fashion, from simple elements, with defined constraints and groupings. In other cases, the commonly used "uniform phase" models might not be appropriate, and ramped, exponential, or normal distributions of events might be more useful. When considering analyses of these kinds, it is useful to be able run simulations on synthetic data. Methods for performing such tests are discussed here along with other methods of diagnosing possible problems with statistical models of this kind.

6,323 citations

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a new approach to perform relaxed phylogenetic analysis, which can be used to estimate phylogenies and divergence times in the face of uncertainty in evolutionary rates and calibration times.
Abstract: In phylogenetics, the unrooted model of phylogeny and the strict molecular clock model are two extremes of a continuum. Despite their dominance in phylogenetic inference, it is evident that both are biologically unrealistic and that the real evolutionary process lies between these two extremes. Fortunately, intermediate models employing relaxed molecular clocks have been described. These models open the gate to a new field of “relaxed phylogenetics.” Here we introduce a new approach to performing relaxed phylogenetic analysis. We describe how it can be used to estimate phylogenies and divergence times in the face of uncertainty in evolutionary rates and calibration times. Our approach also provides a means for measuring the clocklikeness of datasets and comparing this measure between different genes and phylogenies. We find no significant rate autocorrelation among branches in three large datasets, suggesting that autocorrelated models are not necessarily suitable for these data. In addition, we place these datasets on the continuum of clocklikeness between a strict molecular clock and the alternative unrooted extreme. Finally, we present analyses of 102 bacterial, 106 yeast, 61 plant, 99 metazoan, and 500 primate alignments. From these we conclude that our method is phylogenetically more accurate and precise than the traditional unrooted model while adding the ability to infer a timescale to evolution.

5,812 citations

Journal ArticleDOI
TL;DR: The software package Tracer is presented, for visualizing and analyzing the MCMC trace files generated through Bayesian phylogenetic inference, which provides kernel density estimation, multivariate visualization, demographic trajectory reconstruction, conditional posterior distribution summary, and more.
Abstract: Bayesian inference of phylogeny using Markov chain Monte Carlo (MCMC) plays a central role in understanding evolutionary history from molecular sequence data. Visualizing and analyzing the MCMC-generated samples from the posterior distribution is a key step in any non-trivial Bayesian inference. We present the software package Tracer (version 1.7) for visualizing and analyzing the MCMC trace files generated through Bayesian phylogenetic inference. Tracer provides kernel density estimation, multivariate visualization, demographic trajectory reconstruction, conditional posterior distribution summary, and more. Tracer is open-source and available at http://beast.community/tracer.

5,492 citations

Journal ArticleDOI
TL;DR: The Bayesian skyline plot is introduced, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecified parametric model of demographic history, and a Markov chain Monte Carlo sampling procedure is described that efficiently samples a variant of the generalized skyline plot, given sequence data.
Abstract: We introduce the Bayesian skyline plot, a new method for estimating past population dynamics through time from a sample of molecular sequences without dependence on a prespecified parametric model of demographic history. We describe a Markov chain Monte Carlo sampling procedure that efficiently samples a variant of the generalized skyline plot, given sequence data, and combines these plots to generate a posterior distribution of effective population size through time. We apply the Bayesian skyline plot to simulated data sets and show that it correctly reconstructs demographic history under canonical scenarios. Finally, we compare the Bayesian skyline plot model to previous coalescent approaches by analyzing two real data sets (hepatitis C virus in Egypt and mitochondrial DNA of Beringian bison) that have been previously investigated using alternative coalescent methods. In the bison analysis, we detect a severe but previously unrecognized bottleneck, estimated to have occurred 10,000 radiocarbon years ago, which coincides with both the earliest undisputed record of large numbers of humans in Alaska and the megafaunal extinctions in North America at the beginning of the Holocene.

2,850 citations