scispace - formally typeset
Search or ask a question

Showing papers by "Walter R. Gilks published in 2002"


Journal ArticleDOI
TL;DR: A dynamical probabilistic model is developed for chains of misannotation in protein databases, which shows that this iterative approach to annotation quality leads to a systematic deterioration of database quality.
Abstract: Public sequence databases contain information on the sequence, structure and function of proteins. Genome sequencing projects have led to a rapid increase in protein sequence information, but reliable, experimentally verified, information on protein function lags a long way behind. To address this deficit, functional annotation in protein databases is often inferred by sequence similarity to homologous, annotated proteins, with the attendant possibility of error. Now, the functional annotation in these homologous proteins may itself have been acquired through sequence similarity to yet other proteins, and it is generally not possible to determine how the functional annotation of any given protein has been acquired. Thus the possibility of chains of misannotation arises, a process we term 'error percolation'. With some simple assumptions, we develop a dynamical probabilistic model for these misannotation chains. By exploring the consequences of the model for annotation quality it is evident that this iterative approach leads to a systematic deterioration of database quality.

173 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a Bayesian formulation of the back-calculation method which allows a formal treatment of uncertainty and the inclusion of extra information, within a single coherent composite model.
Abstract: Short-term projections of the acquired immune deficiency syndrome (AIDS) epidemic in England and Wales have been regularly updated since the publication of the Cox report in 1988. The key approach for those updates has been the back-calculation method, which has been informally adapted to acknowledge various sources of uncertainty as well as to incorporate increasingly available information on the spread of the human immunodeficiency virus (HIV) in the population. We propose a Bayesian formulation of the back-calculation method which allows a formal treatment of uncertainty and the inclusion of extra information, within a single coherent composite model. Estimation of the variably dimensioned model is carried out by using reversible-jump Markov chain Monte Carlo methods. Application of the model to data for homosexual and bisexual males in England and Wales is presented, and the role of the various sources of information and model assumptions is appraised. Our results show a massive peak in HIV infections around 1983 and suggest that the incidence of AIDS has now reached a plateau, although there is still substantial uncertainty about the future.

50 citations