scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 2000"


Journal ArticleDOI
TL;DR: A simple method of replacing costly computation of nonlinear (on-line) Bayesian similarity measures by inexpensive linear subspace projections and simple Euclidean norms is derived, thus resulting in a significant computational speed-up for implementation with very large databases.

660 citations


Journal ArticleDOI
TL;DR: A unified framework for a Bayesian analysis of incidence or mortality data in space and time is proposed and an epidemiological hypothesis about the temporal development of the association between urbanization and risk factors for cancer is confirmed.
Abstract: This paper proposes a unified framework for a Bayesian analysis of incidence or mortality data in space and time. We introduce four different types of prior distributions for space x time interaction in extension of a model with only main effects. Each type implies a certain degree of prior dependence for the interaction parameters, and corresponds to the product of one of the two spatial with one of the two temporal main effects. The methodology is illustrated by an analysis of Ohio lung cancer data 1968-1988 via Markov chain Monte Carlo simulation. We compare the fit and the complexity of several models with different types of interaction by means of quantities related to the posterior deviance. Our results confirm an epidemiological hypothesis about the temporal development of the association between urbanization and risk factors for cancer.

530 citations


Proceedings Article
29 Jun 2000
TL;DR: It is proposed that the learning process estimates online the full posterior distribution over models and to determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming.
Abstract: The reinforcement learning problem can be decomposed into two parallel types of inference: (i) estimating the parameters of a model for the underlying process; (ii) determining behavior which maximizes return under the estimated model. Following Dearden, Friedman and Andre (1999), it is proposed that the learning process estimates online the full posterior distribution over models. To determine behavior, a hypothesis is sampled from this distribution and the greedy policy with respect to the hypothesis is obtained by dynamic programming. By using a different hypothesis for each trial appropriate exploratory and exploitative behavior is obtained. This Bayesian method always converges to the optimal policy for a stationary process with discrete states.

489 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss the development of dynamic factor models for multivariate financial time series, and the incorporation of stochastic volatility components for latent factor processes, and explore the dynamic factor structure of daily spot exchange rates for a selection of international currencies.
Abstract: We discuss the development of dynamic factor models for multivariate financial time series, and the incorporation of stochastic volatility components for latent factor processes. Bayesian inference and computation is developed and explored in a study of the dynamic factor structure of daily spot exchange rates for a selection of international currencies. The models are direct generalizations of univariate stochastic volatility models and represent specific varieties of models recently discussed in the growing multivariate stochastic volatility literature. We discuss model fitting based on retrospective data and sequential analysis for forward filtering and short-term forecasting. Analyses are compared with results from the much simpler method of dynamic variance-matrix discounting that, for over a decade, has been a standard approach in applied financial econometrics. We study these models in analysis, forecasting, and sequential portfolio allocation for a selected set of international exchange-rate-retur...

477 citations


Book ChapterDOI
26 Jun 2000
TL;DR: A novel and tractable probabilistic approach to modelling manifolds which can handle complex non-linearities and is illustrated using two classical problems: modelling the manifold of face images and modelling the manifolds of hand-written digits.
Abstract: In recent years several techniques have been proposed for modelling the low-dimensional manifolds, or 'subspaces', of natural images. Examples include principal component analysis (as used for instance in 'eigen-faces'), independent component analysis, and auto-encoder neural networks. Such methods suffer from a number of restrictions such as the limitation to linear manifolds or the absence of a probablistic representation. In this paper we exploit recent developments in the fields of variational inference and latent variable models to develop a novel and tractable probabilistic approach to modelling manifolds which can handle complex non-linearities. Our framework comprises a mixture of sub-space components in which both the number of components and the effective dimensionality of the subspaces are determined automatically as part of the Bayesian inference procedure. We illustrate our approach using two classical problems: modelling the manifold of face images and modelling the manifolds of hand-written digits.

425 citations


Journal ArticleDOI
TL;DR: This paper proposes two alternatives for computing a p value, the conditional predictive p value and the partial posterior predictive pvalue, and indicates their advantages from both Bayesian and frequentist perspectives.
Abstract: The problem of investigating compatibility of an assumed model with the data is investigated in the situation when the assumed model has unknown parameters. The most frequently used measures of compatibility are p values, based on statistics T for which large values are deemed to indicate incompatibility of the data and the model. When the null model has unknown parameters, p values are not uniquely defined. The proposals for computing a p value in such a situation include the plug-in and similar p values on the frequentist side, and the predictive and posterior predictive p values on the Bayesian side. We propose two alternatives, the conditional predictive p value and the partial posterior predictive p value, and indicate their advantages from both Bayesian and frequentist perspectives.

419 citations


Journal ArticleDOI
TL;DR: A full structured review of applications of Bayesian methods to randomised controlled trials, observational studies, and the synthesis of evidence, in a form which should be reasonably straightforward to update is provided.
Abstract: Background Bayesian methods may be defined as the explicit quantitative use of external evidence in the design, monitoring, analysis, interpretation and reporting of a health technology assessment. In outline, the methods involve formal combination through the use of Bayes's theorem of: 1. a prior distribution or belief about the value of a quantity of interest (for example, a treatment effect) based on evidence not derived from the study under analysis, with 2. a summary of the information concerning the same quantity available from the data collected in the study (known as the likelihood), to yield 3. an updated or posterior distribution of the quantity of interest. These methods thus directly address the question of how new evidence should change what we currently believe. They extend naturally into making predictions, synthesising evidence from multiple sources, and designing studies: in addition, if we are willing to quantify the value of different consequences as a 'loss function', Bayesian methods extend into a full decision-theoretic approach to study design, monitoring and eventual policy decision-making. Nonetheless, Bayesian methods are a controversial topic in that they may involve the explicit use of subjective judgements in what is conventionally supposed to be a rigorous scientific exercise. Objectives This report is intended to provide: 1. a brief review of the essential ideas of Bayesian analysis 2. a full structured review of applications of Bayesian methods to randomised controlled trials, observational studies, and the synthesis of evidence, in a form which should be reasonably straightforward to update 3. a critical commentary on similarities and differences between Bayesian and conventional approaches 4. criteria for assessing the reporting of a Bayesian analysis 5. a comprehensive list of published 'three-star' examples, in which a proper prior distribution has been used for the quantity of primary interest 6. tutorial case studies of a variety of types 7. recommendations on how Bayesian methods and approaches may be assimilated into health technology assessments in a variety of contexts and by a variety of participants in the research process. Methods The BIDS ISI database was searched using the terms 'Bayes' or 'Bayesian'. This yielded almost 4000 papers published in the period 1990-98. All resultant abstracts were reviewed for relevance to health technology assessment; about 250 were so identified, and used as the basis for forward and backward searches. In addition EMBASE and MEDLINE databases were searched, along with websites of prominent authors, and available personal collections of references, finally yielding nearly 500 relevant references. A comprehensive review of all references describing use of 'proper' Bayesian methods in health technology assessment (those which update an informative prior distribution through the use of Bayes's theorem) has been attempted, and around 30 such papers are reported in structured form. There has been very limited use of proper Bayesian methods in practice, and relevant studies appear to be relatively easily identified. Results Bayesian methods in the health technology assessment context 1. Different contexts may demand different statistical approaches. Prior opinions are most valuable when the assessment forms part of a series of similar studies. A decision-theoretic approach may be appropriate where the consequences of a study are reasonably predictable. 2. The prior distribution is important and not unique, and so a range of options should be examined in a sensitivity analysis. Bayesian methods are best seen as a transformation from initial to final opinion, rather than providing a single 'correct' inference. 3. The use of a prior is based on judgement, and hence a degree of subjectivity cannot be avoided. However, subjective priors tend to show predictable biases, and archetypal priors may be useful for identifying a reasonable range of prior opinion.

364 citations


Journal ArticleDOI
TL;DR: A modified approach is proposed, called Bayesian melding, which takes into full account information and uncertainty about both inputs and outputs to the model, while avoiding the Borel paradox and is implemented here by posterior simulation using the sampling-importance-resampling (SIR) algorithm.
Abstract: Deterministic simulation models are used in many areas of science, engineering, and policy making. Typically, these are complex models that attempt to capture underlying mechanisms in considerable detail, and they have many user-specified inputs. The inputs are often specified by some form of trial-and-error approach in which plausible values are postulated, the corresponding outputs inspected, and the inputs modified until plausible outputs are obtained. Here we address the issue of more formal inference for such models. A probabilistic approach, called Bayesian synthesis, was shown to suffer from the Borel paradox, according to which the results can depend on the parameterization of the model. We propose a modified approach, called Bayesian melding which takes into full account information and uncertainty about both inputs and outputs to the model, while avoiding the Borel paradox. This is done by recognizing the existence of two priors, one implicit and one explicit, on each input and output; ...

347 citations


Journal ArticleDOI
TL;DR: A Bayesian probabilistic methodology for structural health monitoring is presented in this paper, where a high likelihood of reduction in model stiffness at a location is taken as a proxy for damage at the corresponding structural location.
Abstract: A Bayesian probabilistic methodology for structural health monitoring is presented. The method uses a sequence of identified modal parameter data sets to compute the probability that continually updated model stiffness parameters are less than a specified fraction of the corresponding initial model stiffness parameters. In this approach, a high likelihood of reduction in model stiffness at a location is taken as a proxy for damage at the corresponding structural location. The concept extends the idea of using as indicators of damage the changes in structural model parameters that are identified from modal parameter data sets when the structure is initially in an undamaged state and then later in a possibly damaged state. The extension is needed, since effects such as variation in the identified modal parameters in the absence of damage, as well as unavoidable model error, lead to uncertainties in the updated model parameters that in practice obscure health assessment. The method is illustrated by simulating on-line monitoring, wherein specified modal parameters are identified on a regular basis and the probability of damage for each substructure is continually updated.

346 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate the portfolio choices of mean-variance-optimizing investors who use sample evidence to update prior beliefs centered on either risk-based or characteristic-based pricing models.

341 citations


Proceedings Article
01 Jan 2000
TL;DR: It is demonstrated how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set.
Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set.

Journal ArticleDOI
TL;DR: In this article, the authors present a new prior and corresponding algorithm for Bayesian analysis of the multinomial probit model, which places a prior directly on the identified parameter space.

Journal ArticleDOI
TL;DR: This work proposes a nonparametric Bayesian approach for the detection of clusters of elevated (or lowered) risk based on Green's (1995, Biometrika 82, 711-732) reversible jump MCMC methodology.
Abstract: Summary. An interesting epidemiological problem is the analysis of geographical variation in rates of disease incidence or mortality. One goal of such an analysis is to detect clusters of elevated (or lowered) risk in order to identify unknown risk factors regarding the disease. We propose a nonparametric Bayesian approach for the detection of such clusters based on Green's (1995, Biometrika82, 711–732) reversible jump MCMC methodology. The prior model assumes that geographical regions can be combined in clusters with constant relative risk within a cluster. The number of clusters, the location of the clusters, and the risk within each cluster is unknown. This specification can be seen as a change-point problem of variable dimension in irregular, discrete space. We illustrate our method through an analysis of oral cavity cancer mortality rates in Germany and compare the results with those obtained by the commonly used Bayesian disease mapping method of Besag, York, and Mollie (1991, Annals of the Institute of Statistical Mathematics, 43, 1–59).

Proceedings Article
30 Jun 2000
TL;DR: This paper shows how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed ordering over network variables, and uses this result as the basis for an algorithm that approximates the Bayesian posterior of a feature.
Abstract: In many domains, we are interested in analyzing the structure of the underlying distribution, e.g., whether one variable is a direct parent of the other. Bayesian model-selection attempts to find the MAP model and use its structure to answer these questions. However, when the amount of available data is modest, there might be many models that have non-negligible posterior. Thus, we want compute the Bayesian posterior of a feature, i.e., the total posterior probability of all models that contain it. In this paper, we propose a new approach for this task. We first show how to efficiently compute a sum over the exponential number of networks that are consistent with a fixed ordering over network variables. This allows us to compute, for a given ordering, both the marginal probability of the data and the posterior of a feature. We then use this result as the basis for an algorithm that approximates the Bayesian posterior of a feature. Our approach uses an Markov Chain Monte Carlo (MCMC) method, but over orderings rather than over network structures. The space of orderings is much smaller and more regular than the space of structures, and has a smoother posterior "landscape". We present empirical results on synthetic and real-life datasets that compare our approach to full model averaging (when possible), to MCMC over network structures, and to a non-Bayesian bootstrap approach.

Reference BookDOI
25 May 2000
TL;DR: In this paper, the authors describe how to conceptualize, perform, and critique traditional generalized linear models from a Bayesian perspective and how to use modern computational methods to summarize inferences using simulation.
Abstract: This volume describes how to conceptualize, perform, and critique traditional generalized linear models (GLMs) from a Bayesian perspective and how to use modern computational methods to summarize inferences using simulation. Introducing dynamic modeling for GLMs and containing over 1000 references and equations, Generalized Linear Models considers

Journal ArticleDOI
TL;DR: The authors proposed and implemented a coherent statistical framework for combining theoretical and empirical models of macroeconomic activity, which enables the formal yet probabilistic incorporation of uncertainty regarding the parameterization of theoretical models.

Journal ArticleDOI
TL;DR: This chapter introduces statistical concepts, prior structures, posterior smoothing, and Bayes-Stein estimation, and discusses models with several unknown parameters.
Abstract: 1. Introductory statistical concepts 2. The discrete version of Bayes' theorem 3. Models with a single unknown parameter 4. The expected utility hypothesis and its alternatives 5. Models with several unknown parameters 6. Prior structures, posterior smoothing, and Bayes-Stein estimation Guide to worked examples Guide to self-study exercises.

Journal ArticleDOI
TL;DR: In this article, a Bayesian procedure is proposed to quantify the modeling uncertainty, including the uncertainty in mechanical and statistical model selection and distribution parameters, for a fatigue reliability problem with the combination of two competing crack growth models.

Journal ArticleDOI
TL;DR: It was found that frequency formats were not generally associated with better performance than probability formats unless they were presented in a manner which facilitated construction of a set inclusion mental model, and it was demonstrated that the use of frequency information may promote biases in the weighting of information.

Journal ArticleDOI
TL;DR: It is found that the Bayesian approach with a particular choice of diffuse inverse Wishart prior distribution for the (co)variance parameters performs at least as well—in terms of bias of estimates and actual coverage of nominal 95% intervals—as maximum likelihood methods in RSR models with medium sample sizes.
Abstract: We use simulation studies (a) to compare Bayesian and likelihood fitting methods, in terms of validity of conclusions, in two-level random-slopes regression (RSR) models, and (b) to compare several Bayesian estimation methods based on Markov chain Monte Carlo, in terms of computational efficiency, in random-effects logistic regression (RELR) models. We find (a) that the Bayesian approach with a particular choice of diffuse inverse Wishart prior distribution for the (co) variance parameters performs at least as well—in terms of bias of estimates and actual coverage of nominal 95% intervals—as maximum likelihood methods in RSR models with medium sample sizes (expressed in terms of the number J 7 of level-2 units), but neither approach performs as well as might be hoped with small J; and (b) that an adaptive hybrid Metropolis-Gibbs sampling method we have developed for use in the multilevel modeling package M1wiN outperforms adaptive rejection Gibbs sampling in the RELR models we have considered, sometimes by a wide margin.

Journal ArticleDOI
TL;DR: The main purpose is to illustrate the ease with which the Bayesian stochastic volatility model can now be studied routinely via BUGS (Bayesian Inference Using Gibbs Sampling), a recently developed, user-friendly, and freely available software package.
Abstract: This paper reviews the general Bayesian approach to parameter estimation in stochastic volatility models with posterior computations performed by Gibbs sampling. The main purpose is to illustrate the ease with which the Bayesian stochastic volatility model can now be studied routinely via BUGS (Bayesian Inference Using Gibbs Sampling), a recently developed, user-friendly, and freely available software package. It is an ideal software tool for the exploratory phase of model building as any modifications of a model including changes of priors and sampling error distributions are readily realized with only minor changes of the code. BUGS automates the calculation of the full conditional posterior distributions using a model representation by directed acyclic graphs. It contains an expert system for choosing an efficient sampling method for each full conditional. Furthermore, software for convergence diagnostics and statistical summaries is available for the BUGS output. The BUGS implementation of a stochastic volatility model is illustrated using a time series of daily Pound/Dollar exchange rates.

Journal ArticleDOI
TL;DR: In this article, the authors show that the standard invariant prior leads to an improper posterior distribution for generalized linear mixed models, and they propose alternative reference priors: an approximate uniform shrinkage prior and an approximate Jeffreys's prior.
Abstract: Bayesian methods furnish an attractive approach to inference in generalized linear mixed models. In the absence of subjective prior information for the random-effect variance components, these analyses are typically conducted using either the standard invariant prior for normal responses or diffuse conjugate priors. Previous work has pointed out serious difficulties with both strategies, and we show here that as in normal mixed models, the standard invariant prior leads to an improper posterior distribution for generalized linear mixed models. This article proposes and investigates two alternative reference (i.e., “objective” or “noninformative”) priors: an approximate uniform shrinkage prior and an approximate Jeffreys's prior. We give conditions for the existence of the posterior distribution under any prior for the variance components in conjunction with a uniform prior for the fixed effects. The approximate uniform shrinkage prior is shown to satisfy these conditions for several families of d...

Journal ArticleDOI
TL;DR: A strategy for a statistically rigorous Bayesian approach to the problem of determining cosmological parameters from the results of observations of anisotropies in the cosmic microwave background by relying on Markov chain Monte Carlo methods, specifically the Metropolis-Hastings algorithm.
Abstract: We present a strategy for a statistically rigorous Bayesian approach to the problem of determining cosmological parameters from the results of observations of anisotropies in the cosmic microwave background. Our strategy relies on Markov chain Monte Carlo methods, specifically the Metropolis-Hastings algorithm, to perform the necessary high-dimensional integrals. We describe the Metropolis-Hastings algorithm in detail and discuss the results of our test on simulated data.

Journal ArticleDOI
TL;DR: In this article, a Bayesian network model of the consumer complaint process is presented, and the outputs of the Bayesian model provide much insight into the deterministic nature of consumer complaints.
Abstract: The purpose of this article is to present a Bayesian network model of the consumer complaint process. The outputs of the Bayesian model—conditional probabilities—provide much insight into the deter...

Journal ArticleDOI
TL;DR: BugsS as mentioned in this paper is a Bayesian inference using Gibbs sampling (BUGS) tool for parameter estimation in stochastic volatility models with posterior computations performed by Gibbs sampling.
Abstract: Summary This paper reviews the general Bayesian approach to parameter estimation in stochastic volatility models with posterior computations performed by Gibbs sampling. The main purpose is to illustrate the ease with which the Bayesian stochastic volatility model can now be studied routinely via BUGS (Bayesian inference using Gibbs sampling), a recently developed, user-friendly, and freely available software package. It is an ideal software tool for the exploratory phase of model building as any modifications of a model including changes of priors and sampling error distributions are readily realized with only minor changes of the code. However, due to the single move Gibbs sampler, convergence can be slow. BUGS automates the calculation of the full conditional posterior distributions using a model representation by directed acyclic graphs. It contains an expert system for choosing an effective sampling method for each full conditional. Furthermore, software for convergence diagnostics and statistical summaries is available for the BUGS output. The BUGS implementation of a stochastic volatility model is illustrated using a time series of daily Pound/Dollar exchange rates.

Proceedings Article
01 Jan 2000
TL;DR: In this article, three alternative approaches to prediction confidence estimation are presented and compared: the maximum likelihood, approximate Bayesian, and the bootstrap technique, which are tested on a number of controlled artificial problems and a real, industrial regression application, the prediction of paper "curl".
Abstract: Feedforward neural networks, particularly multilayer perceptrons, are widely used in regression and classification tasks. A reliable and practical measure of prediction confidence is essential. In this work three alternative approaches to prediction confidence estimation are presented and compared. The three methods are the maximum likelihood, approximate Bayesian, and the bootstrap technique. We consider prediction uncertainty owing to both data noise and model parameter misspecification. The methods are tested on a number of controlled artificial problems and a real, industrial regression application, the prediction of paper "curl". Confidence estimation performance is assessed by calculating the mean and standard deviation of the prediction interval coverage probability. We show that treating data noise variance as a function of the inputs is appropriate for the curl prediction task. Moreover, we show that the mean coverage probability can only gauge confidence estimation performance as an average over the input space, i.e., global performance and that the standard deviation of the coverage is unreliable as a measure of local performance. The approximate Bayesian approach is found to perform better in terms of global performance.

Journal ArticleDOI
TL;DR: The relationship among identifiability, Bayesian learning and MCMC convergence rates for a common class of spatial models, in order to provide guidance for prior selection and algorithm tuning is investigated.
Abstract: The marked increase in popularity of Bayesian methods in statistical practice over the last decade owes much to the simultaneous development of Markov chain Monte Carlo (MCMC) methods for the evaluation of requisite posterior distributions. However, along with this increase in computing power has come the temptation to fit models larger than the data can readily support, meaning that often the propriety of the posterior distributions for certain parameters depends on the propriety of the associated prior distributions. An important example arises in spatial modelling, wherein separate random effects for capturing unstructured heterogeneity and spatial clustering are of substantive interest, even though only their sum is well identified by the data. Increasing the informative content of the associated prior distributions offers an obvious remedy, but one that hampers parameter interpretability and may also significantly slow the convergence of the MCMC algorithm. In this paper we investigate the relationship among identifiability, Bayesian learning and MCMC convergence rates for a common class of spatial models, in order to provide guidance for prior selection and algorithm tuning. We are able to elucidate the key issues with relatively simple examples, and also illustrate the varying impacts of covariates, outliers and algorithm starting values on the resulting algorithms and posterior distributions.

Proceedings Article
29 Jun 2000
TL;DR: This paper studies Bayesian model averaging’s application to combining rule sets, and compares it with bagging and partitioning, two popular but more ad hoc alternatives, showing its error rates are consistently higher than the other methods.
Abstract: Although Bayesian model averaging is theoretically the optimal method for combining learned models, it has seen very little use in machine learning. In this paper we study its application to combining rule sets, and compare it with bagging and partitioning, two popular but more ad hoc alternatives. Our experiments show that, surprisingly, Bayesian model averaging’s error rates are consistently higher than the other methods’. Further investigation shows this to be due to a marked tendency to overfit on the part of Bayesian model averaging, contradicting previous beliefs that it solves (or avoids) the overfitting problem.

Journal ArticleDOI
TL;DR: A probabilistic model of protein sequence/structure relationships in terms of structural segments is developed, and secondary structure prediction is formulated as a general Bayesian inference problem.
Abstract: We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for α-helices, β-strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting efficient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is...

Journal ArticleDOI
TL;DR: A Bayesian method is presented for the analysis of two types of sudden change at an unknown time-point in a sequence of energy inflows modeled by independent normal random variables.