scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 1992"


Journal ArticleDOI
TL;DR: The focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normal- ity after transformations and marginalization, and the results are derived as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations.
Abstract: The Gibbs sampler, the algorithm of Metropolis and similar iterative simulation methods are potentially very helpful for summarizing multivariate distributions. Used naively, however, iterative simulation can give misleading answers. Our methods are simple and generally applicable to the output of any iterative simulation; they are designed for researchers primarily interested in the science underlying the data and models they are analyzing, rather than for researchers interested in the probability theory underlying the iterative simulations themselves. Our recommended strategy is to use several independent sequences, with starting points sampled from an overdispersed distribution. At each step of the iterative simulation, we obtain, for each univariate estimand of interest, a distributional estimate and an estimate of how much sharper the distributional estimate might become if the simulations were continued indefinitely. Because our focus is on applied inference for Bayesian posterior distributions in real problems, which often tend toward normality after transformations and marginalization, we derive our results as normal-theory approximations to exact Bayesian inference, conditional on the observed simulations. The methods are illustrated on a random-effects mixture model applied to experimental measurements of reaction times of normal and schizophrenic patients.

13,884 citations


Journal ArticleDOI
TL;DR: The GLUE procedure works with multiple sets of parameter values and allows that, within the limitations of a given model structure and errors in boundary conditions and field observations, different sets of values may be equally likely as simulators of a catchment.
Abstract: This paper describes a methodology for calibration and uncertainty estimation of distributed models based on generalized likelihood measures. The GLUE procedure works with multiple sets of parameter values and allows that, within the limitations of a given model structure and errors in boundary conditions and field observations, different sets of values may be equally likely as simulators of a catchment. Procedures for incorporating different types of observations into the calibration; Bayesian updating of likelihood values and evaluating the value of additional observations to the calibration process are described. The procedure is computationally intensive but has been implemented on a local parallel processing computer.

4,146 citations


Journal ArticleDOI
TL;DR: Within a Bayesian learning framework, objective functions are discussed that measure the expected informativeness of candidate measurements that depend on the assumption that the hypothesis space is correct.
Abstract: Learning can be made more efficient if we can actively select particularly salient data points. Within a Bayesian learning framework, objective functions are discussed that measure the expected informativeness of candidate measurements. Three alternative specifications of what we want to gain information about lead to three different criteria for data selection. All these criteria depend on the assumption that the hypothesis space is correct, which may prove to be their main weakness.

1,316 citations


Journal ArticleDOI
TL;DR: The literature of regression analysis with missing values of the independent variables is reviewed in this article, where six classes of procedures are distinguished: complete case analysis, available case methods, least squares on imputed data, maximum likelihood, Bayesian methods, and multiple imputation.
Abstract: The literature of regression analysis with missing values of the independent variables is reviewed. Six classes of procedures are distinguished: complete case analysis, available case methods, least squares on imputed data, maximum likelihood, Bayesian methods, and multiple imputation. Methods are compared and illustrated when missing data are confined to one independent variable, and extensions to more general patterns are indicated. Attention is paid to the performance of methods when the missing data are not missing completely at random. Least squares methods that fill in missing X's using only data on the X's are contrasted with likelihood-based methods that use data on the X's and Y. The latter approach is preferred and provides methods for elaboration of the basic normal linear regression model. It is suggested that more widely distributed software is needed that advances beyond complete-case analysis, available-case analysis, and naive imputation methods. Bayesian simulation methods and mu...

1,074 citations


Journal Article
TL;DR: In this article, a sampling-resampling perspective on Bayesian inference is presented, which has both pedagogic appeal and suggests easily implemented calculation strategies, such as sampling-based methods.
Abstract: Even to the initiated, statistical calculations based on Bayes's Theorem can be daunting because of the numerical integrations required in all but the simplest applications. Moreover, from a teaching perspective, introductions to Bayesian statistics—if they are given at all—are circumscribed by these apparent calculational difficulties. Here we offer a straightforward sampling-resampling perspective on Bayesian inference, which has both pedagogic appeal and suggests easily implemented calculation strategies.

861 citations


Journal ArticleDOI
TL;DR: A straightforward sampling-resampling perspective on Bayesian inference is offered, which has both pedagogic appeal and suggests easily implemented calculation strategies.
Abstract: Even to the initiated, statistical calculations based on Bayes's Theorem can be daunting because of the numerical integrations required in all but the simplest applications. Moreover, from a teaching perspective, introductions to Bayesian statistics—if they are given at all—are circumscribed by these apparent calculational difficulties. Here we offer a straightforward sampling-resampling perspective on Bayesian inference, which has both pedagogic appeal and suggests easily implemented calculation strategies.

852 citations


DissertationDOI
01 Jan 1992
TL;DR: The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non-linear models, and it is shown that the careful incorporation of error bar information into a classifier's predictions yields improved performance.
Abstract: The Bayesian framework for model comparison and regularisation is demonstrated by studying interpolation and classification problems modelled with both linear and non-linear models. This framework quantitatively embodies 'Occam's razor'. Over-complex and under-regularised models are automatically inferred to be less probable, even though their flexibility allows them to fit the data better. When applied to 'neural networks', the Bayesian framework makes possible (1) objective comparison of solutions using alternative network architectures; (2) objective stopping rules for network pruning or growing procedures; (3) objective choice of type of weight decay terms (or regularisers); (4) on-line techniques for optimising weight decay (or regularisation constant) magnitude; (5) a measure of the effective number of well-determined parameters in a model; (6) quantified estimates of the error bars on network parameters and on network output. In the case of classification models, it is shown that the careful incorporation of error bar information into a classifier's predictions yields improved performance. Comparisons of the inferences of the Bayesian framework with more traditional cross-validation methods help detect poor underlying assumptions in learning models. The relationship of the Bayesian learning framework to 'active learning' is examined. Objective functions are discussed which measure the expected informativeness of candidate data measurements, in the context of both interpolation and classification problems. The concepts and methods described in this thesis are quite general and will be applicable to other data modelling problems whether they involve regression, classification or density estimation.

605 citations



Journal Article
TL;DR: 'Ockham's razor', the ad hoc principle enjoining the greatest possible simplicity in theoretical explanations, is presently shown to be justifiable as a consequence of Bayesian inference.
Abstract: 'Ockham's razor', the ad hoc principle enjoining the greatest possible simplicity in theoretical explanations, is presently shown to be justifiable as a consequence of Bayesian inference; Bayesian analysis can, moreover, clarify the nature of the 'simplest' hypothesis consistent with the given data. By choosing the prior probabilities of hypotheses, it becomes possible to quantify the scientific judgment that simpler hypotheses are more likely to be correct. Bayesian analysis also shows that a hypothesis with fewer adjustable parameters intrinsically possesses an enhanced posterior probability, due to the clarity of its predictions.

518 citations


Journal ArticleDOI
TL;DR: This paper illustrates how the Gibbs sampler approach to Bayesian calculation avoids these difficulties and leads to straightforwardly implemented procedures, even for apparently very complicated model forms.
Abstract: Constrained parameter problems arise in a wide variety of applications, including bioassay, actuarial graduation, ordinal categorical data, response surfaces, reliability development testing, and variance component models. Truncated data problems arise naturally in survival and failure time studies, ordinal data models, and categorical data studies aimed at uncovering underlying continuous distributions. In many applications both parameter constraints and data truncation are present. The statistical literature on such problems is very extensive, reflecting both the problems’ widespread occurrence in applications and the methodological challenges that they pose. However, it is striking that so little of this applied and theoretical literature involves a parametric Bayesian perspective. From a technical viewpoint, this perhaps is not difficult to understand. The fundamental tool for Bayesian calculations in typical realistic models is (multidimensional) numerical integration, which often is problem...

468 citations


Proceedings Article
22 Mar 1992
TL;DR: The Bayesian approach to decision making under uncertainty prescribes that a decision maker have a unique prior probability and a utility function such that decisions are made so as to maximize the expected utility.
Abstract: The Bayesian approach to decision making under uncertainty prescribes that a decision maker have a unique prior probability and a utility function such that decisions are made so as to maximize the expected utility In particular, in a statistical inference problem the decision maker is assumed to have a probability distribution over all possible distributions which may govern a certain random process

Book ChapterDOI
01 Jan 1992
TL;DR: It is shown that Bayesian inference from data modeled by a mixture distribution can feasibly be performed via Monte Carlo simulation, and the true Bayesian predictive distribution is exhibited, implicitly integrating over the entire underlying parameter space.
Abstract: It is shown that Bayesian inference from data modeled by a mixture distribution can feasibly be performed via Monte Carlo simulation. This method exhibits the true Bayesian predictive distribution, implicitly integrating over the entire underlying parameter space. An infinite number of mixture components can be accommodated without difficulty, using a prior distribution for mixing proportions that selects a reasonable subset of components to explain any finite training set. The need to decide on a “correct” number of components is thereby avoided. The feasibility of the method is shown empirically for a simple classification task.

Book ChapterDOI
01 Jan 1992
TL;DR: In this paper, the authors compare the performance of Bayesian and frequentist methods for astrophysics problems using the Poisson distribution, including the analysis of on/off measurements of a weak source in a strong background.
Abstract: The “frequentist” approach to statistics, currently dominating statistical practice in astrophysics, is compared to the historically older Bayesian approach, which is now growing in popularity in other scientific disciplines, and which provides unique, optimal solutions to well-posed problems. The two approaches address the same questions with very different calculations, but in simple cases often give the same final results, confusing the issue of whether one is superior to the other. Here frequentist and Bayesian methods are applied to problems where such a mathematical coincidence does not occur, allowing assessment of their relative merits based on their performance, rather than philosophical argument. Emphasis is placed on a key distinction between the two approaches: Bayesian methods, based on comparisons among alternative hypotheses using the single observed data set, consider averages over hypotheses; frequentist methods, in contrast, average over hypothetical alternative data samples and consider hypothesis averaging to be irrelevant. Simple problems are presented that magnify the consequences of this distinction to where common sense can confidently judge between the methods. These demonstrate the irrelevance of sample averaging, and the necessity of hypothesis averaging, revealing frequentist methods to be fundamentally flawed. Bayesian methods are then presented for astrophysically relevant problems using the Poisson distribution, including the analysis of “on/off” measurements of a weak source in a strong background. Weaknesses of the presently used frequentist methods for these problems are straightforwardly overcome using Bayesian methods. Additional existing applications of Bayesian inference to astrophysical problems are noted.

Journal ArticleDOI
TL;DR: It is concluded that little progress has been made on prediction of the secondary structure of proteins given their primary sequence, despite the application of a variety of sophisticated algorithms such as neural networks, and that further advances will require a better understanding of the relevant biophysics.

Journal ArticleDOI
TL;DR: Bayesian Monte Carlo (BMC) as mentioned in this paper has been used to quantify errors in water quality models caused by uncertain parameters and provides estimates of parameter uncertainty as a function of observed data on model state variables.

Journal ArticleDOI
TL;DR: Recently, a group of Monte Carlo integration techniques that fall under the general banner of successive substitution sampling (SSS) have proven to be powerful tools for obtaining approximate answers in a very wide variety of Bayesian modeling situations.
Abstract: The problem of finding marginal distributions of multidimensional random quantities has many applications in probability and statistics. Many of the solutions currently in use are very computationally intensive. For example, in a Bayesian inference problem with a hierarchical prior distribution, one is often driven to multidimensional numerical integration to obtain marginal posterior distributions of the model parameters of interest. Recently, however, a group of Monte Carlo integration techniques that fall under the general banner of successive substitution sampling (SSS) have proven to be powerful tools for obtaining approximate answers in a very wide variety of Bayesian modeling situations. Answers may also be obtained at low cost, both in terms of computer power and user sophistication. Important special cases of SSS include the “Gibbs sampler” described by Gelfand and Smith and the “IP algorithm” described by Tanner and Wong. The major problem plaguing users of SSS is the difficulty in asce...

Journal ArticleDOI
01 Jun 1992
TL;DR: The approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters to enhance model robustness in a CDHMM-based speech recognition system.
Abstract: An investigation into the use of Bayesian learning of the parameters of a multivariate Gaussian mixture density has been carried out. In a framework of continuous density hidden Markov model (CDHMM), Bayesian learning serves as a unified approach for parameter smoothing, speaker adaptation, speaker clustering and corrective training. The goal is to enhance model robustness in a CDHMM-based speech recognition system so as to improve performance. Our approach is to use Bayesian learning to incorporate prior knowledge into the training process in the form of prior densities of the HMM parameters. The theoretical basis for this procedure is presented and results applying it to parameter smoothing, speaker adaptation, speaker clustering and corrective training are given.

Journal ArticleDOI
TL;DR: An incremental categorization algorithm is described which, at each step, assigns the next instance to the most probable category, and Bayesian extensions to deal with nonindependent features are described and evaluated.
Abstract: An incremental categorization algorithm is described which, at each step, assigns the next instance to the most probable category. Probabilities are estimated by a Bayesian inference scheme which assumes that instances are partitioned into categories and that within categories features are displayed independently and probabilistically. This algorithm can be shown to be an optimization of an ideal Bayesian algorithm in which predictive accuracy is traded for computational efficiency. The algorithm can deliver predictions about any dimension of a category and does not treat specially the prediction of category labels. The algorithm has successfully modeled much of the empirical literature on human categorization. This paper describes its application to a number of data sets from the machine learning literature. The algorithm performs reasonably well, having its only serious difficulty because the assumption of independent features is not always satisfied. Bayesian extensions to deal with nonindependent features are described and evaluated.

Proceedings ArticleDOI
23 Feb 1992
TL;DR: Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training.
Abstract: We discuss maximum a posteriori estimation of continuous density hidden Markov models (CDHMM). The classical MLE reestimation algorithms, namely the forward-backward algorithm and the segmental k-means algorithm, are expanded and reestimation formulas are given for HMM with Gaussian mixture observation densities. Because of its adaptive nature, Bayesian learning serves as a unified approach for the following four speech recognition applications, namely parameter smoothing, speaker adaptation, speaker group modeling and corrective training. New experimental results on all four applications are provided to show the effectiveness of the MAP estimation approach.

Journal ArticleDOI
TL;DR: It is shown that the total imprecision of a diagnostic classification can be decomposed into the sum of the imprecisions of its knowledge components.

Proceedings ArticleDOI
12 May 1992
TL;DR: A method for solving the specular reflection problem of sonar systems has been developed and implemented and permits the robot to construct a high-quality probability map of an environment composed of specular surfaces.
Abstract: A method for solving the specular reflection problem of sonar systems has been developed and implemented. This method, the specular reflection probability method, permits the robot to construct a high-quality probability map of an environment composed of specular surfaces. The method uses two parameters, the range confidence factor (RCF) and orientation probability. The RCF is the measure of confidence in the returning range from a sensor under a reflective environment, and the factor will have low values for long range information and vice versa. Orientation probability represents the surface orientation of an object. Bayesian reasoning is used to update the orientation probability from the range readings of the sensor. >

ReportDOI
01 Apr 1992
TL;DR: Using a Bayesian framework, this work places bounds on just what features are worth computing if inferences about the world properties are to be made from image data.
Abstract: Using a Bayesian framework, we place bounds on just what features are worth computing if inferences about the world properties are to be made from image data. Previously others have proposed that useful features reflect ``non-accidental'''' or ``suspicious'''' configurations (such as parallel or colinear lines). We make these notions more precise and show them to be context sensitive.

Journal ArticleDOI
TL;DR: In this article, a unified methodology for dealing with both time and failure truncated data is presented, as well as inference for the expected number of failures and the probability of no failures in some given time interval.
Abstract: The power law process has been used to model reliability growth, software reliability and the failure times of repairable systems. This article reviews and further develops Bayesian inference for such a process. The Bayesian approach provides a unified methodology for dealing with both time and failure truncated data. As well as looking at the posterior densities of the parameters of the power law process, inference for the expected number of failures and the probability of no failures in some given time interval is discussed. Aspects of the prediction problem are examined. The results are illustrated with two data examples.

Journal ArticleDOI
TL;DR: In this article, it was shown that the family of two-parameter Cauchy models is closed under M6bius transformation of the sample space, and that conditional coverage probability assessments depend on the choice of ancillary.
Abstract: SUMMARY Many computations associated with the two-parameter Cauchy model are shown to be greatly simplified if the parameter space is represented by the complex plane rather than the real plane. With this convention we show that the family is closed under M6bius transformation of the sample space: there is a similar induced transformation on the parameter space. The chief raison d'etre of the paper, however, is that the two-parameter Cauchy model provides an example of a nonunique configuration ancillary in the sense of Fisher (1934), such that the maximum likelihood estimate together with either ancillary is minimal sufficient. Some consequences for Bayesian inference and non-Bayesian conditional inference are explored. In particular, it is shown that conditional coverage probability assessments depend on the choice of ancillary. For moderate deviations, the effect occurs in the Op(n-1) term: for large deviations the relative effect is Op(n2).


Journal ArticleDOI
TL;DR: A very general and flexible way to formulate inverse problems is as problems in Bayesian inference, which permits us to consolidate a number of different types of information about a particular inverse problem into a single calculation.
Abstract: A very general and flexible way to formulate inverse problems is as problems in Bayesian inference. This approach permits us to consolidate a number of different types of information about a particular inverse problem into a single calculation. We can combine a priori information about which model features are plausible and which are not, information about the observed data and their errors, and information about numerical and theoretical errors in the calculation that predicts the data that would be produced by a given model. We end up with a scalar‐valued function, called the posterior probability, over the space of all models. The values of this function over a set of models express the probability that those models could have led to a particular set of observed data. The goal of the inverse calculation, then, is to find all of the models whose posterior probability is greater than some threshold of acceptability set in advance by the user. (The function that optimization calculations seek to extremize...


Journal ArticleDOI
Mike West1
TL;DR: In this article, Bayesian updating is developed for cases in which agents' opinions are proveded in terms of full forecast distributions (discrete or continuous), or parital summaries such as forecast means and variances.
Abstract: Problems in the analysis and interpretation of uncertain inferences obtained from individuals, or agents, are considered in the context of forecasting a scalar random quantity. Based on new theoretical results presented by West and Crosse, Bayesian updating is developed for cases in which agents' opinions are proveded in terms of full forecast distributions (discrete or continuous), or parital summaries such as forecast means and variances. A particular focus is on problems in which partial agent information is communicated in terms of a few sumary probabilities or, alternatively, quantiles of the agent's distribution. Example models highlight the key features of the theory and illustrate applications

Journal ArticleDOI
TL;DR: Data indicate that selection bias significantly distorts the determination of predictive accuracies calculated by Bayes' theorem, and that these distortions can be significantly offset by a correction algorithm.
Abstract: Estimates of sensitivity and specificity can be biased by the preferential referral of patients with positive test responses or ancillary clinical abnormalities (the "concomitant information vector") for diagnostic verification. When these biased estimates are analyzed by Bayes' theorem, the resultant posterior disease probabilities (positive and negative predictive accuracies) are similarly biased. Accordingly, a series of computer simulations was performed to quantify the effects of various degrees of verification bias on the calculation of predictive accuracy using Bayes' theorem. The magnitudes of the errors in the observed true-positive rate (sensitivity) and false-positive rate (the complement of specificity) ranged from +11% and +23%, respectively (when the test response and the concomitant information vector were conditionally independent), to +16% and +48% (when they were conditionally non-independent). These errors produced absolute underestimations as high as 22% in positive predictive accuracy, and as high as 14% in negative predictive accuracy, when analyzed by Bayes' theorem at a base rate of 50%. Mathematical correction for biased verification based on the test response using a previously published algorithm significantly reduced these errors by as much as 20%. These data indicate 1) that selection bias significantly distorts the determination of predictive accuracies calculated by Bayes' theorem, and 2) that these distortions can be significantly offset by a correction algorithm.

Proceedings ArticleDOI
Michael Kearns1
07 Jun 1992
TL;DR: An understanding of the sample complexity of learning in several existing models is provided and a systematic investigation and comparison of two fundamental quantities in learning and information theory is undertaken.
Abstract: Summary form only given. A Bayesian or average-case model of concept learning is given. The model provides more precise characterizations of the learning curve (sample complexity) behaviour that depends on properties of both the prior distribution over concepts and the sequence of instances seen by the learner. It unites in a common framework statistical physics and VC dimension theories of learning curves. A systematic investigation and comparison of two fundamental quantities in learning and information theory is undertaken. These are the probability of an incorrect prediction for an optimal learning algorithm, and the Shannon information gain. This paper provides an understanding of the sample complexity of learning in several existing models. >