scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 1993"


Journal ArticleDOI
TL;DR: A general predictive density is presented which includes all proposed Bayesian approaches the authors are aware of and using Laplace approximations they can conveniently assess and compare asymptotic behavior of these approaches.
Abstract: : Model determination is a fundamental data analytic task. Here we consider the problem of choosing amongst a finite (with loss of generality we assume two) set of models. After briefly reviewing classical and Bayesian model choice strategies we present a general predictive density which includes all proposed Bayesian approaches we are aware of. Using Laplace approximations we can conveniently assess and compare asymptotic behavior of these approaches. Concern regarding the accuracy of these approximation for small to moderate sample sizes encourages the use of Monte Carlo techniques to carry out exact calculations. A data set fit with nested non linear models enables comparison between proposals and between exact and asymptotic values.

1,233 citations


Journal ArticleDOI
TL;DR: In this article, a method for Bayesian inference in a linear model in which the disturbances are independent and have identical Student-t distributions is presented. But it is not shown how to construct the exact posterior distribution, whereas the earlier work was confined to obtaining the posterior mode of an approximate posterior.
Abstract: SUMMARY This article takes up methods for Bayesian inference in a linear model in which the disturbances are independent and have identical Student-t distributions. It exploits the equivalence of the Student-t distribution and an appropriate scale mixture of normals, and uses a Gibbs sampler to perform the computations. The new method is applied to some well-known macroeconomic time series. It is found that posterior odds ratios favour the independednt Student- linear model over the normal linear model inodl, and that the posterior odds ratio in favour of difference stationarity over trend stationarity is often substantially less in the favoured Student-t models. The possibility of leptokurtic disturbances is a common concern of econometricians and other users of the linear model. This article takes up methods for Bayesian inference in a linear model in which the disturbances are independent and have identical Student-t distributions. It exploits the equivalence of the Student-t distribution and an appropriate scale mixture of normals, and uses a Gibbs sampler to perform the computations. The main contribution is to provide a simple and stable computational method for full Bayesian inference in the independent Student-t linear model. The new method is applied to some well-known macroeconomic time series. It is found that posterior odds ratios favour the independent Student-t linear model over the normal linear model, and that the posterior odds ratio in favour of difference stationarity over trend stationarity is often substantially less in the favoured Student-t models. The work reported here builds on Bayesian treatments of heteroscedasticity, which began with hierarchical models for the analysis of variance (Lindley, 1965, 1971). With the adaptation of a prior distribution making cell means linear in cofactors (Lindley and Smith, 1972), this treatment was effectively extended to the linear regression model. Lindley (1971) took up the conjugate prior in which the inverses of the variances are x2(v), up to a factor of proportionality with an improper prior. It is shown in Section 2.1 that this is equivalent to the specification of an independent Student-t linear model with known degrees of freedom. In an important variant on this model, Leonard (1975) used a prior in which the log variances are linear functions of cofactors, and constructed an approximation to the posterior density. This article extends these developments in two ways. First, it shows how to construct the exact posterior distribution, whereas the earlier work was confined to obtaining the posterior mode of an approximate posterior. (Interestingly, however, the algorithm in the appendix of

530 citations


Book
01 Aug 1993
TL;DR: Multiple Sensor System Applications, Benefits, and Atmospheric Attenuation Data Fusion Algorithms and Architectures Bayesian Inference Dempster-Shafer Algorithm Artificial Neural Networks Voting Fusion Fuzzy Logic and Neural Networks Passive Data Association Techniques for Unambiguous Location of Targets.
Abstract: Multiple Sensor System Applications, Benefits, and Atmospheric Attenuation Data Fusion Algorithms and Architectures Bayesian Inference Dempster-Shafer Algorithm Artificial Neural Networks Voting Fusion Fuzzy Logic and Neural Networks Passive Data Association Techniques for Unambiguous Location of Targets. Appendices: Planck Radiation Law and Radiative Transfer Voting Fusion With Nested Confidence Levels.

516 citations


Journal ArticleDOI
TL;DR: The early development of MCMC in Bayesian inference is traced, some recent computational progress in statistical physics is reviewed, based on the introduction of auxiliary variables, and its current and future relevance in Bayesesian applications are discussed.
Abstract: on Wednesday, May 6th, 1992, Professor B. W. Silverman in the Chair] SUMMARY Markov chain Monte Carlo (MCMC) algorithms, such as the Gibbs sampler, have provided a Bayesian inference machine in image analysis and in other areas of spatial statistics for several years, founded on the pioneering ideas of Ulf Grenander. More recently, the observation that hyperparameters can be included as part of the updating schedule and the fact that almost any multivariate distribution is equivalently a Markov random field has opened the way to the use of MCMC in general Bayesian computation. In this paper, we trace the early development of MCMC in Bayesian inference, review some recent computational progress in statistical physics, based on the introduction of auxiliary variables, and discuss its current and future relevance in Bayesian applications. We briefly describe a simple MCMC implementation for the Bayesian analysis of agricultural field experiments, with which we have some practical experience.

500 citations


Journal ArticleDOI
TL;DR: It is argued that the problem of plan recognition, inferring an agent's plan from observations, is largely a problem of inference under conditions of uncertainty and an approach to the plan recognition problem that is based on Bayesian probability theory is presented.

483 citations


Journal ArticleDOI
TL;DR: Surprisingly, the naive Bayesian classifier is superior to Assistant in classification accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable.
Abstract: Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two different approaches to machine learning in medical applications are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classifier. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared, and the interpretation of the knowledge and the explanation ability of the classification process of each system is discussed. Surprisingly, the naive Bayesian classifier is superior to Assistant in classification accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In addition, two extensions to naive Bayesian classifier are briefly des...

332 citations


Journal ArticleDOI
TL;DR: It is shown that Gibbs sampling, making systematic use of an adaptive rejection algorithm proposed by Gilks and Wild, provides a straightforward computational procedure for Bayesian inferences in a wide class of generalized linear and proportional hazards models.
Abstract: It is shown that Gibbs sampling, making systematic use of an adaptive rejection algorithm proposed by Gilks and Wild, provides a straightforward computational procedure for Bayesian inferences in a wide class of generalized linear and proportional hazards models

235 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate conditions under which dilation occurs and study some of its implications in robust Bayesian inference and in the theory of upper and lower probabilities, and characterize dilation immune neighborhoods of the uniform measure.
Abstract: Suppose that a probability measure $P$ is known to lie in a set of probability measures $M$. Upper and lower bounds on the probability of any event may then be computed. Sometimes, the bounds on the probability of an event $A$ conditional on an event $B$ may strictly contain the bounds on the unconditional probability of $A$. Surprisingly, this might happen for every $B$ in a partition $\mathscr{B}$. If so, we say that dilation has occurred. In addition to being an interesting statistical curiosity, this counterintuitive phenomenon has important implications in robust Bayesian inference and in the theory of upper and lower probabilities. We investigate conditions under which dilation occurs and we study some of its implications. We characterize dilation immune neighborhoods of the uniform measure.

183 citations


Journal ArticleDOI
TL;DR: It is shown that the frequentist coverage probability of a variety of (1 - alpha) posterior probability regions tends to be larger than 1 - alpha, but will be infinitely often less than any epsilon 0 as n approaches infinity with prior probability 1.
Abstract: : The observation model Y sub i = Beta(i/n) + epsilon sub i, 1 or = n, is considered, where the epsilon's are i.i.d. mean zero and variance sigma-sq and beta is an unknown smooth function. A Gaussian prior distribution is specified by assuming beta is the solution of a high order stochastic differential equation. The estimation error delta = beta - beta-average is analyzed, where beta-average is the posterior expectation of beta. Asymptotic posterior and sampling distributional approximations are given for (abs. val del)square when (abs. val)square is one of a family of norms natural to the problem. It is shown that the frequentist coverage probability of a variety of (1 - alpha) posterior probability regions tends to be larger than 1 - alpha, but will be infinitely often less than any epsilon 0 as n approaches infinity with prior probability 1. A related continuous time signal estimation problem is also studied. Keywords: Bayesian inference; Nonparametric regression; Confidence regions; Signal extraction: Smoothing splices.

159 citations


02 Jan 1993
TL;DR: Experimental results are presented which demonstrate that the ensemble method dramatically improves regression performance on real-world classification tasks.
Abstract: A general theoretical framework for Monte Carlo averaging methods of improving regression estimates is presented with application to neural network classification and time series prediction Given a population of regression estimators, it is shown how to construct a hybrid estimator which is as good as or better than, in the MSE sense, any estimator in the population It is argued that the ensemble method presented has several properties: It efficiently uses all the regressors of a population--none need be discarded It efficiently uses all the available data for training without over-fitting It inherently performs regularization by smoothing in functional space which helps to avoid over-fitting It utilizes local minima to construct improved estimates whereas other regression algorithms are hindered by local minima It is ideally suited for parallel computation It leads to a very useful and natural measure of the number of distinct estimators in a population The optimal parameters of the ensemble estimator are given in closed form It is shown that this result derives from the notion of convexity and can be applied to a wide variety of optimization algorithms including: Mean Square Error, a general class of $L\sb{p}$-norm cost functions, Maximum Likelihood Estimation, Maximum Entropy, Maximum Mutual Information, the Kullback-Leibler Information (Cross Entropy), Penalized Maximum Likelihood Estimation and Smoothing Splines The connection to Bayesian Inference is discussed Experimental results on the NIST OCR database, the Turk and Pentland human face database and sunspot time series prediction are presented which demonstrate that the ensemble method dramatically improves regression performance on real-world classification tasks

156 citations


Proceedings ArticleDOI
01 Jul 1993
TL;DR: A Bayesian inference network model for automatic indexing with index terms (descriptors) from a prescribed vocabulary is presented, followed by an indexing example and some experimental results about the indexing performance of the network model.
Abstract: In this paper, a Bayesian inference network model for automatic indexing with index terms (descriptors) from a prescribed vocabulary is presented. It requires an indexing dictionary with rules mapping terms of the respective subject field onto descriptors and inverted lists for terms occuring in a set of documents of the subject field and descriptors manually assigned to these documents. The indexing dictionary can be derived automatically from a set of manually indexed documents. An application of the network model is described, followed by an indexing example and some experimental results about the indexing performance of the network model.

Journal ArticleDOI
TL;DR: In this paper, a statistical approach called ABIC (Akaike's Bayesian Information Criterion) is presented to solve the problem of choosing the optimum smoothness constraint for a 2D linearized least-squares inversion of magnetotelluric (MT) data.
Abstract: We often apply a smoothness constraint to a two-dimensional (2-D) linearized least-squares inversion of magnetotelluric (MT) data to achieve a stable result. A substantial problem with this scheme lies in the choice of the optimum smoothness, and in this paper, a statistical approach which is very versatile for this purpose is presented. It uses a statistical criterion called ABIC (Akaike's Bayesian Information Criterion), which was derived by introducing the entropy-maximization theorem into the Bayes statistics. On applying the Bayesian procedure to 2-D MT inversion, we seek simultaneous minimization of data misfit and model roughness. ABIC works as a number that represents goodness, or an entropy, of a model in a sense of this simultaneous minimization. Tests with both synthetic and real field data have revealed the effectiveness of this method for non-linear inversion problems. Regardless of the magnitude of the observation error, the method objectively adjusts the tradeoff between the misfit and the roughness, and stable convergence is attained.

Posted Content
TL;DR: In this paper, a Bayesian approach is used to investigate a sample's information about a portfolio's degree of inefficiency, and the data indicate that the NYSE-AMEX market portfolio is rather inefficient in the presence of a riskless asset, although this conclusion is justified only after an analysis using informative priors.
Abstract: A Bayesian approach is used to investigate a sample's information about a portfolio's degree of inefficiency. With standard diffuse priors, posterior distributions for measures of portfolio inefficiency can concentrate well away from values consistent with efficiency, even when the portfolio is exactly efficient in the sample. The data indicate that the NYSE-AMEX market portfolio is rather inefficient in the presence of a riskless asset, although this conclusion is justified only after an analysis using informative priors. Including a riskless asset significantly reduces any sample's ability to produce posterior distributions supporting small degrees of inefficiency.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an alternative variable to U, denoted by T, that is available without knowledge of A and satisfies T= U+ Op(n 1) in general.
Abstract: SUMMARY In the context of inference about a scalar parameter in the presence of nuisance parameters, some simple modifications for the signed root of the log-likelihood ratio statistic R are developed that reduce the order of error in the standard normal approximation to the distribution of R from O(n- 1/2) to O(n- 1). Barndorff-Nielsen has introduced a variable U such that the error in the standard normal approximation to the distribution of R +R- R log(U/R) is of order O(n-3/2), but calculation of Urequires the specification of an exact or approximate ancillary statistic A. This paper proposes an alternative variable to U, denoted by T, that is available without knowledge of A and satisfies T= U+ Op(n 1) in general. Thus the standard normal approximation to the distribution of R +R'- log(T/R) has error of order O(n- 1), and it can be used to construct approximate confidence limits having coverage error of order O(n- 1). In certain cases, however, Tand Uare identical. The derivation of Tinvolves the Bayesian approach to constructing confidence limits considered by Welch and Peers, and Peers. Similar modifications for the signed root of the conditional likelihood ratio statistic are also developed, and these modifications are seen to be useful when a large number of nuisance parameters are present. Several examples are presented, including inference for natural parameters in exponential models and inference about location-scale models with type II censoring. In each case, the relationship between Tand U is discussed. Numerical examples are also given, including inference for regression models, inference about the means of log-normal distributions and inference for exponential lifetime models with type I censoring, where Barndorff-Nielsen's variable U is not available.

Journal ArticleDOI
TL;DR: In this article, the authors consider a possibly nonlinear regression model under any multivariate elliptical data density, and examine Bayesian posterior and predictive results, which are shown to be robust with respect to the specific choice of a sampling density within this elliptical class.

Journal ArticleDOI
TL;DR: A tour of the structure and current applications of quantum-consistent statistical inference and decision theory can be found in this paper, where the authors present examples, outlines the theory and considers applications and open probabilistic axioms.
Abstract: The three main points of this article are: 1. Quantum mechanical data differ from conventional data: for example, joint distributions usually cannot be defined conventionally; 2. rigorous methods have been developed for analyzing such data; the methods often use quantum-consistent analogs of classical statistical procedures; 3. with these procedures, statisticians, both data-analytic and more theoretically oriented, can become active participants in many new and emerging areas of science and biotechnology. In the physical realm described by quantum mechanics, many conven- tional statistical and probabilistic assumptions no longer hold. Probabi- listic ideas are central to quantum theory but the standard Kolmogorov axioms are not uniformly applicable. Studying such phenomena requires an altered model for sample spaces, for random variables and for inference and decision making. The appropriate decision theory has been in devel- opment since the mid-1960s. It is both mathematically and statistically rigorous and conforms to the requirements of the known physical results. This article provides a tour of the structure and current applications of quantum-consistent statistical inference and decision theory. It presents examples, outlines the theory and considers applications and open prob- lems. Certain central concepts of quantum theory are more clearly appre- hended in terms of the quantum-consistent statistical decision theory. For example, the Heisenberg uncertainty principle can be obtained as a consequence of the quantum version of the Cramer-Rao inequality. This places concepts of statistical estimation and decision theory, and thus the statistician, at the center of the quantum measurement process. Quantum statistical inference offers considerable scope for participa- tion by the statistical community, in both applications and foundational questions.

Journal ArticleDOI
TL;DR: This article gives a brief introduction to Bayesian methods and contrasts them with classical hypothesis testing, showing that the quantification of prior beliefs is a common and necessary part of the interpretation of clinical information, whether from a laboratory test or published clinical trial.

Book ChapterDOI
01 Nov 1993
TL;DR: This work proposes a probabilistic case-space metric for the case matching and case adaptation tasks and argues that using this kind of an approach, the difficult problem of case indexing can be completely avoided.
Abstract: We propose a probabilistic case-space metric for the case matching and case adaptation tasks. Central to our approach is a probability propagation algorithm adopted from Bayesian reasoning systems, which allows our case-based reasoning system to perform theoretically sound probabilistic reasoning. The same probability propagation mechanism actually offers a uniform solution to both the case matching and case adaptation problems. We also show how the algorithm can be implemented as a connectionist network, where efficient massively parallel case retrieval is an inherent property of the system. We argue that using this kind of an approach, the difficult problem of case indexing can be completely avoided.

01 Jan 1993
TL;DR: The influence of various information sources on the ability of a statistical tagger to assign lexical categories to unknown words is investigated and methods for improving estimates based on scarce data are proposed and examined experimentally.
Abstract: The influence of various information sources on the ability of a statistical tagger to assign lexical categories to unknown words is investigated. The literal word form is found to be very much more important than other information sources such as the local syntactic context. Different ways of combining information sources are discussed. Methods for improving estimates based on scarce data are proposed and examined experimentally.

Journal ArticleDOI
TL;DR: In this paper, a method is given for constructing a Bayesian interval estimate such that the coverage probability or the interval is approximately equal to the posterior probability orThe interval.
Abstract: Let Y 1 ,..., Y n denote independent observations each distributed according to a distribution depending on a scalar parameter θ; suppose that we are interested in constructing an interval estimate for θ. One approach is to use Bayesian inference. For a given prior density, we can construct an interval such that the posterior probability that θ lies in the interval is some specified value. In this paper, a method is given for constructing a Bayesian interval estimate such that the coverage probability or the interval is approximately equal to the posterior probability or the interval

Journal ArticleDOI
TL;DR: A sequential Bayesian sampling procedure is applied to study two models of learning in repeated games, which allows for a trade-off between competition and cooperation, which is of interest in many economic situations.
Abstract: We apply a sequential Bayesian sampling procedure to study two models of learning in repeated games. In the first model individuals learn only about an opponent when they play her or him repeatedly but do not update from their experience with that opponent when they move on to play the same game with other opponents. We label this the nonsequential model. In the second model individuals use Bayesian updating to learn about population parameters from each of their opponents, as well as learning about the idiosyncrasies of that particular opponent. We call this the sequential model. We sequentially sample observations on the behavior of experimental subjects in the so-called “centipede game.” This game allows for a trade-off between competition and cooperation, which is of interest in many economic situations. At each point in time, the “state” of our dynamic problem consists of our beliefs about the two models and beliefs about the nuisance parameters of the two models. Our “choice” set is to samp...

Journal ArticleDOI
TL;DR: The authors generalize the results on Bayesian learning based on the martingale convergence theorem to the sequential framework and show that the variability in the sequential learning framework is sufficient under mild conditions to circumvent the incomplete learning results that characterize the optimal learning literature.

Journal ArticleDOI
TL;DR: It is proved that, under a Jeffreys' type improper prior on the scale parameter, posterior inference on the location parameters is the same for all lq-spherical sampling models with common q.
Abstract: SUMMARY The class of multivariate lq-spherical distributions is introduced and defined through their isodensity surfaces. We prove that, under a Jeffreys' type improper prior on the scale parameter, posterior inference on the location parameters is the same for all lq-spherical sampling models with common q. This gives us perfect inference robustness with respect to any departures from the reference case of independent sampling from the exponential power distribution.

Journal ArticleDOI
TL;DR: In this article, the authors derived explicit formulae for the Bartlett adjustment factors of both statistics, and the derivations are based on the Tierney, Kass & Kadane (1989) asymptotic approximation for marginal posterior probability density functions.
Abstract: SUMMARY In wide generality, the posterior distributions of the likelihood ratio statistic and the posterior ratio statistic are chi-squared to error of order O(n-'), where n is sample size. The error in the chi-squared approximation can be reduced to order O(n-2) by Bartlett correction. In this paper, explicit formulae are derived for the Bartlett adjustment factors of both statistics, and the derivations are based on the Tierney, Kass & Kadane (1989) asymptotic approximation for marginal posterior probability density functions. The use of numerical differentiation to facilitate calculation of the Bartlett adjustments is also described. Some applications are considered that concern inference about regression models from both complete and right-censored data.

Journal ArticleDOI
TL;DR: A network of probabilistic cellular automata (PCAs) for iteratively resolving ambiguities and conflicts in pattern recognition and a different architecture for describing the model from that used in Bayesian inference networks.

Proceedings ArticleDOI
28 Mar 1993
TL;DR: In the authors' approach, the efficient indexing problem of CBR is naturally implemented by the parallel architecture, and heuristic matching is replaced by a probability metric.
Abstract: Given a problem, a case-based reasoning (CBR) system will search its case memory and use the stored cases to find the solution, possibly modifying retrieved cases to adapt to the required input specifications. A neural network architecture is introduced for efficient CBR. It is shown how a rigorous Bayesian probability propagation algorithm can be implemented as a feedforward neural network and adapted for CBR. In the authors' approach, the efficient indexing problem of CBR is naturally implemented by the parallel architecture, and heuristic matching is replaced by a probability metric. This allows their CBR to perform theoretically sound Bayesian reasoning. It is shown how the probability propagation actually offers a solution to the adaptation problem in a very natural way. >

Journal ArticleDOI
TL;DR: The challenging computational problems posed by the problem of rodering and mapping genes on the basis of recombinant data and radiation hybrid data are shown to be resolvable using Markov chain Monte Carlo methods.
Abstract: Summary The problem of rodering and mapping genes on the basis of recombinant data and radiation hybrid data is formulated as a problem of Bayesian inference for an unknown permutation. The challenging computational problems posed by this approach are shown to be resolvable using Markov chain Monte Carlo methods.

Journal ArticleDOI
TL;DR: In this paper, a trait model describing the underlying effects is built into a model combining a Bayesian approach with hierarchic Markov process in order to calculate optimal replacement policies under various conditions.
Abstract: The observed level of milk yield of a dairy cow or the litter size of a sow is only partially the result of a permanent characteristic of the animal; temporary effects are also involved. Thus, we face a problem concerning the proper definition and measurement of the traits in order to give the best possible prediction of the future revenues from an animal considered for replacement. A trait model describing the underlying effects is built into a model combining a Bayesian approach with hierarchic Markov process in order to be able to calculate optimal replacement policies under various conditions. Copyright 1993 by Oxford University Press.

Journal ArticleDOI
TL;DR: In this paper, the authors show how probability theory can be used to address such questions in a simple and straight forward manner, in which probability theory is used to solve the model selection problem in many data analysis problems in science.