scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 1994"


Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications, and propose a panacea by the standard Bayesian formalism that averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities.
Abstract: We consider the problem of model selection and accounting for model uncertainty in high-dimensional contingency tables, motivated by expert system applications. The approach most used currently is a stepwise strategy guided by tests based on approximate asymptotic P values leading to the selection of a single model; inference is then conditional on the selected model. The sampling properties of such a strategy are complex, and the failure to take account of model uncertainty leads to underestimation of uncertainty about quantities of interest. In principle, a panacea is provided by the standard Bayesian formalism that averages the posterior distributions of the quantity of interest under each of the models, weighted by their posterior model probabilities. Furthermore, this approach is optimal in the sense of maximizing predictive ability. But this has not been used in practice, because computing the posterior model probabilities is hard and the number of models is very large (often greater than 1...

1,313 citations


Book ChapterDOI
29 Jul 1994
TL;DR: This paper embeds the naive Bayesian induction scheme within an algorithm that carries out a greedy search through the space of features, hypothesize that this approach will improve asymptotic accuracy in domains that involve correlated features without reducing the rate of learning in ones that do not.
Abstract: In this paper, we examine previous work on the naive Bayesian classifier and review its limitations, which include a sensitivity to correlated features. We respond to this problem by embedding the naive Bayesian induction scheme within an algorithm that carries out a greedy search through the space of features. We hypothesize that this approach will improve asymptotic accuracy in domains that involve correlated features without reducing the rate of learning in ones that do not. We report experimental results on six natural domains, including comparisons with decision-tree induction, that support these hypotheses. In closing, we discuss other approaches to extending naive Bayesian classifiers and outline some directions for future research.

672 citations


Journal ArticleDOI
01 Jun 1994-Test
TL;DR: An overview of the subject of robust Bayesian analysis is provided, one that is accessible to statisticians outside the field, and recent developments in the area are reviewed.
Abstract: Robust Bayesian analysis is the study of the sensitivity of Bayesian answers to uncertain inputs. This paper seeks to provide an overview of the subject, one that is accessible to statisticians outside the field. Recent developments in the area are also reviewed, though with very uneven emphasis.

587 citations


Journal ArticleDOI
TL;DR: In this article, a Bayesian approach to estimation, prediction, and model comparison in composed error production models is presented, where a broad range of distributions on the inefficiency term define the contending models, which can either be treated separately or pooled.

426 citations


Journal ArticleDOI
Neil Shephard1
TL;DR: The use of simulation techniques to extend the applicability of the usual Gaussian state space filtering and smoothing techniques to a class of nonGaussian time series models allows a fully Bayesian or maximum likelihood analysis of some interesting models, including outlier models, discrete Markov chain components, multiplicative models and stochastic variance models.
Abstract: SUMMARY In this paper we suggest the use of simulation techniques to extend the applicability of the usual Gaussian state space filtering and smoothing techniques to a class of nonGaussian time series models. This allows a fully Bayesian or maximum likelihood analysis of some interesting models, including outlier models, discrete Markov chain components, multiplicative models and stochastic variance models. Finally we discuss at some length the use of a non-Gaussian model to seasonally adjust the published money supply figures.

384 citations


Book ChapterDOI
07 May 1994
TL;DR: A unified approach for Markov Random Field Models modeling in low and high level computer vision is presented, made possible due to a recent advance in MRF modeling for high level object recognition.
Abstract: A variety of computer vision problems can be optimally posed as Bayesian labeling in which the solution of a problem is defined as the maximum a posteriori (MAP) probability estimate of the true labeling. The posterior probability is usually derived from a prior model and a likelihood model. The latter relates to how data is observed and is problem domain dependent. The former depends on how various prior constraints are expressed. Markov Random Field Models (MRF) theory is a tool to encode contextual constraints into the prior probability. This paper presents a unified approach for MRF modeling in low and high level computer vision. The unification is made possible due to a recent advance in MRF modeling for high level object recognition. Such unification provides a systematic approach for vision modeling based on sound mathematical principles.

284 citations


ReportDOI
01 Dec 1994
TL;DR: A set of algorithms are described that handle clustering, classification, and function approximation from incomplete data in a principled and efficient manner that make two distinct appeals to the Expectation-Maximization principle.
Abstract: Real-world learning tasks often involve high-dimensional data sets with complex patterns of missing features. In this paper we review the problem of learning from incomplete data from two statistical perspectives---the likelihood-based and the Bayesian. The goal is two-fold: to place current neural network approaches to missing data within a statistical framework, and to describe a set of algorithms, derived from the likelihood-based framework, that handle clustering, classification, and function approximation

243 citations


Posted Content
TL;DR: In this paper, the authors describe a framework for inducing probabilistic grammars from corpora of positive samples, where samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation.
Abstract: We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are {\em incorporated} by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are {\em merged} to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (`Occam's Razor'). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based $n$-grams, and stochastic context-free grammars.

242 citations


Journal ArticleDOI
TL;DR: This paper proposes to model the uncertainty due to noise, e.g. the error in an object's position, by conventional covariance matrices, independent of the sensing modality, being applicable to most temporal data association problems.

202 citations


Journal ArticleDOI
TL;DR: The authors present a characterization of the minimum points of such functionals, together with a descent-type algorithm for numerical computations, and the results of Monte-Carlo simulations are reported.
Abstract: The regularizing functional approach is widely used in many estimation problems. In practice, the solution is defined as one minimum point of a suitable functional, the main part of which accounts for the underlying physical model, whereas the regularizing part represents some prior information about the unknowns. In the Bayesian interpretation, one has a maximum a posteriori (MAP) estimator in which the main and regularizing parts are represented, respectively, by likelihood and prior distributions. When either the prior or likelihood is a Laplace distribution and the other is a Gaussian distribution, one is led to consider functionals that include both absolute and square norms. The authors present a characterization of the minimum points of such functionals, together with a descent-type algorithm for numerical computations. The results of Monte-Carlo simulations are also reported. >

190 citations


Journal ArticleDOI
TL;DR: This paper presents a meta-analyses of eight fisheries management practices in the Northern Hemisphere over a 25-year period (1995-2009) that shows clear trends in harvests and in particular in the performance of four of the five major harvesting policies.
Abstract: Scientific advice to fishery managers needs to be expressed in probabilistic terms to convey uncertainty about the consequences of alternative harvesting policies (policy performance indices). In most Bayesian approaches to such advice, relatively few of the model parameters used can be treated as uncertain, and deterministic assumptions about population dynamics are required; this can bias the degree of certainty and estimates of policy performance indices. We reformulate a Bayesian approach that uses the sampling/importance resampling algorithm to improve estimates of policy performance indices; it extends the number of parameters that can be treated as uncertain, does not require deterministic assumptions about population dynamics, and can use any of the types of fishery assessment models and data. Application of the approach to New Zealand's western stock of hoki (Macruronus novaezelandiae) shows that the use of Bayesian prior information for parameters such as the constant of proportionality for acou...

Book
01 Jan 1994
TL;DR: This work spells out through examples what are the underlying hypotheses that lead to the selection of an adequate model for a given problem and gives indications on how to choose the appropriate model.
Abstract: Several mathematical models have been proposed for the modelling of someone's degrees of belief. The oldest is the Bayesian model that uses probability functions. The upper and lower probabilities (ULP) model, Dempster's model, the evidentiary value model (EVM) and the probability of modal propositions somehow generalize the Bayesian approach. The transferable belief model (TBM) is based on other premises and uses belief functions. None of these models is THE best: each has its own domain of application. We spell out through examples what are the underlying hypotheses that lead to the selection of an adequate model for a given problem. We give indications on how to choose the appropriate model. The major discriminating criterion is: if there exists a probability measure with known values, use the Bayesian model, if there exists a probability measure but with some unknown values, use the ULP models, if the existence of a probability measure is not known, use the TBM. Dempster's model is essentially a special case of ULP model. The EVM and the probability of modal propositions (provability, necessity...) corresponds to a special use of the Bayesian model.

Posted Content
TL;DR: A new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy, and how the algorithm was incorporated in an operational speech understanding system, where it was combined with neural network acoustic likelihood estimators to improve performance over single-pronunciation word models.
Abstract: This report describes a new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy (Omohundro 1992). The process begins with a maximum likelihood HMM that directly encodes the training data. Successively more general models are produced by merging HMM states. A Bayesian posterior probability criterion is used to determine which states to merge and when to stop generalizing. The procedure may be considered a heuristic search for the HMM structure with the highest posterior probability. We discuss a variety of possible priors for HMMs, as well as a number of approximations which improve the computational efficiency of the algorithm. We studied three applications to evaluate the procedure. The first compares the merging algorithm with the standard Baum-Welch approach in inducing simple finite-state languages from small, positive-only training samples. We found that the merging procedure is more robust and accurate, particularly with a small amount of training data. The second application uses labelled speech data from the TIMIT database to build compact, multiple-pronunciation word models that can be used in speech recognition. Finally, we describe how the algorithm was incorporated in an operational speech understanding system, where it is combined with neural network acoustic likelihood estimators to improve performance over single-pronunciation word models.

Posted Content
TL;DR: In this paper, the numerical procedures needed to implement these prior distributions are considered and the forecasting performance of the different prior distributions considered in the paper is also reported, in addition to the numerical procedure used to implement them.
Abstract: In Bayesian analysis of VAR-models, and especially in forecasting applications, the Minnesota prior of Litterman is frequently used. In many cases other prior distributions provide better forecasts and are preferable from a theoretical standpoint. This paper considers the numerical procedures needed to implement these prior distributions. In addition we also report on the forecasting performance of the different prior distributions considered in the paper.

Journal ArticleDOI
TL;DR: A Bayesian analysis using 2 different sets of prior distributions for the variance components showed that inferences differed only when the relative amount of information contributed by the data was small, compared with a traditional analysis based on best linear unbiased predictors with an animal model.
Abstract: Summary - A method of analysing response to selection using a Bayesian perspective is presented. The following measures of response to selection were analysed: 1) total response in terms of the difference in additive genetic means between last and first generations; 2) the slope (through the origin) of the regression of mean additive genetic value on generation; 3) the linear regression slope of mean additive genetic value on generation. Inferences are based on marginal posterior distributions of the above-defined measures of genetic response, and uncertainties about fixed effects and variance components are taken into account. The marginal posterior distributions were estimated using the Gibbs sampler. Two simulated data sets with heritability levels 0.2 and 0.5 having 5 cycles of selection were used to illustrate the method. Two analyses were carried out for each data set, with partial data (generations 0-2) and with the whole data. The Bayesian analysis differed from a traditional analysis based on best linear unbiased predictors (BLUP) with an animal model, when the amount of information in the data was small. Inferences about selection response were similar with both methods at high heritability values and using all the data for the analysis. The Bayesian approach correctly assessed the degree of uncertainty associated with insufficient information in the data. A Bayesian analysis using 2 different sets of prior distributions for the variance components showed that inferences differed only when the relative amount of information contributed by the data was small.

Journal ArticleDOI
TL;DR: A Bayesian approach to the analysis of survival data on multiple time scales is proposed using priors which specify smooth variation and the extension of the method to Bayesian forecasting of rates is discussed.
Abstract: We propose a Bayesian approach to the analysis of survival data on multiple time scales. Non-parametric modelling of variation of rates with more than one time scale is achieved using priors which specify smooth variation. Computations are conveniently carried out using Gibbs sampling. We discuss the extension of the method to Bayesian forecasting of rates. Numerical experience of two examples is described.

Journal ArticleDOI
TL;DR: Frailty models are shown to be a special case of a random effects generalization of generalized linear models, whereas marginal models for multivariate failure time data are more closely related to the generalized estimating equation approach to longitudinal generalizedlinear models.
Abstract: Methodological research in biostatistics has been dominated over the last twenty years by further development of Cox's regression model for life tables and of Nelder and Wedderburn's formulation of generalized linear models. In both of these areas the need to address the problems introduced by subject level heterogeneity has provided a major motivation, and the analysis of data concerning recurrent events has been widely discussed within both frameworks. This paper reviews this work, drawing together the parallel development of 'marginal' and 'conditional' approaches in survival analysis and in generalized linear models. Frailty models are shown to be a special case of a random effects generalization of generalized linear models, whereas marginal models for multivariate failure time data are more closely related to the generalized estimating equation approach to longitudinal generalized linear models. Computational methods for inference are discussed, including the Bayesian Markov chain Monte Carlo approach.

Journal ArticleDOI
TL;DR: The author applies the hierarchical Bayesian approach to image restoration problems and compares it with other approaches in handling the estimation of the hyperparameters.
Abstract: In an image restoration problem one usually has two different kinds of information. In the first stage, one has knowledge about the structural form of the noise and local characteristics of the restoration. These noise and image models normally depend on unknown hyperparameters. The hierarchical Bayesian approach adds a second stage by putting a hyperprior on the hyperparameters, where information about those hyperparameters is included. In this work the author applies the hierarchical Bayesian approach to image restoration problems and compares it with other approaches in handling the estimation of the hyperparameters. >

Book
28 Jan 1994
TL;DR: In this paper, the ergodicity of rotations has been studied in probability in statistical physics and probability embeddings of probability and chance have been used to find the pathways to modern probability.
Abstract: Preface 1. Introduction 2. Pathways to modern probability 3. Probability in statistical physics 4. Quantum mechanical probability and indeterminism 5. Classical embeddings of probability and chance 6. Von Mises' frequentist probabilities 7. Kolmogorov's measure theoretic probabilities 8. De Finetti's subjective probabilities Supplement: Nicole Oresme and the ergodicity of rotations Bibliography Index of names Index of subjects.

Book ChapterDOI
29 Jul 1994
TL;DR: The behavior of various belief network learning algorithms is studied in this article, where search heuristics based on the Bayesian measure of Cooper and Herskovits and a minimum description length (MDL) measure are compared with respect to their properties for both limiting and finite database sizes.
Abstract: In this paper the behavior of various belief network learning algorithms is studied. Selecting belief networks with certain minimallity properties turns out to be NP-hard, which justifies the use of search heuristics. Search heuristics based on the Bayesian measure of Cooper and Herskovits and a minimum description length (MDL) measure are compared with respect to their properties for both limiting and finite database sizes. It is shown that the MDL measure has more desirable properties than the Bayesian measure. Experimental results suggest that for learning probabilities of belief networks smoothing is helpful.

Journal ArticleDOI
TL;DR: In this paper, a multidimensional parameter b in the presence of a nuisance param is inferred for inference about a multi-dimensional parameter b under the assumption that b is fixed.
Abstract: SUMMARY For inference about a multidimensional parameter b in the presence of nuisance param

Journal ArticleDOI
TL;DR: In this paper, a fully Bayesian approach is suggested, its implementation and practical properties are discussed and the procedure is applied to data from an atlas survey of Finnish herpetofauna.
Abstract: SUMMARY A common method of studying biogeographical ranges is an atlas survey, in which the research area is divided into a square grid and the data consist of the squares where observations occur. Often the observations form only an incomplete map of the true range, and a method is required to decide whether the blank squares indicate true absence or merely a lack of study there. This is essentially an image restoration problem, but it has properties that make the common empirical Bayesian procedures inadequate. Most notably, the observed image is heavily degraded, causing difficulties in the estimation of spatial interaction, and the assessment of reliability of the restoration is emphasized. A fully Bayesian approach is suggested, its implementation and practical properties are discussed and the procedure is applied to data from an atlas survey of Finnish herpetofauna.

Journal ArticleDOI
TL;DR: In this article, a Bayesian decision-theoretic design for a clinical trial comparing two treatments for a disease with binary outcomes is developed and evaluated, where the probability of successful outcome with treatment i is denoted by pi, i = 1, 2, and prior knowledge regarding each pi is assumed to follow a beta distribution.
Abstract: Bayesian decision-theoretic designs for a clinical trial comparing two treatments for a disease with binary outcomes are developed and evaluated. The probability of successful outcome with treatment i is denoted by pi , i = 1, 2, and prior knowledge regarding each pi is assumed to follow a beta distribution. The pi are assumed to be independent. To facilitate comparison with frequentist clinical trial designs, we take a hypothesis-testing approach. The null hypothesis is δ 0, where δ 0 is the minimum treatment effect sought by the trial and δ = p 2 - p 1 is the true treatment difference. We use a simple terminal loss function reflecting the hypothesis-testing goal of the trial, and the total cost of the trial is the final sample size plus the terminal loss function. The stopping and decision rules that minimize the expectation of the total cost are determined by backward induction. Monte Carlo simulation is used to compare Bayesian and frequentist erro...

Journal ArticleDOI
TL;DR: In this article, a new type of indirect inverse analysis procedure is proposed to overcome the difficulties the geotechnical inverse analyses are encountering (such as unstability and non-uniqueness of the solutions as well as multicollinearity).
Abstract: A new type of indirect inverse analysis procedure is proposed to overcome the difficulties the geotechnical inverse analyses are encountering (such as unstability and non-uniqueness of the solutions as well as multicollinearity). These difficulties are eased by combining the objective information (i.e. the observation data) and the subjective information (i.e. the prior information) in an appropriate manner by so-called extended Bayesian method. The method is based on a new view on Bayesian model proposed by Akaike. The problem of model identification in the inverse analysis is also tackled by applying well-known AIC but of the Bayesian version. A case study on an embankment on soft clay is presented to illustrate the effectiveness of the new method. A rather thorough review on the geotechnical inverse analysis is also presented to indicate the necessity of the proposed procedure. An appendix is attached to summarize the statistical background of the new method.

Book ChapterDOI
TL;DR: This chapter focuses on the case where none of the hypotheses is a particular case of another one, the basis of hypothesis testing theory, to introduce the probability of errors.
Abstract: Publisher Summary The comparison of different hypotheses, i.e. of competing models, is the basis of model specification. It may be performed along two main lines. The first one consists in associating with each model a loss function and in retaining the specification implying the smallest (estimated) loss. In practice, the loss function is defined either by updating some a priori knowledge on the models given the available observations (the Bayesian point of view), or by introducing some criterion taking into account the trade-off between the goodness of fit and the complexity of the model. The second approach is hypothesis testing theory. However, the determination of the decision rule is not done on the same basis as model choice. The basis of hypothesis testing theory is to introduce the probability of errors. This chapter focuses on the case where none of the hypotheses is a particular case of another one.

Journal ArticleDOI
TL;DR: A Bayesian analysis of traits influenced by both maternal and direct genetic effects is presented in a Bayesian setting, giving the possibility of exact marginal inference on (co)variance components of interest as opposed to results of REML analysis, where only joint inferences are possible.
Abstract: A method for analyzing traits influenced by both maternal and direct genetic effects is presented in a Bayesian setting. A Bayesian analysis requires full marginalization of the joint posterior density. The necessary multidimensional integrations were carried out using the Gibbs sampler. This gives the possibility of exact marginal inference on (co)variance components of interest as opposed to results of REML analysis, where only joint inferences are possible. The method is illustrated by an example on growth in sheep.

Journal ArticleDOI
01 Feb 1994


Journal ArticleDOI
TL;DR: In this article, various approaches to the development of a non-informative prior for the AR(1) model are considered and compared, with particular attention given to the reference prior approach, which seems to work well for the stationary case but encounters difficulties in the explosive case.
Abstract: Various approaches to the development of a noninformative prior for the AR(1) model are considered and compared. Particular attention is given to the reference prior approach, which seems to work well for the stationary case but encounters difficulties in the explosive case. A symmetrized (proper) version of the stationary reference prior is ultimately recommended for the problem. Bayesian testing of the unit root, stationary, and explosive hypotheses is considered also. Bounds on the Bayes factors are developed and shown to yield answers that appear to conflict with classical tests.

Book ChapterDOI
10 Jul 1994
TL;DR: It is demonstrated that a particular MBR system called PEBLS works comparatively well on a wide range of domains using both real and artificial data and can learn natural concept classes that the Bayesian classifier cannot learn.
Abstract: We quantify both experimentally and analytically the performance of memory-based reasoning (MBR) algorithms. To start gaining insight into the capabilities of MBR algorithms, we compare an MBR algorithm using a value difference metric to a popular Bayesian classifier. These two approaches are similar in that they both make certain independence assumptions about the data. However, whereas MBR uses specific cases to perform classification, Bayesian methods summarize the data probabilistically. We demonstrate that a particular MBR system called PEBLS works comparatively well on a wide range of domains using both real and artificial data. With respect to the artificial data, we consider distributions where the concept classes are separated by functional discriminants, as well as time-series data generated by Markov models of varying complexity. Finally, we show formally that PEBLS can learn (in the limit) natural concept classes that the Bayesian classifier cannot learn, and that it will attain perfect accuracy whenever Bayes does.