scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 2002"


Book
12 Nov 2002
TL;DR: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS- sVM for Unsupervised Learning LS- SVM for Recurrent Networks and Control.
Abstract: Support Vector Machines Basic Methods of Least Squares Support Vector Machines Bayesian Inference for LS-SVM Models Robustness Large Scale Problems LS-SVM for Unsupervised Learning LS-SVM for Recurrent Networks and Control.

2,983 citations


Journal ArticleDOI
TL;DR: The Bayesian inference of phylogeny appears to possess advantages over the other methods in terms of ability to use complex models of evolution, ease of interpretation of the results, and computational efficiency.
Abstract: Only recently has Bayesian inference of phylogeny been proposed. The method is now a practical alternative to the other methods; indeed, the method appears to possess advantages over the other methods in terms of ability to use complex models of evolution, ease of interpretation of the results, and computational efeciency. However, the method should be used cautiously. The results of a Bayesian analysis should be examined with respect to the sensitivity of the results to the priors used and the reliability of the Markov chain Monte Carlo approximation of the probabilities of trees. (Bayesian inference; Markov chain Monte Carlo; phylogeny; posterior probability.)

798 citations


Journal ArticleDOI
TL;DR: A series of models that exemplify the diversity of problems that can be addressed within the empirical Bayesian framework are presented, using PET data to show how priors can be derived from the between-voxel distribution of activations over the brain.

744 citations


Journal ArticleDOI
TL;DR: Two inferential approaches to this problem are discussed: an empirical Bayes method that requires very little a priori Bayesian modeling, and the frequentist method of “false discovery rates” proposed by Benjamini and Hochberg in 1995.
Abstract: In a classic two-sample problem, one might use Wilcoxon's statistic to test for a difference between treatment and control subjects. The analogous microarray experiment yields thousands of Wilcoxon statistics, one for each gene on the array, and confronts the statistician with a difficult simultaneous inference situation. We will discuss two inferential approaches to this problem: an empirical Bayes method that requires very little a priori Bayesian modeling, and the frequentist method of "false discovery rates" proposed by Benjamini and Hochberg in 1995. It turns out that the two methods are closely related and can be used together to produce sensible simultaneous inferences.

687 citations


BookDOI
29 May 2002
TL;DR: Bayesian Decision Theory Introducing Decision Theory Basic Definitions Regression-Style Models with Decision Theory James-Stein Estimation Empirical Bayes Exercises Monte Carlo and Related Iterative Methods.
Abstract: BACKGROUND AND INTRODUCTION Introduction Motivation and Justification Why Are We Uncertain about Probability? Bayes' Law Conditional Inference with Bayes' Law Historical Comments The Scientific Process in Our Social Sciences Introducing Markov Chain Monte Carlo Techniques Exercises SPECIFYING BAYESIAN MODELS Purpose Likelihood Theory and Estimation The Basic Bayesian Framework Bayesian "Learning" Comments on Prior Distributions Bayesian versus Non-Bayesian Approaches Exercises Computational Addendum: R for Basic Analysis THE NORMAL AND STUDENT'S-T MODELS Why Be Normal? The Normal Model with Variance Known The Normal Model with Mean Known The Normal Model with Both Mean and Variance Unknown Multivariate Normal Model, and S Both Unknown Simulated Effects of Differing Priors Some Normal Comments The Student's t Model Normal Mixture Models Exercises Computational Addendum: Normal Examples THE BAYESIAN LINEAR MODEL The Basic Regression Model Posterior Predictive Distribution for the Data The Bayesian Linear Regression Model with Heteroscedasticity Exercises Computational Addendum THE BAYESIAN PRIOR A Prior Discussion of Priors A Plethora of Priors Conjugate Prior Forms Uninformative Prior Distributions Informative Prior Distributions Hybrid Prior Forms Nonparametric Priors Bayesian Shrinkage Exercises ASSESSING MODEL QUALITY Motivation Basic Sensitivity Analysis Robustness Evaluation Comparing Data to the Posterior Predictive Distribution Simple Bayesian Model Averaging Concluding Comments on Model Quality Exercises Computational Addendum BAYESIAN HYPOTHESIS TESTING AND THE BAYES' FACTOR Motivation Bayesian Inference and Hypothesis Testing The Bayes' Factor as Evidence The Bayesian Information Criterion (BIC) The Deviance Information Criterion (DIC) Comparing Posteriors with the Kullback-Leibler Distance Laplace Approximation of Bayesian Posterior Densities Exercises Bayesian Decision Theory Introducing Decision Theory Basic Definitions Regression-Style Models with Decision Theory James-Stein Estimation Empirical Bayes Exercises Monte Carlo and Related Iterative Methods Background Basic Monte Carlo Integration Rejection Sampling Classical Numerical Integration Gaussian Quadrature Importance Sampling/Sampling Importance Resampling Mode Finding and the EM Algorithm Survey of Random Number Generation Concluding Remarks Exercises Computational Addendum: R Code for Importance Sampling BASICS OF MARKOV CHAIN MONTE CARLO Who Is Markov and What Is He Doing with Chains? General Properties of Markov Chains The Gibbs Sampler The Metropolis-Hastings Algorithm The Hit-and-Run Algorithm The Data Augmentation Algorithm Historical Comments Exercises Computational Addendum: Simple R Graphing Routines for MCMC Implementing Bayesian Models with Markov Chain Monte Carlo Introduction to Bayesian Software Solutions It's Only a Name: BUGS Model Specification with BUGS Differences between WinBUGS and JAGS Code Technical Background about the Algorithm Epilogue Exercises BAYESIAN HIERARCHICAL MODELS Introduction to Multilevel Models Standard Multilevel Linear Models A Poisson-Gamma Hierarchical Model The General Role of Priors and Hyperpriors Exchangeability Empirical Bayes Exercises Computational Addendum: Instructions for Running JAGS, Trade Data Model SOME MARKOV CHAIN MONTE CARLO THEORY Motivation Measure and Probability Preliminaries Specific Markov Chain Properties Defining and Reaching Convergence Rates of Convergence Implementation Concerns Exercises UTILITARIAN MARKOV CHAIN MONTE CARLO Practical Considerations and Admonitions Assessing Convergence of Markov Chains Mixing and Acceleration Producing the Marginal Likelihood Integral from Metropolis- Hastings Output Rao-Blackwellizing for Improved Variance Estimation Exercises Computational Addendum: R Code for the Death Penalty Support Model and BUGS Code for the Military Personnel Model Markov Chain Monte Carlo Extensions Simulated Annealing Reversible Jump Algorithms Perfect Sampling Exercises APPENDIX A: GENERALIZED LINEAR MODEL REVIEW Terms The Generalized Linear Model Numerical Maximum Likelihood Quasi-Likelihood Exercises R for Generalized Linear Models APPENDIX B: COMMON PROBABILITY DISTRIBUTIONS REFERENCES AUTHOR INDEX SUBJECT INDEX

676 citations


Journal ArticleDOI
TL;DR: In this article, an adaptive Markov chain approach is proposed to evaluate the desired integral that is based on the Metropolis-Hastings algorithm and a concept similar to simulated annealing.
Abstract: In a full Bayesian probabilistic framework for "robust" system identification, structural response predictions and performance reliability are updated using structural test data D by considering the predictions of a whole set of possible structural models that are weighted by their updated probability. This involves integrating h(θ)p(θ|D) over the whole parameter space, where θ is a parameter vector defining each model within the set of possible models of the structure, h(θ) is a model prediction of a response quantity of interest, and p(θ|D) is the updated probability density for θ, which provides a measure of how plausible each model is given the data D. The evaluation of this integral is difficult because the dimension of the parameter space is usually too large for direct numerical integration and p(θ|D) is concentrated in a small region in the parameter space and only known up to a scaling constant. An adaptive Markov chain Monte Carlo simulation approach is proposed to evaluate the desired integral that is based on the Metropolis-Hastings algorithm and a concept similar to simulated annealing. By carrying out a series of Markov chain simulations with limiting stationary distributions equal to a sequence of intermediate probability densities that converge on p(θ|D), the region of concentration of p(θ|D) is gradually portrayed. The Markov chain samples are used to estimate the desired integral by statistical averaging. The method is illustrated using simulated dynamic test data to update the robust response variance and reliability of a moment-resisting frame for two cases: one where the model is only locally identifiable based on the data and the other where it is unidentifiable.

671 citations


Journal ArticleDOI
TL;DR: The procedures used in conventional data analysis are formulated in terms of hierarchical linear models and a connection between classical inference and parametric empirical Bayes (PEB) through covariance component estimation is established through covariances component estimation.

647 citations


Journal ArticleDOI
TL;DR: A new technique for simulation smoother in state space time series analysis is presented which is both simple and computationally efficient and includes models with diffuse initial conditions and regression effects.
Abstract: A simulation smoother in state space time series analysis is a procedure for drawing samples from the conditional distribution of state or disturbance vectors given the observations. We present a new technique for this which is both simple and computationally efficient. The treatment includes models with diffuse initial conditions and regression effects. Computational comparisons are made with the previous standard method. Two applications are provided to illustrate the use of the simulation smoother for Gibbs sampling for Bayesian inference and importance sampling for classical inference. © 2002 Biometrika Trust.

595 citations


Proceedings Article
01 Jan 2002
TL;DR: A framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, which allows for Bayesian model selection and is less complex in implementation is presented.
Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated in O(d) rather than O(n), d ≪ n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities ('error bars'), allows for Bayesian model selection and is less complex in implementation.

590 citations


Journal ArticleDOI
TL;DR: Several MCMC methods for estimating probabilities of models and associated 'model-averaged' posterior distributions in the presence of model uncertainty are discussed, compare, develop and illustrate, focussed on connections between them.
Abstract: Several MCMC methods have been proposed for estimating probabilities of models and associated ‘model-averaged’ posterior distributions in the presence of model uncertainty. We discuss, compare, develop and illustrate several of these methods, focussing on connections between them.

479 citations


Journal ArticleDOI
TL;DR: In this paper, the authors use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of model uncertainty and show that the out-of-sample performance of the Bayesian approach is superior to that of model selection criteria.

Journal ArticleDOI
TL;DR: In this paper, a Markov chain Monte Carlo (MCMC) algorithm is applied to the nonlinear problem of inverting DC resistivity sounding data to infer characteristics of a 1-D earth model.
Abstract: Summary A key element in the solution of a geophysical inverse problem is the quantification of non-uniqueness, that is, how much parameters of an inferred earth model can vary while fitting a set of measurements. A widely used approach is that of Bayesian inference, where Bayes' rule is used to determine the uncertainty of the earth model parameters a posteriori given the data. I describe here, a natural extension of Bayesian parameter estimation that accounts for the posterior probability of how complex an earth model is (specifically, how many layers it contains). This approach has a built-in parsimony criterion: among all earth models that fit the data, those with fewer parameters (fewer layers) have higher posterior probabilities. To implement this approach in practice, I use a Markov chain Monte Carlo (MCMC) algorithm applied to the nonlinear problem of inverting DC resistivity sounding data to infer characteristics of a 1-D earth model. The earth model is parametrized as a layered medium, where the number of layers and their resistivities and thicknesses are poorly known a priori. The algorithm obtains a sample of layered media from the posterior distribution; this sample measures non-uniqueness in terms of how many layers are effectively resolved by the data and of the range of layer thicknesses and resistivities consistent with the data. Because the complexity of the model is effectively determined by the data, the solution does not need to be regularized. This is a desirable feature, because requiring the solution to be smooth beyond what is implied by prior information can lead to underestimating posterior uncertainty. Letting the number of layers be a free parameter, as done here, broadens the space of earth models possible a priori and makes the determination of posterior uncertainty less dependent on the parametrization.

Book
15 Feb 2002
TL;DR: The topics covered include Bayesian and information-theoretic models of perception, probabilistic theories of neural coding and spike timing, computational models of lateral and cortico-cortical feedback connections, and the development of receptive field properties from natural signals.
Abstract: Neurophysiological, neuroanatomical, and brain imaging studies have helped to shed light on how the brain transforms raw sensory information into a form that is useful for goal-directed behavior. A fundamental question that is seldom addressed by these studies, however, is why the brain uses the types of representations it does and what evolutionary advantage, if any, these representations confer. It is difficult to address such questions directly via animal experiments. A promising alternative is to use probabilistic principles such as maximum likelihood and Bayesian inference to derive models of brain function. This book surveys some of the current probabilistic approaches to modeling and understanding brain function. Although most of the examples focus on vision, many of the models and techniques are applicable to other modalities as well. The book presents top-down computational models as well as bottom-up neurally motivated models of brain function. The topics covered include Bayesian and information-theoretic models of perception, probabilistic theories of neural coding and spike timing, computational models of lateral and cortico-cortical feedback connections, and the development of receptive field properties from natural signals.

Journal ArticleDOI
TL;DR: This study conducted experiments in which human participants trace perceived contours in natural images, and employed this novel methodology to investigate the inferential power of three classical Gestalt cues for contour grouping: proximity, good continuation, and luminance similarity.
Abstract: Although numerous studies have measured the strength of visual grouping cues for controlled psychophysical stimuli, little is known about the statistical utility of these various cues for natural images. In this study, we conducted experiments in which human participants trace perceived contours in natural images. These contours are automatically mapped to sequences of discrete tangent elements detected in the image. By examining relational properties between pairs of successive tangents on these traced curves, and between randomly selected pairs of tangents, we are able to estimate the likelihood distributions required to construct an optimal Bayesian model for contour grouping. We employed this novel methodology to investigate the inferential power of three classical Gestalt cues for contour grouping: proximity, good continuation, and luminance similarity. The study yielded a number of important results: (1) these cues, when appropriately defined, are approximately uncorrelated, suggesting a simple factorial model for statistical inference; (2) moderate image-to-image variation of the statistics indicates the utility of general probabilistic models for perceptual organization; (3) these cues differ greatly in their inferential power, proximity being by far the most powerful; and (4) statistical modeling of the proximity cue indicates a scale-invariant power law in close agreement with prior psychophysics.

Journal ArticleDOI
TL;DR: The method is demonstrated using an input-state-output model of the hemodynamic coupling between experimentally designed causes or factors in fMRI studies and the ensuing BOLD response, and extends classical inference to more plausible inferences about the parameters of the model given the data.

Journal ArticleDOI
TL;DR: In this paper, the Bayesian evidence framework is combined with the least-squares support vector machines (SVM) classifier formulation, and analytic expressions are obtained in the dual space on the different levels of Bayesian inference, while posterior class probabilities are obtained by marginalizing over the model parameters.
Abstract: The Bayesian evidence framework has been successfully applied to the design of multilayer perceptrons (MLPs) in the work of MacKay. Nevertheless, the training of MLPs suffers from drawbacks like the nonconvex optimization problem and the choice of the number of hidden units. In support vector machines (SVMs) for classification, as introduced by Vapnik, a nonlinear decision boundary is obtained by mapping the input vector first in a nonlinear way to a high-dimensional kernel-induced feature space in which a linear large margin classifier is constructed. Practical expressions are formulated in the dual space in terms of the related kernel function, and the solution follows from a (convex) quadratic programming (QP) problem. In least-squares SVMs (LS-SVMs), the SVM problem formulation is modified by introducing a least-squares cost function and equality instead of inequality constraints, and the solution follows from a linear system in the dual space. Implicitly, the least-squares formulation corresponds to a regression formulation and is also related to kernel Fisher discriminant analysis. The least-squares regression formulation has advantages for deriving analytic expressions in a Bayesian evidence framework, in contrast to the classification formulations used, for example, in gaussian processes (GPs). The LS-SVM formulation has clear primal-dual interpretations, and without the bias term, one explicitly constructs a model that yields the same expressions as have been obtained with GPs for regression. In this article, the Bayesian evidence framework is combined with the LS-SVM classifier formulation. Starting from the feature space formulation, analytic expressions are obtained in the dual space on the different levels of Bayesian inference, while posterior class probabilities are obtained by marginalizing over the model parameters. Empirical results obtained on 10 public domain data sets show that the LS-SVM classifier designed within the bayesian evidence framework consistently yields good generalization performances.

Journal ArticleDOI
Philip H. S. Torr1
TL;DR: This paper explores ways of automating the model selection process with specific emphasis on the least squares problem of fitting manifolds to data points, illustrated with respect to epipolar geometry.
Abstract: Computer vision often involves estimating models from visual input. Sometimes it is possible to fit several different models or hypotheses to a set of data, and a decision must be made as to which is most appropriate. This paper explores ways of automating the model selection process with specific emphasis on the least squares problem of fitting manifolds (in particular algebraic varieties e.g. lines, algebraic curves, planes etc.) to data points, illustrated with respect to epipolar geometry. The approach is Bayesian and the contribution three fold, first a new Bayesian description of the problem is laid out that supersedes the author's previous maximum likelihood formulations, this formulation will reveal some hidden elements of the problem. Second an algorithm, ‘MAPSAC’, is provided to obtain the robust MAP estimate of an arbitrary manifold. Third, a Bayesian model selection paradigm is proposed, the Bayesian formulation of the manifold fitting problem uncovers an elegant solution to this problem, for which a new method ‘GRIC’ for approximating the posterior probability of each putative model is derived. This approximations bears some similarity to the penalized likelihoods used by AIC, BIC and MDL however it is far more accurate in situations involving large numbers of latent variables whose number increases with the data. This will be empirically and theoretically demonstrated.

Journal ArticleDOI
Martijn Cremers1
TL;DR: The authors compare the prior views of a skeptic and a confident investor and find that the prior probabilities are in general more supportive of stock return predictability than the priors for both types of investors.
Abstract: Attempts to characterize stock return predictability have resulted in little consensus on the important conditioning variables, giving rise to model uncertainty and data snooping fears. We introduce a new methodology that explicitly incorporates model uncertainty by comparing all possible models simultaneously and in which the priors are calibrated to reflect economically meaningful information. Our approach minimizes data snooping given the information set and the priors. We compare the prior views of a skeptic and a confident investor. The data imply posterior probabilities that are in general more supportive of stock return predictability than the priors for both types of investors. Copyright 2002, Oxford University Press.

Journal ArticleDOI
TL;DR: The DY-conjugate prior for non-decomposable models is derived and it is shown that it can be regarded as a generalization to an arbitrary graph G of the hyper inverse Wishart distribution (Dawid & Lauritzen, 1993).
Abstract: While conjugate Bayesian inference in decomposable Gaussian graphical models is largely solved, the non-decomposable case still poses difficulties concerned with the specification of suitable priors and the evaluation of normalizing constants. In this paper we derive the DY-conjugate prior (Diaconis & Ylvisaker, 1979) for non-decomposable models and show that it can be regarded as a generalization to an arbitrary graph G of the hyper inverse Wishart distribution (Dawid & Lauritzen, 1993). In particular, if G is an incomplete prime graph it constitutes a non-trivial generalization of the inverse Wishart distribution. Inference based on marginal likelihood requires the evaluation of a normalizing constant and we propose an importance sampling algorithm for its computation. Examples of structural learning involving non-decomposable models are given. In order to deal efficiently with the set of all positive definite matrices with non-decomposable zero-pattern we introduce the operation of triangular completion of an incomplete triangular matrix. Such a device turns out to be extremely useful both in the proof of theoretical results and in the implementation of the Monte Carlo procedure.

Journal ArticleDOI
TL;DR: A Bayesian approach to tracking the direction-of-arrival (DOA) of multiple moving targets using a passive sensor array using a collection of target states that can be viewed as samples from the posterior of interest.
Abstract: We present a Bayesian approach to tracking the direction-of-arrival (DOA) of multiple moving targets using a passive sensor array. The prior is a description of the dynamic behavior we expect for the targets which is modeled as constant velocity motion with a Gaussian disturbance acting on the target's heading direction. The likelihood function is arrived at by defining an uninformative prior for both the signals and noise variance and removing these parameters from the problem by marginalization. Advances in sequential Monte Carlo (SMC) techniques, specifically the particle filter algorithm, allow us to model and track the posterior distribution defined by the Bayesian model using a collection of target states that can be viewed as samples from the posterior of interest. We describe two versions of this algorithm and finally present results obtained using synthetic data.

Journal ArticleDOI
TL;DR: This article presents a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions and, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.
Abstract: Bayesian inference is becoming a common statistical approach to phylogenetic estimation because, among other reasons, it allows for rapid analysis of large data sets with complex evolutionary models. Conveniently, Bayesian phylogenetic methods use currently available stochastic models of sequence evolution. However, as with other model-based approaches, the results of Bayesian inference are conditional on the assumed model of evolution: inadequate models (models that poorly fit the data) may result in erroneous inferences. In this article, I present a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions. By evaluating a model's posterior predictive performance, an adequate model can be selected for a Bayesian phylogenetic study. Although I present a single test statistic that assesses the overall (global) performance of a phylogenetic model, a variety of test statistics can be tailored to evaluate specific features (local performance) of evolutionary models to identify sources failure. The method presented here, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.

Journal ArticleDOI
TL;DR: A framework for interpreting Support Vector Machines as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors is described, which allows Bayesian methods to be used for tackling two of the outstanding challenges in SVM classification: how to tune hyperparameters and how to obtain predictive class probabilities.
Abstract: I describe a framework for interpreting Support Vector Machines (SVMs) as maximum a posteriori (MAP) solutions to inference problems with Gaussian Process priors. This probabilistic interpretation can provide intuitive guidelines for choosing a ‘good’ SVM kernel. Beyond this, it allows Bayesian methods to be used for tackling two of the outstanding challenges in SVM classification: how to tune hyperparameters—the misclassification penalty C, and any parameters specifying the ernel—and how to obtain predictive class probabilities rather than the conventional deterministic class label predictions. Hyperparameters can be set by maximizing the evidences I explain how the latter can be defined and properly normalized. Both analytical approximations and numerical methods (Monte Carlo chaining) for estimating the evidence are discussed. I also compare different methods of estimating class probabilities, ranging from simple evaluation at the MAP or at the posterior average to full averaging over the posterior. A simple toy application illustrates the various concepts and techniques.

Journal ArticleDOI
TL;DR: This work corrects two misconceptions propagated in recent work: normalized frequencies have been mistaken for natural frequencies and, as a consequence, "nested sets" and the "subset principle" have been proposed as new explanations.

Journal ArticleDOI
TL;DR: The basic ideas of MCMC and software BUGS (Bayesian inference using Gibbs sampling) are introduced, stressing that a simple and satisfactory intuition for MCMC does not require extraordinary mathemat- ical sophistication.
Abstract: Markov chain Monte Carlo (MCMC) is a statistical innovation that allows researchers to fit far more com- plex models to data than is feasible using conventional methods. Despite its widespread use in a variety of scien- tific fields, MCMC appears to be underutilized in wildlife applications. This may be due to a misconception that MCMC requires the adoption of a subjective Bayesian analysis, or perhaps simply to its lack of familiarity among wildlife researchers. We introduce the basic ideas of MCMC and software BUGS (Bayesian inference using Gibbs sampling), stressing that a simple and satisfactory intuition for MCMC does not require extraordinary mathemat- ical sophistication. We illustrate the use of MCMC with an analysis of the association between latent factors gov- erning individual heterogeneity in breeding and survival rates of kittiwakes (Rissa tridactyla). We conclude with a discussion of the importance of individual heterogeneity for understanding population dynamics and designing management plans.

Journal ArticleDOI
TL;DR: This work proposes an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions, and discusses the probabilistic assumptions made and properties of two practical cross- validate methods, importance sampling and k-fold cross- validation.
Abstract: In this work, we discuss practical methods for the assessment, comparison, and selection of complex hierarchical Bayesian models. A natural way to assess the goodness of the model is to estimate its future predictive capability by estimating expected utilities. Instead of just making a point estimate, it is important to obtain the distribution of the expected utility estimate because it describes the uncertainty in the estimate. The distributions of the expected utility estimates can also be used to compare models, for example, by computing the probability of one model having a better expected utility than some other model. We propose an approach using cross-validation predictive densities to obtain expected utility estimates and Bayesian bootstrap to obtain samples from their distributions. We also discuss the probabilistic assumptions made and properties of two practical cross-validation methods, importance sampling and k-fold cross-validation. As illustrative examples, we use multilayer perceptron neural networks and gaussian processes with Markov chain Monte Carlo sampling in one toy problem and two challenging real-world problems.

Journal ArticleDOI
TL;DR: In this paper, several state-of-the-art binary classification techniques are experimentally evaluated in the context of expert automobile insurance claim fraud detection and compared in terms of mean percentage correctly classified (PCC) and mean area under the receiver operating characteristic (AUROC) curve using a stratified, blocked, tenfold cross-validation experiment.
Abstract: Several state–of–the–art binary classification techniques are experimentally evaluated in the context of expert automobile insurance claim fraud detection. The predictive power of logistic regression, C4.5 decision tree, k–nearest neighbor, Bayesian learning multilayer perceptron neural network, least–squares support vector machine, naive Bayes, and tree–augmented naive Bayes classification is contrasted. For most of these algorithm types, we report on several operationalizations using alternative hyperparameter or design choices. We compare these in terms of mean percentage correctly classified (PCC) and mean area under the receiver operating characteristic (AUROC) curve using a stratified, blocked, ten–fold cross–validation experiment. We also contrast algorithm type performance visually by means of the convex hull of the receiver operating characteristic (ROC) curves associated with the alternative operationalizations per algorithm type. The study is based on a data set of 1,399 personal injury protection claims from 1993 accidents collected by the Automobile Insurers Bureau of Massachusetts. To stay as close to real–life operating conditions as possible, we consider only predictors that are known relatively early in the life of a claim. Furthermore, based on the qualification of each available claim by both a verbal expert assessment of suspicion of fraud and a ten–point–scale expert suspicion score, we can compare classification for different target/class encoding schemes. Finally, we also investigate the added value of systematically collecting nonflag predictors for suspicion of fraud modeling purposes. From the observed results, we may state that: (1) independent of the target encoding scheme and the algorithm type, the inclusion of nonflag predictors allows us to significantly boost predictive performance; (2) for all the evaluated scenarios, the performance difference in terms of mean PCC and mean AUROC between many algorithm type operationalizations turns out to be rather small; visual comparison of the algorithm type ROC curve convex hulls also shows limited difference in performance over the range of operating conditions; (3) relatively simple and efficient techniques such as linear logistic regression and linear kernel least–squares support vector machine classification show excellent overall predictive capabilities, and (smoothed) naive Bayes also performs well; and (4) the C4.5 decision tree operationalization results are rather disappointing; none of the tree operationalizations are capable of attaining mean AUROC performance in line with the best. Visual inspection of the evaluated scenarios reveals that the C4.5 algorithm type ROC curve convex hull is often dominated in large part by most of the other algorithm type hulls.

Journal ArticleDOI
TL;DR: Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.
Abstract: Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution. This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.

Journal ArticleDOI
TL;DR: A variational Bayes (VB) learning algorithm for generalized autoregressive (GAR) models that reduces to the Bayesian evidence framework for Gaussian noise and uninformative priors and weight precisions and is applied to synthetic and real data with encouraging results.
Abstract: We describe a variational Bayes (VB) learning algorithm for generalized autoregressive (GAR) models. The noise is modeled as a mixture of Gaussians rather than the usual single Gaussian. This allows different data points to be associated with different noise levels and effectively provides robust estimation of AR coefficients. The VB framework is used to prevent overfitting and provides model-order selection criteria both for AR order and noise model order. We show that for the special case of Gaussian noise and uninformative priors on the noise and weight precisions, the VB framework reduces to the Bayesian evidence framework. The algorithm is applied to synthetic and real data with encouraging results.

Journal ArticleDOI
TL;DR: In this article, a new method of developing prior distributions for the model parameters is presented, called the expected-posterior prior approach, which defines the priors for all models from a common underlying predictive distribution in such a way that the resulting priors are amenable to modern Markov chain Monte Carlo computational techniques.
Abstract: SUMMARY We consider the problem of comparing parametric models using a Bayesian approach. A new method of developing prior distributions for the model parameters is presented, called the expected-posterior prior approach. The idea is to define the priors for all models from a common underlying predictive distribution, in such a way that the resulting priors are amenable to modern Markov chain Monte Carlo computational techniques. The approach has subjective Bayesian and default Bayesian implementations, and overcomes the most significant impediment to Bayesian model selection, that of ensuring that prior distributions for the various models are appropriately compatible.

Journal ArticleDOI
TL;DR: In this paper, an axiomatization of subjective expected utility and Bayesian updating in a conditional decision problem is presented, which improves our understanding of the Bayesian standard from two perspectives: 1) it uses a set of axioms which are weak and intuitive; and 2) it provides a formal proof to results on the relation between dynamic consistency, expected utility, and bayesian updating.
Abstract: I present an axiomatization of subjective expected utility and Bayesian updating in a conditional decision problem. This result improves our understanding of the Bayesian standard from two perspectives: 1) it uses a set of axioms which are weak and intuitive; 2) it provides a formal proof to results on the relation between dynamic consistency, expected utility and Bayesian updating which have never been explicitly proved in a fully subjective framework.