scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 1997"


BookDOI
TL;DR: The Markov Chain Monte Carlo Implementation Results Summary and Discussion MEDICAL MONITORING Introduction Modelling Medical Monitoring Computing Posterior Distributions Forecasting Model Criticism Illustrative Application Discussion MCMC for NONLINEAR HIERARCHICAL MODELS.
Abstract: INTRODUCING MARKOV CHAIN MONTE CARLO Introduction The Problem Markov Chain Monte Carlo Implementation Discussion HEPATITIS B: A CASE STUDY IN MCMC METHODS Introduction Hepatitis B Immunization Modelling Fitting a Model Using Gibbs Sampling Model Elaboration Conclusion MARKOV CHAIN CONCEPTS RELATED TO SAMPLING ALGORITHMS Markov Chains Rates of Convergence Estimation The Gibbs Sampler and Metropolis-Hastings Algorithm INTRODUCTION TO GENERAL STATE-SPACE MARKOV CHAIN THEORY Introduction Notation and Definitions Irreducibility, Recurrence, and Convergence Harris Recurrence Mixing Rates and Central Limit Theorems Regeneration Discussion FULL CONDITIONAL DISTRIBUTIONS Introduction Deriving Full Conditional Distributions Sampling from Full Conditional Distributions Discussion STRATEGIES FOR IMPROVING MCMC Introduction Reparameterization Random and Adaptive Direction Sampling Modifying the Stationary Distribution Methods Based on Continuous-Time Processes Discussion IMPLEMENTING MCMC Introduction Determining the Number of Iterations Software and Implementation Output Analysis Generic Metropolis Algorithms Discussion INFERENCE AND MONITORING CONVERGENCE Difficulties in Inference from Markov Chain Simulation The Risk of Undiagnosed Slow Convergence Multiple Sequences and Overdispersed Starting Points Monitoring Convergence Using Simulation Output Output Analysis for Inference Output Analysis for Improving Efficiency MODEL DETERMINATION USING SAMPLING-BASED METHODS Introduction Classical Approaches The Bayesian Perspective and the Bayes Factor Alternative Predictive Distributions How to Use Predictive Distributions Computational Issues An Example Discussion HYPOTHESIS TESTING AND MODEL SELECTION Introduction Uses of Bayes Factors Marginal Likelihood Estimation by Importance Sampling Marginal Likelihood Estimation Using Maximum Likelihood Application: How Many Components in a Mixture? Discussion Appendix: S-PLUS Code for the Laplace-Metropolis Estimator MODEL CHECKING AND MODEL IMPROVEMENT Introduction Model Checking Using Posterior Predictive Simulation Model Improvement via Expansion Example: Hierarchical Mixture Modelling of Reaction Times STOCHASTIC SEARCH VARIABLE SELECTION Introduction A Hierarchical Bayesian Model for Variable Selection Searching the Posterior by Gibbs Sampling Extensions Constructing Stock Portfolios With SSVS Discussion BAYESIAN MODEL COMPARISON VIA JUMP DIFFUSIONS Introduction Model Choice Jump-Diffusion Sampling Mixture Deconvolution Object Recognition Variable Selection Change-Point Identification Conclusions ESTIMATION AND OPTIMIZATION OF FUNCTIONS Non-Bayesian Applications of MCMC Monte Carlo Optimization Monte Carlo Likelihood Analysis Normalizing-Constant Families Missing Data Decision Theory Which Sampling Distribution? Importance Sampling Discussion STOCHASTIC EM: METHOD AND APPLICATION Introduction The EM Algorithm The Stochastic EM Algorithm Examples GENERALIZED LINEAR MIXED MODELS Introduction Generalized Linear Models (GLMs) Bayesian Estimation of GLMs Gibbs Sampling for GLMs Generalized Linear Mixed Models (GLMMs) Specification of Random-Effect Distributions Hyperpriors and the Estimation of Hyperparameters Some Examples Discussion HIERARCHICAL LONGITUDINAL MODELLING Introduction Clinical Background Model Detail and MCMC Implementation Results Summary and Discussion MEDICAL MONITORING Introduction Modelling Medical Monitoring Computing Posterior Distributions Forecasting Model Criticism Illustrative Application Discussion MCMC FOR NONLINEAR HIERARCHICAL MODELS Introduction Implementing MCMC Comparison of Strategies A Case Study from Pharmacokinetics-Pharmacodynamics Extensions and Discussion BAYESIAN MAPPING OF DISEASE Introduction Hypotheses and Notation Maximum Likelihood Estimation of Relative Risks Hierarchical Bayesian Model of Relative Risks Empirical Bayes Estimation of Relative Risks Fully Bayesian Estimation of Relative Risks Discussion MCMC IN IMAGE ANALYSIS Introduction The Relevance of MCMC to Image Analysis Image Models at Different Levels Methodological Innovations in MCMC Stimulated by Imaging Discussion MEASUREMENT ERROR Introduction Conditional-Independence Modelling Illustrative examples Discussion GIBBS SAMPLING METHODS IN GENETICS Introduction Standard Methods in Genetics Gibbs Sampling Approaches MCMC Maximum Likelihood Application to a Family Study of Breast Cancer Conclusions MIXTURES OF DISTRIBUTIONS: INFERENCE AND ESTIMATION Introduction The Missing Data Structure Gibbs Sampling Implementation Convergence of the Algorithm Testing for Mixtures Infinite Mixtures and Other Extensions AN ARCHAEOLOGICAL EXAMPLE: RADIOCARBON DATING Introduction Background to Radiocarbon Dating Archaeological Problems and Questions Illustrative Examples Discussion Index

7,399 citations


Book
01 Oct 1997
TL;DR: Model Adequacy Model Choice: MCMC Over Model and Parameter Spaces Convergence Acceleration Exercises Further topics in MCMC are explained.
Abstract: Introduction Stochastic simulation Introduction Generation of Discrete Random Quantities Generation of Continuous Random Quantities Generation of Random Vectors and Matrices Resampling Methods Exercises Bayesian Inference Introduction Bayes' Theorem Conjugate Distributions Hierarchical Models Dynamic Models Spatial Models Model Comparison Exercises Approximate methods of inference Introduction Asymptotic Approximations Approximations by Gaussian Quadrature Monte Carlo Integration Methods Based on Stochastic Simulation Exercises Markov chains Introduction Definition and Transition Probabilities Decomposition of the State Space Stationary Distributions Limiting Theorems Reversible Chains Continuous State Spaces Simulation of a Markov Chain Data Augmentation or Substitution Sampling Exercises Gibbs Sampling Introduction Definition and Properties Implementation and Optimization Convergence Diagnostics Applications MCMC-Based Software for Bayesian Modeling Appendix 5.A: BUGS Code for Example 5.7 Appendix 5.B: BUGS Code for Example 5.8 Exercises Metropolis-Hastings algorithms Introduction Definition and Properties Special Cases Hybrid Algorithms Applications Exercises Further topics in MCMC Introduction Model Adequacy Model Choice: MCMC Over Model and Parameter Spaces Convergence Acceleration Exercises References Author Index Subject Index

1,834 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of accounting for model uncertainty in linear regression models and propose two alternative approaches: the Occam's window approach and the Markov chain Monte Carlo approach.
Abstract: We consider the problem of accounting for model uncertainty in linear regression models. Conditioning on a single selected model ignores model uncertainty, and thus leads to the underestimation of uncertainty when making inferences about quantities of interest. A Bayesian solution to this problem involves averaging over all possible models (i.e., combinations of predictors) when making inferences about quantities of interest. This approach is often not practical. In this article we offer two alternative approaches. First, we describe an ad hoc procedure, “Occam's window,” which indicates a small set of models over which a model average can be computed. Second, we describe a Markov chain Monte Carlo approach that directly approximates the exact solution. In the presence of model uncertainty, both of these model averaging procedures provide better predictive performance than any single model that might reasonably have been selected. In the extreme case where there are many candidate predictors but ...

1,804 citations


Proceedings ArticleDOI
09 Jun 1997
TL;DR: The application of Bayesian regularization to the training of feedforward neural networks is described, using a Gauss-Newton approximation to the Hessian matrix to reduce the computational overhead.
Abstract: This paper describes the application of Bayesian regularization to the training of feedforward neural networks. A Gauss-Newton approximation to the Hessian matrix, which can be conveniently implemented within the framework of the Levenberg-Marquardt algorithm, is used to reduce the computational overhead. The resulting algorithm is demonstrated on a simple test problem and is then applied to three practical problems. The results demonstrate that the algorithm produces networks which have excellent generalization capabilities.

1,338 citations


Book
01 Jun 1997
TL;DR: The first principle of the Law of Likelihood as discussed by the authors is that the strength of evidence is limited by the expectation of the researcher's expectation, and the importance of the evidence is determined by the test of significance.
Abstract: The First Principle Introduction The Law of Likelihood Three Questions Towards Verification Relativity of Evidence Strength of Evidence Counterexamples Testing Simple Hypotheses Composite Hypotheses Another Counterexample Irrelevance of the Sample Space The Likelihood Principle Evidence and Uncertainty Summary Exercises Neyman-Pearson Theory Introduction Neyman-Pearson Statistical Theory Evidential Interpretation of Results of Neyman-Pearson Decision Procedures Neyman-Pearson Hypothesis Testing in Planning Experiments: Choosing the Sample Size Summary Exercises Fisherian Theory Introduction A Method for Measuring Statistical Evidence: The Test of Significance The Rationale for Significance Tests Troubles with p-Values Rejection Trials A Sample of Interpretations The Illogic of Rejection Trials Confidence Sets from Rejection Trials Alternative Hypothesis in Science Summary Paradigms for Statistics Introduction Three Paradigms An Alternative Paradigm Probabilities of Weak and Misleading Evidence: Normal Distribution Mean Understanding the Likelihood Paradigm Evidence about a Probability: Planning a Clinical Trial and Interpreting the Results Summary Exercises Resolving the Old Paradoxes Introduction Why is Power of Only 0.80 OK? Peeking at Data Repeated Tests Testing More than One Hypothesis What's Wrong with One-SIded Tests? Must the Significance Level be Predetermined? And is the Strength of Evidence Limited by the Researcher's Expectations? Summary Looking at Likelihoods Introduction Evidence about Hazard Rates in Two Factories Evidence about an Odds Ration A Standardized Mortality Rate Evidence about a Finite Population Total Determinants of Plans to Attend College Evidence about the Probabilities in a 2x2x2x2 Table Evidence from a Community Intervention Study of Hypertension Effects of Sugars on Growth of Pea Sections: Analysis of Variance Summary Exercises Nuisance Parameters Introduction Orthogonal Parameters Marginal Likelihoods Conditional Likelihoods Estimated Likelihoods Profile Likelihoods Synthetic Conditional Likelihoods Summary Exercises Bayesian Statistical Inference Introduction Bayesian Statistical Models Subjectivity in Bayesian Models The Trouble with Bayesian Statistics Are Likelihood Methods Bayesian? Objective Bayesian Inference Bayesian Integrated Likelihoods Summary Appendix: The Paradox of the Ravens

880 citations


Journal ArticleDOI
TL;DR: In this paper, the posterior for the number of components in a mixture of normals is not well defined, and posterior simulation does not provide a direct estimate of the posterior of the components in the mixture.
Abstract: Mixtures of normals provide a flexible model for estimating densities in a Bayesian framework. There are some difficulties with this model, however. First, standard reference priors yield improper posteriors. Second, the posterior for the number of components in the mixture is not well defined (if the reference prior is used). Third, posterior simulation does not provide a direct estimate of the posterior for the number of components. We present some practical methods for coping with these problems. Finally, we give some results on the consistency of the method when the maximum number of components is allowed to grow with the sample size.

545 citations


Journal ArticleDOI
TL;DR: In this article, Bayesian inferential methods for causal estimands in the presence of noncompliance are presented, where the binary treatment assignment is random and hence ignorable, but the treatment received is not ignorable.
Abstract: For most of this century, randomization has been a cornerstone of scientific experimentation, especially when dealing with humans as experimental units. In practice, however, noncompliance is relatively common with human subjects, complicating traditional theories of inference that require adherence to the random treatment assignment. In this paper we present Bayesian inferential methods for causal estimands in the presence of noncompliance, when the binary treatment assignment is random and hence ignorable, but the binary treatment received is not ignorable. We assume that both the treatment assigned and the treatment received are observed. We describe posterior estimation using EM and data augmentation algorithms. Also, we investigate the role of two assumptions often made in econometric instrumental variables analyses, the exclusion restriction and the monotonicity assumption, without which the likelihood functions generally have substantial regions of maxima. We apply our procedures to real and artificial data, thereby demonstrating the technology and showing that our new methods can yield valid inferences that differ in practically important ways from those based on previous methods for analysis in the presence of noncompliance, including intention-to-treat analyses and analyses based on econometric instrumental variables techniques. Finally, we perform a simulation to investigate the operating characteristics of the competing procedures in a simple setting, which indicates relatively dramatic improvements in frequency operating characteristics attainable using our Bayesian procedures.

542 citations


Journal ArticleDOI
TL;DR: It is argued that for many common machine learning problems, although in general the authors do not know the true (objective) prior for the problem, they do have some idea of a set of possible priors to which the true prior belongs.
Abstract: A Bayesian model of learning to learn by sampling from multiple tasks is presented. The multiple tasks are themselves generated by sampling from a distribution over an environment of related tasks. Such an environment is shown to be naturally modelled within a Bayesian context by the concept of an objective prior distribution. It is argued that for many common machine learning problems, although in general we do not know the true (objective) prior for the problem, we do have some idea of a set of possible priors to which the true prior belongs. It is shown that under these circumstances a learner can use Bayesian inference to learn the true prior by learning sufficiently many tasks from the environment. In addition, bounds are given on the amount of information required to learn a task when it is simultaneously learnt with several other tasks. The bounds show that if the learner has little knowledge of the true prior, but the dimensionality of the true prior is small, then sampling multiple tasks is highly advantageous. The theory is applied to the problem of learning a common feature set or equivalently a low-dimensional-representation (LDR) for an environment of related tasks.

496 citations


Dissertation
01 Jan 1997
TL;DR: It is shown that a Bayesian approach to learning in multi-layer perceptron neural networks achieves better performance than the commonly used early stopping procedure, even for reasonably short amounts of computation time.
Abstract: This thesis develops two Bayesian learning methods relying on Gaussian processes and a rigorous statistical approach for evaluating such methods. In these experimental designs the sources of uncertainty in the estimated generalisation performances due to both variation in training and test sets are accounted for. The framework allows for estimation of generalisation performance as well as statistical tests of significance for pairwise comparisons. Two experimental designs are recommended and supported by the DELVE software environment. Two new non-parametric Bayesian learning methods relying on Gaussian process priors over functions are developed. These priors are controlled by hyperparameters which set the characteristic length scale for each input dimension. In the simplest method, these parameters are fit from the data using optimization. In the second, fully Bayesian method, a Markov chain Monte Carlo technique is used to integrate over the hyperparameters. One advantage of these Gaussian process methods is that the priors and hyperparameters of the trained models are easy to interpret. The Gaussian process methods are benchmarked against several other methods, on regression tasks using both real data and data generated from realistic simulations. The experiments show that small datasets are unsuitable for benchmarking purposes because the uncertainties in performance measurements are large. A second set of experiments provide strong evidence that the bagging procedure is advantageous for the Multivariate Adaptive Regression Splines (MARS) method. The simulated datasets have controlled characteristics which make them useful for understanding the relationship between properties of the dataset and the performance of different methods. The dependency of the performance on available computation time is also investigated. It is shown that a Bayesian approach to learning in multi-layer perceptron neural networks achieves better performance than the commonly used early stopping procedure, even for reasonably short amounts of computation time. The Gaussian process methods are shown to consistently outperform the more conventional methods.

467 citations


Journal ArticleDOI
TL;DR: This paper introduces a Bayesian model selection approach that formalizes Occam's razor, choosing the simplest model that describes the data well, and can be applied to the comparison of non-nested models as well as nested ones.
Abstract: In mathematical modeling of cognition, it is important to have well-justified criteria for choosing among differing explanations (i.e., models) of observed data. This paper introduces a Bayesian model selection approach that formalizes Occam’s razor, choosing the simplest model that describes the data well. The choice of a model is carried out by taking into account not only the traditional model selection criteria (i.e., a model’s fit to the data and the number of parameters) but also the extension of the parameter space, and, most importantly, the functional form of the model (i.e., the way in which the parameters are combined in the model’s equation). An advantage of the approach is that it can be applied to the comparison of non-nested models as well as nested ones. Application examples are presented and implications of the results for evaluating models of cognition are discussed.

387 citations


Journal ArticleDOI
TL;DR: The proposed dynamic sampling algorithms use posterior samples from previous updating stages and exploit conditional independence between groups of parameters to allow samples of parameters no longer of interest to be discarded, such as when a patient dies or is discharged.
Abstract: In dynamic statistical modeling situations, observations arise sequentially, causing the model to expand by progressive incorporation of new data items and new unknown parameters. For example, in clinical monitoring, patients and data arrive sequentially, and new patient-specific parameters are introduced with each new patient. Markov chain Monte Carlo (MCMC) might be used for continuous updating of the evolving posterior distribution, but would need to be restarted from scratch at each expansion stage. Thus MCMC methods are often too slow for real-time inference in dynamic contexts. By combining MCMC with importance resampling, we show how real-time sequential updating of posterior distributions can be effected. The proposed dynamic sampling algorithms use posterior samples from previous updating stages and exploit conditional independence between groups of parameters to allow samples of parameters no longer of interest to be discarded, such as when a patient dies or is discharged. We apply the ...

Journal ArticleDOI
TL;DR: For the Cardiovascular Health Study, Bayesian model averaging predictively outperforms standard model selection and does a better job of assessing who is at high risk for a stroke.
Abstract: SUMMARY In the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for strokes, we apply Bayesian model averaging to the selection of variables in Cox proportional hazard models. We use an extension of the leaps-and-bounds algorithm for locating the models that are to be averaged over and make available S-PLUS software to implement the methods. Bayesian model averaging provides a posterior probability that each variable belongs in the model, a more directly interpretable measure of variable importance than a P-value. P-values from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable and do not account for model uncertainty. We introduce the partial predictive score to evaluate predictive performance. For the Cardiovascular Health Study, Bayesian model averaging predictively outperforms standard model selection and does a better

Journal ArticleDOI
TL;DR: This work proposes a new approach to cluster analysis which consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors from the output using the Laplace–Metropolis estimator, which works well in several real and simulated examples.
Abstract: A new approach to cluster analysis has been introduced based on parsimonious geometric modelling of the within-group covariance matrices in a mixture of multivariate normal distributions, using hierarchical agglomeration and iterative relocation. It works well and is widely used via the MCLUST software available in S-PLUS and StatLib. However, it has several limitations: there is no assessment of the uncertainty about the classification, the partition can be suboptimal, parameter estimates are biased, the shape matrix has to be specified by the user, prior group probabilities are assumed to be equal, the method for choosing the number of groups is based on a crude approximation, and no formal way of choosing between the various possible models is included. Here, we propose a new approach which overcomes all these difficulties. It consists of exact Bayesian inference via Gibbs sampling, and the calculation of Bayes factors (for choosing the model and the number of groups) from the output using the Laplace–Metropolis estimator. It works well in several real and simulated examples.

Journal ArticleDOI
TL;DR: The results suggest that much larger simulation sizes are needed when serial correlation in disturbances is strong, especially SML, and that posterior means computed using Gibbs sampling and data augmentation (GIBBS), simulated maximum likelihood (SML), and method of simulated moment (MSM) estimation using GHK are needed.

Journal ArticleDOI
TL;DR: In this article, a Bayesian analysis of the stochastic frontier model with composed error is presented, and the existence of the posterior distribution and posterior moments is examined under a commonly used class of (partly) noninformative prior distributions.

Journal ArticleDOI
TL;DR: It is proved that the bounded-variance algorithm is the first algorithm with provably fast inference approximation on all belief networks without extreme conditional probabilities, and it is shown that this algorithm approximates inference probabilities in worst-case time that is subexponential 2 (log n ) d , for some integer d that is a linear function of the depth of the belief network.

Journal ArticleDOI
TL;DR: A framework of quasi-Bayes (QB) learning of the parameters of the continuous density hidden Markov model (CDHMM) with Gaussian mixture state observation densities with simple forgetting mechanism to adjust the contribution of previously observed sample utterances is presented.
Abstract: We present a framework of quasi-Bayes (QB) learning of the parameters of the continuous density hidden Markov model (CDHMM) with Gaussian mixture state observation densities. The QB formulation is based on the theory of recursive Bayesian inference. The QB algorithm is designed to incrementally update the hyperparameters of the approximate posterior distribution and the CDHMM parameters simultaneously. By further introducing a simple forgetting mechanism to adjust the contribution of previously observed sample utterances, the algorithm is adaptive in nature and capable of performing an online adaptive learning using only the current sample utterance. It can, thus, be used to cope with the time-varying nature of some acoustic and environmental variabilities, including mismatches caused by changing speakers, channels, and transducers. As an example, the QB learning framework is applied to on-line speaker adaptation and its viability is confirmed in a series of comparative experiments using a 26-letter English alphabet vocabulary.

Journal ArticleDOI
TL;DR: This paper documents and discusses experience on the use of two recent network model approaches, influence diagrams and belief networks, and relates those approaches to decision trees.
Abstract: During the last two decades, much of the theoretical and practical advances in Bayesian decision analysis have been closely linked to the adaptation of emerging new computational — usually Artificial Intelligence — techniques and to progress in computer software, respectively. This paper documents and discusses experience on the use of two recent network model approaches, influence diagrams and belief networks, and relates those approaches to decision trees. They both allow probabilistic, Bayesian studies with classical decision analytic concepts such as risk attitude analysis, value of information and control, multi-attribute analysis, and various structural analyses. The theory of influence diagrams dates back to the early 1980s, and a variety of commercial software are on market. Belief network is a more recent concept that is under process of finding its way to applications. Illustration on environmental and resource management is provided with examples on freshwater and fisheries studies.

Proceedings Article
27 Jul 1997
TL;DR: This paper proposes a stochastic version of a general purpose functional programming language that contains random choices, conditional statements, structured values, defined functions, and recursion, and provides an exact algorithm for computing conditional probabilities of the form Pr(P(x) | Q(x)) where x is chosen randomly from this distribution.
Abstract: In this paper, we propose a stochastic version of a general purpose functional programming language as a method of modeling stochastic processes. The language contains random choices, conditional statements, structured values, defined functions, and recursion. By imagining an experiment in which the program is "run" and the random choices made by sampling, we can interpret a program in this language as encoding a probability distribution over a (potentially infinite) set of objects. We provide an exact algorithm for computing conditional probabilities of the form Pr(P(x) | Q(x)) where x is chosen randomly from this distribution. This algorithm terminates precisely when sampling x and computing P(x) and Q(x) terminates in all possible stochastic executions (under lazy evaluation semantics, in which only values needed to compute the output of the program are evaluated). We demonstrate the applicability of the language and the efficiency of the inference algorithm by encoding both Bayesian networks and stochastic context-free grammars in our language, and showing that our algorithm derives efficient inference algorithms for both. Our language easily supports interesting and useful extensions to these formalisms (e.g., recursive Bayesian networks), to which our inference algorithm will automatically apply.

Proceedings Article
14 Aug 1997
TL;DR: This paper empirically test two alternative explanations for why bagging works: it is an approximation to the optimal procedure of Bayesian model averaging, with an appropriate implicit prior, and it effectively shifts the prior to a more appropriate region of model space.
Abstract: The error rate of decision-tree and other classification learners can often be much reduced by bagging: learning multiple models from bootstrap samples of the database, and combining them by uniform voting. In this paper we empirically test two alternative explanations for this, both based on Bayesian learning theory: (1) bagging works because it is an approximation to the optimal procedure of Bayesian model averaging, with an appropriate implicit prior; (2) bagging works because it effectively shifts the prior to a more appropriate region of model space. All the experimental evidence contradicts the first hypothesis, and confirms the second.

Journal ArticleDOI
01 Jun 1997-Test
TL;DR: This paper identifies and contrasts various properties of these methods, with particular reference to coherence and practicality, which have been proposed for Bayesian model comparison when prior information about model parameters is weak.
Abstract: The Fractional Bayes Factor and various forms of Intrinsic Bayes Factor are related methods which have been proposed for Bayesian model comparison when prior information about model parameters is weak. This paper identifies and contrasts various properties of these methods, with particular reference to coherence and practicality.

Journal ArticleDOI
TL;DR: The particular hierarchical Bayesian approach that is used incorporates expert knowledge about accident sites as a group believed a priori to be exchangeable, the Poisson assumption and a conjugate gamma prior, and three natural strategies for ranking and selecting the most hazardous subgroup of accident locations.
Abstract: Identification, ranking and selecting hazardous traffic accident locations from a group under consideration is a fundamental goal for traffic safety researchers. Few methods exist that can quantitatively, accurately and easily discriminate between sites that commonly have small and variable observation count periods. One method that embodies all these advantages is the hierarchical Bayesian model, the method proposed in this paper. The particular hierarchical Bayesian approach that we use incorporates expert knowledge about accident sites as a group believed a priori to be exchangeable, the Poisson assumption and a conjugate gamma prior. We then propose three natural strategies for ranking and selecting the most hazardous subgroup of accident locations. Also presented is an especially useful procedure that gives the probability that each particular site is worst and by how much it is worst. All proposed strategies are illustrated using previously published fatality accident data from 35 sites in Auckland, New Zealand.

Journal ArticleDOI
TL;DR: In this paper, it is shown that (a slightly strengthened version of) the MAR condition is sufficient to yield ordinary large sample results for estimators and test statistics and thus may be used for (asymptotic) frequentist inference.
Abstract: In Rubin (1976) the missing at random (MAR) and missing completely at random (MCAR) conditions are discussed. It is concluded that the MAR condition allows one to ignore the missing data mechanism when doing likelihood or Bayesian inference but also that the stronger MCAR condition is in some sense the weakest generally sufficient condition allowing (conditional) frequentist inference while ignoring the missing data mechanism. In this paper it is shown that (a slightly strengthened version of) the MAR condition is sufficient to yield ordinary large sample results for estimators and test statistics and thus may be used for (asymptotic) frequentist inference.

Book ChapterDOI
TL;DR: The normative appeal of Bayesian econometrics is the same as that of expected utility maximization and Bayesian learning, the dominant paradigms in economic theory.
Abstract: Economics is the discipline of using data to revise beliefs about economic issues. In Bayesian econometrics, the revision is conducted in accordance with the laws of probability, conditional on what has been observed. The normative appeal of Bayesian econometrics is the same as that of expected utility maximization and Bayesian learning, the dominant paradigms in economic theory. The questions that econometrics ultimately addresses are similar to those faced by economic agents in models, as well. Given the observed data, what decisions should be made? After bringing data to bear on two alternative models, how is their relative plausibility changed? Any survey of the introductory and concluding sections of papers in the academic literature should provide more examples and illustrate the process of formally or informally updating beliefs. Until quite recently, applied Bayesian econometrics was undertaken largely by those primarily concerned with contributing to the theory, and the proportion of applied work that was formally Bayesian was rather small. There are several reasons for this. First, Bayesian econometrics demands both a likelihood function and a prior distribution, whereas non-Bayesian methods do not. Second, the subjective prior distribution has to be defended, and if the reader (or worse, the editor) does not agree, then the work may be ignored. Third, most posterior moments can't be obtained anyway because the requisite integrals can't be evaluated.(This abstract was borrowed from another version of this item.)

Book ChapterDOI
01 Jan 1997
TL;DR: Experimental results of the application of the approach to keyhole plan recognition which uses a Dynamic Belief Network to represent features of the domain that are needed to identify users’ plans and goals to indicate that this approach will work in other domains with similar features.
Abstract: We present an approach to keyhole plan recognition which uses a Dynamic Belief Network to represent features of the domain that are needed to identify users’ plans and goals The structure of this network was determined from analysis of the domain The conditional probability distributions are learned during a training phase, which dynamically builds these probabilities from observations of user behaviour This approach allows the use of incomplete, sparse and noisy data during both training and testing We present experimental results of the application of our system to a Multi-User Dungeon adventure game with thousands of possible actions and positions These results show a high degree of predictive accuracy and indicate that this approach will work in other domains with similar features

Proceedings ArticleDOI
01 Dec 1997
TL;DR: The paper summarizes some important results at the intersection of the elds of Bayesian statistics and stochastic simulation and presents a new Bayesian formulation for the problem of output analysis for a single system.
Abstract: The paper summarizes some important results at the intersection of the elds of Bayesian statistics and stochastic simulation. Two statistical analysis issues for stochastic simulation are discussed in further detail from a Bayesian perspective. First, a review of recent work in input distribution selection is presented. Then, a new Bayesian formulation for the problem of output analysis for a single system is presented. A key feature is analyzing simulation output as a random variable whose parameters are an unknown function of the simulation’s inputs. The distribution of those parameters is inferred from simulation output via Bayesian response-surface methods. A brief summary of Bayesian inference and decision making is included for reference.

Proceedings ArticleDOI
12 Oct 1997
TL;DR: A new approach to test classification based on automatic feature extraction and probabilistic reasoning and a Bayesian network text classifier is developed, which is automatically constructed from a set of training test documents.
Abstract: We develop a new approach to test classification based on automatic feature extraction and probabilistic reasoning. The knowledge representation used to perform such task is known as Bayesian inference networks. A Bayesian network text classifier is automatically constructed from a set of training test documents. We have conducted a series of experiments on two text document corpus, namely the CACM and Reuters, to analyze the performance of our approach, which are described in the paper.

Journal Article
TL;DR: In this paper, a Bayesian modeling technique was used to predict probability of occurrence for 14 species of Maine land birds using spectral data from the Landsat Thematic Mapper bands 4 and 5.
Abstract: A Bayesian modeling technique was used to predict probability of occurrence for 14 species of Maine land birds. The relationships between bird species survey data and the spectral values of Landsat Thematic Mapper bands 4 and 5 as well as a derived texture measure were used to build conditional probabilities for input into Bayes' Theorem. The conditional probabilities form decision rules for reclassifying the input spectral data into probability of occurrence estimates with associated estimates of error inherent in the model prediction. This methodology removed the costly and time-consuming step of creating a habitat map before modeling species occurrence. The output resolution of the species predictions is not degraded from the original 30-m TM pixel size to the coarse resolution of the wildlife survey data. Model results can be compared to results from other habitat modeling techniques and used by natural resource managers to predict the effects of land-use changes on available habitat.

Journal ArticleDOI
TL;DR: In this paper, the authors apply the concept of relative surprise to the development of estimation, hypothesis testing and model checking procedures, and establish links with common Bayesian inference procedures such as highest posterior density regions, modal estimates and Bayes factors.
Abstract: We consider the problem of deriving Bayesian inference procedures via the concept of relative surprise. The mathematical concept of surprise has been developed by I.J. Good in a long sequence of papers. We make a modification to this development that permits the avoidance of a serious defect; namely, the change of variable problem. We apply relative surprise to the development of estimation, hypothesis testing and model checking procedures. Important advantages of the relative surprise approach to inference include the lack of dependence on a particular loss function and complete freedom to the statistician in the choice of prior for hypothesis testing problems. Links are established with common Bayesian inference procedures such as highest posterior density regions, modal estimates and Bayes factors. From a practical perspective new inference procedures arise that possess good properties.

Journal ArticleDOI
TL;DR: In this paper, the question of whether, and when, the Bayesian approach produces worthwhile answers is investigated conditionally, given the information provided by the experiment, and an important initial insight on the matter is that posterior estimates of a non-identifiable parameter can actually be inferior to the prior (no-data) estimate of that parameter, even as the sample size grows to infinity.
Abstract: Although classical statistical methods are inapplicable in point estimation problems involving nonidentifiable parameters, a Bayesian analysis using proper priors can produce a closed form, interpretable point estimate in such problems. The question of whether, and when, the Bayesian approach produces worthwhile answers is investigated. In contrast to the preposterior analysis of this question offered by Kadane, we examine the question conditionally, given the information provided by the experiment. An important initial insight on the matter is that posterior estimates of a nonidentifiable parameter can actually be inferior to the prior (no-data) estimate of that parameter, even as the sample size grows to infinity. In general, our goal is to characterize, within the space of prior distributions, classes of priors that lead to posterior estimates that are superior, in some reasonable sense, to one's prior estimate. This goal is shown to be feasible through a detailed examination of a particular t...