scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian probability published in 1995"


Book
01 Jan 1995
TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

16,079 citations


Journal ArticleDOI
TL;DR: In this article, a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data is presented, which is derived from a set of assumptions made previously as well as the assumption of likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence.
Abstract: We describe a Bayesian approach for learning Bayesian networks from a combination of prior knowledge and statistical data. First and foremost, we develop a methodology for assessing informative priors needed for learning. Our approach is derived from a set of assumptions made previously as well as the assumption of likelihood equivalence, which says that data should not help to discriminate network structures that represent the same assertions of conditional independence. We show that likelihood equivalence when combined with previously made assumptions implies that the user's priors for network parameters can be encoded in a single Bayesian network for the next case to be seen—a prior network—and a single measure of confidence for that network. Second, using these priors, we show how to compute the relative posterior probabilities of network structures given data. Third, we describe search methods for identifying network structures with high posterior probabilities. We describe polynomial algorithms for finding the highest-scoring network structures in the special case where every node has at most k e 1 parent. For the general case (k > 1), which is NP-hard, we review heuristic search algorithms including local search, iterative local search, and simulated annealing. Finally, we describe a methodology for evaluating Bayesian-network learning algorithms, and apply this approach to a comparison of various approaches.

4,124 citations


Journal ArticleDOI
TL;DR: This paper reviews the literature on Bayesian experimental design, both for linear and nonlinear models, and presents a uniied view of the topic by putting experimental design in a decision theoretic framework.
Abstract: This paper reviews the literature on Bayesian experimental design. A unified view of this topic is presented, based on a decision-theoretic approach. This framework casts criteria from the Bayesian literature of design as part of a single coherent approach. The decision-theoretic structure incorporates both linear and nonlinear design problems and it suggests possible new directions to the experimental design problem, motivated by the use of new utility functions. We show that, in some special cases of linear design problems, Bayesian solutions change in a sensible way when the prior distribution and the utility function are modified to allow for the specific structure of the experiment. The decision-theoretic approach also gives a mathematical justification for selecting the appropriate optimality criterion.

1,903 citations


Journal ArticleDOI
TL;DR: By analyzing several thousand solutions to Bayesian problems, the authors found that when information was presented in frequency formats, statistically naive participants derived up to 50% of all inferences by Bayesian algorithms.
Abstract: Is the mind, by design, predisposed against performing Bayesian inference? Previous research on base rate neglect suggests that the mind lacks the appropriate cognitive algorithms. However, any claim against the existence of an algorithm, Bayesian or otherwise, is impossible to evaluate unless one specifies the information format in which it is designed to operate. The authors show that Bayesian algorithms are computationally simpler in frequency formats than in the probability formats used in previous research. Frequency formats correspond to the sequential way information is acquired in natural sampling, from animal foraging to neural networks. By analyzing several thousand solutions to Bayesian problems, the authors found that when information was presented in frequency formats, statistically naive participants derived up to 50% of all inferences by Bayesian algorithms. Non-Bayesian algorithms included simple versions of Fisherian and Neyman-Pearsonian inference. Is the mind, by design, predisposed against performing Bayesian inference? The classical probabilists of the Enlightenment, including Condorcet, Poisson, and Laplace, equated probability theory with the common sense of educated people, who were known then as "hommes eclaires." Laplace (1814/ 1951) declared that "the theory of probability is at bottom nothing more than good sense reduced to a calculus which evaluates that which good minds know by a sort of instinct, without being able to explain how with precision" (p. 196). The available mathematical tools, in particular the theorems of Bayes and Bernoulli, were seen as descriptions of actual human judgment (Daston, 1981,1988). However, the years of political upheaval during the French Revolution prompted Laplace, unlike earlier writers such as Condorcet, to issue repeated disclaimers that probability theory, because of the interference of passion and desire, could not account for all relevant factors in human judgment. The Enlightenment view—that the laws of probability are the laws of the mind—moderated as it was through the French Revolution, had a profound influence on 19th- and 20th-century science. This view became the starting point for seminal contributions to mathematics, as when George Boole

1,873 citations


Journal ArticleDOI
TL;DR: In this article, the authors apply the Schwarz criterion to find an approximate solution to Bayesian testing problems, at least when the hypotheses are nested when the prior on ψ is Normal.
Abstract: To compute a Bayes factor for testing H 0: ψ = ψ0 in the presence of a nuisance parameter β, priors under the null and alternative hypotheses must be chosen As in Bayesian estimation, an important problem has been to define automatic, or “reference,” methods for determining priors based only on the structure of the model In this article we apply the heuristic device of taking the amount of information in the prior on ψ equal to the amount of information in a single observation Then, after transforming β to be “null orthogonal” to ψ, we take the marginal priors on β to be equal under the null and alternative hypotheses Doing so, and taking the prior on ψ to be Normal, we find that the log of the Bayes factor may be approximated by the Schwarz criterion with an error of order O p (n −½), rather than the usual error of order O p (1) This result suggests the Schwarz criterion should provide sensible approximate solutions to Bayesian testing problems, at least when the hypotheses are nested When

1,235 citations


Proceedings Article
27 Nov 1995
TL;DR: This paper investigates the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparameters to be carried out exactly using matrix operations.
Abstract: The Bayesian analysis of neural networks is difficult because a simple prior over weights implies a complex prior distribution over functions. In this paper we investigate the use of Gaussian process priors over functions, which permit the predictive Bayesian analysis for fixed values of hyperparameters to be carried out exactly using matrix operations. Two methods, using optimization and averaging (via Hybrid Monte Carlo) over hyperparameters have been tested on a number of challenging problems and have produced excellent results.

1,225 citations


Journal ArticleDOI
TL;DR: Practical techniques based on Gaussian approximations for implementation of these powerful methods for controlling, comparing and using adaptive networks are described.
Abstract: Bayesian probability theory provides a unifying framework for data modelling. In this framework the overall aims are to find models that are well-matched to the data, and to use these models to make optimal predictions. Neural network learning is interpreted as an inference of the most probable parameters for the model, given the training data. The search in model space (i.e., the space of architectures, noise models, preprocessings, regularizers and weight decay constants) can then also be treated as an inference problem, in which we infer the relative probability of alternative models, given the data. This review describes practical techniques based on Gaussian approximations for implementation of these powerful methods for controlling, comparing and using adaptive networks.

927 citations


BookDOI
TL;DR: Non-Bayesian predictive approaches for Bayesian prediction of process control and optimization and Multivariate normal prediction problems.
Abstract: The author's research has been directed towards inference involving observables rather than parameters. In this book, he brings together his views on predictive or observable inference and its advantages over parametric inference. While the book discusses a variety of approaches to prediction including those based on parametric, nonparametric, and nonstochastic statistical models, it is devoted mainly to predictive applications of the Bayesian approach. It not only substitutes predictive analyses for parametric analyses, but it also presents predictive analyses that have no real parametric analogues. It demonstrates that predictive inference can be a critical component of even strict parametric inference when dealing with interim analyses. This approach to predictive inference will be of interest to statisticians, psychologists, econometricians, and sociologists.

750 citations


Journal ArticleDOI
TL;DR: It is described how a full Bayesian analysis can deal with unresolved issues, such as the choice between fixed- and random-effects models, the choice of population distribution in a random- effects analysis, the treatment of small studies and extreme results, and incorporation of study-specific covariates.
Abstract: Current methods for meta-analysis still leave a number of unresolved issues, such as the choice between fixed- and random-effects models, the choice of population distribution in a random-effects analysis, the treatment of small studies and extreme results, and incorporation of study-specific covariates. We describe how a full Bayesian analysis can deal with these and other issues in a natural way, illustrated by a recent published example that displays a number of problems. Such analyses are now generally available using the BUGS implementation of Markov chain Monte Carlo numerical integration techniques. Appropriate proper prior distributions are derived, and sensitivity analysis to a variety of prior assumptions carried out. Current methods are briefly summarized and compared to the full Bayes analysis.

535 citations


Journal ArticleDOI
TL;DR: Standard techniques for improved generalization from neural networks include weight decay and pruning and a comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.
Abstract: Standard techniques for improved generalization from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy suggests a Laplace rather than a gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error and (2) those failing to achieve this sensitivity and that therefore vanish. Since the critical value is determined adaptively during training, pruning---in the sense of setting weights to exact zeros---becomes an automatic consequence of regularization alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a gaussian regularizer.

362 citations


Book
16 Nov 1995
TL;DR: In this paper, the authors present and summarize the scientific method for displaying and summarizing data, and provide answers to Selected Exercises (see Section 5.1.1).
Abstract: 1. Statistics and the Scientific Method 2. Displaying and Summarizing Data 3. Designing Experiments 4. Probability and Uncertainty 5. Conditional Probability and Bayes' Rule 6. Models for Proportions 7. Densities for Proportions 8. Comparing Two Proportions 9. Densities for Two Proportions 10. General Samples and Population Means 11. Densities for Means 12. Comparing Two or More Means 13. Data Transformations and Nonparametric Methods 14. Regression Analysis Answers to Selected Exercises



Proceedings Article
27 Nov 1995
TL;DR: A Bayesian framework for inferring the parameters of a mixture of experts model based on ensemble learning by variational free energy minimisation is presented and these methods are demonstrated on artificial problems and sunspot time series prediction.
Abstract: We present a Bayesian framework for inferring the parameters of a mixture of experts model based on ensemble learning by variational free energy minimisation. The Bayesian approach avoids the over-fitting and noise level under-estimation problems of traditional maximum likelihood inference. We demonstrate these methods on artificial problems and sunspot time series prediction.


Journal ArticleDOI
TL;DR: A new, adaptive algorithm for change detection is derived where the decision thresholds vary depending on context, thus improving detection performance substantially.
Abstract: In many conventional methods for change detection, the detections are carried out by comparing a test statistic, which is computed locally for each location on the image grid, with a global threshold. These ‘nonadaptive’ methods for change detection suffer from the dilemma of either causing many false alarms or missing considerable parts of non-stationary areas. This contribution presents a way out of this dilemma by viewing change detection as an inverse, ill-posed problem. As such, the problem can be solved using prior knowledge about typical properties of change masks. This reasoning leads to a Bayesian formulation of change detection, where the prior knowledge is brought to bear by appropriately specified a priori probabilities. Based on this approach, a new, adaptive algorithm for change detection is derived where the decision thresholds vary depending on context, thus improving detection performance substantially. The algorithm requires only a single raster scan per picture and increases the computional load only slightly in comparison to non-adaptive techniques.

Book ChapterDOI
David Heckerman1
18 Aug 1995
TL;DR: In this article, the authors examine Bayesian methods for learning both types of networks and show that these methods often employ assumptions to facilitate the construction of priors, including the assumptions of parameter independence, parameter modularity, and likelihood equivalence.
Abstract: Whereas acausal Bayesian networks represent probabilistic independence, causal Bayesian networks represent causal relationships In this paper, we examine Bayesian methods for learning both types of networks Bayesian methods for learning acausal networks are fairly well developed These methods often employ assumptions to facilitate the construction of priors, including the assumptions of parameter independence, parameter modularity, and likelihood equivalence We show that although these assumptions also can be appropriate for learning causal networks, we need additional assumptions in order to learn causal networks We introduce two sufficient assumptions, called mechanism independence and component independence We show that these new assumptions, when combined with parameter independence, parameter modularity, and likelihood equivalence, allow us to apply methods for learning acausal networks to learn causal networks

Journal ArticleDOI
TL;DR: OLAE is described as an assessment tool that collects data from students solving problems in introductory college physics, analyses that data with probabilistic methods that determine what knowledge the student is using, and flexibly presents the results of analysis.
Abstract: We describe OLAE as an assessment tool that collects data from students solving problems in introductory college physics, analyses that data with probabilistic methods that determine what knowledge the student is using, and flexibly presents the results of analysis. For each problem, OLAE automatically creates a Bayesian net that relates knowledge, represented as first-order rules, to particular actions, such as written equations. Using the resulting Bayesian network, OLAE observes a student's behavior and computes the probabilities that the student knows and uses each of the rules.

Journal ArticleDOI
15 Mar 1995-JAMA
TL;DR: This analysis suggests that the clinical superiority of tissue-type plasminogen activator over streptokinase remains uncertain, and the usefulness of the Bayesian approach is demonstrated using the results of the recent GUSTO study of various thrombolytic strategies in acute myocardial infarction.
Abstract: Standard statistical analyses of randomized clinical trials fail to provide a direct assessment of which treatment is superior or the probability of a clinically meaningful difference. A Bayesian analysis permits the calculation of the probability that a treatment is superior based on the observed data and prior beliefs. The subjectivity of prior beliefs in the Bayesian approach is not a liability, but rather explicitly allows different opinions to be formally expressed and evaluated. The usefulness of this approach is demonstrated using the results of the recent GUSTO study of various thrombolytic strategies in acute myocardial infarction. This analysis suggests that the clinical superiority of tissue-type plasminogen activator over streptokinase remains uncertain.

Journal ArticleDOI
TL;DR: A Bayesian analysis of the TAR model with two regimes is presented, which provides an estimate of the threshold value directly without resorting to a subjective choice from various scatterplots and avoids sophisticated analytical and numerical multiple integration.
Abstract: . The study of non-linear time series has attracted much attention in recent years. Among the models proposed, the threshold autoregressive (TAR) model and bilinear model are perhaps the most popular ones in the literature. However, the TAR model has not been widely used in practice due to the difficulty in identifying the threshold variable and in estimating the associated threshold value. The main focal point of this paper is a Bayesian analysis of the TAR model with two regimes. The desired marginal posterior densities of the threshold value and other parameters are obtained via the Gibbs sampler. This approach avoids sophisticated analytical and numerical multiple integration. It also provides an estimate of the threshold value directly without resorting to a subjective choice from various scatterplots. We illustrate the proposed methodology by using simulation experiments and analysis of a real data set.

Proceedings Article
20 Aug 1995
TL;DR: A hierarchical clustering algorithm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters is proposed.
Abstract: Text classification, the grouping of texts into several clusters, has been used as a means of improving both the efficiency and the effective-Dess of text retrieval/categorization In this paper we propose a hierarchical clustering algorithm that constructs a Bet of clusters having the maximum Bayesian posterior probability, the probability that the given texts are classified into clusters We call the algorithm Hierarchical Bayesian Clustering (HBC) The advantages of HBC are experimentally verified from several viewpoints (1) HBC can reconstruct the original clusters more accurately than do other non probabilistic algorithms (2) When a probabilistic text categorization is extended to a cluster-based one, the use of HBC offers better performance than does the use of non probabilistic algorithms.

Journal ArticleDOI
TL;DR: In this article, the authors analyze errors-in-variables models in full generality under a Bayesian formulation, and compute the necessary posterior distributions, utilizing various computational techniques.
Abstract: SUMMARY Use of errors-in-variables models is appropriate in many practical experimental problems. However, inference based on such models is by no means straightforward. In previous analyses, simplifying assumptions have been made in order to ease this intractability, but assumptions of this nature are unfortunate and restrictive. In this paper, we analyse errors-in-variables models in full generality under a Bayesian formulation. In order to compute the necessary posterior distributions, we utilize various computational techniques. Two specific non-linear errors-in-variables regression examples are considered; the first is a re-analysed Berkson-type model, and the second is a classical errors-in-variables model. Our analyses are compared and contrasted with those presented elsewhere in the literature.

Proceedings Article
David Heckerman1, Dan Geiger1
18 Aug 1995
TL;DR: A general Bayesian scoring metric is derived, appropriate for both discrete and Gaussian domains, from well-known statistical facts about the Dirichlet and normal--Wishart distributions.
Abstract: We examine Bayesian methods for learning Bayesian networks from a combination of prior knowledge and statistical data In particular, we unify the approaches we presented at last year's conference for discrete and Gaussian domains We derive a general Bayesian scoring metric, appropriate for both domains We then use this metric in combination with well-known statistical facts about the Dirichlet and normal--Wishart distributions to derive our metrics for discrete and Gaussian domains

Journal ArticleDOI
TL;DR: A Gibbs sampling algorithm is implemented from which Bayesian estimates and credible intervals for survival and movement probabilities are derived and Convergence of the algorithm is proved using a duality principle.
Abstract: SUMMARY The Arnason-Schwarz model is usually used for estimating survival and movement probabilities of animal populations from capture-recapture data. The missing data structure of this capture-recapture model is exhibited and summarised via a directed graph representation. Taking advantage of this structure we implement a Gibbs sampling algorithm from which Bayesian estimates and credible intervals for survival and movement probabilities are derived. Convergence of the algorithm is proved using a duality principle. We illustrate our approach through a real example.

Journal ArticleDOI
TL;DR: A Bayesian framework is proposed to combine end uses monitoring information with the aggregate-load/appliance data to allow load researchers to derive more accurate load shapes.
Abstract: Traditional methods of estimating kilowatt end uses load profiles may face very serious multicollinearity issues. In this article, a Bayesian framework is proposed to combine end uses monitoring information with the aggregate-load/appliance data to allow load researchers to derive more accurate load shapes. Two variants are suggested: The first one uses the raw end-use metered data to construct the prior means and variances. The second method uses actual end-use data to construct the priors of the parameters characterizing the behavior of end uses of specific appliances. From a prediction perspective, the Bayesian methods consistently outperform the predictions generated from conventional conditional-demand formulation.

Journal ArticleDOI
01 Apr 1995
TL;DR: In this article, a stochastic simulation Bayesian method for multitarget tracking is developed, which uses a random sample in state space to represent the posterior state estimate distribution and is illustrated by simulations involving one target in dense clutter.
Abstract: A stochastic simulation Bayesian method for multitarget tracking is developed. This method uses a random sample in state space to represent the posterior state estimate distribution. The method is illustrated by simulations involving one target in dense clutter. Comparison with nearest-neighbours and probabilistic data association shows the superiority of the proposed method.

Journal ArticleDOI
TL;DR: A theoretical framework for Bayesian adaptive training of the parameters of a discrete hidden Markov model and a semi-continuous HMM with Gaussian mixture state observation densities is presented and the proposed MAP algorithms are shown to be effective especially in the cases in which the training or adaptation data are limited.
Abstract: A theoretical framework for Bayesian adaptive training of the parameters of a discrete hidden Markov model (DHMM) and of a semi-continuous HMM (SCHMM) with Gaussian mixture state observation densities is presented. In addition to formulating the forward-backward MAP (maximum a posteriori) and the segmental MAP algorithms for estimating the above HMM parameters, a computationally efficient segmental quasi-Bayes algorithm for estimating the state-specific mixture coefficients in SCHMM is developed. For estimating the parameters of the prior densities, a new empirical Bayes method based on the moment estimates is also proposed. The MAP algorithms and the prior parameter specification are directly applicable to training speaker adaptive HMMs. Practical issues related to the use of the proposed techniques for HMM-based speaker adaptation are studied. The proposed MAP algorithms are shown to be effective especially in the cases in which the training or adaptation data are limited. >

Journal Article
Abstract: Markov chain Monte Carlo (MCMC) methods have been used extensively in statistical physics over the last 40 years, in spatial statistics for the past 20 and in Bayesian image analysis over the last decade. In the last five years, MCMC has been introduced into significance testing, general Bayesian inference and maximum likelihood estimation. This paper presents basic methodology of MCMC, emphasizing the Bayesian paradigm, conditional probability and the intimate relationship with Markov random fields in spatial statistics. Hastings algorithms are discussed, including Gibbs, Metropolis and some other variations. Pairwise difference priors are described and are used subsequently in three Bayesian applications, in each of which there is a pronounced spatial or temporal aspect to the modeling. The examples involve logistic regression in the presence of unobserved covariates and ordinal factors; the analysis of agricultural field experiments, with adjustment for fertility gradients; and processing of low-resolution medical images obtained by a gamma camera. Additional methodological issues arise in each of these applications and in the Appendices. The paper lays particular emphasis on the calculation of posterior probabilities and concurs with others in its view that MCMC facilitates a fundamental breakthrough in applied Bayesian modeling.

Journal ArticleDOI
TL;DR: This article argued that the essence of "Bayesian rationality" is the assignment, correct manipulation, and proper updating of subjective event probabilities when evaluating and comparing uncertain prospects, regardless of whether attitudes toward risk satisfy the expected utility property.