scispace - formally typeset
Search or ask a question

Showing papers on "Bayesian inference published in 1996"


Book
15 May 1996
TL;DR: Approaches for Statistical Inference: The Bayes Approach, Model Criticism and Selection, and Performance of Bayes Procedures.
Abstract: Approaches for Statistical Inference. The Bayes Approach. The Empirical Bayes Approach. Performance of Bayes Procedures. Bayesian Computation. Model Criticism and Selection. Special Methods and Models. Case Studies. Appendices.

2,413 citations


01 Jan 1996
TL;DR: In this article, the authors consider Bayesian counterparts of the classical tests for good-ness of fit and their use in judging the fit of a single Bayesian model to the observed data.
Abstract: This paper considers Bayesian counterparts of the classical tests for good- ness of fit and their use in judging the fit of a single Bayesian model to the observed data. We focus on posterior predictive assessment, in a framework that also includes conditioning on auxiliary statistics. The Bayesian formulation facilitates the con- struction and calculation of a meaningful reference distribution not only for any (classical) statistic, but also for any parameter-dependent "statistic" or discrep- ancy. The latter allows us to propose the realized discrepancy assessment of model fitness, which directly measures the true discrepancy between data and the posited model, for any aspect of the model which we want to explore. The computation required for the realized discrepancy assessment is a straightforward byproduct of the posterior simulation used for the original Bayesian analysis. We illustrate with three applied examples. The first example, which serves mainly to motivate the work, illustrates the difficulty of classical tests in assessing the fitness of a Poisson model to a positron emission tomography image that is constrained to be nonnegative. The second and third examples illustrate the details of the posterior predictive approach in two problems: estimation in a model with inequality constraints on the parameters, and estimation in a mixture model. In all three examples, standard test statistics (either a χ 2 or a likelihood ratio) are not pivotal: the difficulty is not just how to compute the reference distribution for the test, but that in the classical framework no such distribution exists, independent of the unknown model parameters.

2,065 citations


Journal ArticleDOI
TL;DR: In this paper, a review of techniques for constructing non-informative priors is presented and some of the practical and philosophical issues that arise when they are used are discussed.
Abstract: Subjectivism has become the dominant philosophical foundation for Bayesian inference. Yet in practice, most Bayesian analyses are performed with so-called “noninformative” priors, that is, priors constructed by some formal rule. We review the plethora of techniques for constructing such priors and discuss some of the practical and philosophical issues that arise when they are used. We give special emphasis to Jeffreys's rules and discuss the evolution of his viewpoint about the interpretation of priors, away from unique representation of ignorance toward the notion that they should be chosen by convention. We conclude that the problems raised by the research on priors chosen by formal rules are serious and may not be dismissed lightly: When sample sizes are small (relative to the number of parameters being estimated), it is dangerous to put faith in any “default” solution; but when asymptotics take over, Jeffreys's rules and their variants remain reasonable choices. We also provide an annotated b...

1,243 citations


Book
01 Sep 1996
TL;DR: In this article, a Bayesian decision theory for binocular stereopsis is proposed based on the notion of shape from texture, and the generic viewpoint assumption in a Bayes framework.
Abstract: 1. Introduction D. C. Knill, D. Kersten and A. Yuille 2. Pattern theory: a unifying perspective D. Mumford 3. Modal structure and reliable inference A. Jepson, W. Richards and D. C. Knill 4. Priors, preferences and categorical percepts W. Richards, A. Jepson and J. Feldman 5. Bayesian decision theory and psychophysics A. L. Yuille and H. H. Bulthoff 6. Observer theory, Bayes theory, and psychophysics B. M. Bennett, D. D. Hoffman, C. Prakash and S. N. Richman 7. Implications of a Bayesian formulation D. C. Knill, D. Kersten and P. Mamassian 8. Shape from texture: ideal observers and human psychophysics A. Blake, H. H. Bulthoff and D. Sheinberg 9. A computational theory for binocular stereopsis P. N. Belhumeur 10. The generic viewpoint assumption in a Bayesian framework W. T. Freeman 11. Experiencing and perceiving visual surfaces K. Nakayama and S. Shimojo 12. The perception of shading and reflectance E. H. Adelson and A. P. Pentland 13. Banishing the Homunculus H. Barlow.

1,044 citations


Journal ArticleDOI
TL;DR: A way to use Bayesian statistical inference and the principle of maximum entropy to analytically continue imaginary-time quantum Monte Carlo data and the symmetric, infinite-dimension Anderson Hamiltonian is presented.

925 citations


Journal ArticleDOI
TL;DR: In this article, the imprecise Dirichlet model is proposed for multinomial data in cases where there is no prior information and the probabilities are expressed in terms of posterior upper and lower probabilities.
Abstract: A new method is proposed for making inferences from multinomial data in cases where there is no prior information. A paradigm is the problem of predicting the colour of the next marble to be drawn from a bag whose contents are (initially) completely unknown. In such problems we may be unable to formulate a sample space because we do not know what outcomes are possible. This suggests an invariance principle : inferences based on observations should not depend on the sample space in which the observations and future events of interest are represented. Objective Bayesian methods do not satisfy this principle. This paper describes a statistical model, called the imprecise Dirichlet model, for drawing coherent inferences from multinomial data. Inferences are expressed in terms of posterior upper and lower probabilities. The probabilities are initially vacuous, reflecting prior ignorance, but they become more precise as the number of observations increases. This model does satisfy the invariance principle. Two sets of data are analysed in detail. In the first example one red marble is observed in six drawings from a bag. Inferences from the imprecise Dirichlet model are compared with objective Bayesian and frequentist inferences. The second example is an analysis of data from medical trials which compared two treatments for cardiorespiratory failure in newborn babies. There are two problems : to draw conclusions about which treatment is more effective and to decide when the randomized trials should be terminated. This example shows how the imprecise Dirichlet model can be used to analyse data in the form of a contingency table.

505 citations


Book
06 Aug 1996
TL;DR: The Bayesian Approach to Statistical Archaeology examines the role of Bayesian inference in the development of dating methods and its applications in archaeology.
Abstract: The Bayesian Approach to Statistical Archaeology. Outline of the Approach. Modelling in Archaeology. Quantifying Uncertainty: The Probability Concept. Statistical Modelling. Bivariate and Multivariate Distributions. Bayesian Inference. Implementation Issues. Interpretation of Radiocarbon Results. Spatial Analysis. Sourcing and Provenancing. Application to Other Dating Methods. The Way Forward. References. Index.

478 citations


Journal ArticleDOI
TL;DR: It is argued that a "Bayesian ecology" would make better use of pre-existing data; allow stronger conclusions to be drawn from large-scale experiments with few replicates; and be more relevant to environmental decision-making.
Abstract: In our statistical practice, we ecologists work comfortably within the hypothetico-deductive epistemology of Popper and the frequentist statistical methodology of Fisher. Consequently, our null hypotheses do not often take into account pre-existing data and do not require parameterization, our experiments demand large sample sizes, and we rarely use results from one experiment to predict the outcomes of future experiments. Comparative statistical statements such as we reject the null hypothesis at the 0.05 level, which reflect the likelihood of our data given our hypothesis, are of little use in communicating our results to nonspecialists or in describing the degree of certitude we have in our conclusions. In contrast, Bayesian statistical inference requires the explicit assignment of prior probabilities, based on existing information, to the outcomes of experiments. Such an assignment forces the parameterization of null and alternative hypotheses. The results of these experiments, regardless of sample size, then can be used to compute posterior probabilities of our hypotheses given the available data. Inferential conclusions in a Bayesian mode also are more meaningful in environmental policy discussions: e.g., our experiments indicate that there is a 95% probability that acid deposition will affect northeastern conifer forests. Based on comparisons with current statistical practice in ecology, I argue that a Bayesian ecology would (a) make better use of pre-existing data; (b) allow stronger conclusions to be drawn from large-scale experiments with few replicates; and (c) be more relevant to environmental decision-making.

449 citations


Journal ArticleDOI
TL;DR: A notion of causal independence is presented that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability.
Abstract: A new method is proposed for exploiting causal independencies in exact Bayesian network inference. A Bayesian network can be viewed as representing a factorization of a joint probability into the multiplication of a set of conditional probabilities. We present a notion of causal independence that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The new formulation of causal independence lets us specify the conditional probability of a variable given its parents in terms of an associative and commutative operator, such as "or", "sum" or "max", on the contribution of each parent. We start with a simple algorithm VE for Bayesian network inference that, given evidence and a query variable, uses the factorization to find the posterior distribution of the query. We show how this algorithm can be extended to exploit causal independence. Empirical studies, based on the CPCS networks for medical diagnosis, show that this method is more efficient than previous methods and allows for inference in larger networks than previous algorithms.

449 citations


Journal ArticleDOI
David Heckerman1, J.S. Breese1
01 Nov 1996
TL;DR: It is shown how the use of causal independence in a Bayesian network can greatly simplify probability assessment as well as probabilistic inference.
Abstract: A Bayesian network is a probabilistic representation for uncertain relationships, which has proven to be useful for modeling real-world problems. When there are many potential causes of a given effect, however, both probability assessment and inference using a Bayesian network can be difficult. In this paper, we describe causal independence, a collection of conditional independence assertions and functional relationships that are often appropriate to apply to the representation of the uncertain interactions between causes and effect. We show how the use of causal independence in a Bayesian network can greatly simplify probability assessment as well as probabilistic inference.

400 citations


Book
07 Feb 1996
TL;DR: Genesis Basic Distributional Results and Properties Order Statistics and their properties MLEs under Censoring and Truncation and Inference Linear Estimation under CENSoring and Information Reliability Estimation and Applications Inferences under Two-Sample and Multi-Sample Situations Tolerance Limits and Acceptance Sampling Plans Prediction Problems Bayesian Inference and Applications Conditional Inference, Characterizations Goodness-of-Fit Tests Outliers and Some Related Inferential Issues Extensions to Estimation Under Multiple-Outlier Models Selection and Ranking Procedures Record Values Related Distributions and Some
Abstract: Genesis Basic Distributional Results and Properties Order Statistics and Their Properties MLEs under Censoring and Truncation and Inference Linear Estimation under Censoring and Inference Reliability Estimation and Applications Inferences under Two-Sample and Multi-Sample Situations Tolerance Limits and Acceptance Sampling Plans Prediction Problems Bayesian Inference and Applications Conditional Inference and Applications Characterizations Goodness-of-Fit Tests Outliers and Some Related Inferential Issues Extensions to Estimation under Multiple-Outlier Models Selection and Ranking Procedures Record Values Related Distributions and Some Generalizations Mixtures - Models and Applications Bivariate Exponential Distributions Inference for Multivariate Exponential Distributions Optimal Tests in Multivariate Exponential Distributions Accelerated Life Testing with Applications System Reliability and Associated Inference Exponential Regression with Applications Two-Stage and Multi-Stage Estimation Two-Stage and Multi-Stage Tests of Hypotheses Sequential Inference Competing Risks Theory and Identifiability Problems Applications in Survival Analysis Applications in Queueing Theory Exponential Classification and Applications Computer Simulations

Journal ArticleDOI
TL;DR: In this article, a hierarchical model is proposed for a Bayesian semiparametric analysis of randomised block experiments, in which a Dirichlet process is inserted at the middle stage for the distribution of the block effects.
Abstract: SUMMARY A model is proposed for a Bayesian semiparametric analysis of randomised block experiments. The model is a hierarchical model in which a Dirichlet process is inserted at the middle stage for the distribution of the block effects. This model allows an arbitrary distribution of block effects, and it results in effective estimates of treatment contrasts, block effects and the distribution of block effects. An effective computational strategy is presented for describing the posterior distribution.

Book ChapterDOI
01 Jan 1996
TL;DR: A general hierarchical model for time series analysis is presented and discussed and a brief overview of generalizations of the fundamental hierarchical time series model concludes the article.
Abstract: Notions of Bayesian analysis are reviewed, with emphasis on Bayesian modeling and Bayesian calculation. A general hierarchical model for time series analysis is then presented and discussed. Both discrete time and continuous time formulations are discussed. An brief overview of generalizations of the fundamental hierarchical time series model concludes the article.

Journal ArticleDOI
TL;DR: In this article, the authors investigate the performance of enumeration and several sampling based techniques such as a Gibbs' sampler, PGS and several multiple maximum a posteriori (MAP) algorithms for a simple geophysical problem of inversion of resistivity sounding data.
Abstract: The posterior probability density function (PPD), σ(m|d obs ), of earth model m, where d obs are the measured data, describes the solution of a geophysical inverse problem, when a Bayesian inference model is used to describe the problem. In many applications, the PPD is neither analytically tractable nor easily approximated and simple analytic expressions for the mean and variance of the PPD are not available. Since the complete description of the PPD is impossible in the highly multi-dimensional model space of many geophysical applications, several measures such as the highest posterior density regions, marginal PPD and several orders of moments are often used to describe the solutions. Calculation of such quantities requires evaluation of multidimensional integrals. A faster alternative to enumeration and blind Monte-Carlo integration is importance sampling which may be useful in several applications. Thus how to draw samples of m from the PPD becomes an important aspect of geophysical inversion such that importance sampling can be used in the evaluation of these multi-dimensional integrals. Importance sampling can be carried out most efficiently by a Gibbs' sampler (GS). We also introduce a method which we called parallel Gibbs' sampler (PGS) based on genetic algorithms (GA) and show numerically that the results from the two samplers are nearly identical. We first investigate the performance of enumeration and several sampling based techniques such as a GS, PGS and several multiple maximum a posteriori (MAP) algorithms for a simple geophysical problem of inversion of resistivity sounding data. Several non-linear optimization methods based on simulated annealing (SA), GA and some of their variants can be devised which can be made to reach very close to the maximum of the PPD. Such MAP estimation algorithms also sample different points in the model space. By repeating these MAP inversions several times, it is possible to sample adequately the most significant portion(s) of the PPD and all these models can be used to construct the marginal PPD, mean) covariance, etc. We observe that the GS and PGS results are identical and indistinguishable from the enumeration scheme. Multiple MAP algorithms slightly underestimate the posterior variances although the correlation values obtained by all the methods agree very well. Multiple MAP estimation required 0.3% of the computational effort of enumeration and 40% of the effort of a GS or PGS for this problem. Next, we apply GS to the inversion of a marine seismic data set to quantify uncertainties in the derived model, given the prior distribution determined from several common midpoint gathers.

Proceedings Article
04 Aug 1996
TL;DR: Tree Augmented Naive Bayes (TAN) is single out, which outperforms naive Bayes, yet at the same time maintains the computational simplicity and robustness which are characteristic of naive Baye.
Abstract: Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier with strong assumptions of independence among features, called naive Bayes, is competitive with state of the art classifiers such as C4.5. This fact raises the question of whether a classifier with less restrictive assumptions can perform even better. In this paper we examine and evaluate approaches for inducing classifiers from data, based on recent results in the theory of learning Bayesian networks. Bayesian networks are factored representations of probability distributions that generalize the naive Bayes classifier and explicitly represent statements about independence. Among these approaches we single out a method we call Tree Augmented Naive Bayes (TAN), which outperforms naive Bayes, yet at the same time maintains the computational simplicity (no search involved) and robustness which are characteristic of naive Bayes. We experimentally tested these approaches using benchmark problems from the U. C. Irvine repository, and compared them against C4.5, naive Bayes, and wrapper-based feature selection methods.

Journal ArticleDOI
01 Jun 1996-Test
TL;DR: Scoring rules and some related measures for evaluating probabilities are reviewed, including decompositions of scoring rules and attributes of “goodness” of probabilites, comparability of scores, and the design of scoring Rules for specific inferential and decision-making problems are reviewed.
Abstract: In Bayesian inference and decision analysis, inferences and predictions are inherently probabilistic in nature. Scoring rules, which involve the computation of a score based on probability forecasts and what actually occurs, can be used to evaluate probabilities and to provide appropriate incentives for “good” probabilities. This paper review scoring rules and some related measures for evaluating probabilities, including decompositions of scoring rules and attributes of “goodness” of probabilites, comparability of scores, and the design of scoring rules for specific inferential and decision-making problems

Journal ArticleDOI
TL;DR: The papers in this special section on Bayesian statistics exemplify thedifficulties inherent in making convincing scientific arguments with Bayesian reasoning.
Abstract: Bayesian statistics involve substantial changes in the methods and philos- ophy of science. Before adopting Bayesian approaches, ecologists should consider carefully whether or not scientific understanding will be enhanced. Frequentist statistical methods, while imperfect, have made an unquestioned contribution to scientific progress and are a workhorse of day-to-day research. Bayesian statistics, by contrast, have a largely untested track record. The papers in this special section on Bayesian statistics exemplify the diffi- culties inherent in making convincing scientific arguments with Bayesian reasoning.

Posted Content
TL;DR: It is shown that the Gibbs sampler can be combined with a unidimensional deterministic integration rule applied to each coordinate of the posterior density to perform Bayesian inference on GARCH models.
Abstract: This paper explains how the Gibbs sampler can be used to perform Bayesian inference on GARCH models. Although the Gibbs sampler is usually based on the analytical knowledge of the full conditional posterior densities, such knowledge is not available in regression models with GARCH errors. We show that the Gibbs sampler can be combined with a unidimensional deterministic integration rule applied to each coordinate of the posterior density. The full conditional densities are evaluated and inverted numerically to obtain random draws of the joint posterior. The method is shown to be feasible and competitive compared to importance sampling and the Metropolis-Hastings algorithm. It is applied to estimate an asymmetric GARCH model for the return on a stock exchange index, and to compute predictive densities of option prices on the index.

Journal ArticleDOI
TL;DR: The authors developed general methods for Bayesian inference with noninformative reference priors in this model, based on a Markov chain sampling algorithm, and procedures for obtaining predictive odds ratios for regression models with different ranks.

Book
29 Jan 1996
TL;DR: Theorem for Non-Precise a priori Distribution and Non- precise Data Bayesian Decisions Based on Non- Precise Information Outlook References List of Symbols Index.
Abstract: Non-Precise Data and Their Formal Description Non-Precise Data Non-Precise Numbers and Characterizing Functions Construction of Characterizing Functions Non-Precise Vectors Functions of Non-Precise Quantities and Non-Precise Functions Descriptive Statistics with Non-Precise Data Non-Precise Samples Histograms for Non-Precise Data Cumulative Sums for Non-Precise Data Empirical Distribution Function for Non-Precise Data Empirical Fractiles for Non-Precise Data Foundations for Statistical Inference with Non-Precise Data Combination of Non-Precise Observations Sample Moment for Non-Precise Observations Sequences of Non-Precise Observations Classical Statistical Inference for Non-Precise Data Point Estimators for Parameters Confidence Regions for Parameters Nonparametric Estimation Statistical Tests and Non-Precise Data Bayesian Inference for Non-Precise Data Bayes' Theorem for Non-Precise Data Bayesian Confidence Regions Based on Non-Precise Data Non-Precise Predictive Distributions Non-Precise a priori Distributions Bayes Theorem for Non-Precise a priori Distribution and Non-Precise Data Bayesian Decisions Based on Non-Precise Information Outlook References List of Symbols Index

Journal ArticleDOI
TL;DR: The role of Bayesian inference networks for updating student models in intelligent tutoring systems (ITSs) and the interplay among inferential issues, the psychology of learning in the domain, and the instructional approach upon which the ITS is based are highlighted.
Abstract: Probability-based inference in complex networks of interdependent variables is an active topic in statistical research, spurred by such diverse applications as forecasting, pedigree analysis, troubleshooting, and medical diagnosis. This paper concerns the role of Bayesian inference networks for updating student models in intelligent tutoring systems (ITSs). Basic concepts of the approach are briefly reviewed, but the emphasis is on the considerations that arise when one attempts to operationalize the abstract framework of probability-based reasoning in a practical ITS context. The discussion revolves around HYDRIVE, an ITS for learning to troubleshoot an aircraft hydraulics system. HYDRIVE supports generalized claims about aspects of student proficiency through probabilitybased combination of rule-based evaluations of specific actions. The paper highlights the interplay among inferential issues, the psychology of learning in the domain, and the instructional approach upon which the ITS is based.

Journal ArticleDOI
TL;DR: In this article, a unified approach to the nonhomogeneous Poisson process in software reliability models is given, which models the epochs of failures according to a general order statistics model or to a record value statistics model.
Abstract: A unified approach to the nonhomogeneous Poisson process in software reliability models is given. This approach models the epochs of failures according to a general order statistics model or to a record value statistics model. Their corresponding point processes can be related to the nonhomogeneous Poisson processes, for example, the Goel—Okumoto, the Musa—Okumoto, the Duane, and the Cox—Lewis processes. Bayesian inference for the nonhomogeneous Poisson processes is studied. The Gibbs sampling approach, sometimes with data augmentation and with the Metropolis algorithm, is used to compute the Bayes estimates of credible sets, mean time between failures, and the current system reliability. Model selection based on a predictive likelihood is studied. A numerical example with a real software failure data set is given.

Journal ArticleDOI
TL;DR: A limiting representation of the Bayesian data density is obtained and shown to be the same general exponential form for a wide class of likelihoods and prior distributions.
Abstract: This paper develops an asymptotic theory of Bayesian inference for time series. A limiting representation of the Bayesian data density is obtained and shown to be of the same general exponential form for a wide class of likelihoods and prior distributions. Continuous time and discrete time cases are studied. In discrete time, an embedding theorem is given which shows how to embed the exponential density in a continuous time process. From the embedding we obtain a large sample approximation to the model of the data that corresponds to the exponential density. This has the form of discrete observations drawn from a nonlinear stochastic differential equation driven by Brownian motion. No assumptions concerning stationarity or rates of convergence are required in the asymptotics. Some implications for statistical testing are explored and we suggest tests that are based on likelihood ratios (or Bayes factors) of the exponential densities for discriminating between models.

Journal ArticleDOI
TL;DR: A method is derived that allows us to determine whether a model is distributed as the data over the entire CMD, and whether the relative numbers of points in parts of the diagram are different, which will be useful when attempting to model complex star formation histories.
Abstract: We present a new method designed to aid in the interpretation of the color-magnitude diagrams (CMDs) of resolved stars in nearby galaxies. A CMD is a two-dimensional distribution of data points with well understood Gaussian measurement errors created from two independent observations. The most rigorous way to interpret a CMD is to create a model CMD through Monte Carlo simulation using theoretical stellar evolution tracks to see what combination of initial conditions provides the best match with the observed data. In this paper we describe how best to quantitatively compare these types of model and data. A good model CMD must contain a spatial distribution of points that matches the data and also has the same relative numbers of red stars, blue stars, and any other features seen in the data. This kind of detailed information can be obtained by using the assumptions of Bayesian inference to calculate the likelihood of a model CMD being a good match to the data CMD. To illustrate the effectiveness of this approach, we have created several test scenarios using simplified data sets. We have derived a method that allows us to determine whether a model is distributed as the data over the entire CMD, and whether the relative numbers of points in parts of the diagram are different. We can also determine whether a good match can be made to part of the data, which will be useful when attempting to model complex star formation histories. Our examples show that the results are very sensitive to the size of the measurement errors in the data, and so it is only the accuracy of these errors that restricts our ability to distinguish the good from the bad models. Our method is sufficiently robust and automated that we can search through large areas of parameter space without having to inspect the models visually.


Journal ArticleDOI
TL;DR: Alternative approaches to forecasting, which avoid conditioning on a single model, include Bayesian model averaging and using a forecasting method which is not model-based but which is designed to be adaptable and robust.
Abstract: In time-series analysis, a model is rarely pre-specified but rather is typically formulated in an iterative, interactive way using the given time-series data. Unfortunately the properties of the fitted model, and the forecasts from it, are generally calculated as if the model were known in the first place. This is theoretically incorrect, as least squares theory, for example, does not apply when the same data are used to formulates and fit a model. Ignoring prior model selection leads to biases, not only in estimates of model parameters but also in the subsequent construction of prediction intervals. The latter are typically too narrow, partly because they do not allow for model uncertainty. Empirical results also suggest that more complicated models tend to give a better fit but poorer ex-ante forecasts. The reasons behind these phenomena are reviewed. When comparing different forecasting models, the BIC is preferred to the AIC for identifying a model on the basis of within-sample fit, but out-of-sample forecasting accuracy provides the real test. Alternative approaches to forecasting, which avoid conditioning on a single model, include Bayesian model averaging and using a forecasting method which is not model-based but which is designed to be adaptable and robust.

Journal ArticleDOI
TL;DR: A vowel recognition task with multiple speakers is studied via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtures, which are a mixture model in which both the mixture coefficients and the mixture components are generalized linear models.
Abstract: Machine classification of acoustic waveforms as speech events is often difficult due to context dependencies. Here a vowel recognition task with multiple speakers is studied via the use of a class of modular and hierarchical systems referred to as mixtures-of-experts and hierarchical mixtures-of-experts models. The statistical model underlying the systems is a mixture model in which both the mixture coefficients and the mixture components are generalized linear models. A full Bayesian approach is used as a basis of inference and prediction. Computations are performed using Markov chain Monte Carlo methods. A key benefit of this approach is the ability to obtain a sample from the posterior distribution of any functional of the parameters of the given model. In this way, more information is obtained than can be provided by a point estimate. Also avoided is the need to rely on a normal approximation to the posterior as the basis of inference. This is particularly important in cases where the posteri...

Journal ArticleDOI
TL;DR: In this article, the authors propose a Markov equivalence class for cyclic digraphs and show that each Markov-equivalent class is uniquely determined by a single chain graph, the essential graph, that is itself Markovequivalent simultaneously to all ADGs in the equivalence classes.
Abstract: Acyclic digraphs (ADGs) are widely used to describe dependences among variables in multivariate distributions. In particular, the likelihood functions of ADG models admit convenient recursive factorizations that often allow explicit maximum likelihood estimates and that are well suited to building Bayesian networks for expert systems. There may, however, be many ADGs that determine the same dependence (= Markov) model. Thus, the family of all ADGs with a given set of vertices is naturally partitioned into Markov-equivalence classes, each class being associated with a unique statistical model. Statistical procedures, such as model selection or model averaging, that fail to take into account these equivalence classes, may incur substantial computational or other inefficiencies. Recent results have shown that each Markov-equivalence class is uniquely determined by a single chain graph, the essential graph, that is itself Markov-equivalent simultaneously to all ADGs in the equivalence class. Here we propose t...

Journal ArticleDOI
TL;DR: This article proposed a method for simultaneous variable selection and outlier identification based on the computation of posterior model probabilities, which avoids the problem that the model selection depends upon the order in which variable selection is carried out.

Book ChapterDOI
12 Jun 1996
TL;DR: This paper shows that different course representations can be merged together and realized in a granularity hierarchy and Bayesian inference can be used to propagate knowledge throughout the hierarchy.
Abstract: Adaptive testing is impractical in real world situations where many different learner traits need to be measured in a single test. Recent student modelling approaches have attempted to solve this problem using different course representations along with sound knowledge propagation schemes. This paper shows that these different representations can be merged together and realized in a granularity hierarchy. Bayesian inference can be used to propagate knowledge throughout the hierarchy. This provides information for selecting appropriate test items and maintains a measure of the student's knowledge level.