Showing papers on "Variable-order Bayesian network published in 2002"

PDF

Open Access

Dynamic bayesian networks: representation, inference and learning

[...]

01 Jan 2002

TL;DR: This thesis will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in Dbns, and how to learn DBN models from sequential data.

...read moreread less

Abstract: Dynamic Bayesian Networks: Representation, Inference and Learning by Kevin Patrick Murphy Doctor of Philosophy in Computer Science University of California, Berkeley Professor Stuart Russell, Chair Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and bio-sequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linear-Gaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data. In particular, the main novel technical contributions of this thesis are as follows: a way of representing Hierarchical HMMs as DBNs, which enables inference to be done in O(T ) time instead of O(T ), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T ) space instead of O(T ); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of applying Rao-Blackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.

...read moreread less

2,757 citations

Journal Article•DOI•

Learning Bayesian networks from data: an information-theory based approach

[...]

Jie Cheng¹, Russell Greiner¹, Jonathan Kelly¹, David A. Bell², Weiru Liu² - Show less +1 more•Institutions (2)

University of Alberta¹, Ulster University²

01 May 2002-Artificial Intelligence

TL;DR: Algorithms that use an information-theoretic analysis to learn Bayesian network structures from data, requiring only polynomial numbers of conditional independence tests in typical cases are provided.

...read moreread less

804 citations

Journal Article•DOI•

Potential Applications and Pitfalls of Bayesian Inference of Phylogeny

[...]

John P. Huelsenbeck¹, Bret Larget², Richard E. Miller³, Fredrik Ronquist⁴•Institutions (4)

University of California, Berkeley¹, Duquesne University², Southeastern Louisiana University³, Uppsala University⁴

01 Sep 2002-Systematic Biology

TL;DR: The Bayesian inference of phylogeny appears to possess advantages over the other methods in terms of ability to use complex models of evolution, ease of interpretation of the results, and computational efficiency.

...read moreread less

Abstract: Only recently has Bayesian inference of phylogeny been proposed. The method is now a practical alternative to the other methods; indeed, the method appears to possess advantages over the other methods in terms of ability to use complex models of evolution, ease of interpretation of the results, and computational efeciency. However, the method should be used cautiously. The results of a Bayesian analysis should be examined with respect to the sensitivity of the results to the priors used and the reliability of the Markov chain Monte Carlo approximation of the probabilities of trees. (Bayesian inference; Markov chain Monte Carlo; phylogeny; posterior probability.)

...read moreread less

798 citations

Book•DOI•

Bayesian methods : a social and behavioral sciences approach

[...]

Jeff Gill

29 May 2002

TL;DR: Bayesian Decision Theory Introducing Decision Theory Basic Definitions Regression-Style Models with Decision Theory James-Stein Estimation Empirical Bayes Exercises Monte Carlo and Related Iterative Methods.

...read moreread less

Abstract: BACKGROUND AND INTRODUCTION Introduction Motivation and Justification Why Are We Uncertain about Probability? Bayes' Law Conditional Inference with Bayes' Law Historical Comments The Scientific Process in Our Social Sciences Introducing Markov Chain Monte Carlo Techniques Exercises SPECIFYING BAYESIAN MODELS Purpose Likelihood Theory and Estimation The Basic Bayesian Framework Bayesian "Learning" Comments on Prior Distributions Bayesian versus Non-Bayesian Approaches Exercises Computational Addendum: R for Basic Analysis THE NORMAL AND STUDENT'S-T MODELS Why Be Normal? The Normal Model with Variance Known The Normal Model with Mean Known The Normal Model with Both Mean and Variance Unknown Multivariate Normal Model, and S Both Unknown Simulated Effects of Differing Priors Some Normal Comments The Student's t Model Normal Mixture Models Exercises Computational Addendum: Normal Examples THE BAYESIAN LINEAR MODEL The Basic Regression Model Posterior Predictive Distribution for the Data The Bayesian Linear Regression Model with Heteroscedasticity Exercises Computational Addendum THE BAYESIAN PRIOR A Prior Discussion of Priors A Plethora of Priors Conjugate Prior Forms Uninformative Prior Distributions Informative Prior Distributions Hybrid Prior Forms Nonparametric Priors Bayesian Shrinkage Exercises ASSESSING MODEL QUALITY Motivation Basic Sensitivity Analysis Robustness Evaluation Comparing Data to the Posterior Predictive Distribution Simple Bayesian Model Averaging Concluding Comments on Model Quality Exercises Computational Addendum BAYESIAN HYPOTHESIS TESTING AND THE BAYES' FACTOR Motivation Bayesian Inference and Hypothesis Testing The Bayes' Factor as Evidence The Bayesian Information Criterion (BIC) The Deviance Information Criterion (DIC) Comparing Posteriors with the Kullback-Leibler Distance Laplace Approximation of Bayesian Posterior Densities Exercises Bayesian Decision Theory Introducing Decision Theory Basic Definitions Regression-Style Models with Decision Theory James-Stein Estimation Empirical Bayes Exercises Monte Carlo and Related Iterative Methods Background Basic Monte Carlo Integration Rejection Sampling Classical Numerical Integration Gaussian Quadrature Importance Sampling/Sampling Importance Resampling Mode Finding and the EM Algorithm Survey of Random Number Generation Concluding Remarks Exercises Computational Addendum: R Code for Importance Sampling BASICS OF MARKOV CHAIN MONTE CARLO Who Is Markov and What Is He Doing with Chains? General Properties of Markov Chains The Gibbs Sampler The Metropolis-Hastings Algorithm The Hit-and-Run Algorithm The Data Augmentation Algorithm Historical Comments Exercises Computational Addendum: Simple R Graphing Routines for MCMC Implementing Bayesian Models with Markov Chain Monte Carlo Introduction to Bayesian Software Solutions It's Only a Name: BUGS Model Specification with BUGS Differences between WinBUGS and JAGS Code Technical Background about the Algorithm Epilogue Exercises BAYESIAN HIERARCHICAL MODELS Introduction to Multilevel Models Standard Multilevel Linear Models A Poisson-Gamma Hierarchical Model The General Role of Priors and Hyperpriors Exchangeability Empirical Bayes Exercises Computational Addendum: Instructions for Running JAGS, Trade Data Model SOME MARKOV CHAIN MONTE CARLO THEORY Motivation Measure and Probability Preliminaries Specific Markov Chain Properties Defining and Reaching Convergence Rates of Convergence Implementation Concerns Exercises UTILITARIAN MARKOV CHAIN MONTE CARLO Practical Considerations and Admonitions Assessing Convergence of Markov Chains Mixing and Acceleration Producing the Marginal Likelihood Integral from Metropolis- Hastings Output Rao-Blackwellizing for Improved Variance Estimation Exercises Computational Addendum: R Code for the Death Penalty Support Model and BUGS Code for the Military Personnel Model Markov Chain Monte Carlo Extensions Simulated Annealing Reversible Jump Algorithms Perfect Sampling Exercises APPENDIX A: GENERALIZED LINEAR MODEL REVIEW Terms The Generalized Linear Model Numerical Maximum Likelihood Quasi-Likelihood Exercises R for Generalized Linear Models APPENDIX B: COMMON PROBABILITY DISTRIBUTIONS REFERENCES AUTHOR INDEX SUBJECT INDEX

...read moreread less

676 citations

Journal Article•DOI•

Classical and Bayesian inference in neuroimaging: theory.

[...]

Karl J. Friston¹, William D. Penny¹, Christophe Phillips¹, Stefan J. Kiebel¹, Geoffrey E. Hinton¹, John Ashburner¹ - Show less +2 more•Institutions (1)

University College London¹

01 Jun 2002-NeuroImage

TL;DR: The procedures used in conventional data analysis are formulated in terms of hierarchical linear models and a connection between classical inference and parametric empirical Bayes (PEB) through covariance component estimation is established through covariances component estimation.

...read moreread less

647 citations

Journal Article•DOI•

Hidden Markov Models and Disease Mapping

[...]

Peter H.R. Green, Sylvia Richardson

01 Dec 2002-Journal of the American Statistical Association

TL;DR: In this paper, a new methodology to extend hidden Markov models to the spatial domain, and use this class of models to analyze spatial heterogeneity of count data on a rare phenomenon is presented.

...read moreread less

Abstract: We present new methodology to extend hidden Markov models to the spatial domain, and use this class of models to analyze spatial heterogeneity of count data on a rare phenomenon This situation occurs commonly in many domains of application, particularly in disease mapping We assume that the counts follow a Poisson model at the lowest level of the hierarchy, and introduce a finite-mixture model for the Poisson rates at the next level The novelty lies in the model for allocation to the mixture components, which follows a spatially correlated process, the Potts model, and in treating the number of components of the spatial mixture as unknown Inference is performed in a Bayesian framework using reversible jump Markov chain Monte Carlo The model introduced can be viewed as a Bayesian semiparametric approach to specifying flexible spatial distribution in hierarchical models Performance of the model and comparison with an alternative well-known Markov random field specification for the Poisson rates are de

...read moreread less

358 citations

Journal Article•DOI•

Bayesian Statistical Modelling

[...]

Stephen J Ganocy¹•Institutions (1)

Goodyear Tire and Rubber Company¹

01 Aug 2002-Technometrics

TL;DR: The stated objectives—to offer statistical methodology for use by laymen outside the grasp of supporting principles—are achieved commendably by the authors, and the extensive tables are the result of computer-intensive optimization algorithms seeking optimal precision.

...read moreread less

Abstract: implementing these tools. Supporting developments are given in Part II. The printed tables and access to the CD-ROM are given in Part III as needed to implement the methods. Detailed case studies are developed in Part IV, illustrating the range of data analyses supported by the tables. The stated objectives—to offer statistical methodology for use by laymen outside the grasp of supporting principles—are achieved commendably by the authors. The tables, both printed and electronic, are easily accessed by the novice through a self-paced study following step-by-step examples, especially as given in Part IV. At the same time, knowledgeable users deserve some explanation as to the statistical principles on which the methodology rests. Comments on these issues constitute much of the remainder of this review. Let X be the sample space containing outcomes X4n5 D 6X11 : : : 1Xn7 of independent Bernoulli trials having parameter M p 2 601 17 with value p to be determined, and let XS D X1 C X2 C ¢ ¢ ¢ C Xn. Procedures are based in principle on either X4n5 or XS , but in practice on the latter. The authors trace developments back to Jacob Bernoulli, and draw heavily on the foundations of Jerzy Neyman for “scienti c statistics.” The authors break ranks with conventional statistics on two essential grounds: (1) the range of M p and (2) the use of asymptotics in a practice often typi ed by small to moderate samples. Regarding (1), they con dently assert that users can accurately stipulate a proper subset 6p1 N p7 of 601 17, called the measurement space, wherein M p is known to lie with certainty. Accordingly, they seek solutions to problems of inference that conform to the measurement space. Classical estimation using XS=n is faulted heavily here in giving often nonconforming values. With regard to (2), the computerintensive developments on which the tables rest are essentially exact for small samples. In rough analogy with scales for weighing mice and elephants, to each measurement space there corresponds a variety of data—analytic tools, listed as (a)–(d) in the second paragraph of this report, that are supported through the printed tables and the CD-ROM. Technical developments begin with the dual issues of ‚-measurement intervals for M p in 6p1 N p7 and ‚-prediction regions in X, with con dence level ¶ ‚ as a gauge of their reliability. These constitute the “‚-measurement & prediction space.” Other methods build on these. Point estimation focuses on ‚-estimators in 6p1 N p7 depending on the sample size as well as on the con dence level ‚. These include (a) the “minimum MSE ‚-estimator,” designed to minimize the conditional mean squared error, given the ‚-measurement & prediction space, and (b) the “midpoint ‚-estimator” as the midpoint of the ‚-measurement interval for M p in 6p1 N p7. The aim of exclusion, in lieu of hypothesis testing, is “to show that the actual value p of M p is different from any value in 6p1 N p7 6p1 N p7, where the reliability of the exclusion procedure is speci ed by the signi cance level .” Thus H0 2 M p 2 6p1 N p7 is excluded from 6p1 N p7. There appear to be no errors of the second kind, because M p always belongs to 6p1 N p7, and thus no concept of the power of an exclusion procedure to exclude. The extensive tables are the result of computer-intensive optimization algorithms seeking optimal precision for each nominal reliability level, while reducing excess reliability arising from discreteness of the problem. Procedures based on X4n5 are shown by inclusion to be superior to ones based on XS . Nonetheless, the available tables use XS owing to apparent practical constraints. In particular, typical input variables are the measurement range 6p1 N p7, the sample size n, the con dence level ‚, the realization XS , and allied information pertaining to exclusion, for example. Output in turn consists of ‚-measurement intervals and other quantities of use in assessing the data. Principles undergirding the analyses are ostensibly non-Bayesian. Nonetheless, M p does become a random variable during the course of the authors’ developments, essentially through the assignment of a Bayesian uniform prior over 6p1 N p7. Despite the careful development of these methodologies, and extensive tables for their implementation, this reviewer sees serious impediments to their effective use. These reservations focus largely on the assumption that the measurement range 6p1 N p7 can itself be stipulated accurately by users. This concern pervades every stage of the scienti c method. New experiments, unless strictly con rmatory, do chart new paths, so that past experience regarding earlier measurement spaces need not carry over without modi cation. At issue are problems with misspeci cation of the range, the consequences of such misspeci cation, and possible robustness of procedures to such misspeci cation. The authors essentially remain mute on these critical issues. For if the parameter range is cast too wide, then the authors’ objections to classical methods (based on M p 2 601175 apply verbatim to their own methods, but now with regard to nonconformity with the actual (now smaller) measurement space. Consequences of prescribing too narrow a range remain to be studied. On the other hand, if the range supported by prior user knowledge is suf ciently narrow, then all statistical procedures become moot. Early in their monograph the authors appear to subscribe to the following point of view: “ If statistics is an applied eld and not a minor branch of mathematics, then more than ninety-nine percent of the published papers are useless exercises.” Apparently, Binomial Distribution Handbook for Scientists and Engineers represents their efforts to be included in the other 1%. I must leave it to the experience of other users to judge how well this objective has been met.

...read moreread less

309 citations

Speech recognition with Dynamic Bayesian Networks

[...]

G. G. Zweig

01 Jan 2002

TL;DR: In this paper, the authors show how to encode stochastic finite-state word models as DBNs, and how to construct DBN models that explicitly model the speech-articulators, accent, gender, speaking-rate, and other important phenomena.

...read moreread less

Abstract: Dynamic Bayesian networks (DBNs) are a powerful and flexible methodology for representing and computing with probabilistic models of stochastic processes. In the past decade, there has been increasing interest in applying them to practical problems, and this thesis shows that they can be used effectively in the field of automatic speech recognition. A principle characteristic of dynamic Bayesian networks is that they can model an arbitrary set of variables as they evolve over time. Moreover, an arbitrary set of conditional independence assumptions can be specified, and this allows the joint distribution to be represented in a highly factored way. Factorization allows for models with relatively few parameters, and computational efficiency. Standardized inference and learning routines allow a variety of model structures to be tested without deriving new formulae, or writing new code. The contribution of this thesis is to show how DBNs can be used in automatic speech recognition. This involves solving problems related to both representation and inference. Representationally, the thesis shows how to encode stochastic finite-state word models as DBNs, and how to construct DBNs that explicitly model the speech-articulators, accent, gender, speaking-rate, and other important phenomena. Technically, the thesis presents inference routines that are especially tailored to the requirements of speech recognition: efficient inference with deterministic constraints, variable-length utterances, and online inference. Finally, the thesis presents experimental results that indicate that real systems can be built, and that modeling important phenomena with DBNs results in higher recognition accuracy.

...read moreread less

237 citations

Journal Article•DOI•

Bayesian Model Adequacy and Choice in Phylogenetics

[...]

Jonathan P. Bollback¹•Institutions (1)

University of Rochester¹

01 Jul 2002-Molecular Biology and Evolution

TL;DR: This article presents a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions and, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.

...read moreread less

Abstract: Bayesian inference is becoming a common statistical approach to phylogenetic estimation because, among other reasons, it allows for rapid analysis of large data sets with complex evolutionary models. Conveniently, Bayesian phylogenetic methods use currently available stochastic models of sequence evolution. However, as with other model-based approaches, the results of Bayesian inference are conditional on the assumed model of evolution: inadequate models (models that poorly fit the data) may result in erroneous inferences. In this article, I present a Bayesian phylogenetic method that evaluates the adequacy of evolutionary models using posterior predictive distributions. By evaluating a model's posterior predictive performance, an adequate model can be selected for a Bayesian phylogenetic study. Although I present a single test statistic that assesses the overall (global) performance of a phylogenetic model, a variety of test statistics can be tailored to evaluate specific features (local performance) of evolutionary models to identify sources failure. The method presented here, unlike the likelihood-ratio test and parametric bootstrap, accounts for uncertainty in the phylogeny and model parameters.

...read moreread less

234 citations

Journal Article•DOI•

Bayesian curve fitting using MCMC with applications to signal segmentation

[...]

E. Punskaya¹, Christophe Andrieu, Arnaud Doucet², William J. Fitzgerald¹•Institutions (2)

University of Cambridge¹, University of Melbourne²

01 Mar 2002-IEEE Transactions on Signal Processing

TL;DR: Some Bayesian methods to address the problem of fitting a signal modeled by a sequence of piecewise constant linear regression models, for example, autoregressive or Volterra models are proposed.

...read moreread less

Abstract: We propose some Bayesian methods to address the problem of fitting a signal modeled by a sequence of piecewise constant linear (in the parameters) regression models, for example, autoregressive or Volterra models A joint prior distribution is set up over the number of the changepoints/knots, their positions, and over the orders of the linear regression models within each segment if these are unknown Hierarchical priors are developed and, as the resulting posterior probability distributions and Bayesian estimators do not admit closed-form analytical expressions, reversible jump Markov chain Monte Carlo (MCMC) methods are derived to estimate these quantities Results are obtained for standard denoising and segmentation of speech data problems that have already been examined in the literature These results demonstrate the performance of our methods

...read moreread less

202 citations

Bayesian optimization algorithm: from single level to hierarchy

[...]

David E. Goldberg, Martin Pelikan

01 Jan 2002

Book•

Hybrid Bayesian networks for reasoning about complex systems

[...]

Uri Lerner

01 Jan 2002

TL;DR: A suite of new inference algorithms, designed to deal with non-linearities present in many real-world systems, and to scale up to large hybrid models are provided, and it is shown that these algorithms often outperform the current state of the art inference algorithms for hybrid models.

...read moreread less

Abstract: Many real-world systems are naturally modeled as hybrid stochastic processes, i.e., stochastic processes that contain both discrete and continuous variables. Examples include speech recognition, target tracking, and monitoring of physical systems. The task is usually to perform probabilistic inference, i.e., infer the hidden state of the system given some noisy observations. For example, we can ask what is the probability that a certain word was pronounced given the readings of our microphone, what is the probability that a submarine is trying to surface given our sonar data, and what is the probability of a valve being open given our pressure and ow readings. Bayesian networks are a compact way to represent a probability distribution. They can be extended to dynamic Bayesian networks which represent stochastic processes. In this thesis we concentrate on hybrid (dynamic) Bayesian networks. Our contributions are three-fold: theoretical, algorithmic, and practical. From a theoretical perspective, we provide a novel complexity analysis for inference in hybrid models and show that there is a fundamental di erence between the complexity of inference in discrete models and in hybrid ones. In particular, we provide the rst NP-hardness results for inference in very simple hybrid models. From an algorithmic perspective, we provide a suite of new inference algorithms, designed to deal with non-linearities present in many real-world systems, and to scale up to large hybrid models. We show that our algorithms often outperform the current state of the art inference algorithms for hybrid models. Finally, from a practical perspective, we apply our techniques to the task of fault diagnosis in a complex real-world physical system, designed to extract oxygen from the Martian atmosphere. We demonstrate the feasibility of our approach using data collected during actual runs of the system. v Acknowledgments First and foremost I would like to thank Daphne Koller. If I had to describe Daphne in one word, excellence would be my choice. Her uncompromising standards make everybody, and mostly herself, work harder, but also bring her students to ful ll their true potential. Looking back I realize how little I knew when I came to Stanford and how much I learned from Daphne during these years: how to nd interesting research problems, how to clearly present one's work both in papers and in talks, how to teach a class, and in short how to be a good researcher. I am most grateful to Daphne for bringing me to the point of being proud of my work. Excluding Daphne, Ron Parr is the person to whom I owe my biggest debt of gratitude. Ron, who was a Research Associate in Daphne's research group, served as my uno cial second adviser during a crucial period of my Ph.D. career, and to a large extent he is responsible for putting me on the track which ultimately led to this thesis. Working closely with Ron was one of the most rewarding and enjoyable experiences of my Ph.D. career. Ron always had the time for a chat, and much of my work can be traced back to these chats. I feel extremely fortunate to have met Ron and to count him as a friend. Thank you Ron! I would also like to thank Stephen Boyd, the third member of my reading committee, who pointed out to me many interesting directions and connections that turned out to be very useful. I thank Daphne, Ron and Stephen for taking the time to read this long thesis and come up with as many useful comments as they have had. I also thank Ross Shachter and Nils Nilsson who were in my Orals committee. Working on the RWGS was truly a group e ort, and I am privileged to have worked with such an amazing group of people. Among the people at Stanford I vi would like to thank Brooks Moses who was the only one of us who truly understood the system and was able to come up with a model for it, Sheila McIlraith who brought the problem into my attention and was instrumental in keeping the project on the right track, and Maricia Scott who shared much of the workload with me and was always invaluable during the crucial moments. Brooks, Maricia and I spent many hours on constructing and debugging the model. Their company transformed the experience from a potentially painful one to an enjoyable one. On the NASA side I would rst like to thank Charlie Goodrich whose help went above and beyond any reasonable expectation. Charlie was always there to patiently answer our questions, make sure we have access to the data that we needed, and was even kind enough to host us at his home during our visit to Kennedy Space Center. I could not have hoped for a more helpful or a nicer person than Charlie. I would also like to thank Dan Clancy, Bill Larson, Jon Whitlow, Clyde Parrish, Curtis Ihl eld and Dan Keenan for their help. Zohar Manna was my adviser during my rst year at Stanford, and made my transition from Israel to Stanford much smoother and easier. Zohar let me develop my research interests during my rst year, was always there when I needed someone to talk with, and always believed in my abilities. For that I am truly grateful. Shimon Ullman from the Weizmann Institute of Science in Israel helped me a great deal in my application process and encouraged me to come to Stanford. Since then he was always there to keep track of my progress and give me a good advice when I needed one. I may not have come to Stanford without Shimon's help, and I am in his debt for that. The algorithms in this thesis were implemented using a code infrastructure called Frog or Phrog. Frog was initially created by Lise Getoor and myself, and I am convinced that had we known what we were getting ourselves into, we would not have done it. However, now that the beast walks among us, it is my hope that the various users of Frog actually bene ted from the system, despite all the pain that it caused. I would like to thank Eric Bauer, Xavier Boyen, Maricia Scott, Ben Taskar and Drago Anguelov for their help in writing Frog. Being a member of DAGS (Daphne's Approximate Group of Student) was a source vii of both pleasure and pride. Year after year, the research coming out of DAGS is of high quality and high impact, which is a testament to the extraordinary people that were and are members of DAGS, as well as to Daphne's vision and guidance. I would like to thank Drago Anguelov, Eric Bauer, Jenny Berglund, Xavier Boyen, Urszula Chajewska, Barbara Engelhardt, Raya Fratkina, Lise Getoor, Carlos Guestrin, Manfred Jaeger, Alex Kozlov, Brian Milch, Uri Nodelman, Dirk Ormoneit, Ron Parr, Avi Pfe er, Mehran Sahami, Maricia Scott, Eran Segal, Christian Shelton, Ben Taskar, and Simon Tong. I would like to thank some other friends and colleagues that I was fortunate enough to get to know along the years: Eyal Amir, Nikolaj Bj rner, Michael Colon, Arturo Crespo, Alon Efrat, Mattan Erez, Bernd Finkbeiner, Nir Friedman, Karl P eger, Henny Sipma, and Tomas Uribe. I would also like to thank my di erent sources of funding: ONR Young Investigator (PECASE) under grant number N00014-99-1-0464, ARO under the MURI program, \Integrated Approach to Intelligent Systems", grant number DAAH04-961-0341, ONR under the MURI program \Decision Making under Uncertainty", grant number N00014-00-1-0637, and NASA under grant number NAG2-1337. Finally, and above all, I would like to thank my parents, Tamar and Eliahu Lerner whose love and support were always something I could count on, and my brother Assaf who was always so proud of me | it is I who is proud to have you as a brother. This dissertation is dedicated to them. viii

...read moreread less

Journal Article•DOI•

Bayesian Clustering by Dynamics

[...]

Marco F. Ramoni¹, Paola Sebastiani², Paul R. Cohen²•Institutions (2)

Harvard University¹, University of Massachusetts Amherst²

01 Apr 2002-Machine Learning

TL;DR: This paper introduces a Bayesian method for clustering dynamic processes that models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing different dynamics.

...read moreread less

Abstract: This paper introduces a Bayesian method for clustering dynamic processes. The method models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing different dynamics. To increase efficiency, the method uses an entropy-based heuristic search strategy. A controlled experiment suggests that the method is very accurate when applied to artificial time series in a broad range of conditions and, when applied to clustering sensor data from mobile robots, it produces clusters that are meaningful in the domain of application.

...read moreread less

Journal Article•DOI•

B-course: a web-based tool for bayesian and causal data analysis

[...]

Petri Myllymäki¹, Tomi Silander¹, Henry Tirri¹, Pekka Uronen¹•Institutions (1)

Helsinki Institute for Information Technology¹

01 Sep 2002-International Journal on Artificial Intelligence Tools

TL;DR: With the restrictions stated in the support material, B-Course is a powerful analysis tool exploiting several theoretically elaborate results developed recently in the fields of Bayesian and causal modeling.

...read moreread less

Abstract: B-Course is a free web-based online data analysis tool, which allows the users to analyze their data for multivariate probabilistic dependencies. These dependencies are represented as Bayesian network models. In addition to this, B-Course also offers facilities for inferring certain type of causal dependencies from the data. The software uses a novel "tutorial stylerdquo; user-friendly interface which intertwines the steps in the data analysis with support material that gives an informal introduction to the Bayesian approach adopted. Although the analysis methods, modeling assumptions and restrictions are totally transparent to the user, this transparency is not achieved at the expense of analysis power: with the restrictions stated in the support material, B-Course is a powerful analysis tool exploiting several theoretically elaborate results developed recently in the fields of Bayesian and causal modeling. B-Course can be used with most web-browsers (even Lynx), and the facilities include features such as automatic missing data handling and discretization, a flexible graphical interface for probabilistic inference on the constructed Bayesian network models (for Java enabled browsers), automatic prettyHyphen;printed layout for the networks, exportation of the models, and analysis of the importance of the derived dependencies. In this paper we discuss both the theoretical design principles underlying the B-Course tool, and the pragmatic methods adopted in the implementation of the software.

...read moreread less

Journal Article•DOI•

Bayesian prediction of spatial count data using generalized linear mixed models.

[...]

Ole F. Christensen¹, Rasmus Waagepetersen¹•Institutions (1)

Aalborg University¹

01 Jun 2002-Biometrics

TL;DR: Spatial weed count data are modeled and predicted using a generalized linear mixed model combined with a Bayesian approach and Markov chain Monte Carlo and so‐called Langevin‐Hastings updates are useful for efficient simulation of the posterior distributions.

...read moreread less

Abstract: Spatial weed count data are modeled and predicted using a generalized linear mixed model combined with a Bayesian approach and Markov chain Monte Carlo. Informative priors for a data set with sparse sampling are elicited using a previously collected data set with extensive sampling. Furthermore, we demonstrate that so-called Langevin-Hastings updates are useful for efficient simulation of the posterior distributions, and we discuss computational issues concerning prediction.

...read moreread less

Journal Article•DOI•

Identifiability of parameters in MCMC Bayesian inference of phylogeny.

[...]

Bruce Rannala¹•Institutions (1)

University of Alberta¹

01 Sep 2002-Systematic Biology

TL;DR: Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.

...read moreread less

Abstract: Methods for Bayesian inference of phylogeny using DNA sequences based on Markov chain Monte Carlo (MCMC) techniques allow the incorporation of arbitrarily complex models of the DNA substitution process, and other aspects of evolution. This has increased the realism of models, potentially improving the accuracy of the methods, and is largely responsible for their recent popularity. Another consequence of the increased complexity of models in Bayesian phylogenetics is that these models have, in several cases, become overparameterized. In such cases, some parameters of the model are not identifiable; different combinations of nonidentifiable parameters lead to the same likelihood, making it impossible to decide among the potential parameter values based on the data. Overparameterized models can also slow the rate of convergence of MCMC algorithms due to large negative correlations among parameters in the posterior probability distribution. Functions of parameters can sometimes be found, in overparameterized models, that are identifiable, and inferences based on these functions are legitimate. Examples are presented of overparameterized models that have been proposed in the context of several Bayesian methods for inferring the relative ages of nodes in a phylogeny when the substitution rate evolves over time.

...read moreread less

Journal Article•DOI•

Expected‐posterior prior distributions for model selection

[...]

José M. Pérez¹, James O. Berger•Institutions (1)

Simón Bolívar University¹

01 Aug 2002-Biometrika

TL;DR: In this article, a new method of developing prior distributions for the model parameters is presented, called the expected-posterior prior approach, which defines the priors for all models from a common underlying predictive distribution in such a way that the resulting priors are amenable to modern Markov chain Monte Carlo computational techniques.

...read moreread less

Abstract: SUMMARY We consider the problem of comparing parametric models using a Bayesian approach. A new method of developing prior distributions for the model parameters is presented, called the expected-posterior prior approach. The idea is to define the priors for all models from a common underlying predictive distribution, in such a way that the resulting priors are amenable to modern Markov chain Monte Carlo computational techniques. The approach has subjective Bayesian and default Bayesian implementations, and overcomes the most significant impediment to Bayesian model selection, that of ensuring that prior distributions for the various models are appropriately compatible.

...read moreread less

Journal Article•DOI•

A tutorial introduction to Bayesian inference for stochastic epidemic models using Markov chain Monte Carlo methods

[...]

Philip D. O'Neill¹•Institutions (1)

University of Nottingham¹

01 Nov 2002-Bellman Prize in Mathematical Biosciences

TL;DR: Recent Bayesian methods for the analysis of infectious disease outbreak data using stochastic epidemic models are reviewed and rely on Markov chain Monte Carlo methods.

...read moreread less

Abstract: Recent Bayesian methods for the analysis of infectious disease outbreak data using stochastic epidemic models are reviewed. These methods rely on Markov chain Monte Carlo methods. Both temporal and non-temporal data are considered. The methods are illustrated with a number of examples featuring different models and datasets.

...read moreread less

A Survey of Algorithms for Real-Time Bayesian Network Inference

[...]

Haipeng Guo¹, William H. Hsu¹•Institutions (1)

Kansas State University¹

01 Jan 2002

TL;DR: A survey of various exact and approximate Bayesian network inference algorithms working under real-time constraints and a framework for understanding these algorithms and the relationships between them is provided.

...read moreread less

Abstract: As Bayesian networks are applied to more complex and realistic real-world applications, the development of more efficient inference algorithms working under real-time constraints is becoming more and more important. This paper presents a survey of various exact and approximate Bayesian network inference algorithms. In particular, previous research on real-time inference is reviewed. It provides a framework for understanding these algorithms and the relationships between them. Some important issues in real-time Bayesian networks inference are also discussed.

...read moreread less

Journal Article•DOI•

Using Bayesian network for fault location on distribution feeder

[...]

Chen-Fu Chien¹, Shi-Lin Chen¹, Yih-Shin Lin¹•Institutions (1)

National Tsing Hua University¹

07 Nov 2002-IEEE Power & Energy Magazine

TL;DR: This work constructs a Bayesian network on the basis of expert knowledge and historical data for fault diagnosis on a distribution feeder in Taiwan and the experimental results validate the practical viability of the proposed approach.

...read moreread less

Abstract: The Bayesian network is a probabilistic graphical model in which a problem is structured as a set of variables (parameters) and probabilistic relationships among them. The Bayesian network has been effectively used to incorporate expert knowledge and historical data for revising the prior belief in the light of new evidence in many fields. However, little research has been done to apply the Bayesian network for fault location in power delivery systems. We construct a Bayesian network on the basis of expert knowledge and historical data for fault diagnosis on a distribution feeder in Taiwan. The experimental results validate the practical viability of the proposed approach.

...read moreread less

Methods for Record Linkage and Bayesian Networks

[...]

William E. Winkler

01 Jan 2002

TL;DR: Although terminology differs, there is considerable overlap between record linkage methods based on the Fellegi-Sunter model and Bayesian networks used in machine learning and formal probabilistic models that can be shown to be equivalent in many situations.

...read moreread less

Abstract: Although terminology differs, there is considerable overlap between record linkage methods based on the Fellegi-Sunter model (JASA 1969) and Bayesian networks used in machine learning (Mitchell 1997). Both are based on formal probabilistic models that can be shown to be equivalent in many situations (Winkler 2000). When no missing data are present in identifying fields and training data are available, then both can efficiently estimate parameters of interest. EM and MCMC methods can be used for automatically estimating parameters and error rates in some of the record linkage situations (Belin and Rubin 1995, Larsen and Rubin 2001).

...read moreread less

Journal Article•DOI•

Scalability of the Bayesian optimization algorithm

[...]

Martin Pelikan¹, Kumara Sastry¹, David E. Goldberg¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Nov 2002-International Journal of Approximate Reasoning

TL;DR: The paper analyzes the applicability of the methods for learning Bayesian networks in the context of genetic and evolutionary search and concludes that the combination of the two approaches yields robust, efficient, and accurate search.

...read moreread less

Dissertation•

The Discrete Acyclic Digraph Markov Model in Data Mining

[...]

J.R. Castelo Valdueza

03 Jun 2002

TL;DR: A new class of graphical Markov models, called TCI models, is introduced, which can be represented by labeled trees and form the intersection of two previously well-known classes, and the inclusion order of graphicalMarkov models is studied.

...read moreread less

Abstract: Graphical Markov models are a powerful tool for the description of complex interactions between the variables of a domain They provide a succinct description of the joint distribution of the variables This feature has led to the most successful application of graphical Markov models, that is as the core component of probabilistic expert systems The fascinating theory behind this type of models arises from three different disciplines, viz, Statistics, Graph Theory and Artificial Intelligence This interdisciplinary origin has given rich insight from different perspectives There are two main ways to find the qualitative structure of graphical Markov models Either the structure is specified by a domain expert or ``structural learning'' is applied, ie, the structure is automatically recovered from data For structural learning, one has to compare how well different models describe the data This is easy for, eg, acyclic digraph Markov models However, structural learning is still a hard problem because the number of possible models grows exponentially with the number of variables The main contributions of this thesis are as follows Firstly, a new class of graphical Markov models, called TCI models, is introduced These models can be represented by labeled trees and form the intersection of two previously well-known classes Secondly, the inclusion order of graphical Markov models is studied From this study, two new learning algorithms are derived One for heuristic search and the other for the Markov Chain Monte Carlo Method Both algorithms improve the results of previous approaches without compromising the computational cost of the learning process Finally, new diagnostics for convergence assessment of the Markov Chain Monte Carlo Method in structural learning are introduced The results of this thesis are illustrated using both synthetic and real world datasets

...read moreread less

Journal Article•DOI•

A Bayesian Diagnostic Algorithm for Student Modeling and its Evaluation

[...]

Eva Millán¹, José-Luis Pérez-de-la-Cruz¹•Institutions (1)

University of Málaga¹

19 Mar 2002-User Modeling and User-adapted Interaction

TL;DR: A new approach to diagnosis in student modeling based on the use of Bayesian Networks and Computer Adaptive Tests is presented and a new integrated Bayesian student model is defined and then combined with an Adaptive Testing algorithm.

...read moreread less

Abstract: In this paper, we present a new approach to diagnosis in student modeling based on the use of Bayesian Networks and Computer Adaptive Tests. A new integrated Bayesian student model is defined and then combined with an Adaptive Testing algorithm. The structural model defined has the advantage that it measures students' abilities at different levels of granularity, allows substantial simplifications when specifying the parameters (conditional probabilities) needed to construct the Bayesian Network that describes the student model, and supports the Adaptive Diagnosis algorithm. The validity of the approach has been tested intensively by using simulated students. The results obtained show that the Bayesian student model has excellent performance in terms of accuracy, and that the introduction of adaptive question selection methods improves its behavior both in terms of accuracy and efficiency.

...read moreread less

Journal Article•DOI•

Marginal maximum a posteriori estimation using Markov chain Monte Carlo

[...]

Arnaud Doucet¹, Simon J. Godsill¹, Christian P. Robert²•Institutions (2)

University of Cambridge¹, INSEE²

01 Jan 2002-Statistics and Computing

TL;DR: In this paper, a simple and novel MCMC strategy, called State-Augmentation for Marginal Estimation (SAME), which leads to MMAP estimates for Bayesian models is presented.

...read moreread less

Abstract: Markov chain Monte Carlo (MCMC) methods, while facilitating the solution of many complex problems in Bayesian inference, are not currently well adapted to the problem of marginal maximum a posteriori (MMAP) estimation, especially when the number of parameters is large We present here a simple and novel MCMC strategy, called State-Augmentation for Marginal Estimation (SAME), which leads to MMAP estimates for Bayesian models We illustrate the simplicity and utility of the approach for missing data interpolation in autoregressive time series and blind deconvolution of impulsive processes

...read moreread less

Journal Article•DOI•

Double Markov random fields and Bayesian image segmentation

[...]

D.E. Melas, Simon P. Wilson¹•Institutions (1)

Trinity College, Dublin¹

01 Feb 2002-IEEE Transactions on Signal Processing

TL;DR: A class of such models (the double Markov random field) for images composed of several textures is described, which is considered to be the natural hierarchical model for such a task.

...read moreread less

Abstract: Markov random fields are used extensively in model-based approaches to image segmentation and, under the Bayesian paradigm, are implemented through Markov chain Monte Carlo (MCMC) methods. We describe a class of such models (the double Markov random field) for images composed of several textures, which we consider to be the natural hierarchical model for such a task. We show how several of the Bayesian approaches in the literature can be viewed as modifications of this model, made in order to make MCMC implementation possible. From a simulation study, conclusions are made concerning the performance of these modified models.

...read moreread less

Journal Article•DOI•

Bayesian Modelling of Outstanding Liabilities Incorporating Claim Count Uncertainty

[...]

Ioannis Ntzoufras¹, Petros Dellaportas²•Institutions (2)

University of the Aegean¹, Athens University of Economics and Business²

01 Jan 2002-The North American Actuarial Journal

TL;DR: Claim counts are used to add a further hierarchical stage in the model with log-normally distributed claim amounts and its corresponding state space version, resulting in new model formulations using Bayesian theory and Markov chain Monte Carlo methods.

...read moreread less

Abstract: This paper deals with the prediction of the amount of outstanding automobile claims that an insurance company will pay in the near future We consider various competing models using Bayesian theory and Markov chain Monte Carlo methods Claim counts are used to add a further hierarchical stage in the model with log-normally distributed claim amounts and its corresponding state space version This way, we incorporate information from both the outstanding claim amounts and counts data resulting in new model formulations Implementation details and illustrations with real insurance data are provided

...read moreread less

Journal Article•DOI•

Continuous-time hidden Markov models for network performance evaluation

[...]

Wei Wei¹, Bing Wang¹, Don Towsley¹•Institutions (1)

University of Massachusetts Amherst¹

01 Sep 2002-Performance Evaluation

TL;DR: An algorithm to infer the CT-HMM from a series of end-to-end delay and loss observations of probe packets is developed, which can be used to simulate network environments for network performance evaluation.

...read moreread less

Journal Article•DOI•

Fitting hidden Markov models to psychological data

[...]

Ingmar Visser¹, Maartje E. J. Raijmakers¹, Peter C. M. Molenaar¹•Institutions (1)

University of Amsterdam¹

01 Aug 2002-Scientific Programming

TL;DR: A novel application of hidden Markov models in implicit learning is presented and this method of analyzing implicit learning data provides a comprehensive approach for addressing important theoretical issues in the field.

...read moreread less

Abstract: Markov models have been used extensively in psychology of learning. Applications of hidden Markov models are rare however. This is partially due to the fact that comprehensive statistics for model selection and model assessment are lacking in the psychological literature. We present model selection and model assessment statistics that are particularly useful in applying hidden Markov models in psychology. These statistics are presented and evaluated by simulation studies for a toy example. We compare AIC, BIC and related criteria and introduce a prediction error measure for assessing goodness-of-fit. In a simulation study, two methods of fitting equality constraints are compared. In two illustrative examples with experimental data we apply selection criteria, fit models with constraints and assess goodness-of-fit. First, data from a concept identification task is analyzed. Hidden Markov models provide a flexible approach to analyzing such data when compared to other modeling methods. Second, a novel application of hidden Markov models in implicit learning is presented. Hidden Markov models are used in this context to quantify knowledge that subjects express in an implicit learning task. This method of analyzing implicit learning data provides a comprehensive approach for addressing important theoretical issues in the field.

...read moreread less

Book Chapter•DOI•

Bayesian Analysis of Linear Models

[...]

Daniel Sorensen, Daniel Gianola¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2002

Collapse