scispace - formally typeset
Search or ask a question

Showing papers on "Graphical model published in 1998"


Journal ArticleDOI
TL;DR: A simple changepoint model is used to illustrate how to tackle a typical Bayesian modelling problem via the MCMC method, before using mixture model problems to provide illustrations of good sampler output and of the implementation of a reversible jump MCMC algorithm.
Abstract: The Markov chain Monte Carlo (MCMC) method, as a computer-intensive statistical tool, has enjoyed an enormous upsurge in interest over the last few years. This paper provides a simple, comprehensive and tutorial review of some of the most common areas of research in this field. We begin by discussing how MCMC algorithms can be constructed from standard building-blocks to produce Markov chains with the desired stationary distribution. We also motivate and discuss more complex ideas that have been proposed in the literature, such as continuous time and dimension jumping methods. We discuss some implementational issues associated with MCMC methods. We take a look at the arguments for and against multiple replications, consider how long chains should be run for and how to determine suitable starting points. We also take a look at graphical models and how graphical approaches can be used to simplify MCMC implementation. Finally, we present a couple of examples, which we use as case-studies to highlight some of the points made earlier in the text. In particular, we use a simple changepoint model to illustrate how to tackle a typical Bayesian modelling problem via the MCMC method, before using mixture model problems to provide illustrations of good sampler output and of the implementation of a reversible jump MCMC algorithm.

737 citations


Book
26 Jun 1998
TL;DR: Probabilistic inference in graphical models pattern classification unsupervised learning data compression channel coding future research directions and how this affects research directions is investigated.
Abstract: Probabilistic inference in graphical models pattern classification unsupervised learning data compression channel coding future research directions.

597 citations


Journal ArticleDOI
TL;DR: The framework described in this paper is an attempt to unify adaptive models like artificial neural nets and belief nets for the problem of processing structured information, where relations between data variables are expressed by directed acyclic graphs, where both numerical and categorical values coexist.
Abstract: A structured organization of information is typically required by symbolic processing. On the other hand, most connectionist models assume that data are organized according to relatively poor structures, like arrays or sequences. The framework described in this paper is an attempt to unify adaptive models like artificial neural nets and belief nets for the problem of processing structured information. In particular, relations between data variables are expressed by directed acyclic graphs, where both numerical and categorical values coexist. The general framework proposed in this paper can be regarded as an extension of both recurrent neural networks and hidden Markov models to the case of acyclic graphs. In particular we study the supervised learning problem as the problem of learning transductions from an input structured space to an output structured space, where transductions are assumed to admit a recursive hidden state-space representation. We introduce a graphical formalism for representing this class of adaptive transductions by means of recursive networks, i.e., cyclic graphs where nodes are labeled by variables and edges are labeled by generalized delay elements. This representation makes it possible to incorporate the symbolic and subsymbolic nature of data. Structures are processed by unfolding the recursive network into an acyclic graph called encoding network. In so doing, inference and learning algorithms can be easily inherited from the corresponding algorithms for artificial neural networks or probabilistic graphical model.

508 citations


Journal Article
TL;DR: In this paper, a probabilistic framework for learning models of temporal data is presented, which uses the Bayesian network formalism, a marriage of probability theory and graph theory in which dependencies between variables are expressed graphically.
Abstract: This paper presents a probabilistic framework for learning models of temporal data. We express these models using the Bayesian network formalism, a marriage of probability theory and graph theory in which dependencies between variables are expressed graphically. The graph not only allows the user to understand which variables affect which other ones, but also serves as the backbone for efficiently computing marginal and conditional probabilities that may be required for inference and learning.

251 citations


Book ChapterDOI
26 Mar 1998
TL;DR: In this article, the authors provide an overview of latent variable models for representing continuous variables and show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known technique of principal components analysis (PCA).
Abstract: A powerful approach to probabilistic modelling involves supplementing a set of observed variables with additional latent, or hidden, variables. By defining a joint distribution over visible and latent variables, the corresponding distribution of the observed variables is then obtained by marginalization. This allows relatively complex distributions to be expressed in terms of more tractable joint distributions over the expanded variable space. One well-known example of a hidden variable model is the mixture distribution in which the hidden variable is the discrete component label. In the case of continuous latent variables we obtain models such as factor analysis. The structure of such probabilistic models can be made particularly transparent by giving them a graphical representation, usually in terms of a directed acyclic graph, or Bayesian network. In this chapter we provide an overview of latent variable models for representing continuous variables. We show how a particular form of linear latent variable model can be used to provide a probabilistic formulation of the well-known technique of principal components analysis (PCA). By extending this technique to mixtures, and hierarchical mixtures, of probabilistic PCA models we are led to a powerful interactive algorithm for data visualization. We also show how the probabilistic PCA approach can be generalized to non-linear latent variable models leading to the Generative Topographic Mapping algorithm (GTM). Finally, we show how GTM can itself be extended to model temporal data.

132 citations


Journal ArticleDOI
TL;DR: In this article, a graphical modeling approach is presented for automated adaptive testing (CAT) based on item response theory (IRT) is viewed from the perspective of graphical modeling (GM) for making inferences about multifaceted skills and knowledge, and for extracting data from complex performances.
Abstract: Computerized adaptive testing (CAT) based on item response theory (IRT) is viewed from the perspective of graphical modeling (GM). GM provides methods for making inferences about multifaceted skills and knowledge, and for extracting data from complex performances. However, simply incorporating variables for all sources of variation is rarely successful. Thus, researchers must closely analyze the substance and structure of the problem to create more effective models. Researchers regularly employ sophisticated strategies to handle many sources of variability outside the IRT model. Relevant variables can play many roles without appearing in the operational IRT model per se, e.g., in validity studies, assembling tests, and constructing and modeling tasks. Some of these techniques are described from a GM perspective, as well as how to extend them to more complex assessment situations. Issues are illustrated in the context of language testing.

130 citations


Book ChapterDOI
01 Jan 1998
TL;DR: This chapter surveys the development of graphical models known as Bayesian networks, summarizes their semantical basis and assesses their properties and applications to reasoning and planning.
Abstract: This chapter surveys the development of graphical models known as Bayesian networks, summarizes their semantical basis and assesses their properties and applications to reasoning and planning.

102 citations


Book ChapterDOI
26 Mar 1998
TL;DR: This tutorial aims to give an overview of some of the basic axioms of probability theory, which hopefully will provide newcomers to the field a conceptual framework for following the more detailed and advanced work.
Abstract: The field of Bayesian networks, and graphical models in general, has grown enormously over the last few years, with theoretical and computational developments in many areas. As a consequence there is now a fairly large set of theoretical concepts and results for newcomers to the field to learn. This tutorial aims to give an overview of some of these topics, which hopefully will provide such newcomers a conceptual framework for following the more detailed and advanced work. It begins with revision of some of the basic axioms of probability theory.

96 citations


Journal ArticleDOI
TL;DR: Partially ordered Markov models (POMMs) as mentioned in this paper are a subset of the class of MRFs but have probability distributions that can be written in closed form and are used for texture synthesis and the inverse problem of parameter estimation.

80 citations


Proceedings Article
24 Jul 1998
TL;DR: This work proposes a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data, and can be viewed as a combination of the Cheeseman-Stutz asymptotic approximation for model posterior probability and the Expectation-Maximization algorithm.
Abstract: We describe computationally efficient methods for learning mixtures in which each component is a directed acyclic graphical model (mixtures of DAGs or MDAGs). We argue that simple search-and-score algorithms are infeasible for a variety of problems, and introduce a feasible approach in which parameter and structure search is interleaved and expected data is treated as real data. Our approach can be viewed as a combination of (1) the Cheeseman-Stutz asymptotic approximation for model posterior probability and (2) the Expectation-Maximization algorithm. We evaluate our procedure for selecting among MDAGs on synthetic and real examples.

73 citations


Journal ArticleDOI
TL;DR: The aim of this paper is to investigate the kind of properties that a dependency model must verify in order to be equivalent to a singly connected graph structure, as a way of driving automated discovery and construction of singlyconnected networks in data.
Abstract: Graphical structures such as Bayesian networks or Markov networks are very useful tools for representing irrelevance or independency relationships, and they may be used to ec ciently perform reasoning tasks. Singly connected networks are important speci® c cases where there is no more than one undirected path connecting each pair of variables. The aim of this paper is to investigate the kind of properties that a dependency model must verify in order to be equivalent to a singly connected graph structure, as a way of driving automated discovery and construction of singly connected networks in data. The main results are the characterizations of those dependency models which are isomorphic to singly connected graphs (either via the d-separation criterion for directed acyclic graphs or via the separation criterion for undirected graphs), as well as the development of ec cient algorithms for learning singly connected graph representations of dependency models.

Proceedings Article
01 Dec 1998
TL;DR: A call-based detection system based on a hierarchical regime-switching model that is formulated as an inference problem on the regime probabilities and implemented by applying the junction tree algorithm to the underlying graphical model.
Abstract: Fraud causes substantial losses to telecommunication carriers. Detection systems which automatically detect illegal use of the network can be used to alleviate the problem. Previous approaches worked on features derived from the call patterns of individual users. In this paper we present a call-based detection system based on a hierarchical regime-switching model. The detection problem is formulated as an inference problem on the regime probabilities. Inference is implemented by applying the junction tree algorithm to the underlying graphical model. The dynamics are learned from data using the EM algorithm and subsequent discriminative training. The methods are assessed using fraud data from a real mobile communication network.

Proceedings Article
24 Jul 1998
TL;DR: In this paper, a classification of graphical models according to their representation as subfamilies of exponential families is provided, including linear exponential families, directed acyclic graphical models and chain graphs with no hidden variables.
Abstract: We provide a classification of graphical models according to their representation as subfamilies of exponential families. Undirected graphical models with no hidden variables are linear exponential families (LEFs), directed acyclic graphical models and chain graphs with no hidden variables, including Bayesian networks with several families of local distributions, are curved exponential families (CEFs) and graphical models with hidden variables are stratified exponential families (SEFs). An SEF is a finite union of CEFs satisfying a frontier condition. In addition, we illustrate how one can automatically generate independence and non-independence constraints on the distributions over the observable variables implied by a Bayesian network with hidden variables. The relevance of these results for model selection is examined.

Proceedings Article
24 Jul 1998
TL;DR: The geometry of the likelihood of the unknown parameters in a simple class of Bayesian directed graphs with hidden variables is investigated to obtain certain insights in the nature of the unidentifiability inherent in such models.
Abstract: In this paper we investigate the geometry of the likelihood of the unknown parameters in a simple class of Bayesian directed graphs with hidden variables. This enables us, before any numerical algorithms are employed, to obtain certain insights in the nature of the unidentifiability inherent in such models, the way posterior densities will be sensitive to prior densities and the typical geometrical form these posterior densities might take. Many of these insights carry over into more complicated Bayesian networks with systematic missing data.

Proceedings Article
01 Dec 1998
TL;DR: This work presents two classes of distributions, decimatable Boltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach, and gives generalised mean-field equations for both these directed and undirected approximations.
Abstract: Graphical models provide a broad probabilistic framework with applications in speech recognition (Hidden Markov Models), medical diagnosis (Belief networks) and artificial intelligence (Boltzmann Machines). However, the computing time is typically exponential in the number of nodes in the graph. Within the variational framework for approximating these models, we present two classes of distributions, decimatable Boltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach. We give generalised mean-field equations for both these directed and undirected approximations. Simulation results on a small benchmark problem suggest using these richer approximations compares favorably against others previously reported in the literature.

Proceedings Article
24 Jul 1998
TL;DR: The paper gives a few arguments in favour of use of chain graphs for description of probabilistic conditional independence structures and a separation criterion for reading independences from a chain graph is formulated.
Abstract: The paper gives a few arguments in favour of use of chain graphs for description of probabilistic conditional independence structures. Every Bayesian network model can be equivalently introduced by means of a factorization formula with respect to chain graph which is Markov equivalent to the Bayesian network. A graphical characterization of such graphs is given. The class of equivalent graphs can be represented by a distinguished graph which is called the largest chain graph. The factorization formula with respect to the largest chain graph is a basis of a proposal how to represent the corresponding (discrete) probability distribution in a computer (i.e. 'parametrize' it). This way does not depend on the choice of a particular Bayesian network from the class of equivalent networks and seems to be the most efficient way from the point of view of memory demands. A separation criterion for reading independences from a chain graph is formulated in a simpler way. It resembles the well-known d-separation criterion for Bayesian networks and can be implemented 'locally'.

Proceedings Article
01 Dec 1998
TL;DR: A real-time computer vision and machine learning system for modeling and recognizing human actions and interactions and it is demonstrated that 'synthetic agents' (Alife-style agents) can be used to develop flexible prior models of the person-to-person interactions.
Abstract: We describe a real-time computer vision and machine learning system for modeling and recognizing human actions and interactions. Two different domains are explored: recognition of two-handed motions in the martial art 'Tai Chi', and multiple-person interactions in a visual surveillance task. Our system combines top-down with bottom-up information using a feedback loop, and is formulated with a Bayesian framework. Two different graphical models (HMMs and Coupled HMMs) are used for modeling both individual actions and multiple-agent interactions, and CHMMs are shown to work more efficiently and accurately for a given amount of training. Finally, to overcome the limited amounts of training data, we demonstrate that 'synthetic agents' (Alife-style agents) can be used to develop flexible prior models of the person-to-person interactions.

Journal ArticleDOI
TL;DR: While the Bayesian methodology can be applied to any type of model, as an example it is outlined its use for an important, and increasingly standard, class of models in computational neuroscience—compartmental models of single neurons.
Abstract: Computational modeling is being used increasingly in neuroscience. In deriving such models, inference issues such as model selection, model complexity, and model comparison must be addressed constantly. In this article we present briefly the Bayesian approach to inference. Under a simple set of commonsense axioms, there exists essentially a unique way of reasoning under uncertainty by assigning a degree of confidence to any hypothesis or model, given the available data and prior information. Such degrees of confidence must obey all the rules governing probabilities and can be updated accordingly as more data becomes available. While the Bayesian methodology can be applied to any type of model, as an example we outline its use for an important, and increasingly standard, class of models in computational neuroscience--compartmental models of single neurons. Inference issues are particularly relevant for these models: their parameter spaces are typically very large, neurophysiological and neuroanatomical data are still sparse, and probabilistic aspects are often ignored. As a tutorial, we demonstrate the Bayesian approach on a class of one-compartment models with varying numbers of conductances. We then apply Bayesian methods on a compartmental model of a real neuron to determine the optimal amount of noise to add to the model to give it a level of spike time variability comparable to that found in the real cell.

01 Jan 1998
TL;DR: A variational inference algorithm for efficient probabilistic inference in dense graphical models for the QMR-DT database is described, the accuracy of the algorithm is evaluated on a set of standard diagnostic cases and it is compared to stochastic sampling methods.
Abstract: We describe a variational approximation method for efficient probabilistic inference in dense graphical models. Variational methods are deterministic approximation procedures that provide bounds on probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We exemplify variational methods by an application to the problem of diagnostic inference in the QMR-DT database. The QMR-DT database is a large-scale graphical model based on statistical and expert knowledge in internal medicine. The size and complexity of the database render exact probabilistic diagnosis infeasible for all but a small set of cases. We describe a variational inference algorithm for the QMR-DT database, evaluate the accuracy of our algorithm on a set of standard diagnostic cases and compare to stochastic sampling methods.

Book ChapterDOI
26 Mar 1998
TL;DR: Graphical models based on chain graphs, which admit both directed and undirected edges, were introduced by by Lauritzen, Wermuth and Frydenberg as a generalization of graphical modelsBased on Undirected graphs, and acyclic directed graphs.
Abstract: Graphical models based on chain graphs, which admit both directed and undirected edges, were introduced by by Lauritzen, Wermuth and Frydenberg as a generalization of graphical models based on undirected graphs, and acyclic directed graphs. More recently Andersson, Madigan and Perlman have given an alternative Markov property for chain graphs. This raises two questions: How are the two types of chain graphs to be mterpreted? In which situations should chain graph models be used and with which Markov property?

Proceedings Article
01 Jan 1998
TL;DR: A procedure for contextual interpretation of spoken sentences within dialogs, enabling the interpreter algorithm to be efficient and task-independent and all inferences are probability-weighted.
Abstract: We describe a procedure for contextual interpretation of spoken sentences within dialogs. Task structure is represented in a graphical form, enabling the interpreter algorithm to be efficient and task-independent. Recognized spoken input may consist either of a single sentence with utterance-verification scores, or of a word lattice with arc weights. A confidence model is used throughout and all inferences are probability-weighted. The interpretation consists of a probability for each class and for each auxiliary information label needed for task completion. Anaphoric references are permitted.

Journal ArticleDOI
TL;DR: A method for model identification of biological systems described by stochastic linear differential equations using a new computational technique for statistical Bayesian inference, namely mixed graphical models in the sense of Lauritzen and Wermuth, is presented.

Journal ArticleDOI
TL;DR: In this paper Bayesian networks modelling is applied to a multidimensional model of depression and the characterization of the probabilistic model exploits expert knowledge to associate latent concentrations of neurotransmitters and symptoms.

Journal ArticleDOI
TL;DR: In this article, the authors discuss maximum likelihood estimation when some observations are missing in mixed graphical interaction models assuming a conditional Gaussian distribution as introduced by Lauritzen&Wermuth (1989).
Abstract: In this paper we discuss maximum likelihood estimation when some observations are missing in mixed graphical interaction models assuming a conditional Gaussian distribution as introduced by Lauritzen&Wermuth (1989). For the saturated case ML estimation with missing values via the EM algorithm has been proposed by Little&Schluchter (1985). We expand their results to the special restrictions in graphical models and indicate a more efficient way to compute the E--step. The main purpose of the paper is to show that for certain missing patterns the computational effort can considerably be reduced.

Proceedings Article
24 Jul 1998
TL;DR: In this paper, the authors use variational methods to approximate the stochastic distribution using multi-modal mixtures of factorized distributions, and present results for both inference and learning.
Abstract: Boltzmann machines are undirected graphical models with two-state stochastic variables, in which the logarithms of the clique potentials are quadratic functions of the node states. They have been widely studied in the neural computing literature, although their practical applicability has been limited by the difficulty of finding an effective learning algorithm. One well-established approach, known as mean field theory, represents the stochastic distribution using a factorized approximation. However, the corresponding learning algorithm often fails to find a good solution. We conjecture that this is due to the implicit uni-modality of the mean field approximation which is therefore unable to capture multi-modality in the true distribution. In this paper we use variational methods to approximate the stochastic distribution using multi-modal mixtures of factorized distributions. We present results for both inference and learning to demonstrate the effectiveness of this approach.

Proceedings Article
24 Jul 1998
TL;DR: The authors analyzes irrelevance and independence relations in graphical models associated with convex sets of probability distributions (called Quasi-Bayesian networks) and presents algorithms and results for inferences with the so-called natural extensions using fractional linear programming, and the properties of the type-1 extensions through a new generalization of d-separation.
Abstract: This paper analyzes irrelevance and independence relations in graphical models associated with convex sets of probability distributions (called Quasi-Bayesian networks). The basic question in Quasi-Bayesian networks is, How can irrelevance/independence relations in Quasi-Bayesian networks be detected, enforced and exploited? This paper addresses these questions through Walley's definitions of irrelevance and independence. Novel algorithms and results are presented for inferences with the so-called natural extensions using fractional linear programming, and the properties of the so-called type-1 extensions are clarified through a new generalization of d-separation.

Book
01 Jan 1998
TL;DR: In this paper, a branch-and-bound search scheme is devised to expedite the search for the most likely damage event without exhaustively examining all possible damage cases, and load-dependent Ritz vectors are incorporated into the Bayesian framework.
Abstract: There have been increased economic and societal demands to periodically monitor the safety of structures against long-term deterioration, and to ensure their safety and adequate performance during the life span of the structures. In this work, a Bayesian probabilistic framework for damage detection is proposed for the continuous monitoring of structures. The idea is to search for the most probable damage event by comparing the relative probabilities for di erent damage scenarios. The formulation of the relative posterior probability is based on an output error, which is de ned as the di erence between the estimated vibration parameters and the theoretical ones from the analytical model. The Bayesian approach is shown (1) to take into account the uncertainties in the measurement and the analytical modeling, (2) to perform damage diagnosis with a relatively small number of measurement points and a few modes, and (3) to systematically extract information from continuously obtained test data. A branch-and-bound search scheme is devised to expedite the search for the most likely damage event without exhaustively examining all possible damage cases. As an alternative to modal vectors, load-dependent Ritz vectors are incorporated into the Bayesian framework. The following advantages of Ritz vectors over modal vectors are shown: (1) in general, load-dependent Ritz vectors are more sensitive to damage than the corresponding modal vectors, and (2) by a careful selection of load patterns, substructures of interest can be made more observable. Furthermore, a procedure to extract Ritz vectors from vibration test is proposed, and the procedure is successfully demonstrated using experimental test data.

Journal ArticleDOI
TL;DR: This paper implements the search procedure as a genetic algorithm and proposes a crossover operator which operates on subgraphs and shows it to perform better than an automatic backward elimination procedure at the cost of a small increase of computational time.
Abstract: Graphical log-linear model search is usually performed by using stepwise procedures in which edges are sequentially added or eliminated from the independence graph. In this paper we implement the search procedure as a genetic algorithm and propose a crossover operator which operates on subgraphs. In a simulation study the proposed procedure is shown to perform better than an automatic backward elimination procedure at the cost of a small increase of computational time.

Proceedings Article
01 Jul 1998
TL;DR: It is argued that these representations can form the basis for agents that reason and act in complex uncertain environments, and extend Bayesian networks to deal with these new challenges.
Abstract: For many years, probabilistic models were largely neglected within the AI community. Now they play a fundamental role in many areas in AI, including diagnosis, planning, and learning. One of the crucial reasons for this transition is the use of structured model-based representations such as Bayesian networks. Building on this idea, we can extend the success of probabilistic modeling to much more complex domains, ones involving many components that interact and evolve ove time. These domains are significantly beyond the scope of traditional Bayesian networks. I describe a broad class of structured probabilistic representations that extend Bayesian networks to deal with these new challenges. I argue that these representations can form the basis for agents that reason and act in complex uncertain environments.

Proceedings ArticleDOI
21 Jun 1998
TL;DR: In this article, a subclass of relational Bayesian networks is identified that define distributions with convergence laws for first-order properties, and the convergence properties of the distributions defined in this manner are investigated from a finite model theory perspective.
Abstract: Relational Bayesian networks are an extension of the method of probabilistic model construction by Bayesian networks. They define probability distributions on finite relational structures by conditioning the probability of a ground atom r(a/sub 1/, ..., a/sub n/) on first-order properties of a/sub 1/, ..., a/sub n/ that have been established by previous random decisions. In this paper we investigate from a finite model theory perspective the convergence properties of the distributions defined in this manner. A subclass of relational Bayesian networks is identified that define distributions with convergence laws for first-order properties.