scispace - formally typeset
Search or ask a question

Showing papers presented at "Uncertainty in Artificial Intelligence in 1999"


Proceedings Article•
30 Jul 1999
TL;DR: This work proposes a widely applicable generalization of maximum likelihood model fitting by tempered EM, based on a mixture decomposition derived from a latent class model which results in a more principled approach which has a solid foundation in statistics.
Abstract: Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two-mode and co-occurrence data, which has applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed method is based on a mixture decomposition derived from a latent class model. This results in a more principled approach which has a solid foundation in statistics. In order to avoid overfitting, we propose a widely applicable generalization of maximum likelihood model fitting by tempered EM. Our approach yields substantial and consistent improvements over Latent Semantic Analysis in a number of experiments.

2,306 citations


Proceedings Article•
30 Jul 1999
TL;DR: This paper compares the marginals computed using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR, and finds that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals.
Abstract: Recently, researchers have demonstrated that "loopy belief propagation" -- the use of Pearl's polytree algorithm in a Bayesian network with loops -- can perform well in the context of error-correcting codes. The most dramatic instance of this is the near Shannon-limit performance of "Turbo Codes" -- codes whose decoding algorithm is equivalent to loopy belief propagation in a chain-structured Bayesian network. In this paper we ask: is there something special about the error-correcting code context, or does loopy propagation work as an approximate inference scheme in a more general setting? We compare the marginals computed using loopy propagation to the exact ones in four Bayesian network architectures, including two real-world networks: ALARM and QMR. We find that the loopy beliefs often converge and when they do, they give a good approximation to the correct marginals. However, on the QMR network, the loopy beliefs oscillated and had no obvious relationship to the correct posteriors. We present some initial investigations into the cause of these oscillations, and show that some simple methods of preventing them lead to the wrong results.

1,532 citations


Proceedings Article•
30 Jul 1999
TL;DR: An algorithm that achieves faster learning by restricting the search space, which restricts the parents of each variable to belong to a small subset of candidates and is evaluated both on synthetic and real-life data.
Abstract: Learning Bayesian networks is often cast as an optimization problem, where the computational task is to find a structure that maximizes a statistically motivated score. By and large, existing learning tools address this optimization problem using standard heuristic search techniques. Since the search space is extremely large, such search procedures can spend most of the time examining candidates that are extremely unreasonable. This problem becomes critical when we deal with data sets that are large either in the number of instances, or the number of attributes. In this paper. we introduce an algorithm that achieves faster learning by restricting the search space. This iterative algorithm restricts the parents of each variable to belong to a small subset of candidates. We then search for a network that satisfies these constraints. The learned network is then used for selecting better candidates for the next iteration. We evaluate this algorithm both on synthetic and real-life data. Our results show that it is significantly faster than alternative search procedures without loss of quality in the learned structures

637 citations


Proceedings Article•
Hagai Attias1•
30 Jul 1999
TL;DR: The Variational Bayes framework as discussed by the authors approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner without resorting to sampling methods.
Abstract: Current methods for learning graphical models with latent variables and a fixed structure estimate optimal values for the model parameters. Whereas this approach usually produces overfitting and suboptimal generalization performance, carrying out the Bayesian program of computing the full posterior distributions over the parameters remains a difficult problem. Moreover, learning the structure of models with latent variables, for which the Bayesian approach is crucial, is yet a harder problem. In this paper I present the Variational Bayes framework, which provides a solution to these problems. This approach approximates full posterior distributions over model parameters and structures, as well as latent variables, in an analytical manner without resorting to sampling methods. Unlike in the Laplace approximation, these posteriors are generally non-Gaussian and no Hessian needs to be computed. The resulting algorithm generalizes the standard Expectation Maximization algorithm, and its convergence is guaranteed. I demonstrate that this algorithm can be applied to a large class of models in several domains, including unsupervised clustering and blind source separation.

615 citations


Proceedings Article•
Eric Horvitz1, Andy Jacobs1, David O. Hovel1•
30 Jul 1999
TL;DR: In this paper, the authors introduce utility-directed procedures for mediating the flow of potentially distracting alerts and communications to computer users, and present models and inference procedures that balance the context-sensitive costs of deferring alerts with the cost of interruption.
Abstract: We introduce utility-directed procedures for mediating the flow of potentially distracting alerts and communications to computer users. We present models and inference procedures that balance the context-sensitive costs of deferring alerts with the cost of interruption. We describe the challenge of reasoning about such costs under uncertainty via an analysis of user activity and the content of notifications. After introducing principles of attention-sensitive alerting, we focus on the problem of guiding alerts about email messages. We dwell on the problem of inferring the expected criticality of email and discuss work on the PRIORITIES system, centering on prioritizing email by criticality and modulating the communication of notifications to users about the presence and nature of incoming email.

469 citations


Proceedings Article•
30 Jul 1999
TL;DR: In this paper, a value iteration algorithm for factored Markov decision processes (MDPs) with large state spaces has been proposed to allow dynamic programming to be applied without the need for complete state enumeration.
Abstract: Recently, structured methods for solving factored Markov decisions processes (MDPs) with large state spaces have been proposed recently to allow dynamic programming to be applied without the need for complete state enumeration. We propose and examine a new value iteration algorithm for MDPs that uses algebraic decision diagrams (ADDs) to represent value functions and policies, assuming an ADD input representation of the MDP. Dynamic programming is implemented via ADD manipulation. We demonstrate our method on a class of large MDPs (up to 63 million states) and show that significant gains can be had when compared to tree-structured representations (with up to a thirty-fold reduction in the number of nodes required to represent optimal value functions).

416 citations


Journal Article•DOI•
30 Jul 1999
TL;DR: In this article, the authors consider on-line density estimation with a parameterized density from the exponential family and prove bounds on the additional total loss of the online algorithm over the total loss in the off-line algorithm.
Abstract: We consider on-line density estimation with a parameterized density from the exponential family. The on-line algorithm receives one example at a time and maintains a parameter that is essentially an average of the past examples. After receiving an example the algorithm incurs a loss which is the negative log-likelihood of the example w.r.t. the past parameter of the algorithm. An off-line algorithm can choose the best parameter based on all the examples. We prove bounds on the additional total loss of the on-line algorithm over the total loss of the off-line algorithm. These relative loss bounds hold for an arbitrary sequence of examples. The goal is to design algorithms with the best possible relative loss bounds. We use a certain divergence to derive and analyze the algorithms. This divergence is a relative entropy between two exponential distributions.

344 citations


Proceedings Article•
30 Jul 1999
TL;DR: In this article, the authors empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers - Naive-Bayes, tree augmented Naive Bayes, BN augmented NaĂŻve Bayes and general BNs, where the latter two are learned using two variants of a conditional independence (CI) based BN-learning algorithm.
Abstract: In this paper, we empirically evaluate algorithms for learning four types of Bayesian network (BN) classifiers - Naive-Bayes, tree augmented Naive-Bayes, BN augmented Naive-Bayes and general BNs, where the latter two are learned using two variants of a conditional-independence (CI) based BN-learning algorithm. Experimental results show the obtained classifiers, learned using the CI based algorithms, are competitive with (or superior to) the best known classifiers, based on both Bayesian networks and other formalisms: and that the computational time for learning and using these classifiers is relatively small. Moreover, these results also suggest a way to learn yet more effective classifiers we demonstrate empirically that this new algorithm does work as expected. Collectively, these results argue that BN classifiers deserve more attention in machine learning and data mining communities.

343 citations


Proceedings Article•
30 Jul 1999
TL;DR: This work proposes a graphical representation of preferences that reflects conditional dependence and independence of preference statements under a ceteris paribus (ali else being equal) interpretation, and describes several search algorithms for dominance testing based on this representation.
Abstract: In many domains it is desirable to assess the preferences of users in a qualitative rather than quantitative way. Such representations of qualitative preference orderings form an important component of automated decision tools. We propose a graphical representation of preferences that reflects conditional dependence and independence of preference statements under a ceteris paribus (ali else being equal) interpretation. such a representation is often compact and arguably natural. We describe several search algorithms for dominance testing based on this representation; these algorithms are quite effective, especially in specific network topologies, such as chain- and treestructured networks, as well as polytrees.

315 citations


Proceedings Article•
30 Jul 1999
TL;DR: This paper proposes Efron's Bootstrap as a computationally efficient approach for answering confidence measures on features of Bayesian networks, and proposes to use these confidence measures to induce better structures from the data, and to detect the presence of latent variables.
Abstract: In recent years there has been significant progress in algorithms and methods for inducing Bayesian networks from data. However, in complex data analysis problems, we need to go beyond being satisfied with inducing networks with high scores. We need to provide confidence measures on features of these networks: Is the existence of an edge between two nodes warranted? Is the Markov blanket of a given node robust? Can we say something about the ordering of the variables? We should be able to address these questions, even when the amount of data is not enough to induce a high scoring network. In this paper we propose Efron's Bootstrap as a computationally efficient approach for answering these questions. In addition, we propose to use these confidence measures to induce better structures from the data, and to detect the presence of latent variables.

293 citations


Proceedings Article•
30 Jul 1999
TL;DR: In this article, the authors investigate ways to represent and reason about this uncertainty in algorithms where the system attempts to learn a model of its environment and explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these.
Abstract: Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information - the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways to represent and reason about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation

Proceedings Article•
30 Jul 1999
TL;DR: In this article, the authors extend the VAPS algorithm to the problem of learning general finite state automata, and show that stochastic gradient descent can converge to a locally optimal finitestate controller.
Abstract: Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPS), but some kind of memory is usually necessary for optimal control of a partially observable MDP Policies with finite memory can be represented as finite-state automata In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finitestate controller We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step

Proceedings Article•
30 Jul 1999
TL;DR: The learning method was applied to predict the causal structure and to estimate the causal parameters that exist among randomly selected pairs of nodes in ALARM that are not confounded.
Abstract: This paper describes a Bayesian method for combining an arbitrary mixture of observational and experimental data in order to learn causal Bayesian networks Observational data are passively observed Experimental data, such as that produced by randomized controlled trials, result from the exoerimenter manioulatine one or more variables (tipically randomiy) and observing the states of other variables The paper presents a Bayesian method for learning the causal structure and parameters of the underlying causal process that is generating the data, given that (1) the data contains a mixture of observational and experimental case records, and (2) the causal process is modeled as a causal Bayesian network This learning method was applied using as input various mixtures of experimental and observational data that were generated from the ALARM causal Bayesian network In these experiments, the absolute and relative quantities of experimental and observational data were varied systematically For each of these training datasets, the learning method was applied to predict the causal structure and to estimate the causal parameters that exist among randomly selected pairs of nodes in ALARM that are not confounded The paper reports how these structure predictions and parameter estimates compare with the true causal structures and parameters as given by the ALARM network

Proceedings Article•
30 Jul 1999
TL;DR: A new unified approach is presented that combines approximate inference and the clique tree algorithm, thereby circumventing the need to maintain an exact representation of the cliques potentials.
Abstract: The clique tree algorithm is the standard method for doing inference in Bayesian networks. It works by manipulating clique potentials - distributions over the variables in a clique. While this approach works well for many networks, it is limited by the need to maintain an exact representation of the clique potentials. This paper presents a new unified approach that combines approximate inference and the clique tree algorithm, thereby circumventing this limitation. Many known approximate inference algorithms can be viewed as instances of this approach. The algorithm essentially does clique tree propagation, using approximate inference to estimate the densities in each clique. In many settings, the computation of the approximate clique potential can be done easily using statistical importance sampling. Iterations are used to gradually improve the quality of the estimation.

Proceedings Article•
30 Jul 1999
TL;DR: In this paper, the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size, is studied and a branch-and-bound method for finding globally optimal deterministic policies is proposed.
Abstract: Solving partially observable Markov decision processes (POMDPS) is highly intractable in general, at least in part because the optimal policy may be infinitely large. In this paper, we explore the problem of finding the optimal policy from a restricted set of policies, represented as finite state automata of a given size. This problem is also intractable, but we show that the complexity can be greatly reduced when the POMDP andlor policy are further constrained. We demonstrate good empirical results with a branch-and-bound method for finding globally optimal deterministic policies, and a gradient-ascent method for finding locally optimal stochastic policies.

Proceedings Article•
30 Jul 1999
TL;DR: A new abductive, probabilistic theory of plan recognition that accounts for phenomena omitted from most previous plan recognition theories: notably the cumulative effect of a sequence of observations of partially-ordered, interleaved plans and the effect of context on plan adoption.
Abstract: We present a new abductive, probabilistic theory of plan recognition. This model differs from previous theories in being centered around a model of plan execution: most previous methods have been based on plans as formal objects or on rules describing the recognition process. We show that our new model accounts for phenomena omitted from most previous plan recognition theories: notably the cumulative effect of a sequence of observations of partially-ordered, interleaved plans and the effect of context on plan adoption. The model also supports inferences about the evolution of plan execution in situations where another agent intervenes in plan execution. This facility provides support for using plan recognition to build systems that will intelligently assist a user.

Proceedings Article•
30 Jul 1999
TL;DR: Details for the special cases of item response theory (IRT) and multivariate latent class modeling are given, with a numerical example of the latter.
Abstract: As observations and student models become complex, educational assessments that exploit advances in technology and cognitive psychology can outstrip familiar testing models and analytic methods. Within the Portal conceptual framework for assessment design, Bayesian inference networks (BINS) record beliefs about students' knowledge and skills, in light of what they say and do. Joining evidence model BIN fragments--which contain observable variables and pointers to student model variables--to the student model allows one to update belief about knowledge and skills as observations arrive. Markov Chain Monte Carlo (MCMC) techniques can estimate the required conditional probabilities from empirical data, supplemented by expert judgment or substantive theory. Details for the special cases of item response theory (IRT) and multivariate latent class modeling are given, with a numerical example of the latter.

Proceedings Article•
30 Jul 1999
TL;DR: In this article, the authors present examples where the use of belief functions provided sound and elegant solutions to real-life problems characterized by "missing" information, i.e., problems where classes are only partially known.
Abstract: We present examples where the use of belief functions provided sound and elegant solutions to real life problems. These are essentially characterized by 'missing' information. The examples deal with 1) discriminant analysis using a learning set where classes are only partially known; 2) an information retrieval systems handling inter-documents relationships; 3) the combination of data from sensors competent on partially overlapping frames; 4) the determination of the number of sources in a multi-sensor environment by studying the intersensors contradiction. The purpose of the paper is to report on such applications where the use of belief functions provides a convenient tool to handle 'messy' data problems

Proceedings Article•
30 Jul 1999
TL;DR: In this article, the authors consider the problem of learning the maximum likelihood polytree from data and show that the problem is NP-hard even to approximately solve within some constant factor.
Abstract: We consider the task of learning the maximum-likelihood polytree from data. Our first result is a performance guarantee establishing that the optimal branching (or Chow-Liu tree), which can be computed very easily, constitutes a good approximation to the best polytree. We then show that it is not possible to do very much better, since the learning problem is NP-hard even to approximately solve within some constant factor.

Proceedings Article•
30 Jul 1999
TL;DR: SPOOK implements a more expressive language that allows it to represent the battlespace domain naturally and compactly, and presents a new inference algorithm that utilizes the model structure in a fundamental way, and shows empirically that it achieves orders of magnitude speedup over existing approaches.
Abstract: In previous work, we pointed out the limitations of standard Bayesian networks as a modeling framework for large, complex domains. We proposed a new, richly structured modeling language, Object-oriented Bayesian Networks, that we argued would be able to deal with such domains. However, it turns out that OOBNs are not expressive enough to model many interesting aspects of complex domains: the existence of specific named objects, arbitrary relations between objects, and uncertainty over domain structure. These aspects are crucial in real-world domains such as battlefield awareness. In this paper, we present SPOOK, an implemented system that addresses these limitations. SPOOK implements a more expressive language that allows it to represent the battlespace domain naturally and compactly. We present a new inference algorithm that utilizes the model structure in a fundamental way, and show empirically that it achieves orders of magnitude speedup over existing approaches.

Proceedings Article•
30 Jul 1999
TL;DR: A hybrid constraint-based/ Bayesian algorithm for learning causal networks in the presence of sparse data is presented and found to consistently outperform two variations of greedy search with restarts.
Abstract: We present a hybrid constraint-based/ Bayesian algorithm for learning causal networks in the presence of sparse data. The algorithm searches the space of equivalence classes of models (essential graphs) using a heuristic based on conventional constraint-based techniques. Each essential graph is then converted into a directed acyclic graph and scored using a Bayesian scoring metric. Two variants of the algorithm are developed and tested using data from randomly generated networks of sizes from 15 to 45 nodes with data sizes ranging from 250 to 2000 records. Both variations are compared to, and found to consistently outperform two variations of greedy search with restarts.

Proceedings Article•
30 Jul 1999
TL;DR: In this article, a new method for probability elicitation from domain experts is proposed, which combines various ideas, among which are the ideas of transcribing probabilities and of using a scale with both numerical and verbal anchors for marking assessments.
Abstract: In building Bayesian belief networks, the elicitation of all probabilities required can be a major obstacle. We learned the extent of this often-cited observation in the construction of the probabilistic part of a complex influence diagram in the field of cancer treatment. Based upon our negative experiences with existing methods, we designed a new method for probability elicitation from domain experts. The method combines various ideas, among which are the ideas of transcribing probabilities and of using a scale with both numerical and verbal anchors for marking assessments. In the construction of the probabilistic part of our influence diagram, the method proved to allow for the elicitation of many probabilities in little time.

Proceedings Article•
Yishay Mansour1, Satinder Singh1•
30 Jul 1999
TL;DR: This paper proves the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy.
Abstract: Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.

Proceedings Article•
30 Jul 1999
TL;DR: In this paper, a variational approximation to the logistic function is proposed to perform approximate inference in Bayesian networks containing discrete nodes with continuous parents, which is much faster than sampling but comparable in accuracy.
Abstract: We show how to use a variational approximation to the logistic function to perform approximate inference in Bayesian networks containing discrete nodes with continuous parents. Essentially, we convert the logistic function to a Gaussian, which facilitates exact inference, and then iteratively adjust the variational parameters to improve the quality of the approximation. We demonstrate experimentally that this approximation is much faster than sampling, but comparable in accuracy. We also introduce a simple new technique for handling evidence, which allows us to handle arbitrary distributionson observed nodes, as well as achieving a significant speedup in networks with discrete variables of large cardinality.

Proceedings Article•
30 Jul 1999
TL;DR: A novel representation called Temporal Nodes Bayesian Network (TNBN) is proposed, in which each node represents an event or state change of a variable, and an arc corresponds to a causal-temporal relation.
Abstract: Diagnosis and prediction in some domains, like medical and industrial diagnosis, require a representation that combines uncertainty management and temporal reasoning. Based on the fact that in many cases there are few state changes in the temporal range of interest, we propose a novel representation called Temporal Nodes Bayesian Network (TNBN). In a TNBN each node represents an event or state change of a variable, and an arc corresponds to a causal-temporal relation. The temporal intervals can differ in number and size for each temporal node, so this allows multiple granularity. Our approach is contrasted with a dynamic Bayesian network for a simple medical example. An empirical evaluation is presented for a more complex problem, a subsystem of a fossil power plant, in which this approach is used for fault diagnosis and event prediction with good results.

Proceedings Article•
David McAllester1, Satinder Singh1•
30 Jul 1999
TL;DR: In this article, a planning algorithm for factored POMDPs is proposed that exploits the accuracy efficiency tradeoff in the belief state simplification introduced by Boyen and Koller.
Abstract: We are interested in the problem of planning for factored POMDPs. Building on the recent results of Kearns, Mansour and Ng, we provide a planning algorithm for factored POMDPs that exploits the accuracy efficiency tradeoff in the belief state simplification introduced by Boyen and Koller.

Proceedings Article•
30 Jul 1999
TL;DR: This paper presents a set of conditions which are necessary and sufficient to ensure that a partial influence diagram is welldefined, and uses these conditions as a basis for the construction of an algorithm for determining whether or not a partial Influence diagrams is well defined.
Abstract: Influence diagrams serve as a powerful tool for modelling symmetric decision problems. When solving an influence diagram we determine a set of strategies for the decisions involved. A strategy for a decision variable is in principle a function over its past. However, some of the past may be irrelevant for the decision, and for computational reasons it is important not to deal with redundant variables in the strategies. We show that current methods (e.g. the Decision Bayes-ball algorithm [Shachter, 1998]) do not determine the relevant past, and we present a complete algorithm. Actually, this paper takes a more general outset: When formulating a decision scenario as an influence diagram, a linear temporal ordering of the decisions variables is required. This constraint ensures that the decision scenario is welldefined. However, the structure of a decision scenario often yields certain decisions conditionally independent, and it is therefore unnecessary to impose a linear temporal ordering on the decisions. In this paper we deal with partial influence diagrams i.e. influence diagrams with only a partial temporal ordering specified. We present a set of conditions which are necessary and sufficient to ensure that a partial influence diagram is welldefined. These conditions are used as a basis for the construction of an algorithm for determining whether or not a partial influence diagram is welldefined.

Proceedings Article•
30 Jul 1999
TL;DR: Stochastic search approaches for learning Bayesian networks from incomplete data are described, including a new stochastic algorithm and an adaptive mutation operator and it is shown they all produce accurate results.
Abstract: This paper describes stochastic search approaches, including a new stochastic algorithm and an adaptive mutation operator, for learning Bayesian networks from incomplete data. This problem is characterized by a huge solution space with a highly multimodal landscape. State-of-the-art approaches all involve using deterministic approaches such as the elrpectation-maximization algorithm. These approaches are guaranteed to find local maxima, but do not explore the landscape for other modes. Our approach evolves structure and the missing data. We compare our stochastic algorithms and show they all produce accurate results.

Proceedings Article•
30 Jul 1999
TL;DR: In this article, the structural expectation maximization (SEM) algorithm is used to learn the structure of dynamic Bayesian networks, particularly those where some relevant variables are partially observed or even entirely unknown.
Abstract: Dynamic Bayesian networks provide a compact and natural representation for complex dynamic systems. However, in many cases, there is no expert available from whom a model can be elicited. Learning provides an alternative approach for constructing models of dynamic systems. In this paper, we address some of the crucial computational aspects of learning the structure of dynamic systems, particularly those where some relevant variables are partially observed or even entirely unknown. Our approach is based on the Structural Expectation Maximization (SEM) algorithm. The main computational cost of the SEM algorithm is the gathering of expected sufficient statistics. We propose a novel approximation scheme that allows these sufficient statistics to be computed efficiently. We also investigate the fundamental problem of discovering the existence of hidden variables without exhaustive and expensive search. Our approach is based on the observation that, in dynamic systems, ignoring a hidden variable typically results in a violation of the Markov property. Thus, our algorithm searches for such violations in the data, and introduces hidden variables to explain them. We provide empirical results showing that the algorithm is able to learn the dynamics of complex systems in a computationally tractable way.

Proceedings Article•
James Cussens1•
30 Jul 1999
TL;DR: This work shows how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data and compares the presented framework with other approaches to first-order probabilistic reasoning.
Abstract: Recent work on loglinear models in probabilistic constraint logic programming is applied to first-order probabilistic reasoning. Probabilities are defined directly on the proofs of atomic formulae, and by marginalisation on the atomic formulae themselves. We use Stochastic Logic Programs (SLPs) composed of labelled and unlabelled definite clauses to define the proof probabilities. We have a conservative extension of first-order reasoning, so that, for example, there is a one-one mapping between logical and random variables. We show how, in this framework, Inductive Logic Programming (ILP) can be used to induce the features of a loglinear model from data. We also compare the presented framework with other approaches to first-order probabilistic reasoning.