scispace - formally typeset
Search or ask a question

Showing papers by "Thomas G. Dietterich published in 2012"


Proceedings Article
03 Dec 2012
TL;DR: A probabilistic model, the Logistic Stick-Breaking Conditional Multinomial Model (LSB-CMM), is proposed, derived from the logistic stick-breaking process, to solve the superset label learning problem by maximizing the likelihood of the candidate label sets of training instances.
Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic Stick-Breaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSB-CMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions.

165 citations


Proceedings Article
14 Aug 2012
TL;DR: This paper considers active imitation learning with the goal of reducing this effort by querying the expert about the desired action at individual states, which are selected based on answers to past queries and the learner's interactions with an environment simulator.
Abstract: In standard passive imitation learning, the goal is to learn a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the expert about the desired action at individual states, which are selected based on answers to past queries and the learner's interactions with an environment simulator. We introduce a new approach based on reducing active imitation learning to i.i.d. active learning, which can leverage progress in the i.i.d. setting. Our first contribution, is to analyze reductions for both non-stationary and stationary policies, showing that the label complexity (number of queries) of active imitation learning can be substantially less than passive learning. Our second contribution, is to introduce a practical algorithm inspired by the reductions, which is shown to be highly effective in four test domains compared to a number of alternatives.

46 citations


Posted Content
TL;DR: In this paper, a Dynamic Bayesian Network (DBN) model is proposed for analyzing sensor observations and distinguishing sensor failures from valid data for the case of air temperature measured at 15 minute time resolution.
Abstract: Remote sensors are becoming the standard for observing and recording ecological data in the field. Such sensors can record data at fine temporal resolutions, and they can operate under extreme conditions prohibitive to human access. Unfortunately, sensor data streams exhibit many kinds of errors ranging from corrupt communications to partial or total sensor failures. This means that the raw data stream must be cleaned before it can be used by domain scientists. In our application environment|the H.J. Andrews Experimental Forest|this data cleaning is performed manually. This paper introduces a Dynamic Bayesian Network model for analyzing sensor observations and distinguishing sensor failures from valid data for the case of air temperature measured at 15 minute time resolution. The model combines an accurate distribution of long-term and short-term temperature variations with a single generalized fault model. Experiments with historical data show that the precision and recall of the method is comparable to that of the domain expert. The system is currently being deployed to perform real-time automated data cleaning.

30 citations


Posted Content
TL;DR: This paper shows how to interpret knowledge of qualitative influences, and in particular of monotonicities, as constraints on probability distributions, and to incorporate this knowledge into Bayesian network learning algorithms.
Abstract: When training data is sparse, more domain knowledge must be incorporated into the learning algorithm in order to reduce the effective size of the hypothesis space. This paper builds on previous work in which knowledge about qualitative monotonicities was formally represented and incorporated into learning algorithms (e.g., Clark & Matwin's work with the CN2 rule learning algorithm). We show how to interpret knowledge of qualitative influences, and in particular of monotonicities, as constraints on probability distributions, and to incorporate this knowledge into Bayesian network learning algorithms. We show that this yields improved accuracy, particularly with very small training sets (e.g. less than 10 examples).

15 citations


Journal ArticleDOI
TL;DR: The application of this novel learning and problem solving architecture to the domain of airspace management, where multiple requests for the use of airspaces need to be deconflicted, reconciled, and managed automatically is described.
Abstract: We present a novel ensemble architecture for learning problem-solving techniques from a very small number of expert solutions and demonstrate its effectiveness in a complex real-world domain. The key feature of our “Generalized Integrated Learning Architecture” (GILA) is a set of heterogeneous independent learning and reasoning (ILR) components, coordinated by a central meta-reasoning executive (MRE). The ILRs are weakly coupled in the sense that all coordination during learning and performance happens through the MRE. Each ILR learns independently from a small number of expert demonstrations of a complex task. During performance, each ILR proposes partial solutions to subproblems posed by the MRE, which are then selected from and pieced together by the MRE to produce a complete solution. The heterogeneity of the learner-reasoners allows both learning and problem solving to be more effective because their abilities and biases are complementary and synergistic. We describe the application of this novel learning and problem solving architecture to the domain of airspace management, where multiple requests for the use of airspaces need to be deconflicted, reconciled, and managed automatically. Formal evaluations show that our system performs as well as or better than humans after learning from the same training data. Furthermore, GILA outperforms any individual ILR run in isolation, thus demonstrating the power of the ensemble architecture for learning and problem solving.

15 citations


Proceedings ArticleDOI
04 Jun 2012
TL;DR: A novel approach to modeling the migration of birds is described and a major challenge for all of these methods is to scale up to large, spatially-distributed systems.
Abstract: To avoid ecological collapse, we must manage Earth's ecosystems sustainably. Viewed as a control problem, the two central challenges of ecosystem management are to acquire a model of the system that is sufficient to guide good decision making and then optimize the control policy against that model. This paper describes three efforts aimed at addressing the first of these challenges—machine learning methods for modeling ecosystems. The first effort focuses on automated quality control of environmental sensor data. Next, we consider the problem of learning species distribution models from citizen science observational data. Finally, we describe a novel approach to modeling the migration of birds. A major challenge for all of these methods is to scale up to large, spatially-distributed systems.

9 citations


Posted Content
TL;DR: In this article, a dynamic Bayes net model of strategies in the RTS game Starcraft is presented, which combines a generative model of how strategies relate to observable quantities with a principled framework for incorporating evidence gained via scouting.
Abstract: In typical real-time strategy (RTS) games, enemy units are visible only when they are within sight range of a friendly unit. Knowledge of an opponent's disposition is limited to what can be observed through scouting. Information is costly, since units dedicated to scouting are unavailable for other purposes, and the enemy will resist scouting attempts. It is important to infer as much as possible about the opponent's current and future strategy from the available observations. We present a dynamic Bayes net model of strategies in the RTS game Starcraft that combines a generative model of how strategies relate to observable quantities with a principled framework for incorporating evidence gained via scouting. We demonstrate the model's ability to infer unobserved aspects of the game from realistic observations.

8 citations


Proceedings Article
14 Aug 2012
TL;DR: This work presents a dynamic Bayes net model of strategies in the RTS game Starcraft that combines a generative model of how strategies relate to observable quantities with a principled framework for incorporating evidence gained via scouting and demonstrates the model's ability to infer unobserved aspects of the game from realistic observations.
Abstract: In typical real-time strategy (RTS) games, enemy units are visible only when they are within sight range of a friendly unit. Knowledge of an opponent's disposition is limited to what can be observed through scouting. Information is costly, since units dedicated to scouting are unavailable for other purposes, and the enemy will resist scouting attempts. It is important to infer as much as possible about the opponent's current and future strategy from the available observations. We present a dynamic Bayes net model of strategies in the RTS game Starcraft that combines a generative model of how strategies relate to observable quantities with a principled framework for incorporating evidence gained via scouting. We demonstrate the model's ability to infer unobserved aspects of the game from realistic observations.

6 citations


Posted Content
TL;DR: In this paper, a new approach based on reducing active imitation learning to i.i.d. active learning is proposed, which can leverage progress in the i.d setting.
Abstract: In standard passive imitation learning, the goal is to learn a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the expert about the desired action at individual states, which are selected based on answers to past queries and the learner's interactions with an environment simulator. We introduce a new approach based on reducing active imitation learning to i.i.d. active learning, which can leverage progress in the i.i.d. setting. Our first contribution, is to analyze reductions for both non-stationary and stationary policies, showing that the label complexity (number of queries) of active imitation learning can be substantially less than passive learning. Our second contribution, is to introduce a practical algorithm inspired by the reductions, which is shown to be highly effective in four test domains compared to a number of alternatives.

3 citations


01 Jan 2012
TL;DR: The paper defines a hierarchical Q learning algorithm, proves its convergence, and shows experimentally that it can learn much faster than ordinary “flat” Q learning.
Abstract: This paper presents a new approach to hierarchical reinforcement learning based on the MAXQ decomposition of the value function. The MAXQ decomposition has both a procedural semantics—as a subroutine hierarchy—and a declarative semantics—as a representation of the value function of a hierarchical policy. MAXQ unifies and extends previous work on hierarchical reinforcement learning by Singh, Kaelbling, and Dayan and Hinton. Conditions under which the MAXQ decomposition can represent the optimal value function are derived. The paper defines a hierarchical Q learning algorithm, proves its convergence, and shows experimentally that it can learn much faster than ordinary “flat” Q learning. Finally, the paper discusses some interesting issues that arise in hierarchical reinforcement learning including the hierarchical credit assignment problem and non-hierarchical execution of the MAXQ hierarchy.

2 citations