scispace - formally typeset
Search or ask a question

Showing papers by "Thomas G. Dietterich published in 2014"



Proceedings Article
21 Jun 2014
TL;DR: Empirical Risk Minimizing learners that use the superset error as the empirical risk measure are analyzed and the conditions for ERM learnability and sample complexity for the realizable case are given.
Abstract: In the Superset Label Learning (SLL) problem, weak supervision is provided in the form of a superset of labels that contains the true label. If the classifier predicts a label outside of the superset, it commits a superset error. Most existing SLL algorithms learn a multiclass classifier by minimizing the superset error. However, only limited theoretical analysis has been dedicated to this approach. In this paper, we analyze Empirical Risk Minimizing learners that use the superset error as the empirical risk measure. SLL data can arise either in the form of independent instances or as multiple-instance bags. For both scenarios, we give the conditions for ERM learnability and sample complexity for the realizable case.

85 citations


Proceedings ArticleDOI
01 Oct 2014
TL;DR: This work proposes a novel search-based approach for greedy coreference resolution, where the mentions are processed in order and added to previous coreference clusters, and shows that the Prune-and-Score approach is superior to using a single scoring function to make both decisions and outperforms several state-of-the-art approaches on multiple benchmark corpora including OntoNotes.
Abstract: We propose a novel search-based approach for greedy coreference resolution, where the mentions are processed in order and added to previous coreference clusters. Our method is distinguished by the use of two functions to make each coreference decision: a pruning function that prunes bad coreference decisions from further consideration, and a scoring function that then selects the best among the remaining decisions. Our framework reduces learning of these functions to rank learning, which helps leverage powerful off-the-shelf rank-learners. We show that our Prune-and-Score approach is superior to using a single scoring function to make both decisions and outperforms several state-of-the-art approaches on multiple benchmark corpora including OntoNotes.

35 citations


Proceedings Article
27 Jul 2014
TL;DR: This paper studies state aggregation as a way of reducing stochastic branching in tree search, and finds that trajectory sampling algorithms like UCT can be adapted easily, but that sparse sampling algorithms present difficulties.
Abstract: Monte Carlo tree search (MCTS) algorithms are a popular approach to online decision-making in Markov decision processes (MDPs). These algorithms can, however, perform poorly in MDPs with high stochastic branching factors. In this paper, we study state aggregation as a way of reducing stochastic branching in tree search. Prior work has studied formal properties of MDP state aggregation in the context of dynamic programming and reinforcement learning, but little attention has been paid to state aggregation in MCTS. Our main result is a performance loss bound for a class of value function-based state aggregation criteria in expectimax search trees. We also consider how to construct MCTS algorithms that operate in the abstract state space but require a simulator of the ground dynamics only. We find that trajectory sampling algorithms like UCT can be adapted easily, but that sparse sampling algorithms present difficulties. As a proof of concept, we experimentally confirm that state aggregation can improve the finite-sample performance of UCT.

34 citations


Proceedings Article
27 Jul 2014
TL;DR: This paper proposes the first formal framework for scripts based on Hidden Markov Models (HMMs) and develops an algorithm for structure and parameter learning based on Expectation Maximization, which is superior to several informed baselines for predicting missing events in partial observation sequences.
Abstract: Scripts have been proposed to model the stereotypical event sequences found in narratives. They can be applied to make a variety of inferences including filling gaps in the narratives and resolving ambiguous references. This paper proposes the first formal framework for scripts based on Hidden Markov Models (HMMs). Our framework supports robust inference and learning algorithms, which are lacking in previous clustering models. We develop an algorithm for structure and parameter learning based on Expectation Maximization and evaluate it on a number of natural datasets. The results show that our algorithm is superior to several informed baselines for predicting missing events in partial observation sequences.

29 citations


Journal Article
TL;DR: This paper considers active imitation learning with the goal of reducing this effort by querying the expert about the desired action at individual states, which are selected based on answers to past queries and the learner's interactions with an environment simulator.
Abstract: In standard passive imitation learning, the goal is to learn a policy that performs as well as a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the expert about the desired action at individual states, which are selected based on answers to past queries and the learner's interactions with an environment simulator. We introduce a new approach based on reducing active imitation learning to active i.i.d. learning, which can leverage progress in the i.i.d. setting. Our first contribution is to analyze reductions for both non-stationary and stationary policies, showing for the first time that the label complexity (number of queries) of active imitation learning can be less than that of passive learning. Our second contribution is to introduce a practical algorithm inspired by the reductions, which is shown to be highly effective in five test domains compared to a number of alternatives.

21 citations


Journal ArticleDOI
TL;DR: Recent work on an AI system to quantify bird migration using radar data is described, which is part of the larger BirdCast project to model and forecast bird migration at large scales using radar, weather, and citizen science data.
Abstract: Bird migration occurs at the largest of global scales, but monitoring such movements can be challenging. In the US there is an operational network of weather radars providing freely accessible data for monitoring meteorological phenomena in the atmosphere. Individual radars are sensitive enough to detect birds, and can provide insight into migratory behaviors of birds at scales that are not possible using other sensors. Archived data from the WSR-88D network of US weather radars hold valuable and detailed information about the continent-scale migratory movements of birds over the last 20 years. However, significant technical challenges must be overcome to understand this information and harness its potential for science and conservation. We describe recent work on an AI system to quantify bird migration using radar data, which is part of the larger BirdCast project to model and forecast bird migration at large scales using radar, weather, and citizen science data.

16 citations


Posted Content
TL;DR: Gaussian approximations to the Collective Graphical Model are studied to show that the CGM distribution converges to a multivariate Gaussian distribution (GCGM) that maintains the conditional independence properties of the original CGM.
Abstract: The Collective Graphical Model (CGM) models a population of independent and identically distributed individuals when only collective statistics (i.e., counts of individuals) are observed. Exact inference in CGMs is intractable, and previous work has explored Markov Chain Monte Carlo (MCMC) and MAP approximations for learning and inference. This paper studies Gaussian approximations to the CGM. As the population grows large, we show that the CGM distribution converges to a multivariate Gaussian distribution (GCGM) that maintains the conditional independence properties of the original CGM. If the observations are exact marginals of the CGM or marginals that are corrupted by Gaussian noise, inference in the GCGM approximation can be computed efficiently in closed form. If the observations follow a different noise model (e.g., Poisson), then expectation propagation provides efficient and accurate approximate inference. The accuracy and speed of GCGM inference is compared to the MCMC and MAP methods on a simulated bird migration problem. The GCGM matches or exceeds the accuracy of the MAP method while being significantly faster.

5 citations


Proceedings Article
21 Jun 2014
TL;DR: In this article, a multivariate Gaussian distribution (GCGM) is proposed for the collective graph model, which maintains the conditional independence properties of the original CGM and can be computed efficiently in closed form.
Abstract: The Collective Graphical Model (CGM) models a population of independent and identically distributed individuals when only collective statistics (i.e., counts of individuals) are observed. Exact inference in CGMs is intractable, and previous work has explored Markov Chain Monte Carlo (MCMC) and MAP approximations for learning and inference. This paper studies Gaussian approximations to the CGM. As the population grows large, we show that the CGM distribution converges to a multivariate Gaussian distribution (GCGM) that maintains the conditional independence properties of the original CGM. If the observations are exact marginals of the CGM or marginals that are corrupted by Gaussian noise, inference in the GCGM approximation can be computed efficiently in closed form. If the observations follow a different noise model (e.g., Poisson), then expectation propagation provides efficient and accurate approximate inference. The accuracy and speed of GCGM inference is compared to the MCMC and MAP methods on a simulated bird migration problem. The GCGM matches or exceeds the accuracy of the MAP method while being significantly faster.

3 citations