scispace - formally typeset
Search or ask a question

Showing papers by "Thomas G. Dietterich published in 2017"


Journal ArticleDOI
TL;DR: Eight ideas related to robustness that are being pursued within the AI research community are described, which touch on the fundamental question of how finite systems can survive and thrive in a complex and dangerous world.
Abstract: Recent advances in artificial intelligence are encouraging governments and corporations to deploy AI in high-stakes settings including driving cars autonomously, managing the power grid, trading on stock exchanges, and controlling autonomous weapons systems. Such applications require AI methods to be robust to both the known unknowns (those uncertain aspects of the world about which the computer can reason explicitly) and the unknown unknowns (those aspects of the world that are not captured by the system’s models). This article discusses recent progress in AI and then describes eight ideas related to robustness that are being pursued within the AI research community. While these ideas are a start, we need to devote more attention to the challenges of dealing with the known and unknown unknowns. These issues are fascinating, because they touch on the fundamental question of how finite systems can survive and thrive in a complex and dangerous world

123 citations


Posted Content
TL;DR: A novel technique for incorporating simple binary feedback into tree-based anomaly detectors is introduced and it is shown that the Isolation Forest algorithm can significantly improve its performance by incorporating feedback, when compared with the baseline algorithm that does not incorporate feedback.
Abstract: Anomaly detectors are often used to produce a ranked list of statistical anomalies, which are examined by human analysts in order to extract the actual anomalies of interest. Unfortunately, in realworld applications, this process can be exceedingly difficult for the analyst since a large fraction of high-ranking anomalies are false positives and not interesting from the application perspective. In this paper, we aim to make the analyst's job easier by allowing for analyst feedback during the investigation process. Ideally, the feedback influences the ranking of the anomaly detector in a way that reduces the number of false positives that must be examined before discovering the anomalies of interest. In particular, we introduce a novel technique for incorporating simple binary feedback into tree-based anomaly detectors. We focus on the Isolation Forest algorithm as a representative tree-based anomaly detector, and show that we can significantly improve its performance by incorporating feedback, when compared with the baseline algorithm that does not incorporate feedback. Our technique is simple and scales well as the size of the data increases, which makes it suitable for interactive discovery of anomalies in large datasets.

39 citations


Journal ArticleDOI
TL;DR: A machine learning technique called approximate dynamic programming is applied to determine the optimal timing and location of fuel treatments and timber harvests for a fire-threatened landscape using policies that explicitly consider evolving spatial interactions created by fire spread.

21 citations


Journal ArticleDOI
TL;DR: The first visualization targeting MDP testing, MDPvis, is presented and it is shown the visualization's generality by connecting it to two reinforcement learning frameworks that implement many different MDPs of interest in the research community.
Abstract: Markov Decision Processes (MDPs) are a formulation for optimization problems in sequential decision making Solving MDPs often requires implementing a simulator for optimization algorithms to invoke when updating decision making rules known as policies The combination of simulator and optimizer are subject to failures of specification, implementation, integration, and optimization that may produce invalid policies We present these failures as queries for a visual analytic system (MDPVIS) MDPVIS addresses three visualization research gaps First, the data acquisition gap is addressed through a general simulator-visualization interface Second, the data analysis gap is addressed through a generalized MDP information visualization Finally, the cognition gap is addressed by exposing model components to the user MDPVIS generalizes a visualization for wildfire management We use that problem to illustrate MDPVIS and show the visualization's generality by connecting it to two reinforcement learning frameworks that implement many different MDPs of interest in the research community HighlightsMarkov decision processes (MDPs) formalize sequential decision optimization problemsComplex simulators often implement MDPs and are subject to a variety of bugsInteractive visualizations support testing MDPs and optimization algorithmsThe first visualization targeting MDP testing, MDPvis, is presented

11 citations


Proceedings Article
01 Jan 2017
TL;DR: Three new algorithms, designed to complement an existing POMDP solver and be able to approximately solve N-POMDPs, are proposed, based on a general approach that is called α-min-2.
Abstract: In many fields in computational sustainability, applications of POMDPs are inhibited by the complexity of the optimal solution. One way of delivering simple solutions is to represent the policy with a small number of α-vectors. We would like to find the best possible policy that can be expressed using a fixed number N of α-vectors. We call this the N-POMDP problem. The existing solver α-min approximately solves finite-horizon POMDPs with a controllable number of α-vectors. However α-min is a greedy algorithm without performance guarantees, and it is rather slow. This paper proposes three new algorithms, based on a general approach that we call α-min-2. These three algorithms are able to approximately solve N-POMDPs. α-min-2-fast (heuristic) and α-min-2-p (with performance guarantees) are designed to complement an existing POMDP solver, while α-min-2-solve (heuristic) is a solver itself. Complexity results are provided for each of the algorithms, and they are tested on well-known benchmarks. These new algorithms will help users to interpret solutions to POMDP problems in computational sustainability.

10 citations


Journal ArticleDOI
TL;DR: A bound on regret due to state abstraction in tree search is derived that decomposes abstraction error into three components arising from properties of the abstraction and the search algorithm.
Abstract: Sample-based tree search (SBTS) is an approach to solving Markov decision problems based on constructing a lookahead search tree using random samples from a generative model of the MDP. It encompasses Monte Carlo tree search (MCTS) algorithms like UCT as well as algorithms such as sparse sampling. SBTS is well-suited to solving MDPs with large state spaces due to the relative insensitivity of SBTS algorithms to the size of the state space. The limiting factor in the performance of SBTS tends to be the exponential dependence of sample complexity on the depth of the search tree. The number of samples required to build a search tree is O((|A|B)d), where |A| is the number of available actions, B is the number of possible random outcomes of taking an action, and d is the depth of the tree. State abstraction can be used to reduce B by aggregating random outcomes together into abstract states. Recent work has shown that abstract tree search often performs substantially better than tree search conducted in the ground state space. This paper presents a theoretical and empirical evaluation of tree search with both fixed and adaptive state abstractions. We derive a bound on regret due to state abstraction in tree search that decomposes abstraction error into three components arising from properties of the abstraction and the search algorithm. We describe versions of popular SBTS algorithms that use fixed state abstractions, and we introduce the Progressive Abstraction Refinement in Sparse Sampling (PARSS) algorithm, which adapts its abstraction during search. We evaluate PARSS as well as sparse sampling with fixed abstractions on 12 experimental problems, and find that PARSS outperforms search with a fixed abstraction and that search with even highly inaccurate fixed abstractions outperforms search without abstraction. These results establish progressive abstraction refinement as a promising basis for new tree search algorithms, and we propose directions for future work within the progressive refinement framework.

9 citations


Posted Content
TL;DR: A method for factoring out some of the state and action variables so that Model-Free Monte Carlo can work in high-dimensional MDPs is described and evaluated on a very challenging wildfire management MDP.
Abstract: Policy analysts wish to visualize a range of policies for large simulator-defined Markov Decision Processes (MDPs). One visualization approach is to invoke the simulator to generate on-policy trajectories and then visualize those trajectories. When the simulator is expensive, this is not practical, and some method is required for generating trajectories for new policies without invoking the simulator. The method of Model-Free Monte Carlo (MFMC) can do this by stitching together state transitions for a new policy based on previously-sampled trajectories from other policies. This "off-policy Monte Carlo simulation" method works well when the state space has low dimension but fails as the dimension grows. This paper describes a method for factoring out some of the state and action variables so that MFMC can work in high-dimensional MDPs. The new method, MFMCi, is evaluated on a very challenging wildfire management MDP.

4 citations


Posted Content
TL;DR: This paper assesses the suitability of SMAC---a black-box empirical function optimization algorithm---for rapid optimization of MDP policies and confirms that SMAC is able to rapidly find good policies that make sense from the domain perspective.
Abstract: Managers of US National Forests must decide what policy to apply for dealing with lightning-caused wildfires. Conflicts among stakeholders (e.g., timber companies, home owners, and wildlife biologists) have often led to spirited political debates and even violent eco-terrorism. One way to transform these conflicts into multi-stakeholder negotiations is to provide a high-fidelity simulation environment in which stakeholders can explore the space of alternative policies and understand the tradeoffs therein. Such an environment needs to support fast optimization of MDP policies so that users can adjust reward functions and analyze the resulting optimal policies. This paper assesses the suitability of SMAC---a black-box empirical function optimization algorithm---for rapid optimization of MDP policies. The paper describes five reward function components and four stakeholder constituencies. It then introduces a parameterized class of policies that can be easily understood by the stakeholders. SMAC is applied to find the optimal policy in this class for the reward functions of each of the stakeholder constituencies. The results confirm that SMAC is able to rapidly find good policies that make sense from the domain perspective. Because the full-fidelity forest fire simulator is far too expensive to support interactive optimization, SMAC is applied to a surrogate model constructed from a modest number of runs of the full-fidelity simulator. To check the quality of the SMAC-optimized policies, the policies are evaluated on the full-fidelity simulator. The results confirm that the surrogate values estimates are valid. This is the first successful optimization of wildfire management policies using a full-fidelity simulation. The same methodology should be applicable to other contentious natural resource management problems where high-fidelity simulation is extremely expensive.

2 citations