Generalizing plans to new environments in relational MDPs

Open AccessProceedings Article

Generalizing plans to new environments in relational MDPs

- pp 1003-1010

TLDR

This paper presents an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs), and proves that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space.

Abstract:

A longstanding goal in planning research is the ability to generalize plans developed for some set of environments to a new but similar environment, with minimal or no replanning. Such generalization can both reduce planning time and allow us to tackle larger domains than the ones tractable for direct planning. In this paper, we present an approach to the generalization problem based on a new framework of relational Markov Decision Processes (RMDPs). An RMDP can model a set of similar environments by representing objects as instances of different classes. In order to generalize plans to multiple environments, we define an approximate value function specified in terms of classes of objects and, in a multiagent setting, by classes of agents. This class-based approximate value function is optimized relative to a sampled subset of environments, and computed using an efficient linear programming method. We prove that a polynomial number of sampled environments suffices to achieve performance close to the performance achievable when optimizing over the entire space. Our experimental results show that our method generalizes plans successfully to new, significantly larger, environments, with minimal loss of performance relative to environment-specific planning. We demonstrate our approach on a real strategic computer war game.

Generalizing plans to new environments in relational MDPs

Citations

Transfer Learning for Reinforcement Learning Domains: A Survey

Markov Decision Processes

Knowledge in Action: Logical Foundations for Specifying and Implementing Dynamical Systems

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation

References

Reinforcement Learning: An Introduction

Introduction to Reinforcement Learning

Some philosophical problems from the standpoint of artificial intelligence

A mathematical introduction to logic

Hierarchical reinforcement learning with the MAXQ value function decomposition

Related Papers (5)

Reinforcement Learning: An Introduction

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Dynamic Programming

Decision-theoretic planning: structural assumptions and computational leverage

Introduction to Reinforcement Learning