scispace - formally typeset
Open AccessProceedings Article

Prioritized goal decomposition of Markov decision processes: toward a synthesis of classical and decision theoretic planning

Reads0
Chats0
TLDR
An abstraction mechanism is used to generate abstract MDPs associated with different objectives, and several methods for merging the policies for these different objectives are considered.
Abstract:ย 
We describe an approach to goal decomposition for a certain class of Markov decision processes (MDPs). An abstraction mechanism is used to generate abstract MDPs associated with different objectives, and several methods for merging the policies for these different objectives are considered. In one t echnique, causal (least-commitment) structures are generated for

read more

Content maybe subject toย copyrightย ย ย  Report

Citations
More filters
Journal ArticleDOI

Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

TL;DR: It is shown that options enable temporally abstract knowledge and action to be included in the reinforcement learning frame- work in a natural and general way and may be used interchangeably with primitive actions in planning methods such as dynamic pro- gramming and in learning methodssuch as Q-learning.
Journal ArticleDOI

Decision-theoretic planning: structural assumptions and computational leverage

TL;DR: In this article, the authors present an overview and synthesis of MDP-related methods, showing how they provide a unifying framework for modeling many classes of planning problems studied in AI.
Proceedings Article

FeUdal Networks for Hierarchical Reinforcement Learning

TL;DR: This work introduces FeUdal Networks (FuNs), a novel architecture for hierarchical reinforcement learning inspired by the feudal reinforcement learning proposal of Dayan and Hinton, and gains power and efficacy by decoupling end-to-end learning across multiple levels -- allowing it to utilise different resolutions of time.
Journal ArticleDOI

Stochastic dynamic programming with factored representations

TL;DR: This work uses dynamic Bayesian networks (with decision trees representing the local families of conditional probability distributions) to represent stochastic actions in an MDP, together with a decision-tree representation of rewards, and develops versions of standard dynamic programming algorithms that directly manipulate decision-Tree representations of policies and value functions.
Posted Content

FeUdal Networks for Hierarchical Reinforcement Learning

TL;DR: FeUdal Networks (FuNs) as mentioned in this paper is a hierarchical reinforcement learning architecture that employs a Manager module and a Worker module, where the Manager operates at lower temporal resolution and sets abstract goals which are conveyed to and enacted by the Worker.
References
More filters
Book

Dynamic Programming

TL;DR: The more the authors study the information processing aspects of the mind, the more perplexed and impressed they become, and it will be a very long time before they understand these processes sufficiently to reproduce them.
Book

Markov Decision Processes: Discrete Stochastic Dynamic Programming

TL;DR: Puterman as discussed by the authors provides a uniquely up-to-date, unified, and rigorous treatment of the theoretical, computational, and applied research on Markov decision process models, focusing primarily on infinite horizon discrete time models and models with discrete time spaces while also examining models with arbitrary state spaces, finite horizon models, and continuous time discrete state models.
Book

Decisions with Multiple Objectives: Preferences and Value Trade-Offs

TL;DR: In this article, a confused decision maker, who wishes to make a reasonable and responsible choice among alternatives, can systematically probe his true feelings in order to make those critically important, vexing trade-offs between incommensurable objectives.
Journal ArticleDOI

A model for reasoning about persistence and causation

TL;DR: A model of causal reasoning that accounts for knowledge concerning causeโ€andโ€effect relationships and knowledge concerning the tendency for propositions to persist or not as a function of time passing is described.