Learning and acting in unknown and uncertain worlds

Open AccessDissertation

Learning and acting in unknown and uncertain worlds

Chats0

TLDR

A policy search method that maintains a population of policies of varying size and a novel non-parametric Bayesian map that sets no limit of the model size are presented, and it is shown experimentally that maps can be learned from robot data with weak prior knowledge.

Abstract:

This dissertation addresses the problem of learning to act in an unknown and uncertain world. This is a difficult problem. Even if a world model is available, an assumption not made here, it is known to be intractable to learn an optimal policy for controlling behaviour (Littman 1996). Assuming no world model is known leads to two approaches: model-free learning, which attempts to learn to act without a model of the environment, and model learning, which attempts to learn a model of the environment from interactions with the world. Most earlier approaches make a priori assumptions about the complexity of the model or policy required, the upshot of which is that a fixed amount of memory is available to the agent. It is well known that in a noisy environment, the type assumed within, an environment specific amount of memory is required to act optimally. Fixing the capacity of memory before any interactions have occurred is thus a limiting assumption. The theme of this dissertation is that representing multiple policies or environment models of varying size enables us to address this problem. Both model-free learning and model learning are investigated. For the former, I present a policy search method (usable with a wide range of algorithms) that maintains a population of policies of varying size. By sharing information between policies I show that it can learn near optimal policies for a variety of challenging problems, and that performance is significantly improved over using the same amount of computation without information sharing. I investigate two approaches to model learning. The first is a variational Bayesian method for learning POMDPs. I show that it achieves superior results to the Bayes-adaptive algorithm (Ross, Chaib-draa and Pineau 2007) using their experimental setup. However, this experimental setup makes strong assumptions about prior information, and I show that weakening these assumptions leads to poor performance. I then address model learning for a simpler model, a topological map. I develop a novel non-parametric Bayesian map that sets no limit of the model size, and show experimentally that maps can be learned from robot data with weak prior knowledge.

Learning and acting in unknown and uncertain worlds

Citations

Automatic State Construction using DecisionTrees for Reinforcement Learning Agents

References

Long short-term memory

Maximum likelihood from incomplete data via the EM algorithm

Distinctive Image Features from Scale-Invariant Keypoints

Optimization by Simulated Annealing

A tutorial on hidden Markov models and selected applications in speech recognition

Related Papers (5)

Partial observability and learnability

Shaping and policy search in reinforcement learning

Learning to make predictions in partially observable environments without a generative model

The evolution of learning systems: to Bayes or not to be

Reinforcement learning with misspecified model classes