scispace - formally typeset
Journal ArticleDOI

Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning

Lukas Schäfer, +3 more
- 05 Jul 2022 - 
- Vol. abs/2207.02249
TLDR
This work discusses the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited limitedtuning and proposes three MATE training paradigms: independent MATE, centralised MATES, and mixed MATE which vary in the information used for the task encoding.
Abstract
Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Deep Reinforcement Learning for Multi-Agent Interaction

TL;DR: A broad overview of the ongoing research portfolio of the Autonomous Agents Research Group is provided and open problems for future directions are discussed.

Generating Diverse Teammates to Train Robust Agents For Ad Hoc Teamwork

TL;DR: In this paper , an automated teammate policy generation method optimising the best-response diversity (BRDiv) metric is presented, which measures diversity based on the compatibility of teammate policies in terms of returns.
Journal ArticleDOI

Learning Embeddings for Sequential Tasks Using Population of Agents

TL;DR: In this paper , the authors leverage the idea that two tasks are similar to each other if observing an agent's performance on one task reduces our uncertainty about its performance on the other, and use a diverse population of agents to measure similarity between tasks in sequential decision-making settings.

Generating Teammates for Training Robust Ad Hoc Teamwork Agents via Best-Response Diversity

TL;DR: In this article , an automated teammate policy generation method optimising the best-response diversity (BRDiv) metric is presented, which measures diversity based on the compatibility of teammate policies in terms of returns.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Journal Article

Visualizing Data using t-SNE

TL;DR: A new technique called t-SNE that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map, a variation of Stochastic Neighbor Embedding that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map.
Proceedings Article

Auto-Encoding Variational Bayes

TL;DR: A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Proceedings ArticleDOI

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.