Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years; however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations, without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction, and computer games, to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this article, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications, and highlight current and future research directions.

/pdf/imitation-learning-a-survey-of-learning-methods-1bfrt7qge0.pdf

Imitation Learning: A Survey of Learning Methods

For autonomous vehicles (AVs) to behave appropriately on roads populated by human-driven vehicles, they must be able to reason about the uncertain intentions and decisions of other drivers from rich perceptual information. Towards these capabilities, we present a probabilistic forecasting model of future interactions between a variable number of agents. We perform both standard forecasting and the novel task of conditional forecasting, which reasons about how all agents will likely respond to the goal of a controlled agent (here, the AV). We train models on real and simulated data to forecast vehicle trajectories given past positions and LIDAR. Our evaluation shows that our model is substantially more accurate in multi-agent driving scenarios compared to existing state-of-the-art. Beyond its general ability to perform conditional forecasting queries, we show that our model's predictions of all agents improve when conditioned on knowledge of the AV's goal, further illustrating its capability to model agent interactions.

/pdf/precog-prediction-conditioned-on-goals-in-visual-multi-agent-341dzkjuds.pdf

PRECOG: PREdiction Conditioned on Goals in Visual Multi-Agent Settings

Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data

Neural Relational Inference for Interacting Systems

Communication could potentially be an effective way for multi-agent cooperation. However, information sharing among all agents or in predefined communication architectures that existing methods adopt can be problematic. When there is a large number of agents, agents cannot differentiate valuable information that helps cooperative decision making from globally shared information. Therefore, communication barely helps, and could even impair the learning of multi-agent cooperation. Predefined communication architectures, on the other hand, restrict communication among agents and thus restrain potential cooperation. To tackle these difficulties, in this paper, we propose an attentional communication model that learns when communication is needed and how to integrate shared information for cooperative decision making. Our model leads to efficient and effective communication for large-scale multi-agent cooperation. Empirically, we show the strength of our model in a variety of cooperative scenarios, where agents are able to develop more coordinated and sophisticated strategies than existing methods.

Learning Attentional Communication for Multi-Agent Cooperation

Multiagent Reinforcement Learning (RL) solves complex tasks that require coordination with other agents through autonomous exploration of the environment. However, learning a complex task from scra...

/pdf/a-survey-on-transfer-learning-for-multiagent-reinforcement-3cyaywsfpw.pdf

A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems

We study the problem of imitation learning from demonstrations of multiple coordinating agents. One key challenge in this setting is that learning a good model of coordination can be difficult, since coordination is often implicit in the demonstrations and must be inferred as a latent variable. We propose a joint approach that simultaneously learns a latent coordination model along with the individual policies. In particular, our method integrates unsupervised structure learning with conventional imitation learning. We illustrate the power of our approach on a difficult problem of learning multiple policies for fine-grained behavior modeling in team sports, where different players occupy different roles in the coordinated team strategy. We show that having a coordination model to infer the roles of players yields substantially improved imitation loss compared to conventional baselines.

Coordinated Multi-Agent Imitation Learning

When learning policies for real-world domains, two important questions arise: (i) how to efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate among different competing objectives and constraints. We thus study the problem of batch policy learning under multiple constraints, and offer a systematic solution. We first propose a flexible meta-algorithm that admits any batch reinforcement learning and online learning procedure as subroutines. We then present a specific algorithmic instantiation and provide performance guarantees for the main objective and all constraints. To certify constraint satisfaction, we propose a new and simple method for off-policy policy evaluation (OPE) and derive PAC-style bounds. Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving. We also show experimentally that our OPE method outperforms other popular OPE techniques on a standalone basis, especially in a high-dimensional setting.

/pdf/batch-policy-learning-under-constraints-1stny4it1m.pdf

Batch Policy Learning under Constraints

The disparate experimental conditions in recent off-policy policy evaluation (OPE) literature make it difficult both for practitioners to choose a reliable estimator for their application domain, as well as for researchers to identify fruitful research directions. In this work, we present the first detailed empirical study of a broad suite of OPE methods. Based on thousands of experiments and empirical analysis, we offer a summarized set of guidelines to advance the understanding of OPE performance in practice, and suggest directions for future research. Along the way, our empirical findings challenge several commonly held beliefs about which class of approaches tends to perform well. Our accompanying software implementation serves as a first comprehensive benchmark for OPE.

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Current state-of-the-art sports metrics such as “Wins-above-Replacement” in baseball, “Expected Point Value” in basketball, and “Expected Goal Value” in soccer and hockey are now commonplace in performance analysis. These measures have enhanced our ability to compare and value performance in sport. But they are inherently limited because they are tied to a discrete outcome of a specific event. With the widespread (and growing) availability of player and ball tracking data comes the potential to quantitatively analyze and compare fine-grain movement patterns. An excellent example of this was the “ghosting” system developed by the Toronto Raptors to analyze player decision-making in STATS SportVU tracking data. Specifically, the Raptors created software to predict what a defensive player should have done instead of what they actually did. Motivated by the original “ghosting” work, we showcase an automatic “data-driven ghosting” method using advanced machine learning methodologies called “deep imitation learning”, applied to a season’s worth of tracking data from a recent professional league in soccer. Our ghosting method, which avoids substantial manual human annotation, results in a data-driven system that allows us to answer the question “how should this player or team have played in a given game situation compare to the league average?”. In addition, by “fine-tuning” our league average model to the tracking data from a particular team, our ghosting technique can estimate how each team might have approached the situation. Our method enables counterfactual analysis of effectiveness of defensive positioning as both a measurable and viewable quantity for the first time.

Data-driven ghosting using deep imitation learning

We study the problem of imitation learning from demonstrations of multiple coordinating agents. One key challenge in this setting is that learning a good model of coordination can be difficult, since coordination is often implicit in the demonstrations and must be inferred as a latent variable. We propose a joint approach that simultaneously learns a latent coordination model along with the individual policies. In particular, our method integrates unsupervised structure learning with conventional imitation learning. We illustrate the power of our approach on a difficult problem of learning multiple policies for finegrained behavior modeling in team sports, where different players occupy different roles in the coordinated team strategy. We show that having a coordination model to infer the roles of players yields substantially improved imitation loss compared to conventional baselines.

/pdf/coordinated-multi-agent-imitation-learning-3klrin4mr8.pdf

Hoang M. Le

Papers

Coordinated Multi-Agent Imitation Learning

Batch Policy Learning under Constraints

Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning

Data-driven ghosting using deep imitation learning

Coordinated multi-agent imitation learning