Top 19 papers published by Aviv Tamar from Technion – Israel Institute of Technology in 2019

Proceedings Article•

A Deep Reinforcement Learning Perspective on Internet Congestion Control

[...]

Nathan Jay¹, Noga H. Rotman², Brighten Godfrey¹, Michael Schapira², Aviv Tamar³ - Show less +1 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, Hebrew University of Jerusalem², University of California, Berkeley³

24 May 2019

TL;DR: It is shown that casting congestion control as RL enables training deep network policies that capture intricate patterns in data traffic and network conditions, and leverage this to outperform the state-of-the-art.

...read moreread less

Abstract: We present and investigate a novel and timely application domain for deep reinforcement learning (RL): Internet congestion control. Congestion control is the core networking task of modulating traffic sources’ data-transmission rates to efficiently utilize network capacity, and is the subject of extensive attention in light of the advent of Internet services such as live video, virtual reality, Internet-of-Things, and more. We show that casting congestion control as RL enables training deep network policies that capture intricate patterns in data traffic and network conditions, and leverage this to outperform the state-of-the-art. We also highlight significant challenges facing real-world adoption of RL-based congestion control, including fairness, safety, and generalization, which are not trivial to address within conventional RL formalism. To facilitate further research and reproducibility of our results, we present a test suite for RL-guided congestion control based on the OpenAI Gym interface.

...read moreread less

166 citations

Proceedings Article•DOI•

Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly

[...]

Jianlan Luo¹, Eugen Solowjow¹, Wen Chengtao¹, Juan Aparicio Ojea¹, Alice M. Agogino¹, Aviv Tamar², Pieter Abbeel¹ - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Technion – Israel Institute of Technology²

20 May 2019

TL;DR: This paper explicitly considers incorporating operational space force/torque information into reinforcement learning; this is motivated by humans heuristically mapping perceived forces to control actions, which results in completing high-precision tasks in a fairly easy manner.

...read moreread less

Abstract: Precise robotic manipulation skills are desirable in many industrial settings, reinforcement learning (RL) methods hold the promise of acquiring these skills autonomously. In this paper, we explicitly consider incorporating operational space force/torque information into reinforcement learning; this is motivated by humans heuristically mapping perceived forces to control actions, which results in completing high-precision tasks in a fairly easy manner. Our approach combines RL with force/torque information by incorporating a proper operational space force controller; where we also exploit different ablations on processing this information. Moreover, we propose a neural network architecture that generalizes to reasonable variations of the environment. We evaluate our method on the open-source Siemens Robot Learning Challenge, which requires precise and delicate force-controlled behavior to assemble a tight-fit gear wheel set.

...read moreread less

96 citations

Proceedings Article•DOI•

Bayesian Relational Memory for Semantic Visual Navigation

[...]

Yi Wu¹, Yuxin Wu², Aviv Tamar¹, Stuart Russell¹, Georgia Gkioxari², Yuandong Tian² - Show less +2 more•Institutions (2)

University of California, Berkeley¹, Facebook²

01 Oct 2019

TL;DR: A new memory architecture, Bayesian Relational Memory (BRM), is introduced to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.

...read moreread less

Abstract: We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards. BRM takes the form of a probabilistic relation graph over semantic entities (e.g., room types), which allows (1) capturing the layout prior from training environments, i.e., prior knowledge, (2) estimating posterior layout at test time, i.e., memory update, and (3) efficient planning for navigation, altogether. We develop a BRM agent consisting of a BRM module for producing sub-goals and a goal-conditioned locomotion module for control. When testing in unseen environments, the BRM agent outperforms baselines that do not explicitly utilize the probabilistic relational memory structure.

...read moreread less

95 citations

Proceedings Article•

Learning Robotic Manipulation through Visual Planning and Acting.

[...]

Angelina Wang¹, Thanard Kurutach², Kara Liu², Pieter Abbeel², Aviv Tamar² - Show less +1 more•Institutions (2)

Princeton University¹, University of California, Berkeley²

22 Jun 2019

TL;DR: This work learns to imagine goal-directed object manipulation directly from raw image data of self-supervised interaction of the robot with the object, and shows that separating the problem into visual planning and visual tracking control is more efficient and more interpretable than alternative data-driven approaches.

...read moreread less

Abstract: Planning for robotic manipulation requires reasoning about the changes a robot can affect on objects. When such interactions can be modelled analytically, as in domains with rigid objects, efficient planning algorithms exist. However, in both domestic and industrial domains, the objects of interest can be soft, or deformable, and hard to model analytically. For such cases, we posit that a data-driven modelling approach is more suitable. In recent years, progress in deep generative models has produced methods that learn to `imagine' plausible images from data. Building on the recent Causal InfoGAN generative model, in this work we learn to imagine goal-directed object manipulation directly from raw image data of self-supervised interaction of the robot with the object. After learning, given a goal observation of the system, our model can generate an imagined plan -- a sequence of images that transition the object into the desired goal. To execute the plan, we use it as a reference trajectory to track with a visual servoing controller, which we also learn from the data as an inverse dynamics model. In a simulated manipulation task, we show that separating the problem into visual planning and visual tracking control is more sample efficient and more interpretable than alternative data-driven approaches. We further demonstrate our approach on learning to imagine and execute in 3 environments, the final of which is deformable rope manipulation on a PR2 robot.

...read moreread less

64 citations

Posted Content•

Reinforcement Learning on Variable Impedance Controller for High-Precision Robotic Assembly.

[...]

Jianlan Luo¹, Eugen Solowjow¹, Wen Chengtao¹, Juan Aparicio Ojea¹, Alice M. Agogino¹, Aviv Tamar², Pieter Abbeel¹ - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Technion – Israel Institute of Technology²

04 Mar 2019-arXiv: Robotics

TL;DR: In this article, the authors combine RL with force/torque information by incorporating a proper operational space force controller, where they also exploit different ablations on processing this information, and propose a neural network architecture that generalizes to reasonable variations of the environment.

...read moreread less

Abstract: Precise robotic manipulation skills are desirable in many industrial settings, reinforcement learning (RL) methods hold the promise of acquiring these skills autonomously. In this paper, we explicitly consider incorporating operational space force/torque information into reinforcement learning; this is motivated by humans heuristically mapping perceived forces to control actions, which results in completing high-precision tasks in a fairly easy manner. Our approach combines RL with force/torque information by incorporating a proper operational space force controller; where we also exploit different ablations on processing this information. Moreover, we propose a neural network architecture that generalizes to reasonable variations of the environment. We evaluate our method on the open-source Siemens Robot Learning Challenge, which requires precise and delicate force-controlled behavior to assemble a tight-fit gear wheel set.

...read moreread less

55 citations

Posted Content•

Learning Robotic Manipulation through Visual Planning and Acting.

[...]

Angelina Wang¹, Thanard Kurutach², Kara Liu², Pieter Abbeel², Aviv Tamar² - Show less +1 more•Institutions (2)

Princeton University¹, University of California, Berkeley²

11 May 2019-arXiv: Robotics

TL;DR: In this article, the Causal InfoGAN generative model is used to generate a sequence of images that transition the desired object into the desired goal and then use it as a reference trajectory to track with a visual servoing controller, which learns from the data as an inverse dynamics model.

...read moreread less

Abstract: Planning for robotic manipulation requires reasoning about the changes a robot can affect on objects. When such interactions can be modelled analytically, as in domains with rigid objects, efficient planning algorithms exist. However, in both domestic and industrial domains, the objects of interest can be soft, or deformable, and hard to model analytically. For such cases, we posit that a data-driven modelling approach is more suitable. In recent years, progress in deep generative models has produced methods that learn to `imagine' plausible images from data. Building on the recent Causal InfoGAN generative model, in this work we learn to imagine goal-directed object manipulation directly from raw image data of self-supervised interaction of the robot with the object. After learning, given a goal observation of the system, our model can generate an imagined plan -- a sequence of images that transition the object into the desired goal. To execute the plan, we use it as a reference trajectory to track with a visual servoing controller, which we also learn from the data as an inverse dynamics model. In a simulated manipulation task, we show that separating the problem into visual planning and visual tracking control is more sample efficient and more interpretable than alternative data-driven approaches. We further demonstrate our approach on learning to imagine and execute in 3 environments, the final of which is deformable rope manipulation on a PR2 robot.

...read moreread less

51 citations

Proceedings Article•DOI•

A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems

[...]

Margaret P. Chapman¹, Jonathan Lacotte², Aviv Tamar³, Donggun Lee³, Kevin M. Smith⁴, Victoria Cheng³, Jaime F. Fisac³, Susmit Jha¹, Marco Pavone², Claire J. Tomlin³ - Show less +6 more•Institutions (4)

SRI International¹, Stanford University², University of California, Berkeley³, Tufts University⁴

10 Jul 2019

TL;DR: In this paper, the authors leverage existing theory of reachability analysis and risk measures to devise a risk-sensitive reachability approach for safety of stochasticdynamic systems under non-adversarial disturbances over a finite time horizon.

...read moreread less

Abstract: A classic reachability problem for safety of dynamic systems is to compute the set of initial states from which the state trajectory is guaranteed to stay inside a given constraint set over a given time horizon. In this paper, we leverage existing theory of reachability analysis and risk measures to devise a risk-sensitivereachability approach for safety of stochasticdynamic systems under non-adversarial disturbances over a finite time horizon. Specifically, we first introduce the notion of a risk-sensitive safe set asa set of initial states from which the risk of large constraint violations can be reduced to a required level via a control policy, where risk is quantified using the Conditional Value-at-Risk(CVaR) measure. Second, we show how the computation of a risk-sensitive safe set can be reduced to the solution to a Markov Decision Process (MDP), where cost is assessed according to CVaR. Third, leveraging this reduction, we devise a tractable algorithm to approximate a risk-sensitive safe set and provide arguments about its correctness. Finally, we present a realistic example inspired from stormwater catchment design to demonstrate the utility of risk-sensitive reachability analysis. In particular, our approach allows a practitioner to tune the level of risk sensitivity from worst-case (which is typical for Hamilton-Jacobi reachability analysis) to risk-neutral (which is the case for stochastic reachability analysis).

...read moreread less

38 citations

Proceedings Article•

Harnessing Reinforcement Learning for Neural Motion Planning

[...]

Tom Jurgenson¹, Aviv Tamar²•Institutions (2)

Technion – Israel Institute of Technology¹, University of California, Berkeley²

22 Jun 2019

TL;DR: In this article, a modification of the DDPG RL algorithm was proposed for motion planning domains, by exploiting the known model in the problem and the set of solved plans in the data, which significantly improves the accuracy of the learned motion planning policy.

...read moreread less

Abstract: Motion planning is an essential component in most of today's robotic applications. In this work, we consider the learning setting, where a set of solved motion planning problems is used to improve the efficiency of motion planning on different, yet similar problems. This setting is important in applications with rapidly changing environments such as in e-commerce, among others. We investigate a general deep learning based approach, where a neural network is trained to map an image of the domain, the current robot state, and a goal robot state to the next robot state in the plan. We focus on the learning algorithm, and compare supervised learning methods with reinforcement learning (RL) algorithms. We first establish that supervised learning approaches are inferior in their accuracy due to insufficient data on the boundary of the obstacles, an issue that RL methods mitigate by actively exploring the domain. We then propose a modification of the popular DDPG RL algorithm that is tailored to motion planning domains, by exploiting the known model in the problem and the set of solved plans in the data. We show that our algorithm, dubbed DDPG-MP, significantly improves the accuracy of the learned motion planning policy. Finally, we show that given enough training data, our method can plan significantly faster on novel domains than off-the-shelf sampling based motion planners. Results of our experiments are shown in this https URL.

...read moreread less

36 citations

Posted Content•

Harnessing Reinforcement Learning for Neural Motion Planning

[...]

Tom Jurgenson¹, Aviv Tamar²•Institutions (2)

Technion – Israel Institute of Technology¹, University of California, Berkeley²

01 Jun 2019-arXiv: Robotics

TL;DR: This work proposes a modification of the popular DDPG RL algorithm that is tailored to motion planning domains, by exploiting the known model in the problem and the set of solved plans in the data, and shows that the algorithm can plan significantly faster on novel domains than off-the-shelf sampling based motion planners.

...read moreread less

Abstract: Motion planning is an essential component in most of today's robotic applications. In this work, we consider the learning setting, where a set of solved motion planning problems is used to improve the efficiency of motion planning on different, yet similar problems. This setting is important in applications with rapidly changing environments such as in e-commerce, among others. We investigate a general deep learning based approach, where a neural network is trained to map an image of the domain, the current robot state, and a goal robot state to the next robot state in the plan. We focus on the learning algorithm, and compare supervised learning methods with reinforcement learning (RL) algorithms. We first establish that supervised learning approaches are inferior in their accuracy due to insufficient data on the boundary of the obstacles, an issue that RL methods mitigate by actively exploring the domain. We then propose a modification of the popular DDPG RL algorithm that is tailored to motion planning domains, by exploiting the known model in the problem and the set of solved plans in the data. We show that our algorithm, dubbed DDPG-MP, significantly improves the accuracy of the learned motion planning policy. Finally, we show that given enough training data, our method can plan significantly faster on novel domains than off-the-shelf sampling based motion planners. Results of our experiments are shown in this https URL.

...read moreread less

25 citations

Proceedings Article•DOI•

Domain Randomization for Active Pose Estimation

[...]

Xinyi Ren¹, Jianlan Luo¹, Eugen Solowjow², Juan Aparicio Ojea², Abhishek Gupta¹, Aviv Tamar¹, Pieter Abbeel¹ - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Siemens²

20 May 2019

TL;DR: The main idea is that active perception – moving the robot to get a better estimate of pose– can be trained in simulation and transferred to real using domain randomization, and this approach can significantly improve the accuracy of standard pose estimation in several scenarios.

...read moreread less

Abstract: Accurate state estimation is a fundamental component of robotic control. In robotic manipulation tasks, as is our focus in this work, state estimation is essential for identifying the positions of objects in the scene, forming the basis of the manipulation plan. However, pose estimation typically requires expensive 3D cameras or additional instrumentation such as fiducial markers to perform accurately. Recently, Tobin et al. introduced an approach to pose estimation based on domain randomization, where a neural network is trained to predict pose directly from a 2D image of the scene. The network is trained on computer generated images with a high variation in textures and lighting, thereby generalizing to real world images. In this work, we investigate how to improve the accuracy of domain randomization based pose estimation. Our main idea is that active perception – moving the robot to get a better estimate of pose– can be trained in simulation and transferred to real using domain randomization. In our approach, the robot trains in a domain-randomized simulation how to estimate pose from a sequence of images. We show that our approach can significantly improve the accuracy of standard pose estimation in several scenarios: when the robot holding an object moves, when reference objects are moved in the scene, or when the camera is moved around the object.

...read moreread less

23 citations

Posted Content•

Deep Variational Semi-Supervised Novelty Detection.

[...]

Tal Daniel¹, Thanard Kurutach, Aviv Tamar•Institutions (1)

Technion – Israel Institute of Technology¹

25 Sep 2019-arXiv: Learning

TL;DR: This work proposes two variational methods for training VAEs for SSAD, and shows that this idea can be derived from principled probabilistic formulations of the problem, and proposes simple and effective algorithms.

...read moreread less

Abstract: In anomaly detection (AD), one seeks to identify whether a test sample is abnormal, given a data set of normal samples. A recent and promising approach to AD relies on deep generative models, such as variational autoencoders (VAEs), for unsupervised learning of the normal data distribution. In semi-supervised AD (SSAD), the data also includes a small sample of labeled anomalies. In this work, we propose two variational methods for training VAEs for SSAD. The intuitive idea in both methods is to train the encoder to `separate' between latent vectors for normal and outlier data. We show that this idea can be derived from principled probabilistic formulations of the problem, and propose simple and effective algorithms. Our methods can be applied to various data types, as we demonstrate on SSAD datasets ranging from natural images to astronomy and medicine, can be combined with any VAE model architecture, and are naturally compatible with ensembling. When comparing to state-of-the-art SSAD methods that are not specific to particular data types, we obtain marked improvement in outlier detection.

...read moreread less

Posted Content•

A Risk-Sensitive Finite-Time Reachability Approach for Safety of Stochastic Dynamic Systems

[...]

Margaret P. Chapman¹, Jonathan Lacotte², Aviv Tamar¹, Donggun Lee¹, Kevin M. Smith³, Victoria Cheng¹, Jaime F. Fisac¹, Susmit Jha⁴, Marco Pavone², Claire J. Tomlin¹ - Show less +6 more•Institutions (4)

University of California, Berkeley¹, Stanford University², Tufts University³, SRI International⁴

28 Feb 2019-arXiv: Systems and Control

TL;DR: This paper introduces the notion of a risk-sensitive safe set as a set of initial states from which the risk of large constraint violations can be reduced to a required level via a control policy, where risk is quantified using the Conditional Value-at-Risk (CVaR) measure.

...read moreread less

Abstract: A classic reachability problem for safety of dynamic systems is to compute the set of initial states from which the state trajectory is guaranteed to stay inside a given constraint set over a given time horizon. In this paper, we leverage existing theory of reachability analysis and risk measures to devise a risk-sensitive reachability approach for safety of stochastic dynamic systems under non-adversarial disturbances over a finite time horizon. Specifically, we first introduce the notion of a risk-sensitive safe set as a set of initial states from which the risk of large constraint violations can be reduced to a required level via a control policy, where risk is quantified using the Conditional Value-at-Risk (CVaR) measure. Second, we show how the computation of a risk-sensitive safe set can be reduced to the solution to a Markov Decision Process (MDP), where cost is assessed according to CVaR. Third, leveraging this reduction, we devise a tractable algorithm to approximate a risk-sensitive safe set, and provide theoretical arguments about its correctness. Finally, we present a realistic example inspired from stormwater catchment design to demonstrate the utility of risk-sensitive reachability analysis. In particular, our approach allows a practitioner to tune the level of risk sensitivity from worst-case (which is typical for Hamilton-Jacobi reachability analysis) to risk-neutral (which is the case for stochastic reachability analysis).

...read moreread less

Posted Content•

Domain Randomization for Active Pose Estimation

[...]

Xinyi Ren¹, Jianlan Luo¹, Eugen Solowjow², Juan Aparicio Ojea², Abhishek Gupta¹, Aviv Tamar¹, Pieter Abbeel¹ - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Siemens²

10 Mar 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, the authors proposed an approach to improve the accuracy of domain randomization-based pose estimation by training a neural network on computer-generated images with a high variation in textures and lighting.

...read moreread less

Abstract: Accurate state estimation is a fundamental component of robotic control. In robotic manipulation tasks, as is our focus in this work, state estimation is essential for identifying the positions of objects in the scene, forming the basis of the manipulation plan. However, pose estimation typically requires expensive 3D cameras or additional instrumentation such as fiducial markers to perform accurately. Recently, Tobin et al.~introduced an approach to pose estimation based on domain randomization, where a neural network is trained to predict pose directly from a 2D image of the scene. The network is trained on computer-generated images with a high variation in textures and lighting, thereby generalizing to real-world images. In this work, we investigate how to improve the accuracy of domain randomization based pose estimation. Our main idea is that active perception -- moving the robot to get a better estimate of pose -- can be trained in simulation and transferred to real using domain randomization. In our approach, the robot trains in a domain-randomized simulation how to estimate pose from a \emph{sequence} of images. We show that our approach can significantly improve the accuracy of standard pose estimation in several scenarios: when the robot holding an object moves, when reference objects are moved in the scene, or when the camera is moved around the object.

...read moreread less

Posted Content•

Multi-Agent Reinforcement Learning with Multi-Step Generative Models

[...]

Orr Krupnik¹, Igor Mordatch², Aviv Tamar¹•Institutions (2)

Technion – Israel Institute of Technology¹, Google²

29 Jan 2019-arXiv: Learning

TL;DR: This work proposes model-based reinforcement learning models based on a disentangled variational auto-encoder, and shows that this approach has better sample efficiency than a strong model-free RL baseline, and can learn both cooperative and adversarial behavior from the same data.

...read moreread less

Abstract: We consider model-based reinforcement learning (MBRL) in 2-agent, high-fidelity continuous control problems -- an important domain for robots interacting with other agents in the same workspace. For non-trivial dynamical systems, MBRL typically suffers from accumulating errors. Several recent studies have addressed this problem by learning latent variable models for trajectory segments and optimizing over behavior in the latent space. In this work, we investigate whether this approach can be extended to 2-agent competitive and cooperative settings. The fundamental challenge is how to learn models that capture interactions between agents, yet are disentangled to allow for optimization of each agent behavior separately. We propose such models based on a disentangled variational auto-encoder, and demonstrate our approach on a simulated 2-robot manipulation task, where one robot can either help or distract the other. We show that our approach has better sample efficiency than a strong model-free RL baseline, and can learn both cooperative and adversarial behavior from the same data.

...read moreread less

Proceedings Article•DOI•

Robust 2D Assembly Sequencing via Geometric Planning with Learned Scores

[...]

Tzvika Geft¹, Aviv Tamar², Ken Goldberg², Dan Halperin¹•Institutions (2)

Tel Aviv University¹, University of California, Berkeley²

01 Aug 2019

TL;DR: This work presents an approach that combines geometric planning with a deep neural network that can learn robustness to plan robust sequences an order of magnitude faster than simulation, and demonstrates this approach on two-handed planar assemblies.

...read moreread less

Abstract: To compute robust 2D assembly plans, we present an approach that combines geometric planning with a deep neural network. We train the network using the Box2D physics simulator with added stochastic noise to yield robustness scores-the success probabilities of planned assembly motions. As running a simulation for every assembly motion is impractical, we train a convolutional neural network to map assembly operations, given as an image pair of the subassemblies before and after they are mated, to a robustness score. The neural network prediction is used within a planner to quickly prune out motions that are not robust. We demonstrate this approach on two-handed planar assemblies, where the motions are onestep linear translations. Results suggest that the neural network can learn robustness to plan robust sequences an order of magnitude faster than simulation.

...read moreread less

Proceedings Article•

Distributional Multivariate Policy Evaluation and Exploration with the Bellman GAN.

[...]

Dror Freirich, Tzahi Shimkin, Ron Meir¹, Aviv Tamar²•Institutions (2)

Technion – Israel Institute of Technology¹, University of California, Berkeley²

24 May 2019

Posted Content•

Sub-Goal Trees - a Framework for Goal-Directed Trajectory Prediction and Optimization.

[...]

Tom Jurgenson, Edward Groshev, Aviv Tamar

12 Jun 2019-arXiv: Learning

TL;DR: A goal-conditioned trajectory can be represented by first selecting an intermediate state between start and goal, partitioning the trajectory into two, and recursively, predicting intermediate points on each sub-segment until a complete trajectory is obtained.

...read moreread less

Abstract: Many AI problems, in robotics and other domains, are goal-directed, essentially seeking a trajectory leading to some goal state. In such problems, the way we choose to represent a trajectory underlies algorithms for trajectory prediction and optimization. Interestingly, most all prior work in imitation and reinforcement learning builds on a sequential trajectory representation -- calculating the next state in the trajectory given its predecessors. We propose a different perspective: a goal-conditioned trajectory can be represented by first selecting an intermediate state between start and goal, partitioning the trajectory into two. Then, recursively, predicting intermediate points on each sub-segment, until a complete trajectory is obtained. We call this representation a sub-goal tree, and building on it, we develop new methods for trajectory prediction, learning, and optimization. We show that in a supervised learning setting, sub-goal trees better account for trajectory variability, and can predict trajectories exponentially faster at test time by leveraging a concurrent computation. Then, for optimization, we derive a new dynamic programming equation for sub-goal trees, and use it to develop new planning and reinforcement learning algorithms. These algorithms, which are not based on the standard Bellman equation, naturally account for hierarchical sub-goal structure in a task. Empirical results on motion planning domains show that the sub-goal tree framework significantly improves both accuracy and prediction time.

...read moreread less

Multi-Agent Reinforcement Learning with Multi-Step Generative Models

[...]

Orr Krupnik, Igor Mordatch, Aviv Tamar

29 Jan 2019

TL;DR: In this paper, a disentangled variational auto-encoder is proposed to learn models that capture interactions between agents and allow for optimization of each agent behavior separately, and demonstrate their approach on a simulated 2-robot manipulation task where one robot can either help or distract the other.

...read moreread less

Abstract: We consider model-based reinforcement learning (MBRL) in 2-agent, high-fidelity continuous control problems -- an important domain for robots interacting with other agents in the same workspace. For non-trivial dynamical systems, MBRL typically suffers from accumulating errors. Several recent studies have addressed this problem by learning latent variable models for trajectory segments and optimizing over behavior in the latent space. In this work, we investigate whether this approach can be extended to 2-agent competitive and cooperative settings. The fundamental challenge is how to learn models that capture interactions between agents, yet are disentangled to allow for optimization of each agent behavior separately. We propose such models based on a disentangled variational auto-encoder, and demonstrate our approach on a simulated 2-robot manipulation task, where one robot can either help or distract the other. We show that our approach has better sample efficiency than a strong model-free RL baseline, and can learn both cooperative and adversarial behavior from the same data.

...read moreread less

Posted Content•

Bayesian Relational Memory for Semantic Visual Navigation

[...]

Yi Wu¹, Yuxin Wu¹, Aviv Tamar², Stuart Russell², Georgia Gkioxari², Yuandong Tian³ - Show less +2 more•Institutions (3)

University of California, Berkeley¹, Facebook², Technion – Israel Institute of Technology³

10 Sep 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, Bayesian Relational Memory (BRM) is introduced to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards.

...read moreread less

Abstract: We introduce a new memory architecture, Bayesian Relational Memory (BRM), to improve the generalization ability for semantic visual navigation agents in unseen environments, where an agent is given a semantic target to navigate towards. BRM takes the form of a probabilistic relation graph over semantic entities (e.g., room types), which allows (1) capturing the layout prior from training environments, i.e., prior knowledge, (2) estimating posterior layout at test time, i.e., memory update, and (3) efficient planning for navigation, altogether. We develop a BRM agent consisting of a BRM module for producing sub-goals and a goal-conditioned locomotion module for control. When testing in unseen environments, the BRM agent outperforms baselines that do not explicitly utilize the probabilistic relational memory structure

...read moreread less

Showing papers by "Aviv Tamar published in 2019"