scispace - formally typeset
Search or ask a question
Author

Mohak Bhardwaj

Bio: Mohak Bhardwaj is an academic researcher from University of Washington. The author has contributed to research in topics: Motion planning & Heuristics. The author has an hindex of 7, co-authored 16 publications receiving 158 citations. Previous affiliations of Mohak Bhardwaj include Indian Institutes of Technology & Carnegie Mellon University.

Papers
More filters
Journal ArticleDOI
TL;DR: A novel data-driven imitation learning framework to efficiently train planning policies by imitating a clairvoyant oracle: an oracle that at train time has full knowledge about the world map and can compute optimal decisions.
Abstract: Robot planning is the process of selecting a sequence of actions that optimize for a task=specific objective. For instance, the objective for a navigation task would be to find collision-free paths...

65 citations

Proceedings ArticleDOI
01 May 2020
TL;DR: This work proposes a differentiable extension to the GPMP2 algorithm, so that it can be trained end-to-end from data, and performs several experiments that validate the algorithm and illustrate the benefits of the proposed learning-based approach to motion planning.
Abstract: Modern trajectory optimization based approaches to motion planning are fast, easy to implement, and effective on a wide range of robotics tasks. However, trajectory optimization algorithms have parameters that are typically set in advance (and rarely discussed in detail). Setting these parameters properly can have a significant impact on the practical performance of the algorithm, sometimes making the difference between finding a feasible plan or failing at the task entirely. We propose a method for leveraging past experience to learn how to automatically adapt the parameters of Gaussian Process Motion Planning (GPMP) algorithms. Specifically, we propose a differentiable extension to the GPMP2 algorithm, so that it can be trained end-to-end from data. We perform several experiments that validate our algorithm and illustrate the benefits of our proposed learning-based approach to motion planning.

42 citations

18 Oct 2017
TL;DR: SaIL as discussed by the authors is an efficient algorithm that trains heuristic policies by imitating "clairvoyant oracles" -oracles that have full information about the world and demonstrate decisions that minimize search effort.
Abstract: Robotic motion planning problems are typically solved by constructing a search tree of valid maneuvers from a start to a goal configuration. Limited onboard computation and real-time planning constraints impose a limit on how large this search tree can grow. Heuristics play a crucial role in such situations by guiding the search towards potentially good directions and consequently minimizing search effort. Moreover, it must infer such directions in an efficient manner using only the information uncovered by the search up until that time. However, state of the art methods do not address the problem of computing a heuristic that explicitly minimizes search effort. In this paper, we do so by training a heuristic policy that maps the partial information from the search to decide which node of the search tree to expand. Unfortunately, naively training such policies leads to slow convergence and poor local minima. We present SaIL, an efficient algorithm that trains heuristic policies by imitating "clairvoyant oracles" - oracles that have full information about the world and demonstrate decisions that minimize search effort. We leverage the fact that such oracles can be efficiently computed using dynamic programming and derive performance guarantees for the learnt heuristic. We validate the approach on a spectrum of environments which show that SaIL consistently outperforms state of the art algorithms. Our approach paves the way forward for learning heuristics that demonstrate an anytime nature - finding feasible solutions quickly and incrementally refining it over time.

36 citations

Posted Content
TL;DR: SaIL is presented, an efficient algorithm that trains heuristic policies by imitating "clairvoyant oracles" - oracles that have full information about the world and demonstrate decisions that minimize search effort and is validated on a spectrum of environments which show that SaIL consistently outperforms state of the art algorithms.
Abstract: Robotic motion planning problems are typically solved by constructing a search tree of valid maneuvers from a start to a goal configuration. Limited onboard computation and real-time planning constraints impose a limit on how large this search tree can grow. Heuristics play a crucial role in such situations by guiding the search towards potentially good directions and consequently minimizing search effort. Moreover, it must infer such directions in an efficient manner using only the information uncovered by the search up until that time. However, state of the art methods do not address the problem of computing a heuristic that explicitly minimizes search effort. In this paper, we do so by training a heuristic policy that maps the partial information from the search to decide which node of the search tree to expand. Unfortunately, naively training such policies leads to slow convergence and poor local minima. We present SaIL, an efficient algorithm that trains heuristic policies by imitating "clairvoyant oracles" - oracles that have full information about the world and demonstrate decisions that minimize search effort. We leverage the fact that such oracles can be efficiently computed using dynamic programming and derive performance guarantees for the learnt heuristic. We validate the approach on a spectrum of environments which show that SaIL consistently outperforms state of the art algorithms. Our approach paves the way forward for learning heuristics that demonstrate an anytime nature - finding feasible solutions quickly and incrementally refining it over time.

29 citations

31 Jul 2020
TL;DR: In this article, the authors present a theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models, and validate the proposed algorithm on sim-to-sim control tasks.
Abstract: Model-free Reinforcement Learning (RL) works well when experience can be collected cheaply and model-based RL is effective when system dynamics can be modeled accurately. However, both assumptions can be violated in real world problems such as robotics, where querying the system can be expensive and real-world dynamics can be difficult to model. In contrast to RL, Model Predictive Control (MPC) algorithms use a simulator to optimize a simple policy class online, constructing a closed-loop controller that can effectively contend with real-world dynamics. MPC performance is usually limited by factors such as model bias and the limited horizon of optimization. In this work, we present a novel theoretical connection between information theoretic MPC and entropy regularized RL and develop a Q-learning algorithm that can leverage biased models. We validate the proposed algorithm on sim-to-sim control tasks to demonstrate the improvements over optimal control and reinforcement learning from scratch. Our approach paves the way for deploying reinforcement learning algorithms on real systems in a systematic manner.

11 citations


Cited by
More filters
Proceedings ArticleDOI
05 Mar 2019
TL;DR: In this paper, the authors use a progress monitor developed in prior work as a learnable heuristic for search, and propose two modules incorporated into an end-to-end architecture: a learned mechanism to perform backtracking, which decides whether to continue moving forward or roll back to a previous state, and a mechanism to help the agent decide which direction to go next by showing directions that are visited and their associated progress estimate.
Abstract: As deep learning continues to make progress for challenging perception tasks, there is increased interest in combining vision, language, and decision-making. Specifically, the Vision and Language Navigation (VLN) task involves navigating to a goal purely from language instructions and visual information without explicit knowledge of the goal. Recent successful approaches have made in-roads in achieving good success rates for this task but rely on beam search, which thoroughly explores a large number of trajectories and is unrealistic for applications such as robotics. In this paper, inspired by the intuition of viewing the problem as search on a navigation graph, we propose to use a progress monitor developed in prior work as a learnable heuristic for search. We then propose two modules incorporated into an end-to-end architecture: 1) A learned mechanism to perform backtracking, which decides whether to continue moving forward or roll back to a previous state (Regret Module) and 2) A mechanism to help the agent decide which direction to go next by showing directions that are visited and their associated progress estimate (Progress Marker). Combined, the proposed approach significantly outperforms current state-of-the-art methods using greedy action selection, with 5% absolute improvement on the test server in success rates, and more importantly 8% on success rates normalized by the path length.

157 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed motion planning networks (MPNet) which uses neural networks to learn general near-optimal heuristics for path planning in seen and unseen environments.
Abstract: This article describes motion planning networks (MPNet), a computationally efficient, learning-based neural planner for solving motion planning problems.MPNet uses neural networks to learn general near-optimal heuristics for path planning in seen and unseen environments. It takes environment information such as raw point cloud from depth sensors, as well as a robot's initial and desired goal configurations and recursively calls itself to bidirectionally generate connectable paths. In addition to finding directly connectable and near-optimal paths in a single pass, we show that worst-case theoretical guarantees can be proven if we merge this neural network strategy with classical sample-based planners in a hybrid approach while still retaining significant computational and optimality improvements. To train the MPNet models, we present an active continual learning approach that enables MPNet to learn from streaming data and actively ask for expert demonstrations when needed, drastically reducing data for training. We validate MPNet against gold-standard and state-of-the-art planning methods in a variety of problems from two-dimensional to seven-dimensional robot configuration spaces in challenging and cluttered environments, with results showing significant and consistently stronger performance metrics, and motivating neural planning in general as a modern strategy for solving motion planning problems efficiently.

147 citations

Journal ArticleDOI
24 Sep 2020
TL;DR: This work introduces a globally guided reinforcement learning approach (G2RL), which incorporates a novel reward structure that generalizes to arbitrary environments and applies G2RL to solve the multi-robot path planning problem in a fully distributed reactive manner.
Abstract: Path planning for mobile robots in large dynamic environments is a challenging problem, as the robots are required to efficiently reach their given goals while simultaneously avoiding potential conflicts with other robots or dynamic objects. In the presence of dynamic obstacles, traditional solutions usually employ re-planning strategies, which re-call a planning algorithm to search for an alternative path whenever the robot encounters a conflict. However, such re-planning strategies often cause unnecessary detours. To address this issue, we propose a learning-based technique that exploits environmental spatio-temporal information. Different from existing learning-based methods, we introduce a globally guided reinforcement learning approach ( G2RL ), which incorporates a novel reward structure that generalizes to arbitrary environments. We apply G2RL to solve the multi-robot path planning problem in a fully distributed reactive manner. We evaluate our method across different map types, obstacle densities, and the number of robots. Experimental results show that G2RL generalizes well, outperforming existing distributed methods, and performing very similarly to fully centralized state-of-the-art benchmarks.

113 citations

Proceedings ArticleDOI
01 Mar 2019
TL;DR: To model language-based assistance, a general framework termed Imitation Learning with Indirect Intervention (I3L) is developed, and a solution that is effective on the VNLA task is proposed that significantly improves the success rate of the learning agent over other baselines on both seen and unseen environments.
Abstract: We present Vision-based Navigation with Language-based Assistance (VNLA), a grounded vision-language task where an agent with visual perception is guided via language to find objects in photorealistic indoor environments. The task emulates a real-world scenario in that (a) the requester may not know how to navigate to the target objects and thus makes requests by only specifying high-level end-goals, and (b) the agent is capable of sensing when it is lost and querying an advisor, who is more qualified at the task, to obtain language subgoals to make progress. To model language-based assistance, we develop a general framework termed Imitation Learning with Indirect Intervention (I3L), and propose a solution that is effective on the VNLA task. Empirical results show that this approach significantly improves the success rate of the learning agent over other baselines on both seen and unseen environments. Our code and data are publicly available at https://github.com/debadeepta/vnla .

98 citations

Journal ArticleDOI
TL;DR: A novel data-driven imitation learning framework to efficiently train planning policies by imitating a clairvoyant oracle: an oracle that at train time has full knowledge about the world map and can compute optimal decisions.
Abstract: Robot planning is the process of selecting a sequence of actions that optimize for a task=specific objective. For instance, the objective for a navigation task would be to find collision-free paths...

65 citations