scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Robotics in 2020"


Journal ArticleDOI
TL;DR: This article presents ORB-SLAM3, the first system able to perform visual, visual-inertial and multimap SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models, resulting in real-time robust operation in small and large, indoor and outdoor environments.
Abstract: This paper presents ORB-SLAM3, the first system able to perform visual, visual-inertial and multi-map SLAM with monocular, stereo and RGB-D cameras, using pin-hole and fisheye lens models. The first main novelty is a feature-based tightly-integrated visual-inertial SLAM system that fully relies on Maximum-a-Posteriori (MAP) estimation, even during the IMU initialization phase. The result is a system that operates robustly in real-time, in small and large, indoor and outdoor environments, and is 2 to 5 times more accurate than previous approaches. The second main novelty is a multiple map system that relies on a new place recognition method with improved recall. Thanks to it, ORB-SLAM3 is able to survive to long periods of poor visual information: when it gets lost, it starts a new map that will be seamlessly merged with previous maps when revisiting mapped areas. Compared with visual odometry systems that only use information from the last few seconds, ORB-SLAM3 is the first system able to reuse in all the algorithm stages all previous information. This allows to include in bundle adjustment co-visible keyframes, that provide high parallax observations boosting accuracy, even if they are widely separated in time or if they come from a previous mapping session. Our experiments show that, in all sensor configurations, ORB-SLAM3 is as robust as the best systems available in the literature, and significantly more accurate. Notably, our stereo-inertial SLAM achieves an average accuracy of 3.6 cm on the EuRoC drone and 9 mm under quick hand-held motions in the room of TUM-VI dataset, a setting representative of AR/VR scenarios. For the benefit of the community we make public the source code.

875 citations


Posted Content
TL;DR: A framework for tightly-coupled lidar inertial odometry via smoothing and mapping, LIO-SAM, that achieves highly accurate, real-time mobile robot trajectory estimation and map-building and an efficient sliding window approach that registers a new keyframe to a fixed-size set of prior "sub-keyframes."
Abstract: We propose a framework for tightly-coupled lidar inertial odometry via smoothing and mapping, LIO-SAM, that achieves highly accurate, real-time mobile robot trajectory estimation and map-building. LIO-SAM formulates lidar-inertial odometry atop a factor graph, allowing a multitude of relative and absolute measurements, including loop closures, to be incorporated from different sources as factors into the system. The estimated motion from inertial measurement unit (IMU) pre-integration de-skews point clouds and produces an initial guess for lidar odometry optimization. The obtained lidar odometry solution is used to estimate the bias of the IMU. To ensure high performance in real-time, we marginalize old lidar scans for pose optimization, rather than matching lidar scans to a global map. Scan-matching at a local scale instead of a global scale significantly improves the real-time performance of the system, as does the selective introduction of keyframes, and an efficient sliding window approach that registers a new keyframe to a fixed-size set of prior ``sub-keyframes.'' The proposed method is extensively evaluated on datasets gathered from three platforms over various scales and environments.

379 citations


Posted Content
TL;DR: Trajectron++ is a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data and outperforming a wide array of state-of-the-art deterministic and generative methods.
Abstract: Reasoning about human motion is an important prerequisite to safe and socially-aware robotic navigation. As a result, multi-agent behavior prediction has become a core component of modern human-robot interactive systems, such as self-driving cars. While there exist many methods for trajectory forecasting, most do not enforce dynamic constraints and do not account for environmental information (e.g., maps). Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data (e.g., semantic maps). Trajectron++ is designed to be tightly integrated with robotic planning and control frameworks; for example, it can produce predictions that are optionally conditioned on ego-agent motion plans. We demonstrate its performance on several challenging real-world trajectory forecasting datasets, outperforming a wide array of state-of-the-art deterministic and generative methods.

305 citations


Posted Content
TL;DR: This work presents an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals and shows that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors forLegged robots.
Abstract: Reproducing the diverse and agile locomotion skills of animals has been a longstanding challenge in robotics. While manually-designed controllers have been able to emulate many complex behaviors, building such controllers involves a time-consuming and difficult development process, often requiring substantial expertise of the nuances of each skill. Reinforcement learning provides an appealing alternative for automating the manual effort involved in the development of controllers. However, designing learning objectives that elicit the desired behaviors from an agent can also require a great deal of skill-specific expertise. In this work, we present an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals. We show that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots. By incorporating sample efficient domain adaptation techniques into the training process, our system is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment. To demonstrate the effectiveness of our system, we train an 18-DoF quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns.

172 citations


Posted Content
TL;DR: The LGSVL Simulator is introduced which is a high fidelity simulator for autonomous driving which provides end-to-end, full-stack simulation which is ready to be hooked up to Autoware and Apollo.
Abstract: Testing autonomous driving algorithms on real autonomous vehicles is extremely costly and many researchers and developers in the field cannot afford a real car and the corresponding sensors. Although several free and open-source autonomous driving stacks, such as Autoware and Apollo are available, choices of open-source simulators to use with them are limited. In this paper, we introduce the LGSVL Simulator which is a high fidelity simulator for autonomous driving. The simulator engine provides end-to-end, full-stack simulation which is ready to be hooked up to Autoware and Apollo. In addition, simulator tools are provided with the core simulation engine which allow users to easily customize sensors, create new types of controllable objects, replace some modules in the core simulator, and create digital twins of particular environments.

152 citations


Posted Content
TL;DR: The key system modules and the benchmark environments of the new release robosuite v1.0 are discussed.
Abstract: robosuite is a simulation framework for robot learning powered by the MuJoCo physics engine. It offers a modular design for creating robotic tasks as well as a suite of benchmark environments for reproducible research. This paper discusses the key system modules and the benchmark environments of our new release robosuite v1.0.

148 citations


Posted Content
TL;DR: The Transporter Network is proposed, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions and learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses.
Abstract: Robotic manipulation can be formulated as inducing a sequence of spatial displacements: where the space being moved can encompass an object, part of an object, or end effector. In this work, we propose the Transporter Network, a simple model architecture that rearranges deep features to infer spatial displacements from visual input - which can parameterize robot actions. It makes no assumptions of objectness (e.g. canonical poses, models, or keypoints), it exploits spatial symmetries, and is orders of magnitude more sample efficient than our benchmarked alternatives in learning vision-based manipulation tasks: from stacking a pyramid of blocks, to assembling kits with unseen objects; from manipulating deformable ropes, to pushing piles of small objects with closed-loop feedback. Our method can represent complex multi-modal policy distributions and generalizes to multi-step sequential tasks, as well as 6DoF pick-and-place. Experiments on 10 simulated tasks show that it learns faster and generalizes better than a variety of end-to-end baselines, including policies that use ground-truth object poses. We validate our methods with hardware in the real world. Experiment videos and code are available at this https URL

148 citations


Journal ArticleDOI
You Li1, Javier Ibanez-Guzman1
TL;DR: A review of state-of-the-art automotive LiDAR technologies and the perception algorithms used with those technologies can be found in this paper, where the main components from laser transmitter to its beam scanning mechanism are analyzed and compared.
Abstract: Autonomous vehicles rely on their perception systems to acquire information about their immediate surroundings. It is necessary to detect the presence of other vehicles, pedestrians and other relevant entities. Safety concerns and the need for accurate estimations have led to the introduction of Light Detection and Ranging (LiDAR) systems in complement to the camera or radar-based perception systems. This article presents a review of state-of-the-art automotive LiDAR technologies and the perception algorithms used with those technologies. LiDAR systems are introduced first by analyzing the main components, from laser transmitter to its beam scanning mechanism. Advantages/disadvantages and the current status of various solutions are introduced and compared. Then, the specific perception pipeline for LiDAR data processing, from an autonomous vehicle perspective is detailed. The model-driven approaches and the emerging deep learning solutions are reviewed. Finally, we provide an overview of the limitations, challenges and trends for automotive LiDARs and perception systems.

140 citations


Posted Content
TL;DR: The goal is to push the limits of Visual SLAM algorithms in the real world by providing a challenging benchmark for testing new methods, while also using a large diverse training data for learning-based methods.
Abstract: We present a challenging dataset, the TartanAir, for robot navigation tasks and more. The data is collected in photo-realistic simulation environments with the presence of moving objects, changing light and various weather conditions. By collecting data in simulations, we are able to obtain multi-modal sensor data and precise ground truth labels such as the stereo RGB image, depth image, segmentation, optical flow, camera poses, and LiDAR point cloud. We set up large numbers of environments with various styles and scenes, covering challenging viewpoints and diverse motion patterns that are difficult to achieve by using physical data collection platforms. In order to enable data collection at such a large scale, we develop an automatic pipeline, including mapping, trajectory sampling, data processing, and data verification. We evaluate the impact of various factors on visual SLAM algorithms using our data. The results of state-of-the-art algorithms reveal that the visual SLAM problem is far from solved. Methods that show good performance on established datasets such as KITTI do not perform well in more difficult scenarios. Although we use the simulation, our goal is to push the limits of Visual SLAM algorithms in the real world by providing a challenging benchmark for testing new methods, while also using a large diverse training data for learning-based methods. Our dataset is available at \url{this http URL}.

122 citations


Posted Content
TL;DR: A comprehensive survey of the topic of BTs in Artificial Intelligence and Robotic applications is presented and the existing literature is described and categorized based on methods, application areas and contributions.
Abstract: Behavior Trees (BTs) were invented as a tool to enable modular AI in computer games, but have received an increasing amount of attention in the robotics community in the last decade. With rising demands on agent AI complexity, game programmers found that the Finite State Machines (FSM) that they used scaled poorly and were difficult to extend, adapt and reuse. In BTs, the state transition logic is not dispersed across the individual states, but organized in a hierarchical tree structure, with the states as leaves. This has a significant effect on modularity, which in turn simplifies both synthesis and analysis by humans and algorithms alike. These advantages are needed not only in game AI design, but also in robotics, as is evident from the research being done. In this paper we present a comprehensive survey of the topic of BTs in Artificial Intelligence and Robotic applications. The existing literature is described and categorized based on methods, application areas and contributions, and the paper is concluded with a list of open research challenges.

111 citations


Posted Content
TL;DR: The reinforcement learning approach, which the authors call BADGR, is an end-to-end learning-based mobile robot navigation system that can be trained with autonomously-labeled off-policy data gathered in real-world environments, without any simulation or human supervision.
Abstract: Mobile robot navigation is typically regarded as a geometric problem, in which the robot's objective is to perceive the geometry of the environment in order to plan collision-free paths towards a desired goal. However, a purely geometric view of the world can can be insufficient for many navigation problems. For example, a robot navigating based on geometry may avoid a field of tall grass because it believes it is untraversable, and will therefore fail to reach its desired goal. In this work, we investigate how to move beyond these purely geometric-based approaches using a method that learns about physical navigational affordances from experience. Our approach, which we call BADGR, is an end-to-end learning-based mobile robot navigation system that can be trained with self-supervised off-policy data gathered in real-world environments, without any simulation or human supervision. BADGR can navigate in real-world urban and off-road environments with geometrically distracting obstacles. It can also incorporate terrain preferences, generalize to novel environments, and continue to improve autonomously by gathering more data. Videos, code, and other supplemental material are available on our website this https URL

Proceedings ArticleDOI
TL;DR: This work proposes the new navigation solution, Navigation2, which builds on the successful legacy of ROS Navigation and is built on top of ROS2, a secure message passing framework suitable for safety critical applications and program lifecycle management.
Abstract: Developments in mobile robot navigation have enabled robots to operate in warehouses, retail stores, and on sidewalks around pedestrians. Various navigation solutions have been proposed, though few as widely adopted as ROS Navigation. 10 years on, it is still one of the most popular navigation solutions. Yet, ROS Navigation has failed to keep up with modern trends. We propose the new navigation solution, Navigation2, which builds on the successful legacy of ROS Navigation. Navigation2 uses a behavior tree for navigator task orchestration and employs new methods designed for dynamic environments applicable to a wider variety of modern sensors. It is built on top of ROS2, a secure message passing framework suitable for safety critical applications and program lifecycle management. We present experiments in a campus setting utilizing Navigation2 to operate safely alongside students over a marathon as an extension of the experiment proposed in Eppstein et al. The Navigation2 system is freely available at this https URL with a rich community and instructions.

Posted Content
TL;DR: SoftGym is presented, a set of open-source simulated benchmarks for manipulating deformable objects, with a standard OpenAI Gym API and a Python interface for creating new environments, to enable reproducible research in this important area.
Abstract: Manipulating deformable objects has long been a challenge in robotics due to its high dimensional state representation and complex dynamics. Recent success in deep reinforcement learning provides a promising direction for learning to manipulate deformable objects with data driven methods. However, existing reinforcement learning benchmarks only cover tasks with direct state observability and simple low-dimensional dynamics or with relatively simple image-based environments, such as those with rigid objects. In this paper, we present SoftGym, a set of open-source simulated benchmarks for manipulating deformable objects, with a standard OpenAI Gym API and a Python interface for creating new environments. Our benchmark will enable reproducible research in this important area. Further, we evaluate a variety of algorithms on these tasks and highlight challenges for reinforcement learning algorithms, including dealing with a state representation that has a high intrinsic dimensionality and is partially observable. The experiments and analysis indicate the strengths and limitations of existing methods in the context of deformable object manipulation that can help point the way forward for future methods development. Code and videos of the learned policies can be found on our project website.

Journal ArticleDOI
TL;DR: The coupled planning method uses stochastic and derivatives-free search to plan both foothold locations and horizontal motions due to the local minima produced by the terrain model, which shows remarkable capability to deal with a wide range of noncoplanar terrains.
Abstract: Planning whole-body motions while taking into account the terrain conditions is a challenging problem for legged robots since the terrain model might produce many local minima. Our coupled planning method uses stochastic and derivatives-free search to plan both foothold locations and horizontal motions due to the local minima produced by the terrain model. It jointly optimizes body motion, step duration and foothold selection, and it models the terrain as a cost-map. Due to the novel attitude planning method, the horizontal motion plans can be applied to various terrain conditions. The attitude planner ensures the robot stability by imposing limits to the angular acceleration. Our whole-body controller tracks compliantly trunk motions while avoiding slippage, as well as kinematic and torque limits. Despite the use of a simplified model, which is restricted to flat terrain, our approach shows remarkable capability to deal with a wide range of non-coplanar terrains. The results are validated by experimental trials and comparative evaluations in a series of terrains of progressively increasing complexity.

Journal ArticleDOI
TL;DR: In this article, a Representation-Free Model Predictive Control (RF-MPC) framework is presented for controlling various dynamic motions of a quadrupedal robot in 3D space.
Abstract: This paper presents a novel Representation-Free Model Predictive Control (RF-MPC) framework for controlling various dynamic motions of a quadrupedal robot in three dimensional (3D) space. Our formulation directly represents the rotational dynamics using the rotation matrix, which liberates us from the issues associated with the use of Euler angles and quaternion as the orientation representations. With a variation-based linearization scheme and a carefully constructed cost function, the MPC control law is transcribed to the standard Quadratic Program (QP) form. The MPC controller can operate at real-time rates of 250 Hz on a quadruped robot. Experimental results including periodic quadrupedal gaits and a controlled backflip validate that our control strategy could stabilize dynamic motions that involve singularity in 3D maneuvers.

Posted Content
Peng Hang1, Chen Lv1, Yang Xing1, Chao Huang1, Zhongxu Hu1 
TL;DR: Testing results indicate that both the Nash equilibrium and Stackelberg game theoretic approaches can provide reasonable human-like decision making for AVs.
Abstract: Considering that human-driven vehicles and autonomous vehicles (AVs) will coexist on roads in the future for a long time, how to merge AVs into human drivers traffic ecology and minimize the effect of AVs and their misfit with human drivers, are issues worthy of consideration. Moreover, different passengers have different needs for AVs, thus, how to provide personalized choices for different passengers is another issue for AVs. Therefore, a human-like decision making framework is designed for AVs in this paper. Different driving styles and social interaction characteristics are formulated for AVs regarding driving safety, ride comfort and travel efficiency, which are considered in the modeling process of decision making. Then, Nash equilibrium and Stackelberg game theory are applied to the noncooperative decision making. In addition, potential field method and model predictive control (MPC) are combined to deal with the motion prediction and planning for AVs, which provides predicted motion information for the decision-making module. Finally, two typical testing scenarios of lane change, i.e., merging and overtaking, are carried out to evaluate the feasibility and effectiveness of the proposed decision-making framework considering different human-like behaviors. Testing results indicate that both the two game theoretic approaches can provide reasonable human-like decision making for AVs. Compared with the Nash equilibrium approach, under the normal driving style, the cost value of decision making using the Stackelberg game theoretic approach is reduced by over 20%.

Posted Content
TL;DR: This paper trains an inverse dynamics model and uses it to predict actions for state-only demonstrations and considerably outperforms RL alone, and is able to learn from demonstrations with different dynamics, morphologies, and objects.
Abstract: Dexterous manipulation has been a long-standing challenge in robotics. Recently, modern model-free RL has demonstrated impressive results on a number of problems. However, complex domains like dexterous manipulation remain a challenge for RL due to the poor sample complexity. To address this, current approaches employ expert demonstrations in the form of state-action pairs, which are difficult to obtain for real-world settings such as learning from videos. In this work, we move toward a more realistic setting and explore state-only imitation learning. To tackle this setting, we train an inverse dynamics model and use it to predict actions for state-only demonstrations. The inverse dynamics model and the policy are trained jointly. Our method performs on par with state-action approaches and considerably outperforms RL alone. By not relying on expert actions, we are able to learn from demonstrations with different dynamics, morphologies, and objects.

Posted Content
TL;DR: The Stochastic Grounded Action Transformation (SGAT) algorithm is introduced, which models this stochasticity when grounding the simulator, and it is found experimentally—for both simulated and physical target domains—that SGAT can find policies that are robust to stoChasticity in the target domain.
Abstract: Robot control policies learned in simulation do not often transfer well to the real world. Many existing solutions to this sim-to-real problem, such as the Grounded Action Transformation (GAT) algorithm, seek to correct for or ground these differences by matching the simulator to the real world. However, the efficacy of these approaches is limited if they do not explicitly account for stochasticity in the target environment. In this work, we analyze the problems associated with grounding a deterministic simulator in a stochastic real world environment, and we present examples where GAT fails to transfer a good policy due to stochastic transitions in the target domain. In response, we introduce the Stochastic Grounded Action Transformation(SGAT) algorithm,which models this stochasticity when grounding the simulator. We find experimentally for both simulated and physical target domains that SGAT can find policies that are robust to stochasticity in the target domain

Journal ArticleDOI
TL;DR: Li-OM (Livox LiDar-inertial odometry and mapping) is real-time capable and achieves superior accuracy over state-of-the-art systems for both LiDAR types on public data sets of mechanical LiD ARs and in experiments using the Livox Horizon.
Abstract: We present a novel tightly-coupled LiDAR-inertial odometry and mapping scheme for both solid-state and mechanical LiDARs. As frontend, a feature-based lightweight LiDAR odometry provides fast motion estimates for adaptive keyframe selection. As backend, a hierarchical keyframe-based sliding window optimization is performed through marginalization for directly fusing IMU and LiDAR measurements. For the Livox Horizon, a newly released solid-state LiDAR, a novel feature extraction method is proposed to handle its irregular scan pattern during preprocessing. LiLi-OM (Livox LiDAR-inertial odometry and mapping) is real-time capable and achieves superior accuracy over state-of-the-art systems for both LiDAR types on public data sets of mechanical LiDARs and in experiments using the Livox Horizon. Source code and recorded experimental data sets are available at https://github.com/KIT-ISAS/lili-om.

Journal ArticleDOI
TL;DR: Lio is a mobile robot platform with a multi-functional arm explicitly designed for human-robot interaction and personal care assistant tasks, and complies with ISO13482 - Safety requirements for personal care robots, meaning it can be directly tested and deployed in care facilities.
Abstract: Lio is a mobile robot platform with a multi-functional arm explicitly designed for human-robot interaction and personal care assistant tasks. The robot has already been deployed in several health care facilities, where it is functioning autonomously, assisting staff and patients on an everyday basis. Lio is intrinsically safe by having full coverage in soft artificial-leather material as well as having collision detection, limited speed and forces. Furthermore, the robot has a compliant motion controller. A combination of visual, audio, laser, ultrasound and mechanical sensors are used for safe navigation and environment understanding. The ROS-enabled setup allows researchers to access raw sensor data as well as have direct control of the robot. The friendly appearance of Lio has resulted in the robot being well accepted by health care staff and patients. Fully autonomous operation is made possible by a flexible decision engine, autonomous navigation and automatic recharging. Combined with time-scheduled task triggers, this allows Lio to operate throughout the day, with a battery life of up to 8 hours and recharging during idle times. A combination of powerful on-board computing units provides enough processing power to deploy artificial intelligence and deep learning-based solutions on-board the robot without the need to send any sensitive data to cloud services, guaranteeing compliance with privacy requirements. During the COVID-19 pandemic, Lio was rapidly adjusted to perform additional functionality like disinfection and remote elevated body temperature detection. It complies with ISO13482 - Safety requirements for personal care robots, meaning it can be directly tested and deployed in care facilities.

Posted Content
TL;DR: VDO-SLAM is presented, a robust object-aware dynamic SLAM system that exploits semantic information to enable motion estimation of rigid objects in the scene without any prior knowledge of the objects shape or motion models resulting in accurate robot pose and spatio-temporal map estimation.
Abstract: The scene rigidity assumption, also known as the static world assumption, is common in SLAM algorithms. Most existing algorithms operating in complex dynamic environments simplify the problem by removing moving objects from consideration or tracking them separately. Such strong assumptions limit the deployment of autonomous mobile robotic systems in a wide range of important real world applications involving highly dynamic and unstructured environments. This paper presents VDO-SLAM, a robust object-aware dynamic SLAM system that exploits semantic information to enable motion estimation of rigid objects in the scene without any prior knowledge of the objects shape or motion models. The proposed approach integrates dynamic and static structures in the environment into a unified estimation framework resulting in accurate robot pose and spatio-temporal map estimation. We provide a way to extract velocity estimates from object pose change of moving objects in the scene providing an important functionality for navigation in complex dynamic environments. We demonstrate the performance of the proposed system on a number of real indoor and outdoor datasets. Results show consistent and substantial improvements over state-of-the-art algorithms. An open-source version of the source code is available.

Posted Content
TL;DR: CausalWorld is proposed, a benchmark for causal structure and transfer learning in a robotic manipulation environment that is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer.
Abstract: Despite recent successes of reinforcement learning (RL), it remains a challenge for agents to transfer learned skills to related environments. To facilitate research addressing this problem, we propose CausalWorld, a benchmark for causal structure and transfer learning in a robotic manipulation environment. The environment is a simulation of an open-source robotic platform, hence offering the possibility of sim-to-real transfer. Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures. The key strength of CausalWorld is that it provides a combinatorial family of such tasks with common causal structure and underlying factors (including, e.g., robot and object masses, colors, sizes). The user (or the agent) may intervene on all causal variables, which allows for fine-grained control over how similar different tasks (or task distributions) are. One can thus easily define training and evaluation distributions of a desired difficulty level, targeting a specific form of generalization (e.g., only changes in appearance or object mass). Further, this common parametrization facilitates defining curricula by interpolating between an initial and a target task. While users may define their own task distributions, we present eight meaningful distributions as concrete benchmarks, ranging from simple to very challenging, all of which require long-horizon planning as well as precise low-level motor control. Finally, we provide baseline results for a subset of these tasks on distinct training curricula and corresponding evaluation protocols, verifying the feasibility of the tasks in this benchmark.

Journal ArticleDOI
TL;DR: An efficient multi-sensor odometry system for mobile platforms that jointly optimizes visual, lidar, and inertial information within a single integrated factor graph that runs in real-time at full framerate using fixed lag smoothing is presented.
Abstract: We present an efficient multi-sensor odometry system for mobile platforms that jointly optimizes visual, lidar, and inertial information within a single integrated factor graph. This runs in real-time at full framerate using fixed lag smoothing. To perform such tight integration, a new method to extract 3D line and planar primitives from lidar point clouds is presented. This approach overcomes the suboptimality of typical frame-to-frame tracking methods by treating the primitives as landmarks and tracking them over multiple scans. True integration of lidar features with standard visual features and IMU is made possible using a subtle passive synchronization of lidar and camera frames. The lightweight formulation of the 3D features allows for real-time execution on a single CPU. Our proposed system has been tested on a variety of platforms and scenarios, including underground exploration with a legged robot and outdoor scanning with a dynamically moving handheld device, for a total duration of 96 min and 2.4 km traveled distance. In these test sequences, using only one exteroceptive sensor leads to failure due to either underconstrained geometry (affecting lidar) or textureless areas caused by aggressive lighting changes (affecting vision). In these conditions, our factor graph naturally uses the best information available from each sensor modality without any hard switches.

Posted Content
TL;DR: This work introduces a method for incorporating unstructured natural language into imitation learning and demonstrates in a set of simulation experiments how this approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compares the results to a variety of alternative methods.
Abstract: Imitation learning is a popular approach for teaching motor skills to robots. However, most approaches focus on extracting policy parameters from execution traces alone (i.e., motion trajectories and perceptual data). No adequate communication channel exists between the human expert and the robot to describe critical aspects of the task, such as the properties of the target object or the intended shape of the motion. Motivated by insights into the human teaching process, we introduce a method for incorporating unstructured natural language into imitation learning. At training time, the expert can provide demonstrations along with verbal descriptions in order to describe the underlying intent (e.g., "go to the large green bowl"). The training process then interrelates these two modalities to encode the correlations between language, perception, and motion. The resulting language-conditioned visuomotor policies can be conditioned at runtime on new human commands and instructions, which allows for more fine-grained control over the trained policies while also reducing situational ambiguity. We demonstrate in a set of simulation experiments how our approach can learn language-conditioned manipulation policies for a seven-degree-of-freedom robot arm and compare the results to a variety of alternative methods.

Posted Content
TL;DR: ACRONYM, a dataset for robot grasp planning based on physics simulation, contains 17.7M parallel-jaw grasps, spanning 8872 objects from 262 different categories, each labeled with the grasp result obtained from a physics simulator is introduced.
Abstract: We introduce ACRONYM, a dataset for robot grasp planning based on physics simulation. The dataset contains 17.7M parallel-jaw grasps, spanning 8872 objects from 262 different categories, each labeled with the grasp result obtained from a physics simulator. We show the value of this large and diverse dataset by using it to train two state-of-the-art learning-based grasp planning algorithms. Grasp performance improves significantly when compared to the original smaller dataset. Data and tools can be accessed at this https URL.

Posted Content
TL;DR: TACTO is a step towards the widespread adoption of touch sensing in robotic applications, and to enable machine learning practitioners interested in multi-modal learning and control, and is provided a proof-of-concept that TACTO can be successfully used for Sim2Real applications.
Abstract: Simulators perform an important role in prototyping, debugging and benchmarking new advances in robotics and learning for control. Although many physics engines exist, some aspects of the real-world are harder than others to simulate. One of the aspects that have so far eluded accurate simulation is touch sensing. To address this gap, we present TACTO -- a fast, flexible and open-source simulator for vision-based tactile sensors. This simulator allows to render realistic high-resolution touch readings at hundreds of frames per second, and can be easily configured to simulate different vision-based tactile sensors, including GelSight, DIGIT and OmniTact. In this paper, we detail the principles that drove the implementation of TACTO and how they are reflected in its architecture. We demonstrate TACTO on a perceptual task, by learning to predict grasp stability using touch from 1 million grasps, and on a marble manipulation control task. We believe that TACTO is a step towards the widespread adoption of touch sensing in robotic applications, and to enable machine learning practitioners interested in multi-modal learning and control. TACTO is open-source at this https URL.

Posted Content
TL;DR: A reward-specification framework based on composing simple probabilistic periodic costs on basic forces and velocities is proposed and instantiate this framework to define a parametric reward function with intuitive settings for all common bipedal gaits - standing, walking, hopping, running, and skipping.
Abstract: We study the problem of realizing the full spectrum of bipedal locomotion on a real robot with sim-to-real reinforcement learning (RL). A key challenge of learning legged locomotion is describing different gaits, via reward functions, in a way that is intuitive for the designer and specific enough to reliably learn the gait across different initial random seeds or hyperparameters. A common approach is to use reference motions (e.g. trajectories of joint positions) to guide learning. However, finding high-quality reference motions can be difficult and the trajectories themselves narrowly constrain the space of learned motion. At the other extreme, reference-free reward functions are often underspecified (e.g. move forward) leading to massive variance in policy behavior, or are the product of significant reward-shaping via trial-and-error, making them exclusive to specific gaits. In this work, we propose a reward-specification framework based on composing simple probabilistic periodic costs on basic forces and velocities. We instantiate this framework to define a parametric reward function with intuitive settings for all common bipedal gaits - standing, walking, hopping, running, and skipping. Using this function we demonstrate successful sim-to-real transfer of the learned gaits to the bipedal robot Cassie, as well as a generic policy that can transition between all of the two-beat gaits.

Posted Content
TL;DR: This work proposes to simply learn the Policy in the Latent Action Space (PLAS) such that this requirement is naturally satisfied, and demonstrates that this method provides competitive performance consistently across various continuous control tasks and different types of datasets, outperforming existing offline reinforcement learning methods with explicit constraints.
Abstract: The goal of offline reinforcement learning is to learn a policy from a fixed dataset, without further interactions with the environment. This setting will be an increasingly more important paradigm for real-world applications of reinforcement learning such as robotics, in which data collection is slow and potentially dangerous. Existing off-policy algorithms have limited performance on static datasets due to extrapolation errors from out-of-distribution actions. This leads to the challenge of constraining the policy to select actions within the support of the dataset during training. We propose to simply learn the Policy in the Latent Action Space (PLAS) such that this requirement is naturally satisfied. We evaluate our method on continuous control benchmarks in simulation and a deformable object manipulation task with a physical robot. We demonstrate that our method provides competitive performance consistently across various continuous control tasks and different types of datasets, outperforming existing offline reinforcement learning methods with explicit constraints. Videos and code are available at this https URL.

Journal ArticleDOI
TL;DR: This work proposes a unique multi-frame localization paradigm for estimating the states of a UAV in various frames of reference using multiple sensors simultaneously, which enables complex missions in GNSS and GNSS-denied environments.
Abstract: We present a multirotor Unmanned Aerial Vehicle control (UAV) and estimation system for supporting replicable research through realistic simulations and real-world experiments. We propose a unique multi-frame localization paradigm for estimating the states of a UAV in various frames of reference using multiple sensors simultaneously. The system enables complex missions in GNSS and GNSS-denied environments, including outdoor-indoor transitions and the execution of redundant estimators for backing up unreliable localization sources. Two feedback control designs are presented: one for precise and aggressive maneuvers, and the other for stable and smooth flight with a noisy state estimate. The proposed control and estimation pipeline are constructed without using the Euler/Tait-Bryan angle representation of orientation in 3D. Instead, we rely on rotation matrices and a novel heading-based convention to represent the one free rotational degree-of-freedom in 3D of a standard multirotor helicopter. We provide an actively maintained and well-documented open-source implementation, including realistic simulation of UAV, sensors, and localization systems. The proposed system is the product of years of applied research on multi-robot systems, aerial swarms, aerial manipulation, motion planning, and remote sensing. All our results have been supported by real-world system deployment that shaped the system into the form presented here. In addition, the system was utilized during the participation of our team from the CTU in Prague in the prestigious MBZIRC 2017 and 2020 robotics competitions, and also in the DARPA SubT challenge. Each time, our team was able to secure top places among the best competitors from all over the world. On each occasion, the challenges has motivated the team to improve the system and to gain a great amount of high-quality experience within tight deadlines.

Posted Content
TL;DR: This paper develops a system for learning legged locomotion policies with deep RL in the real world with minimal human effort by developing a multi-task learning procedure, an automatic reset controller, and a safety-constrained RL framework.
Abstract: Reliable and stable locomotion has been one of the most fundamental challenges for legged robots. Deep reinforcement learning (deep RL) has emerged as a promising method for developing such control policies autonomously. In this paper, we develop a system for learning legged locomotion policies with deep RL in the real world with minimal human effort. The key difficulties for on-robot learning systems are automatic data collection and safety. We overcome these two challenges by developing a multi-task learning procedure and a safety-constrained RL framework. We tested our system on the task of learning to walk on three different terrains: flat ground, a soft mattress, and a doormat with crevices. Our system can automatically and efficiently learn locomotion skills on a Minitaur robot with little human intervention. The supplemental video can be found at: \url{this https URL}.