scispace - formally typeset
Search or ask a question
Author

Matia Pizzoli

Other affiliations: University of Zurich
Bio: Matia Pizzoli is an academic researcher from Sapienza University of Rome. The author has contributed to research in topics: Speaker recognition & Visual search. The author has an hindex of 10, co-authored 18 publications receiving 2259 citations. Previous affiliations of Matia Pizzoli include University of Zurich.

Papers
More filters
Proceedings ArticleDOI
29 Sep 2014
TL;DR: A semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods and applied to micro-aerial-vehicle state-estimation in GPS-denied environments is proposed.
Abstract: We propose a semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods. The semi-direct approach eliminates the need of costly feature extraction and robust matching techniques for motion estimation. Our algorithm operates directly on pixel intensities, which results in subpixel precision at high frame-rates. A probabilistic mapping method that explicitly models outlier measurements is used to estimate 3D points, which results in fewer outliers and more reliable points. Precise and high frame-rate motion estimation brings increased robustness in scenes of little, repetitive, and high-frequency texture. The algorithm is applied to micro-aerial-vehicle state-estimation in GPS-denied environments and runs at 55 frames per second on the onboard embedded computer and at more than 300 frames per second on a consumer laptop. We call our approach SVO (Semi-direct Visual Odometry) and release our implementation as open-source software.

1,814 citations

Proceedings ArticleDOI
29 Sep 2014
TL;DR: This work proposes a novel approach to depth map computation that combines Bayesian estimation and recent development on convex optimization for image processing, and demonstrates that this method outperforms state-of-the-art techniques in terms of accuracy.
Abstract: In this paper, we solve the problem of estimating dense and accurate depth maps from a single moving camera. A probabilistic depth measurement is carried out in real time on a per-pixel basis and the computed uncertainty is used to reject erroneous estimations and provide live feedback on the reconstruction progress. Our contribution is a novel approach to depth map computation that combines Bayesian estimation and recent development on convex optimization for image processing. We demonstrate that our method outperforms state-of-the-art techniques in terms of accuracy, while exhibiting high efficiency in memory usage and computing power. We call our approach REMODE (REgularized MOnocular Depth Estimation) and the CUDA-based implementation runs at 30Hz on a laptop computer.

339 citations

Journal ArticleDOI
TL;DR: A vision‐based quadrotor micro aerial vehicle that can autonomously execute a given trajectory and provide a live, dense three‐dimensional map of an area and the practical challenges and lessons learned are discussed.
Abstract: The use of mobile robots in search-and-rescue and disaster-response missions has increased significantly in recent years. However, they are still remotely controlled by expert professionals on an actuator set-point level, and they would benefit, therefore, from any bit of autonomy added. This would allow them to execute high-level commands, such as "execute this trajectory" or "map this area." In this paper, we describe a vision-based quadrotor micro aerial vehicle that can autonomously execute a given trajectory and provide a live, dense three-dimensional 3D map of an area. This map is presented to the operator while the quadrotor is mapping, so that there are no unnecessary delays in the mission. Our system does not rely on any external positioning system e.g., GPS or motion capture systems as sensing, computation, and control are performed fully onboard a smartphone processor. Since we use standard, off-the-shelf components from the hobbyist and smartphone markets, the total cost of our system is very low. Due to its low weight below 450 g, it is also passively safe and can be deployed close to humans. We describe both the hardware and the software architecture of our system. We detail our visual odometry pipeline, the state estimation and control, and our live dense 3D mapping, with an overview of how all the modules work and how they have been integrated into the final system. We report the results of our experiments both indoors and outdoors. Our quadrotor was demonstrated over 100 times at multiple trade fairs, at public events, and to rescue professionals. We discuss the practical challenges and lessons learned. Code, datasets, and videos are publicly available to the robotics community.

214 citations

Proceedings ArticleDOI
01 Nov 2012
TL;DR: NIFTi deployed a team of humans and robots (UGV, UAV) in the red-area of Mirandola, Emilia-Romagna, from Tuesday July 24 until Friday July 27, 2012, to assess damage to historical buildings, and cultural artifacts located therein.
Abstract: In May 2012, two major earthquakes occurred in the Emilia-Romagna region, Northern Italy, followed by further aftershocks and earthquakes in June 2012. This sequence of earthquakes and shocks caused multiple casualties, and widespread damage to numerous historical buildings in the region. The Italian National Fire Corps deployed disaster response and recovery of people and buildings. In June 2012, they requested the aid of the EU-funded project NIFTi, to assess damage to historical buildings, and cultural artifacts located therein. To this end, NIFTi deployed a team of humans and robots (UGV, UAV) in the red-area of Mirandola, Emilia-Romagna, from Tuesday July 24 until Friday July 27, 2012. The team worked closely together with the members of the Italian National Fire Corps involved in the red area. This paper describes the deployment, and experience.

122 citations

Proceedings ArticleDOI
01 Nov 2013
TL;DR: A novel algorithm integrating dense reconstructions from monocular views, Monte Carlo localization, and an iterative pose refinement is presented, which achieves high accuracy whereas appearance-based, state-of-the-art approaches fail.
Abstract: We propose a new method for the localization of a Micro Aerial Vehicle (MAV) with respect to a ground robot. We solve the problem of registering the 3D maps computed by the robots using different sensors: a dense 3D reconstruction from the MAV monocular camera is aligned with the map computed from the depth sensor on the ground robot. Once aligned, the dense reconstruction from the MAV is used to augment the map computed by the ground robot, by extending it with the information conveyed by the aerial views. The overall approach is novel, as it builds on recent developments in live dense reconstruction from moving cameras to address the problem of air-ground localization. The core of our contribution is constituted by a novel algorithm integrating dense reconstructions from monocular views, Monte Carlo localization, and an iterative pose refinement. In spite of the radically different vantage points from which the maps are acquired, the proposed method achieves high accuracy whereas appearance-based, state-of-the-art approaches fail. Experimental validation in indoor and outdoor scenarios reported an accuracy in position estimation of 0.08 meters and real time performance. This demonstrates that our new approach effectively overcomes the limitations imposed by the difference in sensors and vantage points that negatively affect previous techniques relying on matching visual features.

95 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: ORB-SLAM as discussed by the authors is a feature-based monocular SLAM system that operates in real time, in small and large indoor and outdoor environments, with a survival of the fittest strategy that selects the points and keyframes of the reconstruction.
Abstract: This paper presents ORB-SLAM, a feature-based monocular simultaneous localization and mapping (SLAM) system that operates in real time, in small and large indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.

4,522 citations

Journal ArticleDOI
TL;DR: A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation.
Abstract: This paper presents ORB-SLAM, a feature-based monocular SLAM system that operates in real time, in small and large, indoor and outdoor environments. The system is robust to severe motion clutter, allows wide baseline loop closing and relocalization, and includes full automatic initialization. Building on excellent algorithms of recent years, we designed from scratch a novel system that uses the same features for all SLAM tasks: tracking, mapping, relocalization, and loop closing. A survival of the fittest strategy that selects the points and keyframes of the reconstruction leads to excellent robustness and generates a compact and trackable map that only grows if the scene content changes, allowing lifelong operation. We present an exhaustive evaluation in 27 sequences from the most popular datasets. ORB-SLAM achieves unprecedented performance with respect to other state-of-the-art monocular SLAM approaches. For the benefit of the community, we make the source code public.

3,807 citations

Book ChapterDOI
06 Sep 2014
TL;DR: A novel direct tracking method which operates on \(\mathfrak{sim}(3)\), thereby explicitly detecting scale-drift, and an elegant probabilistic solution to include the effect of noisy depth values into tracking are introduced.
Abstract: We propose a direct (feature-less) monocular SLAM algorithm which, in contrast to current state-of-the-art regarding direct methods, allows to build large-scale, consistent maps of the environment Along with highly accurate pose estimation based on direct image alignment, the 3D environment is reconstructed in real-time as pose-graph of keyframes with associated semi-dense depth maps These are obtained by filtering over a large number of pixelwise small-baseline stereo comparisons The explicitly scale-drift aware formulation allows the approach to operate on challenging sequences including large variations in scene scale Major enablers are two key novelties: (1) a novel direct tracking method which operates on \(\mathfrak{sim}(3)\), thereby explicitly detecting scale-drift, and (2) an elegant probabilistic solution to include the effect of noisy depth values into tracking The resulting direct monocular SLAM system runs in real-time on a CPU

3,273 citations

01 Nov 2008

2,686 citations

Journal ArticleDOI
TL;DR: In this article, a robust and versatile monocular visual-inertial state estimator is presented, which is the minimum sensor suite (in size, weight, and power) for the metric six degrees of freedom (DOF) state estimation.
Abstract: One camera and one low-cost inertial measurement unit (IMU) form a monocular visual-inertial system (VINS), which is the minimum sensor suite (in size, weight, and power) for the metric six degrees-of-freedom (DOF) state estimation. In this paper, we present VINS-Mono: a robust and versatile monocular visual-inertial state estimator. Our approach starts with a robust procedure for estimator initialization. A tightly coupled, nonlinear optimization-based method is used to obtain highly accurate visual-inertial odometry by fusing preintegrated IMU measurements and feature observations. A loop detection module, in combination with our tightly coupled formulation, enables relocalization with minimum computation. We additionally perform 4-DOF pose graph optimization to enforce the global consistency. Furthermore, the proposed system can reuse a map by saving and loading it in an efficient way. The current and previous maps can be merged together by the global pose graph optimization. We validate the performance of our system on public datasets and real-world experiments and compare against other state-of-the-art algorithms. We also perform an onboard closed-loop autonomous flight on the microaerial-vehicle platform and port the algorithm to an iOS-based demonstration. We highlight that the proposed work is a reliable, complete, and versatile system that is applicable for different applications that require high accuracy in localization. We open source our implementations for both PCs ( https://github.com/HKUST-Aerial-Robotics/VINS-Mono ) and iOS mobile devices ( https://github.com/HKUST-Aerial-Robotics/VINS-Mobile ).

2,305 citations