CoSLAM: Collaborative Visual SLAM in Dynamic Environments

doi:10.1109/TPAMI.2012.104

Home
/
Papers
/
CoSLAM: Collaborative Visual SLAM in Dynamic Environments

Journal Article•DOI•

CoSLAM: Collaborative Visual SLAM in Dynamic Environments

Danping Zou¹, Ping Tan¹•Institutions (1)

01 Feb 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 35, Iss: 2, pp 354-366

TL;DR: Experimental results demonstrate that the vision-based simultaneous localization and mapping in dynamic environments with multiple cameras can work robustly in highly dynamic environments and produce more accurate results in static environments.

read less

Abstract: This paper studies the problem of vision-based simultaneous localization and mapping (SLAM) in dynamic environments with multiple cameras. These cameras move independently and can be mounted on different platforms. All cameras work together to build a global map, including 3D positions of static background points and trajectories of moving foreground points. We introduce intercamera pose estimation and intercamera mapping to deal with dynamic objects in the localization and mapping process. To further enhance the system robustness, we maintain the position uncertainty of each map point. To facilitate intercamera operations, we cluster cameras into groups according to their view overlap, and manage the split and merge of camera groups in real time. Experimental results demonstrate that our system can work robustly in highly dynamic environments and produce more accurate results in static environments.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A Survey on Aerial Swarm Robotics

[...]

Soon-Jo Chung¹, Aditya A. Paranjape², Philip Dames³, Shaojie Shen⁴, Vijay Kumar⁵ - Show less +1 more•Institutions (5)

California Institute of Technology¹, Imperial College London², Temple University³, Hong Kong University of Science and Technology⁴, University of Pennsylvania⁵

03 Aug 2018-IEEE Transactions on Robotics

TL;DR: The main sections of this paper focus on major results covering trajectory generation, task allocation, adversarial control, distributed sensing, monitoring, and mapping, and dynamic modeling and conditions for stability and controllability that are essential in order to achieve cooperative flight and distributed sensing.

...read moreread less

Abstract: The use of aerial swarms to solve real-world problems has been increasing steadily, accompanied by falling prices and improving performance of communication, sensing, and processing hardware. The commoditization of hardware has reduced unit costs, thereby lowering the barriers to entry to the field of aerial swarm robotics. A key enabling technology for swarms is the family of algorithms that allow the individual members of the swarm to communicate and allocate tasks amongst themselves, plan their trajectories, and coordinate their flight in such a way that the overall objectives of the swarm are achieved efficiently. These algorithms, often organized in a hierarchical fashion, endow the swarm with autonomy at every level, and the role of a human operator can be reduced, in principle, to interactions at a higher level without direct intervention. This technology depends on the clever and innovative application of theoretical tools from control and estimation. This paper reviews the state of the art of these theoretical tools, specifically focusing on how they have been developed for, and applied to, aerial swarms. Aerial swarms differ from swarms of ground-based vehicles in two respects: they operate in a three-dimensional space and the dynamics of individual vehicles adds an extra layer of complexity. We review dynamic modeling and conditions for stability and controllability that are essential in order to achieve cooperative flight and distributed sensing. The main sections of this paper focus on major results covering trajectory generation, task allocation, adversarial control, distributed sensing, monitoring, and mapping. Wherever possible, we indicate how the physics and subsystem technologies of aerial robots are brought to bear on these individual areas.

...read moreread less

333 citations

Cites background from "CoSLAM: Collaborative Visual SLAM i..."

...Robots may also maintain the position uncertainty of each point in the map for handling of dynamic objects [190]....
[...]

Journal Article•DOI•

Visual SLAM and Structure from Motion in Dynamic Environments: A Survey

[...]

Muhamad Risqi U. Saputra¹, Andrew Markham¹, Niki Trigoni¹•Institutions (1)

University of Oxford¹

20 Feb 2018-ACM Computing Surveys

TL;DR: This article presents for the first time a survey of visual SLAM and SfM techniques that are targeted toward operation in dynamic environments and identifies three main problems: how to perform reconstruction, how to segment and track dynamic objects, and how to achieve joint motion segmentation and reconstruction.

...read moreread less

Abstract: In the last few decades, Structure from Motion (SfM) and visual Simultaneous Localization and Mapping (visual SLAM) techniques have gained significant interest from both the computer vision and robotic communities. Many variants of these techniques have started to make an impact in a wide range of applications, including robot navigation and augmented reality. However, despite some remarkable results in these areas, most SfM and visual SLAM techniques operate based on the assumption that the observed environment is static. However, when faced with moving objects, overall system accuracy can be jeopardized. In this article, we present for the first time a survey of visual SLAM and SfM techniques that are targeted toward operation in dynamic environments. We identify three main problems: how to perform reconstruction (robust visual SLAM), how to segment and track dynamic objects, and how to achieve joint motion segmentation and reconstruction. Based on this categorization, we provide a comprehensive taxonomy of existing approaches. Finally, the advantages and disadvantages of each solution class are critically discussed from the perspective of practicality and robustness.

...read moreread less

298 citations

Cites background from "CoSLAM: Collaborative Visual SLAM i..."

...Tan et al. [152] also use a similar projection principle to detect dynamic features....
[...]
...With the proliferation of mobile and wearable devices, this natural extension of visual SLAM in dynamic environments will benefit many applications, including obstacle avoidance [63], human-robot interaction [51], people following [183], path planning [19], cooperative robotics [46], collaborative mapping [28], driverless cars [102], augmented reality (e....
[...]
...Zou and Tan [28] project features from the previous frame into the current frame and measure the distance from the tracked features....
[...]

Proceedings Article•DOI•

Robust monocular SLAM in dynamic environments

[...]

Tan Wei¹, Haomin Liu¹, Zilong Dong¹, Guofeng Zhang¹, Hujun Bao¹ - Show less +1 more•Institutions (1)

Zhejiang University¹

23 Dec 2013

TL;DR: A novel prior-based adaptive RANSAC algorithm (PARSAC) is proposed to efficiently remove outliers even when the inlier ratio is rather low, so that the camera pose can be reliably estimated even in very challenging situations.

...read moreread less

Abstract: We present a novel real-time monocular SLAM system which can robustly work in dynamic environments. Different to the traditional methods, our system allows parts of the scene to be dynamic or the whole scene to gradually change. The key contribution is that we propose a novel online keyframe representation and updating method to adaptively model the dynamic environments, where the appearance or structure changes can be effectively detected and handled. We reliably detect the changed features by projecting them from the keyframes to current frame for appearance and structure comparison. The appearance change due to occlusions also can be reliably detected and handled. The keyframes with large changed areas will be replaced by newly selected frames. In addition, we propose a novel prior-based adaptive RANSAC algorithm (PARSAC) to efficiently remove outliers even when the inlier ratio is rather low, so that the camera pose can be reliably estimated even in very challenging situations. Experimental results demonstrate that the proposed system can robustly work in dynamic environments and outperforms the state-of-the-art SLAM systems (e.g. PTAM).

...read moreread less

235 citations

Cites methods from "CoSLAM: Collaborative Visual SLAM i..."

...[16] A. Kawewong, N. Tongprasit, S. Tangruamsub, and O. Hasegawa....
[...]
...[43] D. Zou and P. Tan....
[...]
...In Zou and Tan [43]’s CoSLAM system, all cameras can move freely in the scene, where each camera works independently with intra-camera pose estimation, and both static and dynamic points are used to obtain inter-camera pose estimation for all cameras....
[...]
...[37] A. Taneja, L. Ballan, and M. Pollefeys....
[...]
...Taneja et al. [37] argue that changes in image appearance may not lead to changes in the geometry, and propose a graph based method....
[...]

Proceedings Article•DOI•

MaskFusion: Real-Time Recognition, Tracking and Reconstruction of Multiple Moving Objects

[...]

Martin Rünz¹, Maud Buffier, Lourdes Agapito¹•Institutions (1)

University College London¹

01 Oct 2018

TL;DR: MaskFusion as discussed by the authors is a real-time object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene.

...read moreread less

Abstract: We present MaskFusion, a real-time, object-aware, semantic and dynamic RGB-D SLAM system that goes beyond traditional systems which output a purely geometric map of a static scene. MaskFusion recognizes, segments and assigns semantic class labels to different objects in the scene, while tracking and reconstructing them even when they move independently from the camera. As an RGB-D camera scans a cluttered scene, image-based instance-level semantic segmentation creates semantic object masks that enable realtime object recognition and the creation of an object-level representation for the world map. Unlike previous recognition-based SLAM systems, MaskFusion does not require known models of the objects it can recognize, and can deal with multiple independent motions. MaskFusion takes full advantage of using instance-level semantic segmentation to enable semantic labels to be fused into an object-aware map, unlike recent semantics enabled SLAM systems that perform voxel-level semantic segmentation. We show augmented-reality applications that demonstrate the unique features of the map output by MaskFusion: instance-aware, semantic and dynamic. Code will be made available.

...read moreread less

234 citations

Proceedings Article•DOI•

Collaborative monocular SLAM with multiple Micro Aerial Vehicles

[...]

Christian Forster¹, Simon Lynen², Laurent Kneip², Davide Scaramuzza¹•Institutions (2)

University of Zurich¹, Institute of Robotics and Intelligent Systems²

01 Nov 2013

TL;DR: This paper presents a framework for collaborative localization and mapping with multiple Micro Aerial Vehicles (MAVs) in unknown environments using an onboard, monocular visual odometry algorithm and is the first work on real-time collaborative monocular SLAM, which has also been applied to MAVs.

...read moreread less

Abstract: This paper presents a framework for collaborative localization and mapping with multiple Micro Aerial Vehicles (MAVs) in unknown environments. Each MAV estimates its motion individually using an onboard, monocular visual odometry algorithm. The system of MAVs acts as a distributed preprocessor that streams only features of selected keyframes and relative-pose estimates to a centralized ground station. The ground station creates an individual map for each MAV and merges them together whenever it detects overlaps. This allows the MAVs to express their position in a common, global coordinate frame. The key to real-time performance is the design of data-structures and processes that allow multiple threads to concurrently read and modify the same map. The presented framework is tested in both indoor and outdoor environments with up to three MAVs. To the best of our knowledge, this is the first work on real-time collaborative monocular SLAM, which has also been applied to MAVs.

...read moreread less

204 citations

Cites background from "CoSLAM: Collaborative Visual SLAM i..."

...…supported by the Swiss National Science Foundation through project number 200021-143607 (”Swarm of Flying Cameras”) and the National Centre of Competence in Research Robotics. robots allows the computation of the relative configuration of the agents, which forms a basis for multi-robot path…...
[...]
...This research was supported by the Swiss National Science Foundation through project number 200021-143607 (”Swarm of Flying Cameras”) and the National Centre of Competence in Research Robotics. robots allows the computation of the relative configuration of the agents, which forms a basis for multi-robot path planning and cooperative behaviors....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Good features to track

[...]

Jianbo Shi¹, Tomasi²•Institutions (2)

Cornell University¹, Stanford University²

21 Jun 1994

TL;DR: A feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world are proposed.

...read moreread less

Abstract: No feature-based vision system can work unless good features can be identified and tracked from frame to frame. Although tracking itself is by and large a solved problem, selecting features that can be tracked well and correspond to physical points in the world is still hard. We propose a feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world. These methods are based on a new tracking algorithm that extends previous Newton-Raphson style search methods to work under affine image transformations. We test performance with several simulations and experiments. >

...read moreread less

8,432 citations

Additional excerpts

...Manuscript received 5 Oct. 2011; revised 31 Mar. 2012; accepted 14 Apr. 2012; published online 27 Apr. 2012....
[...]

Proceedings Article•DOI•

Parallel Tracking and Mapping for Small AR Workspaces

[...]

Georg Klein¹, David W. Murray¹•Institutions (1)

University of Oxford¹

13 Nov 2007

TL;DR: A system specifically designed to track a hand-held camera in a small AR workspace, processed in parallel threads on a dual-core computer, that produces detailed maps with thousands of landmarks which can be tracked at frame-rate with accuracy and robustness rivalling that of state-of-the-art model-based systems.

...read moreread less

Abstract: This paper presents a method of estimating camera pose in an unknown scene. While this has previously been attempted by adapting SLAM algorithms developed for robotic exploration, we propose a system specifically designed to track a hand-held camera in a small AR workspace. We propose to split tracking and mapping into two separate tasks, processed in parallel threads on a dual-core computer: one thread deals with the task of robustly tracking erratic hand-held motion, while the other produces a 3D map of point features from previously observed video frames. This allows the use of computationally expensive batch optimisation techniques not usually associated with real-time operation: The result is a system that produces detailed maps with thousands of landmarks which can be tracked at frame-rate, with an accuracy and robustness rivalling that of state-of-the-art model-based systems.

...read moreread less

4,091 citations

"CoSLAM: Collaborative Visual SLAM i..." refers background or methods in this paper

...When measuring the uncertainty in map point positions, we only consider the uncertainty in feature detection and triangulation....
[...]
...Experimental results demonstrate that our system can work robustly in highly dynamic environments and produce more accurate results in static environments....
[...]
...If the camera intrinsic parameters are known, the camera pose ¼ ðR; tÞ can be computed by minimizing the reprojection error (the distance between the image projection of 3D map points and their corresponding image feature points), namely, ¼ arg min X i kmi PðMi; Þkð Þ; ð1Þ where PðMi; Þ is the…...
[...]
...10, for each camera, its poses at neighboring frames are connected....
[...]
...We will adjust all camera poses from frame 2 to F , and adjust the map points generated within these frames, which consists of two successive steps described in the following section....
[...]

Journal Article•DOI•

MonoSLAM: Real-Time Single Camera SLAM

[...]

Andrew J. Davison¹, Ian Reid, N.D. Molton, Olivier Stasse•Institutions (1)

Imperial College London¹

01 Jun 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches is presented.

...read moreread less

Abstract: We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live augmented reality with a hand-held camera

...read moreread less

3,772 citations

"CoSLAM: Collaborative Visual SLAM i..." refers background in this paper

...These cameras move independently and can be mounted on different platforms....
[...]

Journal Article•DOI•

Simultaneous localization and mapping: part I

[...]

Hugh Durrant-Whyte¹, Timothy S. Bailey•Institutions (1)

University of Sydney¹

05 Jun 2006-IEEE Robotics & Automation Magazine

TL;DR: This paper describes the simultaneous localization and mapping (SLAM) problem and the essential methods for solving the SLAM problem and summarizes key implementations and demonstrations of the method.

...read moreread less

Abstract: This paper describes the simultaneous localization and mapping (SLAM) problem and the essential methods for solving the SLAM problem and summarizes key implementations and demonstrations of the method. While there are still many practical issues to overcome, especially in more complex outdoor environments, the general SLAM method is now a well understood and established part of robotics. Another part of the tutorial summarized more recent works in addressing some of the remaining issues in SLAM, including computation, feature representation, and data association

...read moreread less

3,760 citations

Proceedings Article•DOI•

Real-time simultaneous localisation and mapping with a single camera

[...]

Davison¹•Institutions (1)

University of Oxford¹

13 Oct 2003

TL;DR: This work presents a top-down Bayesian framework for single-camera localisation via mapping of a sparse set of natural features using motion modelling and an information-guided active measurement strategy, in particular addressing the difficult issue of real-time feature initialisation via a factored sampling approach.

...read moreread less

Abstract: Ego-motion estimation for an agile single camera moving through general, unknown scenes becomes a much more challenging problem when real-time performance is required rather than under the off-line processing conditions under which most successful structure from motion work has been achieved. This task of estimating camera motion from measurements of a continuously expanding set of self-mapped visual features is one of a class of problems known as Simultaneous Localisation and Mapping (SLAM) in the robotics community, and we argue that such real-time mapping research, despite rarely being camera-based, is more relevant here than off-line structure from motion methods due to the more fundamental emphasis placed on propagation of uncertainty. We present a top-down Bayesian framework for single-camera localisation via mapping of a sparse set of natural features using motion modelling and an information-guided active measurement strategy, in particular addressing the difficult issue of real-time feature initialisation via a factored sampling approach. Real-time handling of uncertainty permits robust localisation via the creating and active measurement of a sparse map of landmarks such that regions can be re-visited after periods of neglect and localisation can continue through periods when few features are visible. Results are presented of real-time localisation for a hand-waved camera with very sparse prior scene knowledge and all processing carried out on a desktop PC.

...read moreread less

1,967 citations

"CoSLAM: Collaborative Visual SLAM i..." refers background in this paper

...The difference between the “intracamera pose estimation” and the “intercamera pose estimation” lies in the second term of (3), where the dynamic points are included in the objective function....
[...]