Visual odometry

Home
/
Papers
/
Visual odometry

Proceedings Article•

Visual odometry

David Nister¹, Oleg Naroditsky¹, James R. Bergen¹•Institutions (1)

01 Jan 2004-

TL;DR: A system that estimates the motion of a stereo head or a single moving camera based on video input in real-time with low delay and the motion estimates are used for navigational purposes.

read less

Abstract: We present a system that estimates the motion of a stereo head or a single moving camera based on video input. The system operates in real-time with low delay and the motion estimates are used for navigational purposes. The front end of the system is a feature tracker. Point features are matched between pairs of frames and linked into image trajectories at video rate. Robust estimates of the camera motion are then produced from the feature tracks using a geometric hypothesize-and-test architecture. This generates what we call visual odometry, i.e. motion estimates from visual input alone. No prior knowledge of the scene nor the motion is necessary. The visual odometry can also be used in conjunction with information from other sources such as GPS, inertia sensors, wheel encoders, etc. The pose estimation method has been applied successfully to video from aerial, automotive and handheld platforms. We focus on results with an autonomous ground vehicle. We give examples of camera trajectories estimated purely from images over previously unseen distances and periods of time.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Parallel Tracking and Mapping for Small AR Workspaces

[...]

Georg Klein¹, David W. Murray¹•Institutions (1)

University of Oxford¹

13 Nov 2007

TL;DR: A system specifically designed to track a hand-held camera in a small AR workspace, processed in parallel threads on a dual-core computer, that produces detailed maps with thousands of landmarks which can be tracked at frame-rate with accuracy and robustness rivalling that of state-of-the-art model-based systems.

...read moreread less

Abstract: This paper presents a method of estimating camera pose in an unknown scene. While this has previously been attempted by adapting SLAM algorithms developed for robotic exploration, we propose a system specifically designed to track a hand-held camera in a small AR workspace. We propose to split tracking and mapping into two separate tasks, processed in parallel threads on a dual-core computer: one thread deals with the task of robustly tracking erratic hand-held motion, while the other produces a 3D map of point features from previously observed video frames. This allows the use of computationally expensive batch optimisation techniques not usually associated with real-time operation: The result is a system that produces detailed maps with thousands of landmarks which can be tracked at frame-rate, with an accuracy and robustness rivalling that of state-of-the-art model-based systems.

...read moreread less

4,091 citations

Cites methods from "Visual odometry"

...When the system is first started, we employ the five-point stereo algorithm of [27] to initialise the map in a manner similar to [20, 18, 9]....
[...]
...While bundle adjustment has long been a proven method for offline Structure-from-Motion (SfM), we are more directly inspired by its recent successful applications to real-time visual odometry and tracking [20, 18, 9]....
[...]

Journal Article•DOI•

MonoSLAM: Real-Time Single Camera SLAM

[...]

Andrew J. Davison¹, Ian Reid, N.D. Molton, Olivier Stasse•Institutions (1)

Imperial College London¹

01 Jun 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches is presented.

...read moreread less

Abstract: We present a real-time algorithm which can recover the 3D trajectory of a monocular camera, moving rapidly through a previously unknown scene. Our system, which we dub MonoSLAM, is the first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches. The core of the approach is the online creation of a sparse but persistent map of natural landmarks within a probabilistic framework. Our key novel contributions include an active approach to mapping and measurement, the use of a general motion model for smooth camera movement, and solutions for monocular feature initialization and feature orientation estimation. Together, these add up to an extremely efficient and robust algorithm which runs at 30 Hz with standard PC and camera hardware. This work extends the range of robotic systems in which SLAM can be usefully applied, but also opens up new areas. We present applications of MonoSLAM to real-time 3D localization and mapping for a high-performance full-size humanoid robot and live augmented reality with a hand-held camera

...read moreread less

3,772 citations

Cites background from "Visual odometry"

...[39] presented a real-time system based very much on the standard structure from motion methodology of frame-to-frame matching of large numbers of point features which was able to recover instantaneous motions impressively but again had no ability to rerecognize features after periods of neglect and, therefore, would lead inevitably to rapid drift in augmented reality or localization....
[...]

Journal Article•DOI•

Past, Present, and Future of Simultaneous Localization And Mapping: Towards the Robust-Perception Age

[...]

Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, José L. Neira, Ian Reid, John J. Leonard - Show less +4 more

19 Jun 2016-arXiv: Robotics

TL;DR: What is now the de-facto standard formulation for SLAM is presented, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers.

...read moreread less

Abstract: Simultaneous Localization and Mapping (SLAM)consists in the concurrent construction of a model of the environment (the map), and the estimation of the state of the robot moving within it. The SLAM community has made astonishing progress over the last 30 years, enabling large-scale real-world applications, and witnessing a steady transition of this technology to industry. We survey the current state of SLAM. We start by presenting what is now the de-facto standard formulation for SLAM. We then review related work, covering a broad set of topics including robustness and scalability in long-term mapping, metric and semantic representations for mapping, theoretical performance guarantees, active SLAM and exploration, and other new frontiers. This paper simultaneously serves as a position paper and tutorial to those who are users of SLAM. By looking at the published research with a critical eye, we delineate open challenges and new research issues, that still deserve careful scientific investigation. The paper also contains the authors' take on two questions that often animate discussions during robotics conferences: Do robots need SLAM? and Is SLAM solved?

...read moreread less

1,828 citations

Additional excerpts

...2012 Visual odometry Scaramuzza and Fraundofer [115, 274]...
[...]

Journal Article•DOI•

Visual Odometry [Tutorial]

[...]

Davide Scaramuzza¹, Friedrich Fraundorfer²•Institutions (2)

University of Pennsylvania¹, ETH Zurich²

08 Dec 2011-IEEE Robotics & Automation Magazine

TL;DR: Visual odometry is the process of estimating the egomotion of an agent (e.g., vehicle, human, and robot) using only the input of a single or If multiple cameras attached to it, and application domains include robotics, wearable computing, augmented reality, and automotive.

...read moreread less

Abstract: Visual odometry (VO) is the process of estimating the egomotion of an agent (e.g., vehicle, human, and robot) using only the input of a single or If multiple cameras attached to it. Application domains include robotics, wearable computing, augmented reality, and automotive. The term VO was coined in 2004 by Nister in his landmark paper. The term was chosen for its similarity to wheel odometry, which incrementally estimates the motion of a vehicle by integrating the number of turns of its wheels over time. Likewise, VO operates by incrementally estimating the pose of the vehicle through examination of the changes that motion induces on the images of its onboard cameras. For VO to work effectively, there should be sufficient illumination in the environment and a static scene with enough texture to allow apparent motion to be extracted. Furthermore, consecutive frames should be captured by ensuring that they have sufficient scene overlap.

...read moreread less

1,371 citations

Cites background or methods from "Visual odometry"

...[1], there is an advantage in using the 2-D-to-2-D and 3-D-to-2-D methods compared to the 3-D-to-3-D method for motion computation....
[...]
...[1], motion estimation from 3-D-to-2-D correspondences is more accurate than from 3-D-to-3-D correspondences because it minimizes the image reprojection error (10) instead of the 3-D-to-3-D feature position error (9)....
[...]
...[1], they used the random sample consensus (RANSAC) RANSAC [18] in the least-squares motion estimation step for outlier rejection....
[...]
...In the first category are the works by the authors in [1], [24], [25], [27], and [30]–[32]....
[...]
...The term VO was coined in 2004 by Nister in his landmark paper [1]....
[...]

Journal Article•DOI•

RGB-D mapping: Using Kinect-style depth cameras for dense 3D modeling of indoor environments

[...]

Peter Henry¹, Michael Krainin¹, Evan Herbst¹, Xiaofeng Ren², Dieter Fox¹ - Show less +1 more•Institutions (2)

University of Washington¹, Intel²

01 Apr 2012-The International Journal of Robotics Research

TL;DR: This paper presents RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based alignment to achieve globally consistent maps.

...read moreread less

Abstract: RGB-D cameras (such as the Microsoft Kinect) are novel sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate how such cameras can be used for building dense 3D maps of indoor environments. Such maps have applications in robot navigation, manipulation, semantic mapping, and telepresence. We present RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based alignment. Visual and depth information are also combined for view-based loop-closure detection, followed by pose optimization to achieve globally consistent maps. We evaluate RGB-D Mapping on two large indoor environments, and show that it effectively combines the visual and shape information available from RGB-D cameras.

...read moreread less

1,223 citations

Cites background from "Visual odometry"

...…many techniques for 3D mapping using range scans (Thrun et al. 2000; Triebel and Burgard 2005; May et al. 2009; Newman et al. 2009), stereo cameras (Nister et al. 2004; Akbarzadeh et al. 2006; Konolige and Agrawal 2008), monocular cameras (Clemente et al. 2007), and even unsorted collections of…...
[...]
...2009), stereo cameras (Nister et al. 2004; Akbarzadeh et al. 2006; Konolige and Agrawal 2008), monocular cameras (Clemente et al....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

[...]

Martin A. Fischler¹, Robert C. Bolles¹•Institutions (1)

SRI International¹

01 Jun 1981-Communications of The ACM

TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.

...read moreread less

Abstract: A new paradigm, Random Sample Consensus (RANSAC), for fitting a model to experimental data is introduced. RANSAC is capable of interpreting/smoothing data containing a significant percentage of gross errors, and is thus ideally suited for applications in automated image analysis where interpretation is based on the data provided by error-prone feature detectors. A major portion of this paper describes the application of RANSAC to the Location Determination Problem (LDP): Given an image depicting a set of landmarks with known locations, determine that point in space from which the image was obtained. In response to a RANSAC requirement, new results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form. These results provide the basis for an automatic system that can solve the LDP under difficult viewing

...read moreread less

23,396 citations

Numerical recipes in C

[...]

William H. Press, Saul A. Teukolsky, William T. Vetterling, Brian P. Flannery

01 Jan 1994

TL;DR: The Diskette v 2.06, 3.5''[1.44M] for IBM PC, PS/2 and compatibles [DOS] Reference Record created on 2004-09-07, modified on 2016-08-08.

...read moreread less

Abstract: Note: Includes bibliographical references, 3 appendixes and 2 indexes.- Diskette v 2.06, 3.5''[1.44M] for IBM PC, PS/2 and compatibles [DOS] Reference Record created on 2004-09-07, modified on 2016-08-08

...read moreread less

19,881 citations

"Visual odometry" refers methods in this paper

...The survivor features in each bucket are found with the quickselect algorithm [19] based on the strength of the corner response....
[...]

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

Proceedings Article•DOI•

A Combined Corner and Edge Detector

[...]

Chris Harris, Mike Stephens

01 Jan 1988

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.

...read moreread less

Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

...read moreread less

13,993 citations

"Visual odometry" refers methods in this paper

...In each frame, we detect Harris corners [7]....
[...]

Journal Article•DOI•

Determining optical flow

[...]

Berthold K. P. Horn¹, Brian G. Schunck¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Aug 1981-Artificial Intelligence

TL;DR: In this paper, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.

...read moreread less

10,727 citations