scispace - formally typeset
Open AccessProceedings ArticleDOI

Stereo vision and rover navigation software for planetary exploration

Reads0
Chats0
TLDR
The radiation effects analysis is summarized that suggests that commercial grade processors are likely to be adequate for Mars surface missions, and the level of speedup that may accrue from using these instead of radiation hardened parts is discussed.
Abstract
NASA's Mars Exploration Rover (MER) missions will land twin rovers on the surface of Mars in 2004. These rovers will have the ability to navigate safely through unknown and potentially hazardous terrain, using autonomous passive stereo vision to detect potential terrain hazards before driving into them. Unfortunately, the computational power of currently available radiation hardened processors limits the amount of distance (and therefore science) that can be safely achieved by any rover in a given time frame. We present overviews of our current rover vision and navigation systems, to provide context for the types of computation that are required to navigate safely. We also present baseline timing results that represent a lower bound in achievable performance (useful for systems engineering studies of future missions), and describe ways to improve that performance using commercial grade (as opposed to radiation hardened) processors. In particular, we document speedups to our stereo vision system that were achieved using the vectorized operations provided by Pentium MMX technology. Timing data were derived from implementations on several platforms: a prototype Mars rover with flight-like electronics (the Athena Software Development Model (SDM) rover), a RAD6000 computing platform (as will be used in the 2003 MER missions), and research platforms with commercial Pentium III and Sparc processors. Finally, we summarize the radiation effects analysis that suggests that commercial grade processors are likely to be adequate for Mars surface missions, and discuss the level of speedup that may accrue from using these instead of radiation hardened parts.

read more

Content maybe subject to copyright    Report

Published in 2002 IEEE Aerospace Conference Proceedings, March 2002, Big Sky, Montana, USA
Stereo Vision and Rover Navigation Software
for Planetary Exploration
Steven B. Goldberg
Indelible Systems
8921 Quartz Ave
Northridge, CA 91311
+1 (818) 998 - 6895
isteve@robotics.jpl.nasa.gov
Mark W. Maimone Larry Matthies
Jet Propulsion Laboratory
California Institute of Technology
Pasadena, CA 91109
+1 (818) 354 - 0592
mwm,lhm
@robotics.jpl.nasa.gov
http://robotics.jpl.nasa.gov/people/mwm/visnavsw/
Abstract—NASAs Mars Exploration Rover (MER) missions
will land twin rovers on the surface of Mars in 2004. These
rovers will have the ability to navigate safely through un-
known and potentially hazardous terrain, using autonomous
passive stereo vision to detect potential terrain hazards before
driving into them. Unfortunately, the computational power of
currently available radiation hardened processors limits the
amount of distance (and therefore science) that can be safely
achieved by any rover in a given time frame.
We present overviews of our current rover vision and naviga-
tion systems, to provide context for the types of computation
that are required to navigate safely. We also present baseline
timing results that represent a lower bound in achievable per-
formance (useful for systems engineering studies of future
missions), and describe ways to improve that performance
using commercial grade (as opposed to radiation hardened)
processors. In particular, we document speedups to our stereo
vision system that were achieved using the vectorized oper-
ations provided by Pentium MMX technology. Timing data
were derived from implementations on several platforms: a
prototype Mars rover with flight-like electronics (the Athena
Software Development Model (SDM) rover), a RAD6000
computing platform (as will be used in the 2003 MER mis-
sions), and research platforms with commercial Pentium III
and Sparc processors.
Finally, we summarize the radiation effects analysis that sug-
gests that commercial grade processors are likely to be ad-
equate for Mars surface missions, and discuss the level of
speedup that may accrue from using these instead of radia-
tion hardened parts.
TABLE OF CONTENTS
1 INTRODUCTION
2 STEREO VISION ALGORITHM
3 STEREO VISION OPTIMIZATIONS
4 GESTALT NAVIGATION SYSTEM
5 BASELINE SYSTEM TIMINGS
0-7803-7231-X/01/$10.00/
c
2002 IEEE
IEEEAC paper # 324
6 FUTURE MISSIONS
7 CONCLUSION
8 ACKNOWLEDGEMENTS
9 BIOGRAPHIES
1. INTRODUCTION
Planetary rovers now have the ability to navigate safely
through unknown and potentially hazardous terrain, using au-
tonomous passive stereo vision to detect potential terrain haz-
ards before driving into them. A local map of the terrain can
be maintained onboard, by resampling and effectivelymanag-
ing the range data generated by stereo vision. NASAs Mars
Exploration Rover (MER) missions will drive safely on the
Red Planet in early 2004 using this type of technology.
Stereo vision is an attractive technology for rover navigation
because it is passive; sunlight provides all the energy needed
for daylight operations. Hence only a small amount of power
is required for the imaging electronics to obtain knowledge
about the environment. And with enough cameras or a wide
enough field of view, there need be no moving parts in the
system. Having fewer motors reduces the number of compo-
nents that could fail.
Our navigation system relies on a geometric analysis of the
world near the rover, combining various range data snapshots
generated by the stereo system into a local map. We devel-
oped a system for interpreting this data, called the Grid-based
Estimation of Surface Traversability Applied to Local Terrain
(GESTALT) system, based on Carnegie Mellon’s Morphin al-
gorithm [9], [10].
Although the MER mission (launching in mid-2003) only re-
quires the rover to travel at most 100 meters per day, future
missions like the Smart Lander Rover being considered for
2009 will require rovers to travel even farther, hence at faster
speeds. In this paper we describe our current algorithms for
autonomous rover navigation, and provide baseline timings
for implementations of these algorithms on a variety of plat-
forms. Our implementations are primarily written in C and
C++, but certain optimizations are hard coded in assembler to
take advantage of vector operations. These timings provide a
benchmark from which future rover driving capabilities can

a) Raw Images: b) Rectified Images:
c) Laplacian Images: d) Resulting Elevation Map:
Figure 1. Illustration of the steps involved in Stereo Vision Processing.
be derived.
2. STEREO VISION ALGORITHM
JPL has applied Stereo Vision software to rover motion con-
trol for many years. Although certain aspects of our approach
have appeared before [3], [14], we will summarize the overall
algorithm here before presenting new results that take advan-
tage of some commercial vectorized processors that haveonly
recently become available.
Our algorithm depends on certain physical properties of the
stereo camera system. The pair of stereo cameras must be
rigidly mounted to a camera bar. Using a pair of images of
a known calibration target, a pair of geometric camera lens
models is calculated using Gennery’s CAHVORE formula-
tion.[2] This formulation assumes the system will maintain
its geometric calibration over some useful time period (e.g.,
days or weeks for research purposes, weeks or months for de-
ployed vehicles). This is a reasonable assumption: examples
of NASA-sponsored robot camera systems that have main-
tained their stereo calibration in spite of both high vibration
deployments and/or long periods of use include Dante [8],
Nomad [13], Rocky 7 [12], [5], and the Mars Pathfinder Lan-
der [11].
Our stereo vision algorithm can be described as follows:
1. To decrease the computational burden and the effect of the
rigidity constraint, often the raw sensor images are reduced in
size, e.g., from 1024
1024 source pixels down to 256
256
pixels, by averaging pixel values (see Figure 1a). Each so-
called pyramid level reduction results in an 8-fold decrease
in computation: a factor of two from each spatial dimension,
and an additional factor of two from a reduction in the number
of integer disparities that need to be searched. There is a cost
though: the depth resolution of the resulting range estimates
doubles (i.e., becomes less precise) with each pyramid level
reduction [4].
2. Each image pixel encodes the appearance of a location in
the 3D world; in particular, the surface of that object nearest
the camera along a certain ray. To find the pixel that repre-
sents the same object surface in the other image, it is suffi-
cient to search only along the projection of that ray. Since
the bulk of the processing time in stereo vision is spent doing
this search, we simplify later processing by resampling each
image so that searching these rays requires only integer oper-
ations. Pairs of images are thus rectified, ensuring that these
rays (called epipolar lines) are aligned with the horizontal, as
in Figure 1b.
3. We compute the Laplacian of each image to remove any
pixel intensity bias, e.g., Figure 1c. Actually, our implemen-
tation computes an approximation, a Difference of Gaussians,
which can be done more quickly.
4. The filtered images are then fed into a 1-D correlator that
uses a 7
7 pixel window. The correlator considers a number
of potential matches for each pixel in the left image of each
stereo pair, assigning a score to each potential match. The
range of pixels to be searched is called the disparity range,
and is derived geometrically from the input range of depth
values to be searched (e.g., from 30 cm to 3 meters in front
of the cameras). The maximum-scored match is selected, and
the camera model is applied to determine the corresponding
range estimate. This process is repeated for every pixel in the
left image. We take advantage of the inherent parallelism us-
ing a sliding sum implementation to compute the correlation
scores efficiently.
5. Not every range estimate is accepted, however. A variety
of checks is applied to prune out unreliable estimates. For
example, the peak filter: the chosen score must be better than
that at adjacent pixels. A flat correlation peak would mean

(a) (b)
(c) (d)
Figure 2. Stereo Vision Results: Unoptimized (light blue) vs. Vector-optimized (dark red) run times for 4 functions. The X
axis represents individual assembly instructions executed for each function; the first instruction in a function is on the far left,
the last instruction on the far right. The Y axis represents total time in milliseconds spent executing a particular instruction,
integrated over 200 iterations.
that many nearby pixels have the same appearance, result-
ing in an unresolvable ambiguity. Also, the Left/Right Line
of Sight filter: the correlator is run in the reverse direction,
yielding independent range estimates for pixels in the right
image as well. If an estimate from a pixel in the left image
fails to match that from its correspondent in the right image,
the estimate is discarded.
6. Having pruned some of these values, any remaining small
isolated regions of range values are thrown out. These safety
checks (the peak filter, the Left/Right Line Of Sight filter, and
this Blob filter) result in a robust set of correspondences that
can be used by the onboard autonomy system.
7. Finally, each disparity value can be mapped to a 3-D
(X,Y,Z) location using the geometric camera model. This in-
formation can be displayed in many forms; an elevation map
is shown in Figure 1d.
3. STEREO VISION OPTIMIZATIONS
The ready availability of commercial vectorized processors
has allowed us to realize significant improvements in the per-
formance of our stereo implementation. Faster interpretation
of the world allows our rovers to drive safely at ever faster
speeds, e.g. the Urbie robot [6] which can now drive safely
at over 1 meter per second. In this section we document the
speedups obtained by taking advantage of the Pentium MMX
capabilities. All the graphs in Figure 2 reflect timings taken
on a Pentium III 700 MHz CPU with 32 Kbyte L1 cache,
256 Kbyte L2 cache and 512 Mbyte RAM running Windows
2000. While currently used only on Earth-based rovers, such
commercial (i.e., non-radiation hardened) processors might
also be used on future space missions, as we discuss in Sec-
tion 6.
We focus attention on four particular functions: local 2D
pixel resampling in Difference of Gaussian and Decimate,
buffer preparation in Prepare New Row, correlation score
comparison in the Inner Loop, and integer-based quadratic
peak finding in Compute Sub-Pixel. The first two require
memory accesses that jump across image row boundaries, and
the latter two perform many independent operations on 8-bit
integer data. These properties make them useful candidates
for vectorization.
Difference of Gaussian and Decimate Used to filter and
decimate the images before stereo calculations, this algorithm
is implemented using sliding sums. Working with the origi-
nal stereo images, 2 @ 240 Kbytes (512*480), this algorithm
must access main memory and will incur L2 cache misses. To
make the most of each L2 cache miss, the vectorized imple-
mentation operates over whole cache lines. To guarantee only
whole cache lines are used, a small portion of the original al-
gorithm is used to align the inputs. As shown in Figure 2a, the
vectorized version does have a more localized occurrence of

L2 cache misses, and its efficient use of Pentium III prefetch-
ing mak
es better use of the L1 cache. Note, in Figure 2a, the
L1 cache misses can not be directly measured so an estima-
tion of L2 cache reads is used. This, coupled with a vector-
ized computation of sliding sum values, results in a 2
speed
increase.
Prepare Next Row This function maintains the sliding sum
buffers. It accesses left and right input images and both
sliding sum buffers. Optimizing this function produces the
largest gains, 4.8
, from vectorization and Pentium III cache
optimizations. This is accomplished by operating over full
cache lines of aligned memory. Since access to image data
is not aligned, this algorithm makes use of the Pentium III
prefetch instruction to assist in loading the required cache
lines into the L1 cache. The other memory, namely the slid-
ing sum buffers, is guaranteed to be aligned. In Figure 2b, the
tall light blue bars indicate how much time is wasted by the
C algorithm waiting for the processor to fetch operands from
memory.
Inner Loop The core of the stereo matching algorithm, this
function finds the level of disparity with the best correlation
score for left and right disparities, and saves the necessary in-
formation to generate sub-pixel information. The C algorithm
used an unaligned data structure to store four shorts and one
byte of data. This was not vectorizable so it was rewritten
to store two shorts in one data structure and one correlation
value in a table. With this simpler data structure, the “Inner
Loop” could be coded using vector operations, specifically,
vector comparisons. Using vector comparisons rather than if
statements prevented pipeline thrashing and allowed four left
and four right disparities to be calculated every pass. The use
of a table of correlation scores was slower but necessary to
reduce the size of the data structure. By reducing the size of
the data structure and using vector comparisons, the vector al-
gorithm performs over 1.5 times faster. This algorithm could
be further optimized by removing the right disparity calcula-
tion and using a correlation table, but this would sacrifice the
left-right-line-of-sight filter and sub-pixel disparity.
Compute Sub-Pixel Run only after the best correlation
score is found for each row, this function generates the final
pixel and sub-pixel disparity image. The C version is faster
here because it makes use of the spatial locality of the three
best correlation scores for each disparity, while the vector-
ized version must do a table lookup to find the two of the
three scores. This overhead is partially absorbed by the use of
a vectorized division rather than individual integer divisions
for each sub-pixel value. In Figure 2d, the two annotations
point to the timings for these divisions. Note, the vectorized
division is called 4 times fewer than the integer version, but is
still 1.2
faster. Unfortunately,while this algorithm has most
recently been optimized and may still provide a performance
increase, at the moment it results in a 1.3
slow down.
These performance numbers, measured with Intel’s VTune,
havebeen measured without the benefit of compiler optimiza-
tions. With the compiler set to optimize for speed, the final re-
sults show the JPL stereo algorithm running 2.5 times faster.
That translates to just over 20 frames per second on a Pen-
tium III 750 with 256 Kbyte L2 cache, working on 256
240
images over 32 disparities and with a window size of 8
8.
4. GESTALT NAVIGATION SYSTEM
A primary input to any navigation system is a metrically-
specified waypoint. Although one could tell the rover to drive
randomly, typically it will be sent to a particular point in the
world. Waypoints may be specified statically by simply giv-
ing an (X,Y,Z) value in a known world frame, or dynamically
by providing a module that can track a feature in the world
and always return its current position. In what follows we
assume the waypoint is static, but the extension to dynamic
waypoints is trivial.
At its core, GESTALT is a set of routines that decide the next
best direction for a rover to move, given the state of the world
already seen, new sensor data, and a desired waypoint goal.
It first checks to see if the rover has already reached its goal,
or at least a point within some tolerance band around it. If
so, the navigation cycle has completed and the traverse will
terminate successfully.
The rover will rarely start out already at its goal, however.
When it has any distance left to travel, it will evaluate its ter-
rain information to determine the safety of all possible nearby
turns. Sensed data about the terrain can come from any num-
ber or type of sensors, so long as their results are prefiltered to
provide individual point measurements of (X,Y,Z) data in the
rover’s (not the sensor’s) coordinate frame. GESTALT then
chooses from among the safe turns, one that will best help it
reach the goal. The desired turn and a short distance (e.g.,
35 cm) is then sent to the low-level wheel controller, and the
rover is commanded to move blindly.
While the rover is driving its next step, it will not use its imag-
ing sensors to look for obstacles. Other types of safeguarding
will likely be enabled (e.g., tilt sensors, motor current limits,
potentiometers that monitor kinematic limit configurations),
but no additional high-level terrain-based planning or sensing
need be performed.
At the end of each step, sensors are expected to provide
a reasonably accurate estimate of the rover’s new position.
GESTALT does not require that the rover motion exactly
match that which was commanded, but it does assume that
wherever the rover ended up, its relative position and orien-
tation can be reasonably inferred and provided as input. That
is one limitation of the system, that it relies on other modules
to deal with myriad position estimation problems (slipping in
sand, getting stuck on a rock, freeing a jammed wheel, etc).

Figure 3. A successful 5 meter run through a narrow obstacle course. The upper left image shows actual obstacle locations
and actual rover path, as measured using a surveyor’s ranging theodolite. The upper right image is a picture of the test course
and rover. The bottom image is rendered at the same scale as the upper left image, and shows the local map built by the rover
during its traverse. This bottom image is one of GESTALT’s diagnostic images, and includes (1) the rover view from the left
forward-facing camera with grid superimposed, (2) an elevation image corresponding to (1), (3) the local occupancy grid with
(dark) obstacles and possible steering arcs, and (4) a ranking of possible headings showing best heading.
Choosing a Safe Direction
Range images generated by Stereo Vision are usually not suf-
ficient, in and of themselves, to determine a safe driving path.
Field of view restrictions and error recovery behaviors might
force a rover to turn into an unseen area. For this reason we
keep a local map of the area around the rover, so that it can
reason more effectively about its surroundings. This map is
maintained not from the perspective of the rover cameras, but
from an overhead “bird’s eye” viewpoint. Figure 3 shows an
example of a rover map next to a similarly-scaled (but inde-
pendent) measurement of the environment.
GESTALT models the world as a grid of regularly spaced
cells, with each cell typically the size of a rover’s wheel. Each
cell stores an 8 bit goodness and certainty value, or is tagged
unknown. The resolution of the grid cells, the evaluation as-
signed to particular types of obstacles, the types of tests to be
performed, are all parameters that may be changed prior to
(some even during) a traverse; a nearly complete list can be
found in Table 1.
The GESTALT local map currently uses a configuration
space representation of the environment. That is, the contents
of each cell in its map represents whether a rover-sized object

Citations
More filters
Proceedings Article

Visual odometry

TL;DR: A system that estimates the motion of a stereo head or a single moving camera based on video input in real-time with low delay and the motion estimates are used for navigational purposes.
Journal ArticleDOI

Visual Navigation for Mobile Robots: A Survey

TL;DR: The outline to mapless navigation includes reactive techniques based on qualitative characteristics extraction, appearance-based localization, optical flow, features tracking, plane ground detection/tracking, etc... the recent concept of visual sonar has also been revised.
Proceedings Article

Off-Road Obstacle Avoidance through End-to-End Learning

TL;DR: A vision-based obstacle avoidance system for off-road mobile robots that is trained from end to end to map raw input images to steering angles and exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.
Journal ArticleDOI

Natural terrain classification using three-dimensional ladar data for ground robot mobility

TL;DR: This paper focuses on the segmentation of ladar data into three classes using local threedimensional point cloud statistics to represent porous volumes such as grass and tree canopy, and finally “surface” to capture solid objects like ground surface, rocks, or large trunks.
Journal ArticleDOI

Remote sensing platforms and sensors: A survey

TL;DR: In this article, the authors reviewed the state-of-the-art remote sensing technologies, including platforms and sensors, the topics representing the primary research interest in the ISPRS Technical Commission I activities.
References
More filters
Proceedings ArticleDOI

Recent progress in local and global traversability for planetary rovers

TL;DR: This paper reports on the extension on the systems that were previously developed that were necessary to achieve autonomous navigation in this domain and the algorithms have been tested on the outdoor prototype rover, Bullwinkle, and have recently driven 100 m at a speed of 15 cm/sec.
Journal ArticleDOI

A portable, autonomous, urban reconnaissance robot

TL;DR: A prototype urban robot on a novel chassis with articulated tracks that enable stair climbing and scrambling over rubble and stereo vision-based obstacle avoidance, visual servoing to user-designated goals, and autonomous vision-guided stair climbing is developed.
Proceedings ArticleDOI

Stereo ego-motion improvements for robust rover navigation

TL;DR: This paper shows that a system based on only camera ego-motion estimates will accumulate errors with super-linear growth in the distance travelled, owing to increasing orientation errors, and describes a methodology for long-distance rover navigation that meets these goals using robust estimation.
Journal ArticleDOI

Stochastic performance, modeling and evaluation of obstacle detectability with imaging range sensors

TL;DR: In this paper, the authors address the issue of obstacle detection with imaging range sensors by dividing the evaluation problem into two levels: quality of range data itself and quality of the obstacle detection algorithms applied to the range data.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "Stereo vision and rover navigation software for planetary exploration" ?

The authors present overviews of their current rover vision and navigation systems, to provide context for the types of computation that are required to navigate safely. The authors also present baseline timing results that represent a lower bound in achievable performance ( useful for systems engineering studies of future missions ), and describe ways to improve that performance using commercial grade ( as opposed to radiation hardened ) processors. In particular, the authors document speedups to their stereo vision system that were achieved using the vectorized operations provided by Pentium MMX technology. Finally, the authors summarize the radiation effects analysis that suggests that commercial grade processors are likely to be adequate for Mars surface missions, and discuss the level of speedup that may accrue from using these instead of radiation hardened parts. 

Traverse science applications are not yet well specified ; possibilities include analyzing point spectrometer data or multispectral imagery for mineral classification. The authors are also examining the possibility of flying a commercial grade microprocessor as a co-processor, to provide additional horsepower for rover navigation and science processing.