What have the authors stated for future works in "Stereo vision and rover navigation software for planetary exploration" ?

Traverse science applications are not yet well specified ; possibilities include analyzing point spectrometer data or multispectral imagery for mineral classification. The authors are also examining the possibility of flying a commercial grade microprocessor as a co-processor, to provide additional horsepower for rover navigation and science processing.

(Open Access) Stereo vision and rover navigation software for planetary exploration (2002) | Steve Goldberg

Q: What are the contributions mentioned in the paper "Stereo vision and rover navigation software for planetary exploration" ?

The authors present overviews of their current rover vision and navigation systems, to provide context for the types of computation that are required to navigate safely. The authors also present baseline timing results that represent a lower bound in achievable performance ( useful for systems engineering studies of future missions ), and describe ways to improve that performance using commercial grade ( as opposed to radiation hardened ) processors. In particular, the authors document speedups to their stereo vision system that were achieved using the vectorized operations provided by Pentium MMX technology. Finally, the authors summarize the radiation effects analysis that suggests that commercial grade processors are likely to be adequate for Mars surface missions, and discuss the level of speedup that may accrue from using these instead of radiation hardened parts.

Published in 2002 IEEE Aerospace Conference Proceedings, March 2002, Big Sky, Montana, USA

Stereo Vision and Rover Navigation Software

for Planetary Exploration

Steven B. Goldberg

Indelible Systems

8921 Quartz Ave

Northridge, CA 91311

+1 (818) 998 - 6895

isteve@robotics.jpl.nasa.gov

Mark W. Maimone Larry Matthies

Jet Propulsion Laboratory

California Institute of Technology

Pasadena, CA 91109

+1 (818) 354 - 0592



mwm,lhm



@robotics.jpl.nasa.gov

http://robotics.jpl.nasa.gov/people/mwm/visnavsw/

Abstract—NASA’s Mars Exploration Rover (MER) missions

will land twin rovers on the surface of Mars in 2004. These

rovers will have the ability to navigate safely through un-

known and potentially hazardous terrain, using autonomous

passive stereo vision to detect potential terrain hazards before

driving into them. Unfortunately, the computational power of

currently available radiation hardened processors limits the

amount of distance (and therefore science) that can be safely

achieved by any rover in a given time frame.

We present overviews of our current rover vision and naviga-

tion systems, to provide context for the types of computation

that are required to navigate safely. We also present baseline

timing results that represent a lower bound in achievable per-

formance (useful for systems engineering studies of future

missions), and describe ways to improve that performance

using commercial grade (as opposed to radiation hardened)

processors. In particular, we document speedups to our stereo

vision system that were achieved using the vectorized oper-

ations provided by Pentium MMX technology. Timing data

were derived from implementations on several platforms: a

prototype Mars rover with ﬂight-like electronics (the Athena

Software Development Model (SDM) rover), a RAD6000

computing platform (as will be used in the 2003 MER mis-

sions), and research platforms with commercial Pentium III

and Sparc processors.

Finally, we summarize the radiation effects analysis that sug-

gests that commercial grade processors are likely to be ad-

equate for Mars surface missions, and discuss the level of

speedup that may accrue from using these instead of radia-

tion hardened parts.

TABLE OF CONTENTS

1 INTRODUCTION

2 STEREO VISION ALGORITHM

3 STEREO VISION OPTIMIZATIONS

4 GESTALT NAVIGATION SYSTEM

5 BASELINE SYSTEM TIMINGS

0-7803-7231-X/01/$10.00/



2002 IEEE

IEEEAC paper # 324

6 FUTURE MISSIONS

7 CONCLUSION

8 ACKNOWLEDGEMENTS

9 BIOGRAPHIES

1. INTRODUCTION

Planetary rovers now have the ability to navigate safely

through unknown and potentially hazardous terrain, using au-

tonomous passive stereo vision to detect potential terrain haz-

ards before driving into them. A local map of the terrain can

be maintained onboard, by resampling and effectivelymanag-

ing the range data generated by stereo vision. NASA’s Mars

Exploration Rover (MER) missions will drive safely on the

Red Planet in early 2004 using this type of technology.

Stereo vision is an attractive technology for rover navigation

because it is passive; sunlight provides all the energy needed

for daylight operations. Hence only a small amount of power

is required for the imaging electronics to obtain knowledge

about the environment. And with enough cameras or a wide

enough ﬁeld of view, there need be no moving parts in the

system. Having fewer motors reduces the number of compo-

nents that could fail.

Our navigation system relies on a geometric analysis of the

world near the rover, combining various range data snapshots

generated by the stereo system into a local map. We devel-

oped a system for interpreting this data, called the Grid-based

Estimation of Surface Traversability Applied to Local Terrain

(GESTALT) system, based on Carnegie Mellon’s Morphin al-

gorithm [9], [10].

Although the MER mission (launching in mid-2003) only re-

quires the rover to travel at most 100 meters per day, future

missions like the Smart Lander Rover being considered for

2009 will require rovers to travel even farther, hence at faster

speeds. In this paper we describe our current algorithms for

autonomous rover navigation, and provide baseline timings

for implementations of these algorithms on a variety of plat-

forms. Our implementations are primarily written in C and

C++, but certain optimizations are hard coded in assembler to

take advantage of vector operations. These timings provide a

benchmark from which future rover driving capabilities can

a) Raw Images: b) Rectiﬁed Images:

c) Laplacian Images: d) Resulting Elevation Map:

Figure 1. Illustration of the steps involved in Stereo Vision Processing.

be derived.

2. STEREO VISION ALGORITHM

JPL has applied Stereo Vision software to rover motion con-

trol for many years. Although certain aspects of our approach

have appeared before [3], [14], we will summarize the overall

algorithm here before presenting new results that take advan-

tage of some commercial vectorized processors that haveonly

recently become available.

Our algorithm depends on certain physical properties of the

stereo camera system. The pair of stereo cameras must be

rigidly mounted to a camera bar. Using a pair of images of

a known calibration target, a pair of geometric camera lens

models is calculated using Gennery’s CAHVORE formula-

tion.[2] This formulation assumes the system will maintain

its geometric calibration over some useful time period (e.g.,

days or weeks for research purposes, weeks or months for de-

ployed vehicles). This is a reasonable assumption: examples

of NASA-sponsored robot camera systems that have main-

tained their stereo calibration in spite of both high vibration

deployments and/or long periods of use include Dante [8],

Nomad [13], Rocky 7 [12], [5], and the Mars Pathﬁnder Lan-

der [11].

Our stereo vision algorithm can be described as follows:

1. To decrease the computational burden and the effect of the

rigidity constraint, often the raw sensor images are reduced in

size, e.g., from 1024



1024 source pixels down to 256



256

pixels, by averaging pixel values (see Figure 1a). Each so-

called pyramid level reduction results in an 8-fold decrease

in computation: a factor of two from each spatial dimension,

and an additional factor of two from a reduction in the number

of integer disparities that need to be searched. There is a cost

though: the depth resolution of the resulting range estimates

doubles (i.e., becomes less precise) with each pyramid level

reduction [4].

2. Each image pixel encodes the appearance of a location in

the 3D world; in particular, the surface of that object nearest

the camera along a certain ray. To ﬁnd the pixel that repre-

sents the same object surface in the other image, it is sufﬁ-

cient to search only along the projection of that ray. Since

the bulk of the processing time in stereo vision is spent doing

this search, we simplify later processing by resampling each

image so that searching these rays requires only integer oper-

ations. Pairs of images are thus rectiﬁed, ensuring that these

rays (called epipolar lines) are aligned with the horizontal, as

in Figure 1b.

3. We compute the Laplacian of each image to remove any

pixel intensity bias, e.g., Figure 1c. Actually, our implemen-

tation computes an approximation, a Difference of Gaussians,

which can be done more quickly.

4. The ﬁltered images are then fed into a 1-D correlator that

uses a 7



7 pixel window. The correlator considers a number

of potential matches for each pixel in the left image of each

stereo pair, assigning a score to each potential match. The

range of pixels to be searched is called the disparity range,

and is derived geometrically from the input range of depth

values to be searched (e.g., from 30 cm to 3 meters in front

of the cameras). The maximum-scored match is selected, and

the camera model is applied to determine the corresponding

range estimate. This process is repeated for every pixel in the

left image. We take advantage of the inherent parallelism us-

ing a sliding sum implementation to compute the correlation

scores efﬁciently.

5. Not every range estimate is accepted, however. A variety

of checks is applied to prune out unreliable estimates. For

example, the peak ﬁlter: the chosen score must be better than

that at adjacent pixels. A ﬂat correlation peak would mean

(a) (b)

Figure 2. Stereo Vision Results: Unoptimized (light blue) vs. Vector-optimized (dark red) run times for 4 functions. The X

axis represents individual assembly instructions executed for each function; the ﬁrst instruction in a function is on the far left,

the last instruction on the far right. The Y axis represents total time in milliseconds spent executing a particular instruction,

integrated over 200 iterations.

that many nearby pixels have the same appearance, result-

ing in an unresolvable ambiguity. Also, the Left/Right Line

of Sight ﬁlter: the correlator is run in the reverse direction,

yielding independent range estimates for pixels in the right

image as well. If an estimate from a pixel in the left image

fails to match that from its correspondent in the right image,

the estimate is discarded.

6. Having pruned some of these values, any remaining small

isolated regions of range values are thrown out. These safety

checks (the peak ﬁlter, the Left/Right Line Of Sight ﬁlter, and

this Blob ﬁlter) result in a robust set of correspondences that

can be used by the onboard autonomy system.

7. Finally, each disparity value can be mapped to a 3-D

(X,Y,Z) location using the geometric camera model. This in-

formation can be displayed in many forms; an elevation map

is shown in Figure 1d.

3. STEREO VISION OPTIMIZATIONS

The ready availability of commercial vectorized processors

has allowed us to realize signiﬁcant improvements in the per-

formance of our stereo implementation. Faster interpretation

of the world allows our rovers to drive safely at ever faster

speeds, e.g. the Urbie robot [6] which can now drive safely

at over 1 meter per second. In this section we document the

speedups obtained by taking advantage of the Pentium MMX

capabilities. All the graphs in Figure 2 reﬂect timings taken

on a Pentium III 700 MHz CPU with 32 Kbyte L1 cache,

256 Kbyte L2 cache and 512 Mbyte RAM running Windows

2000. While currently used only on Earth-based rovers, such

commercial (i.e., non-radiation hardened) processors might

also be used on future space missions, as we discuss in Sec-

tion 6.

We focus attention on four particular functions: local 2D

pixel resampling in Difference of Gaussian and Decimate,

buffer preparation in Prepare New Row, correlation score

comparison in the Inner Loop, and integer-based quadratic

peak ﬁnding in Compute Sub-Pixel. The ﬁrst two require

memory accesses that jump across image row boundaries, and

the latter two perform many independent operations on 8-bit

integer data. These properties make them useful candidates

for vectorization.

Difference of Gaussian and Decimate — Used to ﬁlter and

decimate the images before stereo calculations, this algorithm

is implemented using sliding sums. Working with the origi-

nal stereo images, 2 @ 240 Kbytes (512*480), this algorithm

must access main memory and will incur L2 cache misses. To

make the most of each L2 cache miss, the vectorized imple-

mentation operates over whole cache lines. To guarantee only

whole cache lines are used, a small portion of the original al-

gorithm is used to align the inputs. As shown in Figure 2a, the

vectorized version does have a more localized occurrence of

L2 cache misses, and its efﬁcient use of Pentium III prefetch-

ing mak



es better use of the L1 cache. Note, in Figure 2a, the

L1 cache misses can not be directly measured so an estima-

tion of L2 cache reads is used. This, coupled with a vector-

ized computation of sliding sum values, results in a 2



speed

increase.

Prepare Next Row — This function maintains the sliding sum

buffers. It accesses left and right input images and both

sliding sum buffers. Optimizing this function produces the

largest gains, 4.8



, from vectorization and Pentium III cache

optimizations. This is accomplished by operating over full

cache lines of aligned memory. Since access to image data

is not aligned, this algorithm makes use of the Pentium III

prefetch instruction to assist in loading the required cache

lines into the L1 cache. The other memory, namely the slid-

ing sum buffers, is guaranteed to be aligned. In Figure 2b, the

tall light blue bars indicate how much time is wasted by the

C algorithm waiting for the processor to fetch operands from

memory.

Inner Loop — The core of the stereo matching algorithm, this

function ﬁnds the level of disparity with the best correlation

score for left and right disparities, and saves the necessary in-

formation to generate sub-pixel information. The C algorithm

used an unaligned data structure to store four shorts and one

byte of data. This was not vectorizable so it was rewritten

to store two shorts in one data structure and one correlation

value in a table. With this simpler data structure, the “Inner

Loop” could be coded using vector operations, speciﬁcally,

vector comparisons. Using vector comparisons rather than if

statements prevented pipeline thrashing and allowed four left

and four right disparities to be calculated every pass. The use

of a table of correlation scores was slower but necessary to

reduce the size of the data structure. By reducing the size of

the data structure and using vector comparisons, the vector al-

gorithm performs over 1.5 times faster. This algorithm could

be further optimized by removing the right disparity calcula-

tion and using a correlation table, but this would sacriﬁce the

left-right-line-of-sight ﬁlter and sub-pixel disparity.

Compute Sub-Pixel — Run only after the best correlation

score is found for each row, this function generates the ﬁnal

pixel and sub-pixel disparity image. The C version is faster

here because it makes use of the spatial locality of the three

best correlation scores for each disparity, while the vector-

ized version must do a table lookup to ﬁnd the two of the

three scores. This overhead is partially absorbed by the use of

a vectorized division rather than individual integer divisions

for each sub-pixel value. In Figure 2d, the two annotations

point to the timings for these divisions. Note, the vectorized

division is called 4 times fewer than the integer version, but is

still 1.2



faster. Unfortunately,while this algorithm has most

recently been optimized and may still provide a performance

increase, at the moment it results in a 1.3



slow down.

These performance numbers, measured with Intel’s VTune,

havebeen measured without the beneﬁt of compiler optimiza-

tions. With the compiler set to optimize for speed, the ﬁnal re-

sults show the JPL stereo algorithm running 2.5 times faster.

That translates to just over 20 frames per second on a Pen-

tium III 750 with 256 Kbyte L2 cache, working on 256



240

images over 32 disparities and with a window size of 8



4. GESTALT NAVIGATION SYSTEM

A primary input to any navigation system is a metrically-

speciﬁed waypoint. Although one could tell the rover to drive

randomly, typically it will be sent to a particular point in the

world. Waypoints may be speciﬁed statically by simply giv-

ing an (X,Y,Z) value in a known world frame, or dynamically

by providing a module that can track a feature in the world

and always return its current position. In what follows we

assume the waypoint is static, but the extension to dynamic

waypoints is trivial.

At its core, GESTALT is a set of routines that decide the next

best direction for a rover to move, given the state of the world

already seen, new sensor data, and a desired waypoint goal.

It ﬁrst checks to see if the rover has already reached its goal,

or at least a point within some tolerance band around it. If

so, the navigation cycle has completed and the traverse will

terminate successfully.

The rover will rarely start out already at its goal, however.

When it has any distance left to travel, it will evaluate its ter-

rain information to determine the safety of all possible nearby

turns. Sensed data about the terrain can come from any num-

ber or type of sensors, so long as their results are preﬁltered to

provide individual point measurements of (X,Y,Z) data in the

rover’s (not the sensor’s) coordinate frame. GESTALT then

chooses from among the safe turns, one that will best help it

reach the goal. The desired turn and a short distance (e.g.,

35 cm) is then sent to the low-level wheel controller, and the

rover is commanded to move blindly.

While the rover is driving its next step, it will not use its imag-

ing sensors to look for obstacles. Other types of safeguarding

will likely be enabled (e.g., tilt sensors, motor current limits,

potentiometers that monitor kinematic limit conﬁgurations),

but no additional high-level terrain-based planning or sensing

need be performed.

At the end of each step, sensors are expected to provide

a reasonably accurate estimate of the rover’s new position.

GESTALT does not require that the rover motion exactly

match that which was commanded, but it does assume that

wherever the rover ended up, its relative position and orien-

tation can be reasonably inferred and provided as input. That

is one limitation of the system, that it relies on other modules

to deal with myriad position estimation problems (slipping in

sand, getting stuck on a rock, freeing a jammed wheel, etc).

Figure 3. A successful 5 meter run through a narrow obstacle course. The upper left image shows actual obstacle locations

and actual rover path, as measured using a surveyor’s ranging theodolite. The upper right image is a picture of the test course

and rover. The bottom image is rendered at the same scale as the upper left image, and shows the local map built by the rover

during its traverse. This bottom image is one of GESTALT’s diagnostic images, and includes (1) the rover view from the left

forward-facing camera with grid superimposed, (2) an elevation image corresponding to (1), (3) the local occupancy grid with

(dark) obstacles and possible steering arcs, and (4) a ranking of possible headings showing best heading.

Choosing a Safe Direction

Range images generated by Stereo Vision are usually not suf-

ﬁcient, in and of themselves, to determine a safe driving path.

Field of view restrictions and error recovery behaviors might

force a rover to turn into an unseen area. For this reason we

keep a local map of the area around the rover, so that it can

reason more effectively about its surroundings. This map is

maintained not from the perspective of the rover cameras, but

from an overhead “bird’s eye” viewpoint. Figure 3 shows an

example of a rover map next to a similarly-scaled (but inde-

pendent) measurement of the environment.

GESTALT models the world as a grid of regularly spaced

cells, with each cell typically the size of a rover’s wheel. Each

cell stores an 8 bit goodness and certainty value, or is tagged

unknown. The resolution of the grid cells, the evaluation as-

signed to particular types of obstacles, the types of tests to be

performed, are all parameters that may be changed prior to

(some even during) a traverse; a nearly complete list can be

found in Table 1.

The GESTALT local map currently uses a conﬁguration

space representation of the environment. That is, the contents

of each cell in its map represents whether a rover-sized object

Stereo vision and rover navigation software for planetary exploration

Figures

Citations

Visual odometry

Visual Navigation for Mobile Robots: A Survey

Off-Road Obstacle Avoidance through End-to-End Learning

Natural terrain classification using three-dimensional ladar data for ground robot mobility

Remote sensing platforms and sensors: A survey

References

Recent progress in local and global traversability for planetary rovers

Results from the Mars Pathfinder camera.

A portable, autonomous, urban reconnaissance robot

Stereo ego-motion improvements for robust rover navigation

Stochastic performance, modeling and evaluation of obstacle detectability with imaging range sensors

Related Papers (5)

Recent progress in local and global traversability for planetary rovers

Terrain perception for DEMO III

Obstacle Detection and Terrain Classification for Autonomous Off-Road Navigation

Visual odometry on the Mars Exploration Rovers

Two years of Visual Odometry on the Mars Exploration Rovers

Frequently Asked Questions (2)

Q1. What are the contributions mentioned in the paper "Stereo vision and rover navigation software for planetary exploration" ?

Q2. What have the authors stated for future works in "Stereo vision and rover navigation software for planetary exploration" ?