How long does it take to generate a remapped image?

The remapping process takes three 50 ns clock cycles per pixel, giving a total of about 3 ms togenerate a 128 128 remapped image.

How can the GOLD system work in pipelined?

Since the GOLD system is composed of two independent computational engines (the PAPRICA system, running the low-level processing, and its host computer, running the medium-level processing), it can work in pipelined.

What is the choice of for the remapping phase?

The choice of depends on the road markings width, on the image acquisition process, and on the parameters used in the remapping phase.

What is the last phase of the whole computational cycle?

The last phase of the whole computational cycle is the displaying of results on the control panel, issuing warnings to the driver.

What is the simplest way to determine the maximum value of the histogram?

In order to allow a nonfixed road geometry (and also the handling of curves) the histogram is lowpass filtered; finally, its maximum value is determined.

Why is the polar histogram used for the detection of triangles so small?

Due to the small distance between and instead of computing two different polar histograms (having focus on and , a single one is considered.

How can the power consumption of dynamic systems be considered proportional to where represents the capacitance?

The power consumption of dynamic systems can be considered proportional to where represents the capacitance of the circuit, is the clock frequency, and is the voltage swing.

How many pixels is the vertical shift between two subsequent remapped images?

Considering an operational vehicle speed of 100 km/h and the MOB-LAB calibration setup, the vertical shift between two subsequent remapped images corresponding to two frames acquired with a temporal shift of 100 ms is only 7 pixels.

What is the main problem in the detection of obstacles?

The farther the obstacle, the smaller the portion of triangles detectable in the difference image, and thus the lower the amplitude of peaks in the polar histogram; nevertheless, for sufficiently high obstacles (e.g., vehicles at about 50 m far from the cameras), the main problem is not the detection of peaks, but their joining, as shown in Fig. 28(a)–(c).

How many time slots does the GOLD system require?

As shown in Fig. 24, the whole processing (lane and obstacles detection) requires five time slots (100 ms);2 the GOLD system works at a rate of 10 Hz.

(Open Access) GOLD: a parallel real-time stereo vision system for generic obstacle and lane detection (1998) | Massimo Bertozzi

62 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 1, JANUARY 1998

GOLD: A Parallel Real-Time Stereo Vision

System for Generic Obstacle and Lane Detection

Massimo Bertozzi, Student Member, IEEE, and Alberto Broggi, Associate Member, IEEE

Abstract—This paper describes the Generic Obstacle and Lane

Detection system (GOLD), a stereo vision-based hardware and

software architecture to be used on moving vehicles to increment

road safety. Based on a full-custom massively parallel hardware,

it allows to detect both generic obstacles (without constraints

on symmetry or shape) and the lane position in a structured

environment (with painted lane markings) at a rate of 10 Hz.

Thanks to a geometrical transform supported by a speciﬁc hard-

ware module, the perspective effect is removed from both left

and right stereo images; the left is used to detect lane markings

with a series of morphological ﬁlters, while both remapped stereo

images are used for the detection of free-space in front of the

vehicle. The output of the processing is displayed on both an on-

board monitor and a control-panel to give visual feedbacks to the

driver. The system was tested on the mobile laboratory (MOB-

LAB) experimental land vehicle, which was driven for more than

3000 km along extra-urban roads and freeways at speeds up to 80

km/h, and demonstrated its robustness with respect to shadows

and changing illumination conditions, different road textures, and

vehicle movement.

I. INTRODUCTION

HE MAIN issues addressed in this work are lane de-

tection and obstacle detection, both implemented using

only visual data acquired from standard cameras installed on

a mobile vehicle.

A. Lane Detection

Road following, namely the closing of the control loop that

enables a vehicle to drive within a given portion of the road,

has been differently approached and implemented in research

prototype vehicles. Most of the systems developed worldwide

are based on lane detection: ﬁrst, the relative position of the

vehicle with respect to the lane is computed, and then actuators

are driven to keep the vehicle in a safe position. Others [15],

[28], [38] are not based on the preliminary detection of the

road position, but, as in the case of ALVINN [43], [44], derive

the commands to issue to the actuators (steering wheel angles)

directly from visual patterns detected in the incoming images.

In any case, the knowledge of the lane position can be of use

for other purposes, such as the determination of the regions of

interest for other driving assistance functions.

Manuscript received April 5, 1996; revised March 24, 1997. This work

was supported in part by the Italian National Research Council under the

framework of the Progetto Finalizzato Trasporti 2. The associate editor

coordinating the review of this manuscript and approving it for publication

was Prof. Jeffrey J. Rodriguez.

The authors are with the Department of Information Technology, Uni-

versity of Parma, I-43100 Parma, Italy (e-mail: bertozzi@CE.UniPR.IT;

broggi@CE.UniPR.IT).

Publisher Item Identiﬁer S 1057-7149(98)00313-3.

The main problems that must be faced in the detection

of road boundaries or lane markings are: 1) the presence of

shadows, producing artifacts onto the road surface, and thus

altering its texture, and 2) the presence of other vehicles on the

path, partly occluding the visibility of the road. Although some

systems have been designed to work on nonstructured roads

(without painted lane markings) [28] or on unstructured terrain

[39], [52], generally lane detection relies on the presence of

painted road markings on the road surface. Therefore, since

lane detection is generally based on the localization of a

speciﬁc pattern (the lane markings) in the acquired image,

it can be performed with the analysis of a single still image.

In addition, some assumptions can aid the detection algorithm

and/or speed-up the processing. They range from the analysis

of speciﬁc regions of interest in the image (in which, due to

both physical and continuity constraints, it is more probable

to ﬁnd the lane markings) [18] to the assumption of a ﬁxed-

width lane (thus dealing with only parallel lane markings), to

the assumption of a precise road geometry (such as a clothoid)

[18], [33], [58], to the assumption of a ﬂat road (the one

considered in this work).

The techniques implemented in the previously mentioned

systems range from the determination of the characteristics

of painted lane markings [30] eventually aided by color

information [19] to the use of deformable templates (such as

LOIS [31], DBS [7], or ARCADE [29]), to an edge-based

recognition using a morphological paradigm [3], [5], [59], to

a model-based approach (as implemented in VaMoRs [26]

or SCARF [17]). A model-based analysis of road markings

has also been used to perform the analysis of intersections

in city trafﬁc images [21], [32]; nevertheless, as discussed in

[46], the use of a model-based search approach has several

drawbacks, such as the problem of using and maintaining an

appropriate geometrical road model, the difﬁculty in detecting

and matching complex road features, and the complexity of

the computations involved.

Moreover, some systems (such as [46]) work in the velocity

domain instead of the image domain, thus using optical-

ﬂow techniques in order to minimize the horizontal relative

movement of the lane markings with respect to the vehicle.

Unfortunately, such a solution requires both the preliminary

detection of lane markings and the following computation of

the optical ﬂow ﬁeld.

B. Obstacle Detection

The techniques used in the detection of obstacles may vary

according to the deﬁnition of “obstacle.” If “obstacle” means

1057–7149/98$10.00  1998 IEEE

BERTOZZI AND BROGGI: PARALLEL REAL-TIME STEREO VISION SYSTEM 63

a vehicle, then the detection is based on a search for speciﬁc

patterns, possibly supported by other features, such as shape

[56], symmetry [61], or the use of a bounding box [1]. Also,

in this case, the processing can be based on the analysis of a

single still image.

Conversely, if we intend as obstacle any object that can

obstruct the vehicle’s driving path or anything raising out

signiﬁcantly from the road surface, obstacle detection is gen-

erally reduced to the detection of free-space instead of the

recognition of speciﬁc patterns. In this case, different tech-

niques can be used, such as 1) the analysis of the optical

ﬂow ﬁeld, and 2) the processing of stereo images; both

of these require two or more images, thus leading to a

higher computational complexity, which is further increased

by the necessity to handle noise caused by vehicle movements.

Obstacle detection using the optical ﬂow approach [13], [20] is

generally divided into two steps: ﬁrst, ego-motion is computed

from the analysis of optical ﬂow [25] or obtained from

odometry [35]; then obstacles are detected by the analysis

of the differences between the expected and the real velocity

ﬁeld.

On the other hand, the main problem of stereo vision

techniques is the detection of correspondences between two

stereo images (or three images, in case of trinocular vision

[49]). The advantage of the analysis of stereo images instead

of a monocular sequence of images is the possibility to

detect directly the presence of obstacles, which, in case of

an optical ﬂow-based approach, is indirectly derived from the

analysis of the velocity ﬁeld. Moreover, in a limit condition

where both vehicle and obstacles have small or null speeds,

the second approach fails while the former still can detect

obstacles. Furthermore, to decrease the intrinsic complexity of

stereo vision, some domain speciﬁc constraints are generally

adopted.

As in [33], the Generic Obstacle and Lane Detection

(GOLD) system addresses both lane detection and obstacle

detection at the same time: lane detection is based on a

pattern-matching technique that relies on the presence of road

markings, while the localization of obstacles in front of the

vehicle is performed by the processing of pairs of stereo

images: in order to be fast and robust with respect to camera

calibration and vehicle movements, the detection of a generic

obstacle is reduced to the determination of the free-space

in front of the vehicle without any three-dimensional (3-D)

world reconstruction.

Both functionalities share the same underlying approach

(image warping), which is based on the assumption of a ﬂat

road. Such a technique has been successfully used for the

computation of the optical ﬂow ﬁeld [36], for the detection

of obstacles in a structured environment [34], [60], or in the

automotive ﬁeld [37], [42], [45] (using standard cameras) or

[50], [57] (using linear cameras). It is based on a transform

that, given a model of the road in front of the vehicle (e.g.

ﬂat road), remaps the right image onto the left; any disparity

is caused by a deviation from the road model, thus detecting

possible obstacles.

Contrary to other works [33], [37], [42], GOLD performs

two warpings instead of one, remapping both images into

(a)

(b)

(c)

Fig. 1. (a) MOB-LAB land vehicle. (b) Control panel used as output to

display the processing results. (c) ARGO autonomous passengers car.

a different domain (road domain), in which the following

processings are extremely simpliﬁed. Hence, the reprojection

64 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 1, JANUARY 1998

(a) (b)

Fig. 2. (a) Road markings width changes according to their position within the image. (b) Due to the perspective effect, different pixels represent

different portions of the road.

[33], [58] of the results in the road domain is no more required.

Moreover, since both GOLD functionalities are based on the

processing of images remapped into the same domain, the

fusion of the result of the two independent processings is

straightforward.

The GOLD system has been tested on mobile laboratory

(MOB-LAB) experimental land vehicle, integrating the results

of the Italian Research Units involved in the PROMETHEUS

project. MOB-LAB [see Fig. 1(a)] is equipped with four

cameras, two of which are used for this experiment, several

computers, monitors, and a control-panel [see Fig. 1(b)] to

give a visual feedback and warnings to the driver. The GOLD

system is now being ported to ARGO [2] [see Fig. 1(c)], a

Lancia Thema passenger car with automatic steering capabil-

ities.

This work is organized as follows: Section II presents

the basics of the underlying approach used to remove the

perspective effect from a monocular image, while Section III

describes its application to the processing of stereo images.

Section IV describes the lane detection and obstacle detec-

tion functionalities; Section V presents the computing engine

that has been developed as a support to the GOLD system;

Section VI presents the analysis of the time performance of

the current implementation; ﬁnally, Section VII ends the paper

with a discussion about the problems of the system, their

possible solutions, and future developments.

II. I

NVERSE PERSPECTIVE MAPPING

Due to its intrinsic nature, low-level image processing

is efﬁciently performed on single instruction multiple data

(SIMD) systems by means of a massively parallel compu-

tational paradigm. Anyway, this approach is meaningful in

the case of generic ﬁlterings (such as noise reduction, edge

detection, and image enhancement), which consider the image

as a mere collection of pixels, independent of their semantic

content.

On the other hand, the implementation of more sophisticated

ﬁlters requires some semantic knowledge. As an example, let

us consider the speciﬁc problem of road markings detection in

an image acquired from a vehicle. Due to the perspective effect

introduced by the acquisition conditions, the road markings

width changes according to their distance from the camera [see

Fig. 2(a)]. Therefore, the correct detection of road markings

Fig. 3. Relationship between the two coordinate systems.

should be based on matchings with patterns with different

size, according to the speciﬁc position within the image.

Unfortunately, this differentiated low-level processing cannot

be efﬁciently performed on SIMD massively parallel systems,

which by deﬁnition perform the same processing on each pixel

of the image.

The perspective effect associates different meanings to

different image pixels, depending on their position in the

image [see Fig. 2(b)]. Conversely, after the removal of the

perspective effect, each pixel represents the same portion of the

road,

allowing a homogeneous distribution of the information

among all image pixels; to remove the perspective effect, it is

necessary to know the speciﬁc acquisition conditions (camera

position, orientation, optics, etc.) and the scene represented in

the image (the road, which is now assumed to be ﬂat). This

constitutes the a priori knowledge.

Now, recalling the example of road markings detection, the

size and shape of the matching template can be independent

of the pixel position. Therefore, road markings detection can

be conveniently divided into two steps: the ﬁrst, exploiting the

a priori knowledge, is a transform that generates an image in

a new domain where the detection of the features of interest

is extremely simpliﬁed; the second, exploiting the sensorial

A pixel in the lower part of the image of Fig. 2(a) represents a few cm

of the road, while a pixel in the middle of the same image represents a few

tens of cm

, or even more.

BERTOZZI AND BROGGI: PARALLEL REAL-TIME STEREO VISION SYSTEM 65

(a) (b)

Fig. 4. (a) The

plane in the

space and (b) the

z

plane.

(a) (b)

Fig. 5. (a) Original and remapped images. (b) In grey, the visible portion of the road.

data, consists of a mere low-level morphological processing.

The removal of the perspective effect allows to detect road

markings through an extremely simple and fast morphological

processing that can be efﬁciently implemented on massively

parallel SIMD architectures.

A. Removing the Perspective Effect

The procedure aimed to remove the perspective effect

resamples the incoming image, remapping each pixel toward

a different position and producing a new two-dimensional (2-

D) array of pixels. The resulting image represents a top view

of the road region in front of the vehicle, as it was observed

from a signiﬁcant height.

Two Euclidean spaces are deﬁned, as follows.

•

, representing the 3-D world space

(world-coordinate), where the real world is deﬁned.

•

, representing the 2-D image space

(screen-coordinate), where the 3-D scene is projected.

The image acquired by the camera belongs to the

space,

while the remapped image is deﬁned as the

plane of

the

space (according to the assumption of a ﬂat road). The

remapping process projects the acquired image onto the

plane of the 3-D world space Fig. 3 shows the relationships

between the two spaces

and

1) Mapping: In order to generate a 2-D view of a

3-D scene, the following parameters must be known [41].

1) Viewpoint: camera position is

2) Viewing Direction: optical axis

is determined by the

following angles:

the angle formed by the projection (deﬁned by versor

) of the optical axis on the plane and the

axis [as shown in Fig. 4(a)];

the angle formed by the optical axis and versor

[as shown in Fig. 4(b)].

3) Aperture: camera angular aperture is

4) Resolution: camera resolution is

After simple manipulations [6], the ﬁnal mapping

as a function of and is given by

(1)

66 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 1, JANUARY 1998

Fig. 6. Horopter surface corresponding to different angles between the

optical axes of two stereo cameras.

with Given the coordinates of

a generic point

in the space, (1) return the coordinates

of the corresponding point in the space (see

Fig. 3).

Mapping: The inverse transform

(the dual mapping) is given as follows [6]:

and

(2)

The remapping process deﬁned by (2) removes the perspec-

tive effect and recovers the texture of the

plane of the

space. It is implemented scanning the array of pixels of

coordinates

which form the remapped image,

in order to associate to each of them the corresponding value

assumed by the point of coordinates

As an example, Fig. 5(a) shows the original and remapped

images: it is clearly visible that in this case the road markings

width is almost invariant within the whole image. The reso-

lution of the remapped image has been chosen as a trade-off

between information loss and processing time; the remapped

image shown in Fig. 5(a) has been obtained without preserving

the original aspect-ratio. Note that the lower portion of the

remapped image is undeﬁned: this is due to the speciﬁc camera

position and orientation [see Fig. 5(b)].

III. S

TEREO INVERSE PERSPECTIVE MAPPING

A 3-D description of the world using a single 2-D image

is impossible without a priori knowledge, due to the depth

loss during acquisition; for many years stereo vision has

been investigated as an answer to this problem. Generally,

traditional techniques for the processing of pairs of stereo

images are divided into the following four main steps:

1) calibration of the two cameras;

2) localization of a feature in an image;

3) identiﬁcation and localization of the same feature in the

other image;

4) reconstruction of the 3-D scene.

Whenever the mapping between points corresponding to the

same feature (homologous points) can be determined, the prob-

lem of 3-D reconstruction can be solved using triangulations.

The intrinsic complexity of the determination of homologous

points can be reduced with the introduction of some domain-

speciﬁc constraints, such as the assumption of a ﬂat road in

front of the cameras.

The set of points

where and represent the pro-

jection of

in the space of the left and right

camera respectively, is called horopter and represents the zero

disparity surface of the stereo system [11]. This means that the

two stereo views of an object whose shape and displacement

matches the horopter are identical. This concept is extremely

useful when the horopter coincides with a model of the road

surface, since any deviation from this model can be easily

detected. The horopter is a spherical surface, the smaller the

difference between the orientation of the two cameras (camera

vergence) the larger the radius [22]. Assuming a small camera

vergence, as generally happens in the automotive ﬁeld, the

horopter can be considered planar. As shown in Fig. 6, the

horopter can be moved acting on camera vergence parameters.

Unfortunately, the horopter cannot be overlapped with the

plane (representing the ﬂat road model) using only

camera vergence; for this purpose, electronic vergence, such

as inverse perspective mapping (IPM), is required.

In this way the search for homologous points is reduced to a

simple veriﬁcation (check) of the shape of the horopter: in fact

under the ﬂat road hypothesis, the IPM algorithm can be used

to produce an image representing the road as seen from the top.

Using the IPM algorithm with appropriate parameters on stereo

images, different patches of the road surface can be obtained.

Moreover the knowledge of the parameters of the whole vision

system allows to bring the two road patches to correspondence.

This means that, under the ﬂat road hypothesis, pairs of

pixels having the same image coordinates in the two remapped

images are homologous points and represent the same points

in the road plane.

The ﬂat road hypothesis can be veriﬁed computing the dif-

ference between the two remapped images: a generic obstacle

(anything raising out from the road) is detected if the difference

image presents sufﬁciently large clusters of nonzero pixels

having a speciﬁc shape. Due to the different position of the

two cameras, the difference image can be computed only for

the overlapping area of the two road patches.

In addition, it is easily demonstrable that the IPM algorithm

maps straight lines perpendicular to the road plane into straight

lines passing through the projection

of the

camera onto the plane

(see Fig. 4): using formula

(1), a vertical straight line is represented by the set of pixels

GOLD: a parallel real-time stereo vision system for generic obstacle and lane detection

Figures

Citations

A Learning Approach Towards Detection and Tracking of Lane Markings

Enhanced Road Boundary and Obstacle Detection Using a Downward-Looking LIDAR Sensor

HSI color model based lane-marking detection

Cooperative Fusion for Multi-Obstacles Detection With Use of Stereovision and Laser Scanner

An integrated stereo-based approach to automatic vehicle guidance

References

Image Analysis and Mathematical Morphology

Low-power CMOS digital design

Image Analysis Using Mathematical Morphology

Low-Power CMOS Digital Design

Principles of Interactive Computer Graphics

Related Papers (5)

Lane detection and tracking using B-Snake

Video-based lane estimation and tracking for driver assistance: survey, system, and evaluation

Robust Lane Detection and Tracking in Challenging Scenarios

Real time detection of lane markers in urban streets

Recursive 3-D road and relative ego-state recognition

Frequently Asked Questions (14)

Q1. What are the contributions in "Gold: a parallel real-time stereo vision system for generic obstacle and lane detection" ?

Q2. How long does it take to generate a remapped image?

Q3. What is the purpose of the removal of the perspective effect?

Q4. How can the GOLD system work in pipelined?

Q5. What is the choice of for the remapping phase?

Q6. What is the last phase of the whole computational cycle?

Q7. What is the main problem in the detection of road boundaries?

Q8. What is the simplest way to determine the maximum value of the histogram?

Q9. Why is the polar histogram used for the detection of triangles so small?

Q10. What is the way to verify the shape of the horopter?

Q11. How can the power consumption of dynamic systems be considered proportional to where represents the capacitance?

Q12. How many pixels is the vertical shift between two subsequent remapped images?

Q13. What is the main problem in the detection of obstacles?

Q14. How many time slots does the GOLD system require?