scispace - formally typeset
Open AccessJournal ArticleDOI

GOLD: a parallel real-time stereo vision system for generic obstacle and lane detection

TLDR
The generic obstacle and lane detection system (GOLD), a stereo vision-based hardware and software architecture to be used on moving vehicles to increment road safety, allows to detect both generic obstacles and the lane position in a structured environment at a rate of 10 Hz.
Abstract
This paper describes the generic obstacle and lane detection system (GOLD), a stereo vision-based hardware and software architecture to be used on moving vehicles to increment road safety. Based on a full-custom massively parallel hardware, it allows to detect both generic obstacles (without constraints on symmetry or shape) and the lane position in a structured environment (with painted lane markings) at a rate of 10 Hz. Thanks to a geometrical transform supported by a specific hardware module, the perspective effect is removed from both left and right stereo images; the left is used to detect lane markings with a series of morphological filters, while both remapped stereo images are used for the detection of free-space in front of the vehicle. The output of the processing is displayed on both an on-board monitor and a control-panel to give visual feedbacks to the driver. The system was tested on the mobile laboratory (MOB-LAB) experimental land vehicle, which was driven for more than 3000 km along extra-urban roads and freeways at speeds up to 80 km/h, and demonstrated its robustness with respect to shadows and changing illumination conditions, different road textures, and vehicle movement.

read more

Content maybe subject to copyright    Report

62 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 1, JANUARY 1998
GOLD: A Parallel Real-Time Stereo Vision
System for Generic Obstacle and Lane Detection
Massimo Bertozzi, Student Member, IEEE, and Alberto Broggi, Associate Member, IEEE
AbstractThis paper describes the Generic Obstacle and Lane
Detection system (GOLD), a stereo vision-based hardware and
software architecture to be used on moving vehicles to increment
road safety. Based on a full-custom massively parallel hardware,
it allows to detect both generic obstacles (without constraints
on symmetry or shape) and the lane position in a structured
environment (with painted lane markings) at a rate of 10 Hz.
Thanks to a geometrical transform supported by a specific hard-
ware module, the perspective effect is removed from both left
and right stereo images; the left is used to detect lane markings
with a series of morphological filters, while both remapped stereo
images are used for the detection of free-space in front of the
vehicle. The output of the processing is displayed on both an on-
board monitor and a control-panel to give visual feedbacks to the
driver. The system was tested on the mobile laboratory (MOB-
LAB) experimental land vehicle, which was driven for more than
3000 km along extra-urban roads and freeways at speeds up to 80
km/h, and demonstrated its robustness with respect to shadows
and changing illumination conditions, different road textures, and
vehicle movement.
I. INTRODUCTION
T
HE MAIN issues addressed in this work are lane de-
tection and obstacle detection, both implemented using
only visual data acquired from standard cameras installed on
a mobile vehicle.
A. Lane Detection
Road following, namely the closing of the control loop that
enables a vehicle to drive within a given portion of the road,
has been differently approached and implemented in research
prototype vehicles. Most of the systems developed worldwide
are based on lane detection: first, the relative position of the
vehicle with respect to the lane is computed, and then actuators
are driven to keep the vehicle in a safe position. Others [15],
[28], [38] are not based on the preliminary detection of the
road position, but, as in the case of ALVINN [43], [44], derive
the commands to issue to the actuators (steering wheel angles)
directly from visual patterns detected in the incoming images.
In any case, the knowledge of the lane position can be of use
for other purposes, such as the determination of the regions of
interest for other driving assistance functions.
Manuscript received April 5, 1996; revised March 24, 1997. This work
was supported in part by the Italian National Research Council under the
framework of the Progetto Finalizzato Trasporti 2. The associate editor
coordinating the review of this manuscript and approving it for publication
was Prof. Jeffrey J. Rodriguez.
The authors are with the Department of Information Technology, Uni-
versity of Parma, I-43100 Parma, Italy (e-mail: bertozzi@CE.UniPR.IT;
broggi@CE.UniPR.IT).
Publisher Item Identifier S 1057-7149(98)00313-3.
The main problems that must be faced in the detection
of road boundaries or lane markings are: 1) the presence of
shadows, producing artifacts onto the road surface, and thus
altering its texture, and 2) the presence of other vehicles on the
path, partly occluding the visibility of the road. Although some
systems have been designed to work on nonstructured roads
(without painted lane markings) [28] or on unstructured terrain
[39], [52], generally lane detection relies on the presence of
painted road markings on the road surface. Therefore, since
lane detection is generally based on the localization of a
specific pattern (the lane markings) in the acquired image,
it can be performed with the analysis of a single still image.
In addition, some assumptions can aid the detection algorithm
and/or speed-up the processing. They range from the analysis
of specific regions of interest in the image (in which, due to
both physical and continuity constraints, it is more probable
to find the lane markings) [18] to the assumption of a fixed-
width lane (thus dealing with only parallel lane markings), to
the assumption of a precise road geometry (such as a clothoid)
[18], [33], [58], to the assumption of a flat road (the one
considered in this work).
The techniques implemented in the previously mentioned
systems range from the determination of the characteristics
of painted lane markings [30] eventually aided by color
information [19] to the use of deformable templates (such as
LOIS [31], DBS [7], or ARCADE [29]), to an edge-based
recognition using a morphological paradigm [3], [5], [59], to
a model-based approach (as implemented in VaMoRs [26]
or SCARF [17]). A model-based analysis of road markings
has also been used to perform the analysis of intersections
in city traffic images [21], [32]; nevertheless, as discussed in
[46], the use of a model-based search approach has several
drawbacks, such as the problem of using and maintaining an
appropriate geometrical road model, the difficulty in detecting
and matching complex road features, and the complexity of
the computations involved.
Moreover, some systems (such as [46]) work in the velocity
domain instead of the image domain, thus using optical-
flow techniques in order to minimize the horizontal relative
movement of the lane markings with respect to the vehicle.
Unfortunately, such a solution requires both the preliminary
detection of lane markings and the following computation of
the optical flow field.
B. Obstacle Detection
The techniques used in the detection of obstacles may vary
according to the definition of “obstacle.” If “obstacle” means
1057–7149/98$10.00 1998 IEEE

BERTOZZI AND BROGGI: PARALLEL REAL-TIME STEREO VISION SYSTEM 63
a vehicle, then the detection is based on a search for specific
patterns, possibly supported by other features, such as shape
[56], symmetry [61], or the use of a bounding box [1]. Also,
in this case, the processing can be based on the analysis of a
single still image.
Conversely, if we intend as obstacle any object that can
obstruct the vehicle’s driving path or anything raising out
significantly from the road surface, obstacle detection is gen-
erally reduced to the detection of free-space instead of the
recognition of specific patterns. In this case, different tech-
niques can be used, such as 1) the analysis of the optical
flow field, and 2) the processing of stereo images; both
of these require two or more images, thus leading to a
higher computational complexity, which is further increased
by the necessity to handle noise caused by vehicle movements.
Obstacle detection using the optical flow approach [13], [20] is
generally divided into two steps: first, ego-motion is computed
from the analysis of optical flow [25] or obtained from
odometry [35]; then obstacles are detected by the analysis
of the differences between the expected and the real velocity
field.
On the other hand, the main problem of stereo vision
techniques is the detection of correspondences between two
stereo images (or three images, in case of trinocular vision
[49]). The advantage of the analysis of stereo images instead
of a monocular sequence of images is the possibility to
detect directly the presence of obstacles, which, in case of
an optical flow-based approach, is indirectly derived from the
analysis of the velocity field. Moreover, in a limit condition
where both vehicle and obstacles have small or null speeds,
the second approach fails while the former still can detect
obstacles. Furthermore, to decrease the intrinsic complexity of
stereo vision, some domain specific constraints are generally
adopted.
As in [33], the Generic Obstacle and Lane Detection
(GOLD) system addresses both lane detection and obstacle
detection at the same time: lane detection is based on a
pattern-matching technique that relies on the presence of road
markings, while the localization of obstacles in front of the
vehicle is performed by the processing of pairs of stereo
images: in order to be fast and robust with respect to camera
calibration and vehicle movements, the detection of a generic
obstacle is reduced to the determination of the free-space
in front of the vehicle without any three-dimensional (3-D)
world reconstruction.
Both functionalities share the same underlying approach
(image warping), which is based on the assumption of a flat
road. Such a technique has been successfully used for the
computation of the optical flow field [36], for the detection
of obstacles in a structured environment [34], [60], or in the
automotive field [37], [42], [45] (using standard cameras) or
[50], [57] (using linear cameras). It is based on a transform
that, given a model of the road in front of the vehicle (e.g.
flat road), remaps the right image onto the left; any disparity
is caused by a deviation from the road model, thus detecting
possible obstacles.
Contrary to other works [33], [37], [42], GOLD performs
two warpings instead of one, remapping both images into
(a)
(b)
(c)
Fig. 1. (a) MOB-LAB land vehicle. (b) Control panel used as output to
display the processing results. (c) ARGO autonomous passengers car.
a different domain (road domain), in which the following
processings are extremely simplified. Hence, the reprojection

64 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 1, JANUARY 1998
(a) (b)
Fig. 2. (a) Road markings width changes according to their position within the image. (b) Due to the perspective effect, different pixels represent
different portions of the road.
[33], [58] of the results in the road domain is no more required.
Moreover, since both GOLD functionalities are based on the
processing of images remapped into the same domain, the
fusion of the result of the two independent processings is
straightforward.
The GOLD system has been tested on mobile laboratory
(MOB-LAB) experimental land vehicle, integrating the results
of the Italian Research Units involved in the PROMETHEUS
project. MOB-LAB [see Fig. 1(a)] is equipped with four
cameras, two of which are used for this experiment, several
computers, monitors, and a control-panel [see Fig. 1(b)] to
give a visual feedback and warnings to the driver. The GOLD
system is now being ported to ARGO [2] [see Fig. 1(c)], a
Lancia Thema passenger car with automatic steering capabil-
ities.
This work is organized as follows: Section II presents
the basics of the underlying approach used to remove the
perspective effect from a monocular image, while Section III
describes its application to the processing of stereo images.
Section IV describes the lane detection and obstacle detec-
tion functionalities; Section V presents the computing engine
that has been developed as a support to the GOLD system;
Section VI presents the analysis of the time performance of
the current implementation; finally, Section VII ends the paper
with a discussion about the problems of the system, their
possible solutions, and future developments.
II. I
NVERSE PERSPECTIVE MAPPING
Due to its intrinsic nature, low-level image processing
is efficiently performed on single instruction multiple data
(SIMD) systems by means of a massively parallel compu-
tational paradigm. Anyway, this approach is meaningful in
the case of generic filterings (such as noise reduction, edge
detection, and image enhancement), which consider the image
as a mere collection of pixels, independent of their semantic
content.
On the other hand, the implementation of more sophisticated
filters requires some semantic knowledge. As an example, let
us consider the specific problem of road markings detection in
an image acquired from a vehicle. Due to the perspective effect
introduced by the acquisition conditions, the road markings
width changes according to their distance from the camera [see
Fig. 2(a)]. Therefore, the correct detection of road markings
Fig. 3. Relationship between the two coordinate systems.
should be based on matchings with patterns with different
size, according to the specific position within the image.
Unfortunately, this differentiated low-level processing cannot
be efficiently performed on SIMD massively parallel systems,
which by definition perform the same processing on each pixel
of the image.
The perspective effect associates different meanings to
different image pixels, depending on their position in the
image [see Fig. 2(b)]. Conversely, after the removal of the
perspective effect, each pixel represents the same portion of the
road,
1
allowing a homogeneous distribution of the information
among all image pixels; to remove the perspective effect, it is
necessary to know the specific acquisition conditions (camera
position, orientation, optics, etc.) and the scene represented in
the image (the road, which is now assumed to be flat). This
constitutes the a priori knowledge.
Now, recalling the example of road markings detection, the
size and shape of the matching template can be independent
of the pixel position. Therefore, road markings detection can
be conveniently divided into two steps: the first, exploiting the
a priori knowledge, is a transform that generates an image in
a new domain where the detection of the features of interest
is extremely simplified; the second, exploiting the sensorial
1
A pixel in the lower part of the image of Fig. 2(a) represents a few cm
2
of the road, while a pixel in the middle of the same image represents a few
tens of cm
2
, or even more.

BERTOZZI AND BROGGI: PARALLEL REAL-TIME STEREO VISION SYSTEM 65
(a) (b)
Fig. 4. (a) The
xy
plane in the
W
space and (b) the
z
plane.
(a) (b)
Fig. 5. (a) Original and remapped images. (b) In grey, the visible portion of the road.
data, consists of a mere low-level morphological processing.
The removal of the perspective effect allows to detect road
markings through an extremely simple and fast morphological
processing that can be efficiently implemented on massively
parallel SIMD architectures.
A. Removing the Perspective Effect
The procedure aimed to remove the perspective effect
resamples the incoming image, remapping each pixel toward
a different position and producing a new two-dimensional (2-
D) array of pixels. The resulting image represents a top view
of the road region in front of the vehicle, as it was observed
from a significant height.
Two Euclidean spaces are defined, as follows.
, representing the 3-D world space
(world-coordinate), where the real world is defined.
, representing the 2-D image space
(screen-coordinate), where the 3-D scene is projected.
The image acquired by the camera belongs to the
space,
while the remapped image is defined as the
plane of
the
space (according to the assumption of a flat road). The
remapping process projects the acquired image onto the
plane of the 3-D world space Fig. 3 shows the relationships
between the two spaces
and
1) Mapping: In order to generate a 2-D view of a
3-D scene, the following parameters must be known [41].
1) Viewpoint: camera position is
.
2) Viewing Direction: optical axis
is determined by the
following angles:
the angle formed by the projection (defined by versor
) of the optical axis on the plane and the
axis [as shown in Fig. 4(a)];
the angle formed by the optical axis and versor
[as shown in Fig. 4(b)].
3) Aperture: camera angular aperture is
.
4) Resolution: camera resolution is
After simple manipulations [6], the final mapping
as a function of and is given by
(1)

66 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 7, NO. 1, JANUARY 1998
Fig. 6. Horopter surface corresponding to different angles between the
optical axes of two stereo cameras.
with Given the coordinates of
a generic point
in the space, (1) return the coordinates
of the corresponding point in the space (see
Fig. 3).
2)
Mapping: The inverse transform
(the dual mapping) is given as follows [6]:
and
(2)
The remapping process defined by (2) removes the perspec-
tive effect and recovers the texture of the
plane of the
space. It is implemented scanning the array of pixels of
coordinates
which form the remapped image,
in order to associate to each of them the corresponding value
assumed by the point of coordinates
As an example, Fig. 5(a) shows the original and remapped
images: it is clearly visible that in this case the road markings
width is almost invariant within the whole image. The reso-
lution of the remapped image has been chosen as a trade-off
between information loss and processing time; the remapped
image shown in Fig. 5(a) has been obtained without preserving
the original aspect-ratio. Note that the lower portion of the
remapped image is undefined: this is due to the specific camera
position and orientation [see Fig. 5(b)].
III. S
TEREO INVERSE PERSPECTIVE MAPPING
A 3-D description of the world using a single 2-D image
is impossible without a priori knowledge, due to the depth
loss during acquisition; for many years stereo vision has
been investigated as an answer to this problem. Generally,
traditional techniques for the processing of pairs of stereo
images are divided into the following four main steps:
1) calibration of the two cameras;
2) localization of a feature in an image;
3) identification and localization of the same feature in the
other image;
4) reconstruction of the 3-D scene.
Whenever the mapping between points corresponding to the
same feature (homologous points) can be determined, the prob-
lem of 3-D reconstruction can be solved using triangulations.
The intrinsic complexity of the determination of homologous
points can be reduced with the introduction of some domain-
specific constraints, such as the assumption of a flat road in
front of the cameras.
The set of points
where and represent the pro-
jection of
in the space of the left and right
camera respectively, is called horopter and represents the zero
disparity surface of the stereo system [11]. This means that the
two stereo views of an object whose shape and displacement
matches the horopter are identical. This concept is extremely
useful when the horopter coincides with a model of the road
surface, since any deviation from this model can be easily
detected. The horopter is a spherical surface, the smaller the
difference between the orientation of the two cameras (camera
vergence) the larger the radius [22]. Assuming a small camera
vergence, as generally happens in the automotive field, the
horopter can be considered planar. As shown in Fig. 6, the
horopter can be moved acting on camera vergence parameters.
Unfortunately, the horopter cannot be overlapped with the
plane (representing the flat road model) using only
camera vergence; for this purpose, electronic vergence, such
as inverse perspective mapping (IPM), is required.
In this way the search for homologous points is reduced to a
simple verification (check) of the shape of the horopter: in fact
under the flat road hypothesis, the IPM algorithm can be used
to produce an image representing the road as seen from the top.
Using the IPM algorithm with appropriate parameters on stereo
images, different patches of the road surface can be obtained.
Moreover the knowledge of the parameters of the whole vision
system allows to bring the two road patches to correspondence.
This means that, under the flat road hypothesis, pairs of
pixels having the same image coordinates in the two remapped
images are homologous points and represent the same points
in the road plane.
The flat road hypothesis can be verified computing the dif-
ference between the two remapped images: a generic obstacle
(anything raising out from the road) is detected if the difference
image presents sufficiently large clusters of nonzero pixels
having a specific shape. Due to the different position of the
two cameras, the difference image can be computed only for
the overlapping area of the two road patches.
In addition, it is easily demonstrable that the IPM algorithm
maps straight lines perpendicular to the road plane into straight
lines passing through the projection
of the
camera onto the plane
(see Fig. 4): using formula
(1), a vertical straight line is represented by the set of pixels

Citations
More filters
Journal ArticleDOI

A Learning Approach Towards Detection and Tracking of Lane Markings

TL;DR: A pixel-hierarchy feature descriptor is proposed to model the contextual information shared by lane markings with the surrounding road region and a robust boosting algorithm to select relevant contextual features for detecting lane markings is proposed.
Journal ArticleDOI

Enhanced Road Boundary and Obstacle Detection Using a Downward-Looking LIDAR Sensor

TL;DR: The proposed method using the estimated roll and pitch angles can detect road boundaries and roadside, as well as road obstacles under various road conditions, including paved and unpaved roads and intersections.
Proceedings ArticleDOI

HSI color model based lane-marking detection

TL;DR: In this paper, a new method using HSI color model for lane-marking detection, HSILMD, is proposed, and robustness of this reduced computation consumption system is observed.
Journal ArticleDOI

Cooperative Fusion for Multi-Obstacles Detection With Use of Stereovision and Laser Scanner

TL;DR: A new cooperative fusion approach between stereovision and laser scanner is proposed in order to take advantage of the best features and cope with the drawbacks of these two sensors to perform robust, accurate and real time-detection of multi-obstacles in the automotive context.

An integrated stereo-based approach to automatic vehicle guidance

Joseph Weber
TL;DR: In this paper, a new approach for vision-based longitudinal and lateral vehicle control is proposed, which integrates two modules consisting of a new, domain-specific binocular stereo algorithm, and a lane marker detection algorithm.
References
More filters
Book

Image Analysis and Mathematical Morphology

Jean Serra
TL;DR: This invaluable reference helps readers assess and simplify problems and their essential requirements and complexities, giving them all the necessary data and methodology to master current theoretical developments and applications, as well as create new ones.
Journal ArticleDOI

Low-power CMOS digital design

TL;DR: In this paper, techniques for low power operation are presented which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations to reduce power consumption in CMOS digital circuits while maintaining computational throughput.
Journal ArticleDOI

Image Analysis Using Mathematical Morphology

TL;DR: The tutorial provided in this paper reviews both binary morphology and gray scale morphology, covering the operations of dilation, erosion, opening, and closing and their relations.
Journal Article

Low-Power CMOS Digital Design

TL;DR: An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations, and is achieved by trading increased silicon area for reduced power consumption.
Book

Principles of Interactive Computer Graphics

TL;DR: The principles of interactive computer graphics are discussed in this article, where the authors propose a set of principles for the development of computer graphics systems, including the principles of Interactive Computer Graphics (ICG).
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions in "Gold: a parallel real-time stereo vision system for generic obstacle and lane detection" ?

This paper describes the Generic Obstacle and Lane Detection system ( GOLD ), a stereo vision-based hardware and software architecture to be used on moving vehicles to increment road safety. 

The remapping process takes three 50 ns clock cycles per pixel, giving a total of about 3 ms togenerate a 128 128 remapped image. 

The removal of the perspective effect allows to detect road markings through an extremely simple and fast morphological processing that can be efficiently implemented on massively parallel SIMD architectures. 

Since the GOLD system is composed of two independent computational engines (the PAPRICA system, running the low-level processing, and its host computer, running the medium-level processing), it can work in pipelined. 

The choice of depends on the road markings width, on the image acquisition process, and on the parameters used in the remapping phase. 

The last phase of the whole computational cycle is the displaying of results on the control panel, issuing warnings to the driver. 

The main problems that must be faced in the detection of road boundaries or lane markings are: 1) the presence of shadows, producing artifacts onto the road surface, and thus altering its texture, and 2) the presence of other vehicles on the path, partly occluding the visibility of the road. 

In order to allow a nonfixed road geometry (and also the handling of curves) the histogram is lowpass filtered; finally, its maximum value is determined. 

Due to the small distance between and instead of computing two different polar histograms (having focus on and , a single one is considered. 

the horopter cannot be overlapped with the plane (representing the flat road model) using only camera vergence; for this purpose, electronic vergence, such as inverse perspective mapping (IPM), is required. 

The power consumption of dynamic systems can be considered proportional to where represents the capacitance of the circuit, is the clock frequency, and is the voltage swing. 

Considering an operational vehicle speed of 100 km/h and the MOB-LAB calibration setup, the vertical shift between two subsequent remapped images corresponding to two frames acquired with a temporal shift of 100 ms is only 7 pixels. 

The farther the obstacle, the smaller the portion of triangles detectable in the difference image, and thus the lower the amplitude of peaks in the polar histogram; nevertheless, for sufficiently high obstacles (e.g., vehicles at about 50 m far from the cameras), the main problem is not the detection of peaks, but their joining, as shown in Fig. 28(a)–(c). 

As shown in Fig. 24, the whole processing (lane and obstacles detection) requires five time slots (100 ms);2 the GOLD system works at a rate of 10 Hz.