scispace - formally typeset
Open AccessBook ChapterDOI

Recognizing Objects in Range Data Using Regional Point Descriptors

Reads0
Chats0
TLDR
Two new regional shape descriptors are introduced: 3D shape contexts and harmonic shape contexts that outperform the others on cluttered scenes on recognition of vehicles in range scans of scenes using a database of 56 cars.
Abstract
Recognition of three dimensional (3D) objects in noisy and cluttered scenes is a challenging problem in 3D computer vision. One approach that has been successful in past research is the regional shape descriptor. In this paper, we introduce two new regional shape descriptors: 3D shape contexts and harmonic shape contexts. We evaluate the performance of these descriptors on the task of recognizing vehicles in range scans of scenes using a database of 56 cars. We compare the two novel descriptors to an existing descriptor, the spin image, showing that the shape context based descriptors have a higher recognition rate on noisy scenes and that 3D shape contexts outperform the others on cluttered scenes.

read more

Content maybe subject to copyright    Report

Recognizing Objects in Range Data Using
Regional Point Descriptors
Andrea Frome
1
, Daniel Huber
2
Ravi Kolluri
1
, Thomas B¨ulow
1?
, and Jitendra
Malik
1
1
University of California Berkeley, Berkeley CA 94530, USA,
{afrome,rkolluri,malik}@cs.berkeley.edu
thomas.buelow@philips.com
2
Carnegie Mellon University, Pittsburgh PA 15213, USA, dhuber@cs.cmu.edu
Abstract. Recognition of three dimensional (3D) objects in noisy and
cluttered scenes is a challenging problem in 3D computer vision. One
approach that has been successful in past research is the regional shape
descriptor. In this paper, we introduce two new regional shape descrip-
tors: 3D shape contexts and harmonic shape contexts. We evaluate the
performance of these descriptors on the task of recognizing vehicles in
range scans of scenes using a database of 56 cars. We compare the two
novel descriptors to an existing descriptor, the spin image, showing that
the shape context based descriptors have a higher recognition rate on
noisy scenes and that 3D shape contexts outperform the others on clut-
tered scenes.
1 Introduction
Recognition of three dimensional (3D) objects in noisy and cluttered scenes is a
challenging problem in 3D computer vision. Given a 3D point cloud produced by
a range scanner observing a 3D scene (Fig. 1), the goal is to identify objects in
the scene (in this case, vehicles) by comparing them to a set of candidate objects.
This problem is challenging for several reasons. First, in range scans, much of the
target object is obscured due to self-occlusion or is occluded by other objects.
Nearby objects can also act as background clutter, which can interfere with the
recognition process. Second, many classes of objects, for example the vehicles in
our experiments, are very similar in shape and size. Third, range scanners have
limited spatial resolution; the surface is only sampled at discrete points, and fine
detail in the objects is usually lost or blurred. Finally, high-speed range scanners
(e.g., flash ladars) introduce significant noise in the range measurement, making
it nearly impossible to manually identify objects.
Object recognition in such a setting is interesting in its own right, but would
also be useful in applications such as scan registration [9][6] and robot local-
ization. The ability to recognize objects in 2 1/2-D images such as range scans
?
Current affiliation is with Philips Research Laboratories, Roentgenstrasse 24-26,
22335 Hamburg

2 Andrea Frome et al.
may also prove valuable in recognizing objects in 2D images when some depth
information can be inferred from cues such as shading or motion.
(a) (b)
Fig. 1. (a) An example of a cluttered scene containing trees, a house, the ground,
and a vehicle to be recognized. (b) A point cloud generated from a scan simulation of
the scene. Notice that the range shadow of the building occludes the front half of the
vehicle.
Many approaches to 3D object recognition have been put forth, including
generalized cylinders [3], superquadrics [7], geons [23], medial axis representa-
tions [1], skeletons [4], shape distributions [19], and spherical harmonic repre-
sentations of global shape [8]. Many of these methods require that the target be
segmented from the background, which makes them difficult to apply to real-
world 3D scenes. Furthermore, many global methods have difficulty leveraging
subtle shape variations, especially with large parts of the shape missing from
the scene. At the other end of the spectrum, purely local descriptors, such as
surface curvature, are well-known for being unstable when faced with noisy data.
Regional point descriptors lie midway between the global and local approaches,
giving them the advantages of both. This is the approach that we follow in this
paper.
Methods which use regional point descriptors have proven successful in the
context of image-based recognition [17][15][2] as well as 3D recognition and sur-
face matching [22][13][5][21]. A regional point descriptor characterizes some prop-
erty of the scene in a local support region surrounding a basis point. In our case,
the descriptors characterize regional surface shape. Ideally, a descriptor should
be invariant to transformations of the target object (e.g., rotation and trans-
lation in 3D) and robust to noise and clutter. The descriptor for a basis point
located on the target object in the scene will, therefore, be similar to the de-
scriptor for the corresponding point on a model of the target object. These
model descriptors can be stored in a pre-computed database and accessed using
fast nearest-neighbor search methods such as locality-sensitive hashing [11]. The
limited support region of descriptors makes them robust to significant levels of
occlusion. Reliable recognition is made possible by combining the results from
multiple basis points distributed across the scene.

Recognizing Objects in Range Data 3
In this paper we make the following contributions: (1) we develop the 3D gen-
eralization of the 2D shape context descriptor, (2) we introduce the harmonic
shape context descriptor, (3) we systematically compare the performance of the
3D shape context, harmonic shape context, and spin images in recognizing sim-
ilar objects in scenes with noise or clutter. We also briefly examine the trade-off
of applying hashing techniques to speed search over a large set of objects.
The organization of the paper is as follows: in section 2, we introduce the
3D shape context and harmonic shape context descriptors and review the spin
image descriptor. Section 3 describes the representative descriptor method for
aggregating distances between point descriptors to give an overall matching score
between a query scene and model. Our data set is introduced in section 4, and
our experiments and results are presented in section 5. We finish in section 6
with a brief analysis of a method for speeding our matching process.
2 Descriptors
In this section, we provide the details of the new 3D shape context and harmonic
shape context descriptors and review the existing spin-image descriptor. All three
descriptors take as input a point cloud P and a basis point p, and capture the
regional shape of the scene at p using the distribution of points in a support
region surrounding p. The support region is discretized into bins, and a histogram
is formed by counting the number of points falling within each bin. For the 3D
shape contexts and spin-images, this histogram is used directly as the descriptor,
while with harmonic shape contexts, an additional transformation is applied.
When designing such a 3D descriptor, the first two decisions to be made
are (1) what is the shape of the support region and (2) how to map the bins
in 3D space to positions in the histogram vector. All three methods address
the second issue by aligning the support region’s “up” or north pole direction
with an estimate of the surface normal at the basis point, which leaves a degree
of freedom along the azimuth. Their differences arise from the shape of their
support region and how they remove this degree of freedom.
2.1 3D shape contexts
The 3D shape context is the straightforward extension of 2D shape contexts,
introduced by Belongie et al. [2], to three dimensions. The support region for a 3D
shape context is a sphere centered on the basis point p and its north pole oriented
with the surface normal estimate N for p (Fig. 2). The support region is divided
into bins by equally spaced boundaries in the azimuth and elevation dimensions
and logarithmically spaced boundaries along the radial dimension. We denote
the J + 1 radial divisions by R = {R
0
. . . R
J
}, the K + 1 elevation divisions by
Θ = {Θ
0
. . . Θ
K
}, and the L + 1 azimuth divisions by Φ = {Φ
0
. . . Φ
L
}. Each bin
corresponds to one element in the J × K × L feature vector. The first radius
division R
0
is the minimum radius r
min
, and R
J
is the maximum radius r
max
.
The radius boundaries are calculated as

4 Andrea Frome et al.
R
j
= exp
½
ln(r
min
) +
j
J
ln
µ
r
max
r
min
¶¾
. (1)
Fig. 2. Visualization of the
histogram bins of the 3D
shape context.
Sampling logarithmically makes the descriptor
more robust to distortions in shape with distance
from the basis point. Bins closer to the center are
smaller in all three spherical dimensions, so we use
a minimum radius (r
min
> 0) to avoid being overly
sensitive to small differences in shape very close
to the center. The Θ and Φ divisions are evenly
spaced along the 180
and 360
elevation and az-
imuth ranges.
Bin(j, k, l) accumulates a weighted count w(p
i
)
for each point p
i
whose spherical coordinates rela-
tive to p fall within the radius interval [R
j
, R
j+1
),
azimuth interval [Φ
k
, Φ
k+1
) and elevation interval
[Θ
l
, Θ
l+1
). The contribution to the bin count for
point p
i
is given by
w(p
i
) =
1
ρ
i
3
p
V (j, k, l)
(2)
where V (j, k, l) is the volume of the bin and ρ
i
is the local point density
around the bin. Normalizing by the bin volume compensates for the large varia-
tion in bin sizes with radius and elevation. We found empirically that using the
cube root of the volume retains significant discriminative power while leaving the
descriptor robust to noise which causes points to cross over bin boundaries. The
local point density ρ
i
is estimated as the count of points in a sphere of radius
δ around p
i
. This normalization accounts for variations in sampling density due
to the angle of the surface or distance to the scanner.
We have a degree of freedom in the azimuth direction that we must remove in
order to compare shape contexts calculated in different coordinate systems. To
account for this, we choose some direction to be Φ
0
in an initial shape context,
and then rotate the shape context about its north pole into L positions, such
that each Φ
l
division is located at the original 0
position in one of the rotations.
For descriptor data sets derived from our reference scans, L rotations for each
basis point are included, whereas in the query data sets, we include only one
position per basis point.
2.2 Harmonic shape contexts
To compute harmonic shape contexts, we begin with the histogram described
above for 3D shape contexts, but we use the bin values as samples to calculate
a spherical harmonic transformation for the shells and discard the original his-
togram. The descriptor is a vector of the amplitudes of the transformation, which

Recognizing Objects in Range Data 5
are rotationally invariant in the azimuth direction, thus removing the degree of
freedom.
Any real function f(θ, φ) can be expressed as a sum of complex spherical
harmonic basis functions Y
m
l
.
f(θ, φ) =
X
l=0
m=l
X
m=l
A
m
l
Y
m
l
(θ, φ) (3)
A key property of this harmonic transformation is that a rotation in the az-
imuthal direction results in a phase shift in the frequency domain, and hence
amplitudes of the harmonic coefficients kA
m
l
k are invariant to rotations in the
azimuth direction. We translate a 3D shape context into a harmonic shape con-
text by defining a function f
j
(θ, φ) based on the bins of the 3D shape context
in a single spherical shell R
j
R < R
j+1
as:
f
j
(θ, φ) = SC(j, k, l), θ
k
< θ θ
k+1
, φ
l
< φ φ
l+1
. (4)
As in [14], we choose a bandwidth b and store only b lowest-frequency com-
ponents of the harmonic representation in our descriptor, which is given by
HSC(l, m, k) = kA
m
l,k
k, l, m = 0 . . . b, r = 0 . . . K. For any real function, kA
m
l
k =
kA
m
l
k, so we drop the coefficients A
m
l
for m < 0. The dimensionality of the re-
sulting harmonic shape context is K ·b(b+1)/2. Note that the number of azimuth
and elevation divisions do not affect the dimensionality of the descriptor.
Harmonic shape contexts are related to the rotation-invariant shape descrip-
tors SH(f) described in [14]. One difference between those and the harmonic
shape contexts is that one SH(f ) descriptor is used to describe the global shape
of a single object. Also, the shape descriptor SH(f ) is a vector of length b
whose components are the energies of the function f in the b lowest frequen-
cies: SH
l
(f) = k
P
l
m=l
A
m
l
Y
m
l
k. In contrast, harmonic shape contexts retain
the amplitudes of the individual frequency components, and, as a result, are
more descriptive.
2.3 Spin Images
We compared the performance of both of these shape context-based descriptors
to spin images [13]. Spin-images are well-known 3D shape descriptors that have
proven useful for object recognition [13], classification [20], and modeling [10].
Although spin-images were originally defined for surfaces, the adaptation to
point clouds is straightforward. The support region of a spin image at a basis
point p is a cylinder of radius r
max
and height h centered on p with its axis
aligned with the surface normal at p. The support region is divided linearly into
J segments radially and K segments vertically, forming a set of J × K rings.
The spin-image for a basis point p is computed by counting the points that fall
within each ring, forming a 2D histogram. As with the other descriptors, the
contribution of each point q
i
is weighted by the inverse of that point’s density
estimate (ρ
i
); however, the bins are not weighted by volume. Summing within

Figures
Citations
More filters
Proceedings ArticleDOI

VoxNet: A 3D Convolutional Neural Network for real-time object recognition

TL;DR: VoxNet is proposed, an architecture to tackle the problem of robust object recognition by integrating a volumetric Occupancy Grid representation with a supervised 3D Convolutional Neural Network (3D CNN).
Proceedings ArticleDOI

Behavior recognition via sparse spatio-temporal features

TL;DR: It is shown that the direct 3D counterparts to commonly used 2D interest point detectors are inadequate, and an alternative is proposed, and a recognition algorithm based on spatio-temporally windowed data is devised.
Journal ArticleDOI

FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance

TL;DR: A probabilistic approach to the problem of recognizing places based on their appearance that can determine that a new observation comes from a previously unseen place, and so augment its map, and is particularly suitable for online loop closure detection in mobile robotics.
Book ChapterDOI

Unique signatures of histograms for local surface description

TL;DR: A novel comprehensive proposal for surface representation is formulated, which encompasses a new unique and repeatable local reference frame as well as a new 3D descriptor.
Journal ArticleDOI

Automatic reconstruction of as-built building information models from laser-scanned point clouds: A review of related techniques

TL;DR: This article surveys techniques developed in civil engineering and computer science that can be utilized to automate the process of creating as-built BIMs and outlines the main methods used by these algorithms for representing knowledge about shape, identity, and relationships.
References
More filters
Proceedings ArticleDOI

Object recognition from local scale-invariant features

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Journal ArticleDOI

A performance evaluation of local descriptors

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Journal ArticleDOI

Shape matching and object recognition using shape contexts

TL;DR: This paper presents work on computing shape models that are computationally fast and invariant basic transformations like translation, scaling and rotation, and proposes shape detection using a feature called shape context, which is descriptive of the shape of the object.
Proceedings ArticleDOI

Approximate nearest neighbors: towards removing the curse of dimensionality

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Proceedings ArticleDOI

A performance evaluation of local descriptors

TL;DR: It is observed that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT-based descriptors perform best and Moments and steerable filters show the best performance among the low dimensional descriptors.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "Recognizing objects in range data using regional point descriptors" ?

In this paper, the authors introduce two new regional shape descriptors: 3D shape contexts and harmonic shape contexts. The authors evaluate the performance of these descriptors on the task of recognizing vehicles in range scans of scenes using a database of 56 cars. The authors compare the two novel descriptors to an existing descriptor, the spin image, showing that the shape context based descriptors have a higher recognition rate on noisy scenes and that 3D shape contexts outperform the others on cluttered scenes. 

A benefit of the 3D shape context over the other two descriptors is that a point-to-point match gives a candidate orientation of the model in the scene which can be used to verify other point matches. 

When designing such a 3D descriptor, the first two decisions to be made are (1) what is the shape of the support region and (2) how to map the bins in 3D space to positions in the histogram vector. 

The support region of a spin image at a basis point p is a cylinder of radius rmax and height h centered on p with its axis aligned with the surface normal at p. 

The authors translate a 3D shape context into a harmonic shape context by defining a function fj(θ, φ) based on the bins of the 3D shape context in a single spherical shell Rj ≤ R < Rj+1 as:fj(θ, φ) = SC(j, k, l), θk < θ ≤ θk+1, φl < φ ≤ φl+1. 

To remove outliers caused by unlucky hash divisions, the authors included in the sum in equation (5) only the 80 smallest distances between RDs and the returned reference descriptors. 

high-speed range scanners (e.g., flash ladars) introduce significant noise in the range measurement, making it nearly impossible to manually identify objects. 

The problem is that in placing a hard vote, the authors discard the relative distances between descriptors which provide information about the quality of the matches. 

The method divides the highdimensional feature space where the descriptors lie into hypercubes, divided by a set of k randomly-chosen axis-parallel hyperplanes. 

In this section, the authors briefly explore the cost of using 3D shape contexts and discuss a way to bring the amount of computation required for a 3D shape context query closer to what is used for spin images while maintaining accuracy. 

The authors then sum the distances found for each qk, and call this the representative descriptor cost of matching Sq to Si:cost(Sq,Si) = ∑k∈{1,...,K}min m∈{1,...,M} dist(qk, pm) (5)The best match is the reference model S that minimizes this cost. 

Scoring matches solely on the representative descriptor costs can be thought of as a lower bound on an ideal cost measure that takes geometric constraints between points into account.