What are the contributions mentioned in the paper "Recognizing objects in range data using regional point descriptors" ?

In this paper, the authors introduce two new regional shape descriptors: 3D shape contexts and harmonic shape contexts. The authors evaluate the performance of these descriptors on the task of recognizing vehicles in range scans of scenes using a database of 56 cars. The authors compare the two novel descriptors to an existing descriptor, the spin image, showing that the shape context based descriptors have a higher recognition rate on noisy scenes and that 3D shape contexts outperform the others on cluttered scenes.

What is the advantage of the 3D shape context over the other two descriptors?

A benefit of the 3D shape context over the other two descriptors is that a point-to-point match gives a candidate orientation of the model in the scene which can be used to verify other point matches.

How do the authors translate a 3D shape context into a harmonic shape context?

The authors translate a 3D shape context into a harmonic shape context by defining a function fj(θ, φ) based on the bins of the 3D shape context in a single spherical shell Rj ≤ R < Rj+1 as:fj(θ, φ) = SC(j, k, l), θk < θ ≤ θk+1, φl < φ ≤ φl+1.

How many distances between RDs and the returned reference descriptors?

To remove outliers caused by unlucky hash divisions, the authors included in the sum in equation (5) only the 80 smallest distances between RDs and the returned reference descriptors.

What is the main reason for the noise in the range measurement?

high-speed range scanners (e.g., flash ladars) introduce significant noise in the range measurement, making it nearly impossible to manually identify objects.

What is the problem with placing a hard vote?

The problem is that in placing a hard vote, the authors discard the relative distances between descriptors which provide information about the quality of the matches.

What is the method used to divide the highdimensional feature space?

The method divides the highdimensional feature space where the descriptors lie into hypercubes, divided by a set of k randomly-chosen axis-parallel hyperplanes.

What is the cost of using 3D shape contexts?

In this section, the authors briefly explore the cost of using 3D shape contexts and discuss a way to bring the amount of computation required for a 3D shape context query closer to what is used for spin images while maintaining accuracy.

What is the match for the representative descriptor?

The authors then sum the distances found for each qk, and call this the representative descriptor cost of matching Sq to Si:cost(Sq,Si) = ∑k∈{1,...,K}min m∈{1,...,M} dist(qk, pm) (5)The best match is the reference model S that minimizes this cost.

What is the way to measure the cost of a descriptor?

Scoring matches solely on the representative descriptor costs can be thought of as a lower bound on an ideal cost measure that takes geometric constraints between points into account.

(Open Access) Recognizing Objects in Range Data Using Regional Point Descriptors (2004) | Andrea Frome

Q: What is the first decision to make when designing a 3D shape context?

When designing such a 3D descriptor, the first two decisions to be made are (1) what is the shape of the support region and (2) how to map the bins in 3D space to positions in the histogram vector.

Q: What is the support region of a spin image at a basis point p?

The support region of a spin image at a basis point p is a cylinder of radius rmax and height h centered on p with its axis aligned with the surface normal at p.

Q: How many distances between RDs and the returned reference descriptors?

To remove outliers caused by unlucky hash divisions, the authors included in the sum in equation (5) only the 80 smallest distances between RDs and the returned reference descriptors.

Q: What is the main reason for the noise in the range measurement?

high-speed range scanners (e.g., flash ladars) introduce significant noise in the range measurement, making it nearly impossible to manually identify objects.

Q: What is the problem with placing a hard vote?

The problem is that in placing a hard vote, the authors discard the relative distances between descriptors which provide information about the quality of the matches.

Q: What is the method used to divide the highdimensional feature space?

The method divides the highdimensional feature space where the descriptors lie into hypercubes, divided by a set of k randomly-chosen axis-parallel hyperplanes.

Q: What is the cost of using 3D shape contexts?

In this section, the authors briefly explore the cost of using 3D shape contexts and discuss a way to bring the amount of computation required for a 3D shape context query closer to what is used for spin images while maintaining accuracy.

Recognizing Objects in Range Data Using

Regional Point Descriptors

Andrea Frome

, Daniel Huber

Ravi Kolluri

, Thomas B¨ulow

, and Jitendra

Malik

University of California Berkeley, Berkeley CA 94530, USA,

{afrome,rkolluri,malik}@cs.berkeley.edu

thomas.buelow@philips.com

Carnegie Mellon University, Pittsburgh PA 15213, USA, dhuber@cs.cmu.edu

Abstract. Recognition of three dimensional (3D) objects in noisy and

cluttered scenes is a challenging problem in 3D computer vision. One

approach that has been successful in past research is the regional shape

descriptor. In this paper, we introduce two new regional shape descrip-

tors: 3D shape contexts and harmonic shape contexts. We evaluate the

performance of these descriptors on the task of recognizing vehicles in

range scans of scenes using a database of 56 cars. We compare the two

novel descriptors to an existing descriptor, the spin image, showing that

the shape context based descriptors have a higher recognition rate on

noisy scenes and that 3D shape contexts outperform the others on clut-

tered scenes.

1 Introduction

Recognition of three dimensional (3D) objects in noisy and cluttered scenes is a

challenging problem in 3D computer vision. Given a 3D point cloud produced by

a range scanner observing a 3D scene (Fig. 1), the goal is to identify objects in

the scene (in this case, vehicles) by comparing them to a set of candidate objects.

This problem is challenging for several reasons. First, in range scans, much of the

target object is obscured due to self-occlusion or is occluded by other objects.

Nearby objects can also act as background clutter, which can interfere with the

recognition process. Second, many classes of objects, for example the vehicles in

our experiments, are very similar in shape and size. Third, range scanners have

limited spatial resolution; the surface is only sampled at discrete points, and ﬁne

detail in the objects is usually lost or blurred. Finally, high-speed range scanners

(e.g., ﬂash ladars) introduce signiﬁcant noise in the range measurement, making

it nearly impossible to manually identify objects.

Object recognition in such a setting is interesting in its own right, but would

also be useful in applications such as scan registration [9][6] and robot local-

ization. The ability to recognize objects in 2 1/2-D images such as range scans

Current aﬃliation is with Philips Research Laboratories, Roentgenstrasse 24-26,

22335 Hamburg

2 Andrea Frome et al.

may also prove valuable in recognizing objects in 2D images when some depth

information can be inferred from cues such as shading or motion.

(a) (b)

Fig. 1. (a) An example of a cluttered scene containing trees, a house, the ground,

and a vehicle to be recognized. (b) A point cloud generated from a scan simulation of

the scene. Notice that the range shadow of the building occludes the front half of the

vehicle.

Many approaches to 3D object recognition have been put forth, including

generalized cylinders [3], superquadrics [7], geons [23], medial axis representa-

tions [1], skeletons [4], shape distributions [19], and spherical harmonic repre-

sentations of global shape [8]. Many of these methods require that the target be

segmented from the background, which makes them diﬃcult to apply to real-

world 3D scenes. Furthermore, many global methods have diﬃculty leveraging

subtle shape variations, especially with large parts of the shape missing from

the scene. At the other end of the spectrum, purely local descriptors, such as

surface curvature, are well-known for being unstable when faced with noisy data.

Regional point descriptors lie midway between the global and local approaches,

giving them the advantages of both. This is the approach that we follow in this

paper.

Methods which use regional point descriptors have proven successful in the

context of image-based recognition [17][15][2] as well as 3D recognition and sur-

face matching [22][13][5][21]. A regional point descriptor characterizes some prop-

erty of the scene in a local support region surrounding a basis point. In our case,

the descriptors characterize regional surface shape. Ideally, a descriptor should

be invariant to transformations of the target object (e.g., rotation and trans-

lation in 3D) and robust to noise and clutter. The descriptor for a basis point

located on the target object in the scene will, therefore, be similar to the de-

scriptor for the corresponding point on a model of the target object. These

model descriptors can be stored in a pre-computed database and accessed using

fast nearest-neighbor search methods such as locality-sensitive hashing [11]. The

limited support region of descriptors makes them robust to signiﬁcant levels of

occlusion. Reliable recognition is made possible by combining the results from

multiple basis points distributed across the scene.

Recognizing Objects in Range Data 3

In this paper we make the following contributions: (1) we develop the 3D gen-

eralization of the 2D shape context descriptor, (2) we introduce the harmonic

shape context descriptor, (3) we systematically compare the performance of the

3D shape context, harmonic shape context, and spin images in recognizing sim-

ilar objects in scenes with noise or clutter. We also brieﬂy examine the trade-oﬀ

of applying hashing techniques to speed search over a large set of objects.

The organization of the paper is as follows: in section 2, we introduce the

3D shape context and harmonic shape context descriptors and review the spin

image descriptor. Section 3 describes the representative descriptor method for

aggregating distances between point descriptors to give an overall matching score

between a query scene and model. Our data set is introduced in section 4, and

our experiments and results are presented in section 5. We ﬁnish in section 6

with a brief analysis of a method for speeding our matching process.

2 Descriptors

In this section, we provide the details of the new 3D shape context and harmonic

shape context descriptors and review the existing spin-image descriptor. All three

descriptors take as input a point cloud P and a basis point p, and capture the

regional shape of the scene at p using the distribution of points in a support

region surrounding p. The support region is discretized into bins, and a histogram

is formed by counting the number of points falling within each bin. For the 3D

shape contexts and spin-images, this histogram is used directly as the descriptor,

while with harmonic shape contexts, an additional transformation is applied.

When designing such a 3D descriptor, the ﬁrst two decisions to be made

are (1) what is the shape of the support region and (2) how to map the bins

in 3D space to positions in the histogram vector. All three methods address

the second issue by aligning the support region’s “up” or north pole direction

with an estimate of the surface normal at the basis point, which leaves a degree

of freedom along the azimuth. Their diﬀerences arise from the shape of their

support region and how they remove this degree of freedom.

2.1 3D shape contexts

The 3D shape context is the straightforward extension of 2D shape contexts,

introduced by Belongie et al. [2], to three dimensions. The support region for a 3D

shape context is a sphere centered on the basis point p and its north pole oriented

with the surface normal estimate N for p (Fig. 2). The support region is divided

into bins by equally spaced boundaries in the azimuth and elevation dimensions

and logarithmically spaced boundaries along the radial dimension. We denote

the J + 1 radial divisions by R = {R

. . . R

}, the K + 1 elevation divisions by

Θ = {Θ

. . . Θ

}, and the L + 1 azimuth divisions by Φ = {Φ

. . . Φ

}. Each bin

corresponds to one element in the J × K × L feature vector. The ﬁrst radius

division R

is the minimum radius r

min

, and R

is the maximum radius r

max

The radius boundaries are calculated as

4 Andrea Frome et al.

= exp

ln(r

min

) +

max

min

¶¾

. (1)

Fig. 2. Visualization of the

histogram bins of the 3D

shape context.

Sampling logarithmically makes the descriptor

more robust to distortions in shape with distance

from the basis point. Bins closer to the center are

smaller in all three spherical dimensions, so we use

a minimum radius (r

min

> 0) to avoid being overly

sensitive to small diﬀerences in shape very close

to the center. The Θ and Φ divisions are evenly

spaced along the 180



and 360



elevation and az-

imuth ranges.

Bin(j, k, l) accumulates a weighted count w(p

)

for each point p

whose spherical coordinates rela-

tive to p fall within the radius interval [R

, R

j+1

azimuth interval [Φ

, Φ

k+1

) and elevation interval

[Θ

, Θ

l+1

). The contribution to the bin count for

point p

is given by

w(p

) =

V (j, k, l)

(2)

where V (j, k, l) is the volume of the bin and ρ

is the local point density

around the bin. Normalizing by the bin volume compensates for the large varia-

tion in bin sizes with radius and elevation. We found empirically that using the

cube root of the volume retains signiﬁcant discriminative power while leaving the

descriptor robust to noise which causes points to cross over bin boundaries. The

local point density ρ

is estimated as the count of points in a sphere of radius

δ around p

. This normalization accounts for variations in sampling density due

to the angle of the surface or distance to the scanner.

We have a degree of freedom in the azimuth direction that we must remove in

order to compare shape contexts calculated in diﬀerent coordinate systems. To

account for this, we choose some direction to be Φ

in an initial shape context,

and then rotate the shape context about its north pole into L positions, such

that each Φ

division is located at the original 0



position in one of the rotations.

For descriptor data sets derived from our reference scans, L rotations for each

basis point are included, whereas in the query data sets, we include only one

position per basis point.

2.2 Harmonic shape contexts

To compute harmonic shape contexts, we begin with the histogram described

above for 3D shape contexts, but we use the bin values as samples to calculate

a spherical harmonic transformation for the shells and discard the original his-

togram. The descriptor is a vector of the amplitudes of the transformation, which

Recognizing Objects in Range Data 5

are rotationally invariant in the azimuth direction, thus removing the degree of

freedom.

Any real function f(θ, φ) can be expressed as a sum of complex spherical

harmonic basis functions Y

f(θ, φ) =

∞

l=0

m=l

m=−l

(θ, φ) (3)

A key property of this harmonic transformation is that a rotation in the az-

imuthal direction results in a phase shift in the frequency domain, and hence

amplitudes of the harmonic coeﬃcients kA

k are invariant to rotations in the

azimuth direction. We translate a 3D shape context into a harmonic shape con-

text by deﬁning a function f

(θ, φ) based on the bins of the 3D shape context

in a single spherical shell R

≤ R < R

j+1

as:

(θ, φ) = SC(j, k, l), θ

< θ ≤ θ

k+1

, φ

< φ ≤ φ

l+1

. (4)

As in [14], we choose a bandwidth b and store only b lowest-frequency com-

ponents of the harmonic representation in our descriptor, which is given by

HSC(l, m, k) = kA

l,k

k, l, m = 0 . . . b, r = 0 . . . K. For any real function, kA

k =

−m

k, so we drop the coeﬃcients A

for m < 0. The dimensionality of the re-

sulting harmonic shape context is K ·b(b+1)/2. Note that the number of azimuth

and elevation divisions do not aﬀect the dimensionality of the descriptor.

Harmonic shape contexts are related to the rotation-invariant shape descrip-

tors SH(f) described in [14]. One diﬀerence between those and the harmonic

shape contexts is that one SH(f ) descriptor is used to describe the global shape

of a single object. Also, the shape descriptor SH(f ) is a vector of length b

whose components are the energies of the function f in the b lowest frequen-

cies: SH

(f) = k

m=−l

k. In contrast, harmonic shape contexts retain

the amplitudes of the individual frequency components, and, as a result, are

more descriptive.

2.3 Spin Images

We compared the performance of both of these shape context-based descriptors

to spin images [13]. Spin-images are well-known 3D shape descriptors that have

proven useful for object recognition [13], classiﬁcation [20], and modeling [10].

Although spin-images were originally deﬁned for surfaces, the adaptation to

point clouds is straightforward. The support region of a spin image at a basis

point p is a cylinder of radius r

max

and height h centered on p with its axis

aligned with the surface normal at p. The support region is divided linearly into

J segments radially and K segments vertically, forming a set of J × K rings.

The spin-image for a basis point p is computed by counting the points that fall

within each ring, forming a 2D histogram. As with the other descriptors, the

contribution of each point q

is weighted by the inverse of that point’s density

estimate (ρ

); however, the bins are not weighted by volume. Summing within

Recognizing Objects in Range Data Using Regional Point Descriptors

Figures

Citations

VoxNet: A 3D Convolutional Neural Network for real-time object recognition

Behavior recognition via sparse spatio-temporal features

FAB-MAP: Probabilistic Localization and Mapping in the Space of Appearance

Unique signatures of histograms for local surface description

Automatic reconstruction of as-built building information models from laser-scanned point clouds: A review of related techniques

References

Object recognition from local scale-invariant features

A performance evaluation of local descriptors

Shape matching and object recognition using shape contexts

Approximate nearest neighbors: towards removing the curse of dimensionality

A performance evaluation of local descriptors

Related Papers (5)

Using spin images for efficient object recognition in cluttered 3D scenes

Fast Point Feature Histograms (FPFH) for 3D registration

Unique signatures of histograms for local surface description

Distinctive Image Features from Scale-Invariant Keypoints

A method for registration of 3-D shapes

Frequently Asked Questions (12)

Q1. What are the contributions mentioned in the paper "Recognizing objects in range data using regional point descriptors" ?

Q2. What is the advantage of the 3D shape context over the other two descriptors?

Q3. What is the first decision to make when designing a 3D shape context?

Q4. What is the support region of a spin image at a basis point p?

Q5. How do the authors translate a 3D shape context into a harmonic shape context?

Q6. How many distances between RDs and the returned reference descriptors?

Q7. What is the main reason for the noise in the range measurement?

Q8. What is the problem with placing a hard vote?

Q9. What is the method used to divide the highdimensional feature space?

Q10. What is the cost of using 3D shape contexts?

Q11. What is the match for the representative descriptor?

Q12. What is the way to measure the cost of a descriptor?