scispace - formally typeset

Proceedings ArticleDOI

3D object recognition using spin-images for a humanoid stereoscopic vision system

01 Oct 2006-pp 2955-2960

TL;DR: A method for quickly computing multi-resolution and interpolating spin-images for a humanoid robot having a stereoscopic vision system and the results on simulation and on real data show the effectiveness of this method.

AbstractThis paper presents a 3D object recognition method based on spin-images for a humanoid robot having a stereoscopic vision system. Spin-images have been proposed to search CAD models database, and use 3D range informations. In this context, the use of a vision system is taken into account through a multi-resolution approach. A method for quickly computing multi-resolution and interpolating spin-images is proposed. The results on simulation and on real data are given, and show the effectiveness of this method.

Summary (2 min read)

Introduction

  • Moreover if the information is precise enough, it can also be used for grasping behaviour.
  • Recent works on 3D object model building make possible a description based on geometrical features.
  • This behaviour consists in two majors steps: first building an internal representation of an object unknown to the robot, second finding this object in an unknown environment.

B. Normal computation

  • When computing spin-images, the normal computation should be as less sensitive as possible to noise.
  • This is specially important for vision based informations where the noise might be significant.
  • Using the Stanford Bunny model, and adding a Gaussian noise of 20 percent from the average adjacent edge, the most stable method found was the gravity center of the polynoms formed by the neighbours of each point.

C. Spin-image filling

  • Regarding the spin-image filling, Johnson propose two ways: either using a direct accumulation, or a bilinear interpolation.
  • This makes the spin-image sensitive to noise.
  • To solve this problem, a bilinear interpolation allows to smooth the effect of noise by sharing the density information among the 4 points connected to the surface.
  • One of the most important feature needed in their case, is the possibility to perceive the object at different distances, and thus at different resolutions.

A. Computing resolution of an object

  • The resolution of the perceived object depends upon the stereoscopic system capabilities, the distance between the robot and the object, and the possible sub-sampling scheme during image processing.
  • This error may also be induced by the segmentation used to match two points in the right and the left images, in their case a correlation.
  • Those volumes are the intersection of the cones representing the surfaces on the image planes.
  • They can be interpreted also as the location error of a 3D point. [8] and [9] proposed an ellipsoid based approximation of this volume, while [10] proposed a warranted bounding box using interval analysis.
  • Thus in order to extract a global resolution from the scene, the average edge’s length Lscene is also used.

B. Multi-resolution signature

  • The dyadic scheme consists in dividing by 2 each dimension of the spin image between two resolutions.
  • The main question is how to share the information carried by the points which will disappear.
  • One can notice that the same quadrant may have several notations depending of the reference point used.
  • It should be stress here that in their current implementation, only the spin-images are submit to a multi-resolution scheme.

A. Selection of the best resolution

  • From section III, the object resolution is the average edge’s length in the scene.
  • Then the resolution for the model’s spin-images is chosen according to Eq. 1.
  • Two spin-images (p,q) with the same resolution are compared using the following correlation function as proposed in [3]: R = N.∑Ni=0 pi.qi − ∑Ni=0 pi .
  • Thus during the multiresolution phase the spin-images are not normalised.

B. Rigid transformation evaluation

  • The main rigid transformation is obtained as follows:.
  • Some points are randomly selected in the scene.
  • Their corresponding points in the model are searched by comparing their spin-image to all the model’s spin-images as depicted in Fig.
  • If e is the real rigid transformation, then it should project the maximum number of points from the scene to the model.

C. Final correlation coefficient

  • On order to verify the main rigid transformation, points of the model are chosen randomly and verified against the scene using the proposed main rigid transformation.
  • The main correlation coefficient is the average of the 80 % best correlation coefficients.

A. Simulation

  • The previously described algorithm was tested on different situations to check its efficiency.
  • First, a Stanford Bunny spin-image was tested against a spin-image of the dinosaur represented in Fig.
  • The third case intends to simulate a single view of the complete 3D model, and the subsequent self-occlusion as shown in Fig. 10.
  • The associated rigid transformation has no rotation and no translation.
  • The resulting main correlation coefficient was 0.22.

B. OpenHRP[11] simulator

  • The HRP-2 humanoid robot is simulated inside a house environment.
  • The goal of this simulation was to try to cope with different objects present in the scene.
  • The Stanford Bunny is above a table, behind chairs, and several objects are presents in the background, as depicted in Fig. 11 and Fig.
  • Using the previously described scheme, the model is found with a correlation coefficient close to 0.99.
  • One can conclude that the other objects in the scene does not decrease the efficiency of the search.the authors.

C. Real data

  • The HRP-2 humanoid robot is equipped with a trinoptic vision system.
  • Using a correlation method to match points between the left image and the right image, clouds of 3D points are computed using epipolar geometry.
  • The object used for this test is a box of cookies depicted in Fig. 12.(b).

D. Computation time

  • To build the Stanford Bunny model, it takes 6 minutes and 24 seconds for 34834 points.
  • The recognition process takes 32 seconds for a scene, using 100 spin images to compute the rigid transformation.
  • Two kinds of improvement are possible: using a compression scheme such as the Principal Component Analysis as proposed in [3], or a Wavelet based approach such as WaveMesh [13].
  • Y. Sumi, Y. Kawai, T.Yoshimi, and T. Tomita, “3d object recognition in cluttered environments by segment-based stereo vision,” International Journal of Computer Vision, vol. 6, January 2002. [13].

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

HAL Id: hal-01117854
https://hal.archives-ouvertes.fr/hal-01117854
Submitted on 18 Feb 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
3D object recognition using spin-images for a humanoid
stereoscopic vision system
Olivier Stasse, Sylvain Dupitier, Kazuhito yokoi
To cite this version:
Olivier Stasse, Sylvain Dupitier, Kazuhito yokoi. 3D object recognition using spin-images for a
humanoid stereoscopic vision system. IEEE/RSJ International Conferenece on Intelligent Robots
and Systems (IROS), Oct 2006, Beijing, China. pp.2955 - 2960, �10.1109/IROS.2006.282151�. �hal-
01117854�

3D object recognition using spin-images for a humanoid
stereoscopic vision system.
Olivier Stasse, Sylvain Dupitier and Kazuhito Yokoi
AIST/IS-CNRS/STIC Joint Japanese-French Robotics Laboratory (JRL)
Intelligent Systems Research Institute (IS),
National Institute of Advanced Industrial Science and Technology (AIST)
AIST Central 2, Umezono 1-1-1, Tsukuba, Ibaraki, 305-8568 Japan
{olivier.stasse,kazuhito.yokoi}@aist.go.jp
Abstract This paper presents a 3D object recognition
method based on spin-images for a humanoid robot having a
stereoscopic vision system. Spin-images have been proposed to
search CAD models database, and use 3D range informations.
In this context, the use of a vision system is taken into account
through a multi-resolution approach. A method for quickly
computing multi-resolution and interpolating spin-images is
proposed. The results on simulation and on real data are
given, and show the effectiveness of this method.
Index Terms Spin-images, multi-resolution, 3D recogni-
tion, humanoid robot.
I. INTRODUCTION
Efficient real-time tracking exists for collections of
2D views [1] [2]. However in a humanoid context, 3D
geometrical information is important because the high
redundancy of such robot allows several kinds of 3D
postures. Moreover if the information is precise enough,
it can also be used for grasping behaviour. Recent works
on 3D object model building make possible a description
based on geometrical features. Towards the design of a
search engine for databases of CAD models, several 3D
descriptors have been proposed to build signatures of 3D
objects [3], [4], [5]. The recognition process proposed here
is based on spin-images proposed initially by [3]. The main
difference in the conventional work and this one lies on
the targeted application and a search scheme based on
multi-resolution spin images. Moreover the computation
of the multi-resolution scheme is refined and allows a fast
implementation.
The targeted application us a “Treasure hunting” be-
haviour on a HRP-2 humanoid robot [6]. This behaviour
consists in two majors steps: first building an internal repre-
sentation of an object unknown to the robot, second finding
this object in an unknown environment. This behaviour is
useful for a robot used in an industrial environment, or as
an aid for elderly person. It may incrementally build its
knowledge of its surrounding environment and the object
it has to manipulate without any a-priori models. The time
constraint is crucial, as a reasonable limit has to be set
on the time an end user can wait the robot to achieve its
mission. Finally the method to cope with the widest set of
objects should rely on a limited set of assumptions.
The reminder of this paper is as follow: in section II
the computation of spin images are introduced, section
III details how the multi-resolution signature of objects
is computed, section IV details the s earch process, finally
section V presents the simulation and the experiments
realized with the presented algorithm.
Discrete spin image
3D Mesh
α
β
α
β
P
Fig. 1. Example of spin image computation.
II. SPIN IMAGES
A. Description
A spin-image can be seen as an image representing the
distribution of the object’s density view from a particular
point [3]. More precisely, it is assume that all the 3D
data are given as a mesh Mesh = V, E where V are the
vertices and E the edges. Let’s consider a vertex P V .
The spin image axis are the normal to the point P, and
a perpendicular vector to this normal. The former one is
called β, and the latter one α. The support region of a spin-
image is a cylinder centred on P, and aligned around its
normal. From this, each point of the model is assigned to
a ring with a height along β, and a radius along α.An
example of spin-images for a dinosaur model is given in
Fig. 1.
They are two parameters of importance while using
the spin-images: the size of the rings (δα,δβ), and the
boundaries of the spin-image (α
max
,β
max
). The size of the
rings is similar to a resolution parameter. The limitation
(α
max
,β
max
) allows to impose constraints between the
points chosen for computing the spin-image P and other
points of t he model P
. This is particularly meaningful to
take into account occlusion problem. In our implementa-
tion, two points should have less than 90 degrees between
their normals. A greater value would implies that P
is
occluded by some other points while P is facing the camera.
B. Normal computation
When computing spin-images, the normal computation
should be as less sensitive as possible to noise. This is

specially important for vision based informations where
the noise might be significant. Following the tests done
in [7], 8 methods have been tested: gravity center of the
polynoms formed by neighbours of each point; inertia
matrix; normal average of each face; normal average of
faces formed by neighbour points only; normal average
weighted by angle; normal average weighted by sine and
edge length reciprocal; normal average weighted by areas
of adjacent triangles; normal average weighted by edge
length reciprocals; normal average weighted by square
root of edge length reciprocals. Using the Stanford Bunny
model, and adding a Gaussian noise of 20 percent from
the average adjacent edge, the most stable method found
was the gravity center of the polynoms formed by the
neighbours of each point.
M
a
b
M
ab(1−a)b
a(1−b)(1−a)(1−b)
(
,
)(
,
)
(α,β)
(α,β)
δβ
δα
α
i
β
j
α
i
β
j
Direct image filling Bilinear image filling
Fig. 2. Two ways to fill a spin-image: (a) d irect way (b) bilinear
interpolation.
C. Spin-image filling
Regarding the spin-image filling, Johnson propose two
ways: either using a direct accumulation, or a bilinear
interpolation. Those two methods are depicted in Fig.
2. M is the projection of a point P
V . The first
solution relates M =(α,β) in surface (α
i
,β
j
)-(α
i+1
,β
j
)-
(α
i+1
,β
j+1
)-(α
i
,β
j+1
) to the point (α
i
,β
j
) regardless its
position in the surface. This makes the spin-image sensitive
to noise. Indeed if M is close to a boundary, it will involves
important discrete modification. To solve this problem, a
bilinear interpolation allows to smooth the effect of noise
by sharing the density information among the 4 points
connected to the surface. This is achieved by computing the
distance of M to those 4 points, using two parameters (a,b)
as depicted in Fig. 2. If the points are processed iteratively
in the following {0, 1,...,k,k + 1,...|V |−1}, then densities
are updated as follows:
W
i, j
(k + 1)=W
i, j
(k)+(1 a)(1 b)
W
i+1, j
(k + 1)=W
i, j
(k)+a(1 b)
W
i, j+1
(k + 1)=W
i, j
(k)+(1 a)b
W
i+1, j+1
(k + 1)=W
i, j
(k)+ab
where a =(α α
i
)/δα and b =(β β
j
)/δβ. It is straight-
forward to check that for a point M the sum of each
contribution is one. In the remainder of this paper, f or sake
of clarity the iteration number is implicit.
III. M
ULTI-RESOLUTION
One of the most important feature needed in our case, is
the possibility to perceive the object at different distances,
and thus at different resolutions. This implies to build a
multi-resolution signature of the object, and to be able to
compute the resolution at which the object has been per-
ceived. In the following, the finest spin-image SI
r
max
has the
highest resolution which correspond to (
δα
2
r
max
,
δβ
2
r
max
), while
the spin-image SI
k
has a resolution (
δα
2
k
,
δβ
2
k
)=(δα
k
,δβ
k
).
A. Computing resolution of an object
Image
Right
Gaussian
model
Interval analysis
model
d
Left
Image
Optical center
Optical center
Fig. 3. Model induces by the surface nature of the pixels.
The resolution of the perceived object depends upon the
stereoscopic system capabilities, the distance between the
robot and the object, and the possible sub-sampling scheme
during image processing. This error may also be induced
by the segmentation used to match two points in the right
and the left images, in our case a correlation. If the pixel is
considered as a surface on the image plane, the stereoscopic
vision system may be seen as a sensor which perceive
3D volumes. Those volumes are t he intersection of the
cones representing the surfaces on the image planes. A 2D
representation is given in Fig. 3. They can be interpreted
also as the l ocation error of a 3D point. [8] and [9] proposed
an ellipsoid based approximation of this volume, while [10]
proposed a warranted bounding box using interval analysis.
Both technics show the non-linearity of the uncertainty
related to the reconstruction of a 3D point. However from
those previous work, it is clear that the error estimation,
and here the resolution, may be different for different
parts of the object. While computing the signature, the
resolution of the model is given by the average edge’s
length L
model
=
1
|E|
eE
||E|| of its corresponding data. The
number of multiple resolution m pictures can be deduced
from the following relationship: B
model
=
L
model
2
m
where
B
model
= min{X
max
,Y
max
,Z
max
} and {X
max
,Y
max
,Z
max
} is the
bounding box englobing the model. Thus in order to extract
a global resolution from the scene, the average edge’s
length L
scene
is also used. The resolution r is chosen in
the signature such as:
min{r N|L
scene
< 2
r
L
model
|} (1)
B. Multi-resolution signature
The dyadic scheme consists in dividing by 2 each di-
mension of the spin image between two resolutions. Using

the direct filling way, it is possible to compute, from the
resolution r to r + 1, the density of a point M =(i, j) in
SI
r
by:
W
r
(i, j)
= W
r+1
(2 j,2 j)
+W
r+1
(2i+1,2 j)
+W
r+1
(2i,2 j+1)
+W
r+1
(2i+1,2 j+1)
Using the bilinear interpolated image, the relationship
between W
r
and W
r+1
is not so obvious. In Fig. 4, the
points from resolution r and r + 1 are depicted. Our goal
is to find a relationship between the density W
r
(i, j)
and
the densities W
r+1
(2i+k,2 j+l)
for k ∈{2,1, 0,1,2} and l
{−2,1, 0,1,2}. The main question is how to share the
information carried by the points which will disappear. In
Fig. 4 let’s consider N
4
. As this point is not present in
resolution r + 1, its contribution has to be redistributed to
the four adjacent points remaining at resolution r. However
as the density of a point M depends upon its distance,
if M was in Q
r
(i, j)
0,2
= Q
r+1
(2i1,2 j1)
2
, then its contribution
has already been partially taken into account by N
r+1
(2i,2 j)
,
but not by N
r+1
(2i,2 j2)
, N
r+1
(2i2,2 j2)
, and N
r+1
(2i2,2 j)
. For this
three points, an offset of (
δα
2
r
,
δβ
2
r
) has to be introduce while
processing N
r
(i, j)
.
We note Q
r
(i, j)
the surface described by the points
N
r
(i1, j1)
,N
r
(i+1, j1)
,N
r
(i+1, j+1)
,N
r
(i1, j+1)
. This surface can
be cut in four quadrants Q
r
(i, j)
l
l ∈{0,1,2, 3} as depicted
in Fig. 4. For convenience, and following those notations,
those quadrants may also be divided by four and will
be noted Q
r
(i, j)
l,k
k ∈{0,1, 2,3}. One can notice that the
same quadrant may have several notations depending of
the reference point used. For instance Q
r
(i, j)
2
= Q
r
(i+1, j+1)
0
,
or Q
r
(i, j)
0,2
= Q
r+1
(2i1,2 j1)
2
.
The notation used for the variables (a,b) is now extended
as they change according to the resolution. a(M,N
r
(i, j)
) is
the distance along α from N
r
(i, j)
to M. b(M,N
r
(i, j)
) is the
same along β. The relationship between those variables
from one resolution to the next one is summarised in Tab.
I.
TAB LE I
C
OEFFICIENTS FOR COMPUTING THE MULTI-RESOLUTION BILINEAR
INTERPOLATION
Areas Distances
Q
r
(i, j)
0
a(M,N
r
(i, j)
)=a(M,N
r+1
(2i,2 j)
) b (M,N
r
(i, j)
)=b(M,N
r+1
(2i,2 j)
)
Q
r
(i, j)
1
a(M,N
r
(i, j)
)=a(M,N
r+1
(2i+1,2 j)
)+
δα
2
r+1
b(M,N
r
(i, j)
)=b(M,N
r+1
(2i+1,2 j)
)
Q
r
(i, j)
2
a(M,N
r
(i, j)
)=a(M,N
r+1
(2i+1,2 j+1)
)+
δα
2
r+1
b(M,N
r
(i, j)
)=b(M,N
r+1
(2i+1,2 j+1)
)+
δβ
2
r+1
Q
r
(i, j)
3
a(M,N
r
(i, j)
)=a(M,N
r+1
(2i,2 j+1)
)
b(M,N
r
(i, j)
)=b(M,N
r+1
(2i,2 j+1)
)+
δβ
2
r+1
Lemma: Let’s note W
r
(i, j)
(Q) the contribution of the
quadrant Q for the density at point (i, j) of a spin image
having a resolution r filled by bilinear interpolation. If
N
m
∈{N
r+1
(2i+k,2 j+l)
} for k ∈{0,1,2} and l ∈{0,1,2}, and
0000000000000000
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
000000000000000
0
1111111111111111
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
111111111111111
1
(i,j)
(i,j)
(i,j)
(i,j)
Point at resolution r+1
Point at resolution r
N
r
N
r
N
r
N
r
r
Q
0
Q
1
r
Q
3
r
Q
2
r
(i,j+1)
(i+1,j+1)
(i+1,j)(i,j)
X size at resolution r+1
X size at resolution r
Y size at
resolution r
resolution r+1
Y size at
Q
Q
Q
Q
rr
rr
0,3
0,2
0,0
0,1
3
N
N
6
2
N
0
N
r+1
N
(2i+1 , 2j+1)
r+1
N
(2i , 2j+1)
N
5
N
8
1
N
N
7
4
N
r+1r+1
NN
(2i+1 , 2j)(2i , 2j)
Fig. 4. Computing bilinear interpolated spin-images from one resolution
to the other.
m = 3k + l, then we have:
W
r
(i, j)
(Q
r
(i, j)
2
)=
3
n=0
8
m=0
(1
a
N
m
δα
r
)(1
b
N
m
δβ
r
)W
r+1
N
m
(Q
r
(i, j)
2,n
)
W
r
(i+1, j)
(Q
r
(i+1, j)
3
)=
3
n=0
8
m=0
a
N
m
δα
r
(1
b
N
m
δβ
r
)W
r+1
N
m
(Q
r
(i, j)
3,n
)
W
r
(i, j+1)
(Q
r
(i, j+1)
1
)=
3
n=0
8
m=0
(1
a
N
m
δα
r
)
b
N
m
δβ
r
W
r+1
N
m
(Q
r
(i, j)
1,n
)
W
r
(i+1, j+1)
(Q
r
(i, j)
0
)=
3
n=0
8
m=0
a
N
m
δα
r
b
N
m
δβ
r
W
r+1
N
m
(Q
r
(i, j)
0,n
)
(2)
with a
N
m
= a(M
m
,N
r
(i, j)
), b
N
m
= b(N
m
,N
r
(i, j)
), and W
r+1
N
m
=
W
r+1
N
r+1
(2i+k,2 j+l)
. Finally
W
r
(i, j)
=
3
n=0
W
r
(i, j)
(Q
r
(i, j)
n
) (3)
Proof: We give here a partial proof to illustrate the
general concept. Lets consider the point M Q
r
(i, j)
2,2
=
Q
r+1
(2i+1,2 j+1)
2
= Q
r+1
N
4
2
at resolution r + 1. The points N
4
,
N
5
and N
7
of the spin images mesh are considered. The
contribution provided by M to each of those points is
computed as follows:
W
r+1
N
4
(Q
r+1
N
4
2
)=
MQ
r+1
N
4
2
a(M,N
4
)
δα
r+1
(1
b(M,N
4
)
δβ
r+1
)
W
r+1
N
5
(Q
r+1
N
4
2
)=
MQ
r+1
N
4
2
(1
a(M,N
4
)
δα
r+1
)
b(M,N
4
)
δβ
r+1
W
r+1
N
7
(Q
r+1
N
4
2
)=
MQ
r+1
N
4
2
(1
a(M,N
4
)
δα
r+1
)(1
b(M,N
4
)
δβ
r+1
)
W
r+1
N
8
(Q
r+1
N
4
2
)=
MQ
r+1
N
4
2
a(M,N
4
)
δα
r+1
b(M,N
4
)
δβ
r+1

Now the same point M Q
r+1
N
4
2
at resolution r can be
computed through bilinear interpolation filling. This may
be written for N
r
(i, j)
:
W
r
(i, j)
(Q
r+1
N
4
2
)=
MQ
r+1
N
4
2
(1
a(M,N
r
(i, j)
)
δα
r
)
(1
b(M,N
r
(i, j)
)
δβ
r
)
(4)
From Tab. I, and having 2δα
r+1
= δα
r
Eq. 4 can be
rewritten:
W
r
(i, j)
(Q
r+1
N
4
2
)=W
r
(i, j)
(Q
r
(i, j)
2,2
)=
=
MQ
r+1
N
4
2
(1
a(M,N
4
)+δα
r+1
2δα
r+1
)
(1
b(M,N
4
)+δα
r+1
2δβ
r+1
)
=
MQ
r+1
N
4
2
1
2
(1
a(M,N
4
)
δα
r+1
)
1
2
(1
b(M,N
4
)
δβ
r+1
)=
1
4
W
r+1
N
4
(Q
r+1
2
)
(5)
Using the same arguments, we can find:
W
r
(i, j)
(Q
r
(i, j)
2,0
)=W
r+1
N
0
(Q
r
(i, j)
2,0
)+
1
2
W
r+1
N
1
(Q
r
(i, j)
2,0
)
+
1
2
W
r+1
N
3
(Q
r
(i, j)
2,0
)+
1
4
W
r+1
N
4
(Q
r
(i, j)
2,0
)
W
r
(i, j)
(Q
r
(i, j)
2,1
)=
1
2
W
r+1
N
1
(Q
r
(i, j)
2,1
)+
1
4
W
r+1
N
4
(Q
r
(i, j)
2,1
)
W
r
(i, j)
(Q
r
(i, j)
2,3
)=
1
2
W
r+1
N
3
(Q
r
(i, j)
2,3
)+
1
4
W
r+1
N
4
(Q
r
(i, j)
2,3
)
(6)
Thus
W
r
(i, j)
(Q
r
(i, j)
2
)=
3
n=0
W
r
(i, j)
(Q
r
(i, j)
2,n
)
= W
r+1
N
0
(Q
r
(i, j)
2,0
)+
1
2
W
r+1
N
1
(Q
r
(i, j)
2,0
)
+
1
2
W
r+1
N
3
(Q
r
(i, j)
2,0
)+
1
4
W
r+1
N
4
(Q
r
(i, j)
2,0
)
+
1
2
W
r+1
N
1
(Q
r
(i, j)
2,1
)+
1
4
W
r+1
N
4
(Q
r
(i, j)
2,1
)
+
1
2
W
r+1
N
3
(Q
r
(i, j)
2,3
)+
1
4
W
r+1
N
4
(Q
r
(i, j)
2,3
)
=
3
n=0
8
m=0
(1
a
N
m
δα
r
)(1
b
N
m
δβ
r
)W
r+1
N
m
(Q
r
(i, j)
2,n
)
(7)
The same arguments holds for the other points and
proof the lemma .
The multi-resolution computation of the spin images
is done first by computing the most precise spin-image
through examination of every points. For each point of the
spin image, four densities corresponding to each quadrant
are stored. For lower resolution images, the density is
computed using the position of the point regarding the
quadrant considered and Eq. 2.
It should be stress here t hat in our current implementa-
tion, only the spin-images are submit to a multi-resolution
scheme. In this first step, no sub-sampling of the mesh has
been applied. Thus if the size of the spin-images decrease
in this process, the number of points does not.
IV. S
EARCH PROCESS
Simulator scene being analysed
Simulator scene
Fig. 5. A 3D mesh extracted from the Stanford Bunny flying in the
OpenHRP simulator. The scene is cut according to the bounding box
model.
The search process described here is based on a 3D
mesh. This can be either a single view of the environment
or an incrementally build representation. In our current
implementation, it is a single view provided by the stereo-
scopic system. In the following, it is called the scene.
The scene is divided in sub-blocks. The sub-block size
is given by the bounding box of the searched object as
depicted in Fig. 5. On each of the sub-block the following
algorithm is applied:
1) Select the best resolution according to the average
edge-length;
2) Get the main rigid transformation which project the
model into the scene;
3) Check if if the model is in the s cene using the previ-
ously computed rigid-transformation. This provides
a main correlation coefficient, and the position plus
orientation in the scene of the seen object.
A. Selection of the best resolution
From section III, the object resolution is the average
edge’s length in the scene. Then the resolution for the
model’s spin-images is chosen according to Eq. 1. Two
spin-images (p,q) with the same resolution are compared
using the following correlation function as proposed in [3]:
R =
N.
N
i=0
p
i
.q
i
N
i=0
p
i
.
N
i=0
q
i
N.
N
i=0
p
2
i
N
i=0
p
i
2
.
N.
N
i=0
q
2
i
N
i=0
q
i
2
R [1; 1]
(8)
with N the number of non-empty points in spin-image of
the scene. This correlation can be proven to be independent
to the normalisation of a spin-image. Thus during the multi-
resolution phase the spin-images are not normalised.
B. Rigid transformation evaluation
The main rigid transformation is obtained as follows:
Some points are randomly selected in the scene. Their cor-
responding points in the model are searched by comparing

Citations
More filters

Journal ArticleDOI
TL;DR: The paper describes how real-time — or high-bandwidth — cognitive processes can be obtained by combining vision with walking and the central point of the methodology is to use appropriate models to reduce the complexity of the search space.
Abstract: Aiming at building versatile humanoid systems, we present in this paper the real-time implementation of behaviors which integrate walking and vision to achieve general functionalities. The paper describes how real-time — or high-bandwidth — cognitive processes can be obtained by combining vision with walking. The central point of our methodology is to use appropriate models to reduce the complexity of the search space. We will describe the models introduced in the different blocks of the system and their relationships: walking pattern, self-localization and map building, real-time reactive vision behaviors, and planning.

28 citations


Journal ArticleDOI
TL;DR: A systematic literature review concerning 3D object recognition and classification published between 2006 and 2016 is presented, using the methodology for systematic review proposed by Kitchenham.
Abstract: In this paper, we present a systematic literature review concerning 3D object recognition and classification. We cover articles published between 2006 and 2016 available in three scientific databases (ScienceDirect, IEEE Xplore and ACM), using the methodology for systematic review proposed by Kitchenham. Based on this methodology, we used tags and exclusion criteria to select papers about the topic under study. After the works selection, we applied a categorization process aiming to group similar object representation types, analyzing the steps applied for object recognition, the tests and evaluation performed and the databases used. Lastly, we compressed all the obtained information in a general overview and presented future prospects for the area.

26 citations


Cites methods from "3D object recognition using spin-im..."

  • ...Other works employing the SI for object matching and representation are: Stasse [277] presents a multi-resolution SI approach for object representation and recognition; Assfalg [17] shows a SI variation called Spin Image Signatures (SIS), which is developed under the SI approach with adaptations to support effective retrieval by content; Li [163] demonstrates a framework to identify partial 3D format in 3D CAD parts using the SI as descriptor; Ping [233] proposes the Tsallis entropy use to generate a concise SI representation, called Tsallis Entropy vector of Spin Image (TESI); Choi [47] proposed an improved SI version, which enhances the format discrimination performance, called Angular-Partitioned Spin Images (APSIs)....

    [...]


Proceedings ArticleDOI
09 May 2011
TL;DR: A shape model-based approach using stereo vision and machine learning for object categorization is introduced allowing proper categorization of unknown objects even when object appearance and shape substantially differ from the training set.
Abstract: Humanoid robots should be able to grasp and handle objects in the environment, even if the objects are seen for the first time. A plausible solution to this problem is to categorize these objects into existing classes with associated actions and functional knowledge. So far, efforts on visual object categorization using humanoid robots have either been focused on appearance-based methods or have been restricted to object recognition without generalization capabilities. In this work, a shape model-based approach using stereo vision and machine learning for object categorization is introduced. The state-of-the-art features for shape matching and shape retrieval were evaluated and selectively transfered into the visual categorization. Visual sensing from different vantage points allows the reconstruction of 3D mesh models of the objects found in the scene by exploiting knowledge about the environment for model-based segmentation and registration. These reconstructed 3D mesh models were used for shape feature extraction for categorization and provide sufficient information for grasping and manipulation. Finally, the visual categorization was successfully performed with a variety of features and classifiers allowing proper categorization of unknown objects even when object appearance and shape substantially differ from the training set. Experimental evaluation with the humanoid robot ARMAR-IIIa is presented.

17 citations


Cites methods from "3D object recognition using spin-im..."

  • ...In [14], spin images were used in a 3D object detection system with the humanoid robot HRP-2 [18]....

    [...]

  • ...Spin images are shape descriptors which have been applied to surface matching [12], object recognition [13][14], 3D registration [15] and 3D object retrieval [16]....

    [...]


Proceedings Article
01 Dec 2008
TL;DR: The current status of the group in trying to make a humanoid robot autonomously build an internal representation of an object, and later on to find it in an unknown environment named "treasure hunting" is described.
Abstract: This paper intends to describe the current status of our group in trying to make a humanoid robot autonomously build an internal representation of an object, and later on to find it in an unknown environment. This problem is named "treasure hunting". In both cases, the main difficulty is to be able to find the next best position of the vision sensor in order to realize the behavior while taking care of the robots limitation. We briefly describe the models and the processes we are currently investigating in building this overall behavior. Along the description we stress the current key problems faced while trying to solve this problem.

8 citations


Cites methods from "3D object recognition using spin-im..."

  • ...Depending on the task different recognitions can be used, as we have at our disposal either a 3D-edge model [25] or a Spin-Image [26]....

    [...]


Dissertation
04 Apr 2013
TL;DR: The last part of this thesis tries to draw some directions where innovative ideas may break some current technical locks in humanoid robotics.
Abstract: This manuscript present my research activities on real-time vision-based behaviors for complex robots such as humanoids. The underlying main scientific question structuring this work is the following: "What are the decisional processes which make possible for a humanoid robot to generate motion in real-time based upon visual information ?" In soccer humans can decide to kick a ball while running and when all the other players are constantly moving. When recast as an optimization problem for a humanoid robot, finding a solution for such behavior is generally computationally hard. For instance, the problem of visual search consider in this work is NP-complete. The first part of this work is concerned about real-time motion generation. Starting from the general constraints that a humanoid robot has to fulfill to generate a feasible motion, some core problems are presented. From this several contributions allowing a humanoid robot to react to change in the environment are presented. They revolve around walking pattern generation, whole body motion for obstacle avoidance, and real-time foot-step planning in constrained environment. The second part of this work is concerned about real-time acquisition of knowledge on the environment through computer vision. Two main behaviors are considered: visual-search and visual object model construction. They are considered as a whole taking into account the model of the sensor, the motion cost, the mechanical constraints of the robot, the geometry of the environment as well as the limitation of the vision processes. In addition contributions on coupling Self Localization and Map Building with walking, real-time foot-steps generation based on visual servoing are presented. Finally the core technologies developed in the previous contexts were used in different applications: Human-Robot interaction, tele-operation, human behavior analysis. Based upon the feedback of several integrated demonstrators on the humanoid robot HRP-2, the last part of this thesis tries to draw some directions where innovative ideas may break some current technical locks in humanoid robotics.

7 citations


References
More filters

Journal ArticleDOI
Abstract: We present a 3D shape-based object recognition system for simultaneous recognition of multiple objects in scenes containing clutter and occlusion. Recognition is based on matching surfaces by matching points using the spin image representation. The spin image is a data level shape descriptor that is used to match surfaces represented as surface meshes. We present a compression scheme for spin images that results in efficient multiple object recognition which we verify with results showing the simultaneous recognition of multiple objects from a library of 20 models. Furthermore, we demonstrate the robust performance of recognition in the presence of clutter and occlusion through analysis of recognition trials on 100 scenes.

2,596 citations


Proceedings ArticleDOI
23 Jun 2003
TL;DR: The limitations of canonical alignment are described and an alternate method, based on spherical harmonics, for obtaining rotation invariant representations is discussed, which reduces the dimensionality of the descriptor, providing a more compact representation, which in turn makes comparing two models more efficient.
Abstract: One of the challenges in 3D shape matching arises from the fact that in many applications, models should be considered to be the same if they differ by a rotation. Consequently, when comparing two models, a similarity metric implicitly provides the measure of similarity at the optimal alignment. Explicitly solving for the optimal alignment is usually impractical. So, two general methods have been proposed for addressing this issue: (1) Every model is represented using rotation invariant descriptors. (2) Every model is described by a rotation dependent descriptor that is aligned into a canonical coordinate system defined by the model. In this paper, we describe the limitations of canonical alignment and discuss an alternate method, based on spherical harmonics, for obtaining rotation invariant representations. We describe the properties of this tool and show how it can be applied to a number of existing, orientation dependent descriptors to improve their matching performance. The advantages of this tool are two-fold: First, it improves the matching performance of many descriptors. Second, it reduces the dimensionality of the descriptor, providing a more compact representation, which in turn makes comparing two models more efficient.

1,375 citations


"3D object recognition using spin-im..." refers methods in this paper

  • ...Towards the design of a search engine for databases of CAD models, several 3D descriptors have been proposed to build signatures of 3D objects [3], [4], [5]....

    [...]


Proceedings ArticleDOI
27 Sep 2004
TL;DR: The development of humanoid robot HRP-3 is presented and it is shown that its main mechanical and structural components are designed to prevent the penetration of dust or spray and its wrist and hand are newly designed to improve manipulation.
Abstract: A development of humanoid robot HRP-2 is presented in this paper. HRP-2 is a humanoid robotics platform, which we developed in phase two of HRP. HRP was a humanoid robotics project, which had run by the Ministry of Economy, Trade and Industry (METI) of Japan from 1998FY to 2002FY for five years. The ability of the biped locomotion of HRP-2 is improved so that HRP-2 can cope with uneven surface, can walk at two third level of human speed, and can walk on a narrow path. The ability of whole body motion of HRP-2 is also improved so that HRP-2 can get up by a humanoid robot's own self if HRP-2 tips over safely. In this paper, the appearance design, the mechanisms, the electrical systems, specifications, and features upgraded from its prototype are also introduced.

882 citations


"3D object recognition using spin-im..." refers background in this paper

  • ...The targeted application us a “Treasure hunting” behaviour on a HRP-2 humanoid robot [6]....

    [...]


Book ChapterDOI
11 May 2004
TL;DR: Two new regional shape descriptors are introduced: 3D shape contexts and harmonic shape contexts that outperform the others on cluttered scenes on recognition of vehicles in range scans of scenes using a database of 56 cars.
Abstract: Recognition of three dimensional (3D) objects in noisy and cluttered scenes is a challenging problem in 3D computer vision. One approach that has been successful in past research is the regional shape descriptor. In this paper, we introduce two new regional shape descriptors: 3D shape contexts and harmonic shape contexts. We evaluate the performance of these descriptors on the task of recognizing vehicles in range scans of scenes using a database of 56 cars. We compare the two novel descriptors to an existing descriptor, the spin image, showing that the shape context based descriptors have a higher recognition rate on noisy scenes and that 3D shape contexts outperform the others on cluttered scenes.

844 citations


"3D object recognition using spin-im..." refers methods in this paper

  • ...Towards the design of a search engine for databases of CAD models, several 3D descriptors have been proposed to build signatures of 3D objects [3], [4], [5]....

    [...]


Proceedings ArticleDOI
14 Oct 2008
TL;DR: The appearance design, the mechanisms, the electrical systems, specifications, and features upgraded from its prototype are introduced and HRP-2 is a humanoid robotics platform developed in phase two of HRP.
Abstract: In this paper, the development of humanoid robot HRP-3 is presented. HRP-3, which stands for Humanoid Robotics Platform-3, is a human-size humanoid robot developed as the succeeding model of HRP-2. One of features of HRP-3 is that its main mechanical and structural components are designed to prevent the penetration of dust or spray. Another is that its wrist and hand are newly designed to improve manipulation. Software for a humanoid robot in a real environment is also improved. We also include information on mechanical features of HRP-3 and together with the newly developed hand. Also included are the technologies implemented in HRP-3 prototype. Electrical features and some experimental results using HRP-3 are also presented.

699 citations


"3D object recognition using spin-im..." refers background in this paper

  • ...The targeted application us a “Treasure hunting” behaviour on a HRP-2 humanoid robot [6]....

    [...]


Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "3d object recognition using spin-images for a humanoid stereoscopic vision system" ?

This paper presents a 3D object recognition method based on spin-images for a humanoid robot having a stereoscopic vision system.