scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Isolated 3D object recognition through next view planning

01 Jan 2000-Vol. 30, Iss: 1, pp 67-76
TL;DR: A new online recognition scheme based on next view planning for the identification of an isolated 3D object using simple features using a probabilistic reasoning framework for recognition and planning is presented.
Abstract: In many cases, a single view of an object may not contain sufficient features to recognize it unambiguously. This paper presents a new online recognition scheme based on next view planning for the identification of an isolated 3D object using simple features. The scheme uses a probabilistic reasoning framework for recognition and planning. Our knowledge representation scheme encodes feature based information about objects as well as the uncertainty in the recognition process. This is used both in the probability calculations as well as in planning the next view. Results clearly demonstrate the effectiveness of our strategy for a reasonably complex experimental set.

Summary (2 min read)

Introduction

  • A hierarchical knowledge representation scheme facilitates recognition and the planning process.
  • A single view may not contain sufficient features to recognize the object unambiguously.
  • A simple feature set is applicable for a larger class of objects than a model base specific complex feature set.
  • The purpose of this paper is to investigate the use of suitably planned multiple views and two-dimensional (2-D) invariants for 3-D object recognition.

A. Relation with Other Work

  • Tarabanis et al. [5] survey the field of sensor planning for vision tasks.
  • S. Dutta Roy and S. Banerjee are with the Department of Computer Science and Engineering, Indian Institute of Technology, New Delhi-110 016, India (e-mail: sumantra@ee.iitd.ernet.in; suban@cse.iitd.ernet.in).
  • The next view planning strategy acts on the basis of these hypotheses.
  • The authors use a hierarchical knowledge representation scheme which not only ensures a low-order polynomial-time complexity of the hypothesis generation process, but also plays an important role in planning the next view.
  • There are six aspects of the object shown, belonging to three classes.

A. Class Identification, Accounting for Uncertainty

  • 2) Class Probability Calculations Using the Knowledge Representation Scheme: (2) P is 1 for those classes which have a link from feature-class fjk.
  • The computation of(2) takesO(NC) time—this is done for each feature-class.
  • Due to errors possible in the feature detection process, a degree of uncertainty is associated with the evidence.
  • The summation reduces to one term,P pjrk.

B. Object Identification

  • Based on the outcome of the class recognition scheme, the authors estimate the object probabilities as follows.
  • A particular movement may preclude the occurrence of some aspects for a given class observed.
  • Let cij and a ij represent the minimum angles necessary to move out of the current assumed aspect in the clockwise and counterclockwise directions, respectively, also known as Auxiliary Move.
  • The authors construct search tree nodes corresponding to both moves.
  • From these, the authors finally select one with the minimum total movement.

A. The Planning Process and Object Recognition

  • In their object identification algorithm, aspect and object probabilities are initialized to theira priori values.
  • Else, the algorithm initiates the search process to get the best distinguishing move to resolve the ambiguity associated with this view.
  • All the above steps starting at (a) (b) (c) (d) (e) (f) planning scheme is global—its reactive nature incorporates all previous movements and observations both in the probability calculations (Section III-B) as well as in the planning process.
  • The authors robust class recognition algorithm can recover from many feature detection errors at the class recognition phase itself (Section III-A-2).
  • Let denote the angular extent of the smallest aspect observed so far.

B. Bounds on the Number of Observations

  • It is instructive to consider bounds onTavg(n), the number of observations required to disambiguate between a set ofn aspects (corresponding to the initially observed class).
  • An interesting case is observed in Fig. 10(c) and (f)—an opportunistic case when the number of steps with primary moves is less than the one with both primary and auxiliary moves.
  • 3) Ordering of Feature Detectors:The third image in Fig. 9(a) shows advantage of their scheduling of feature detectors.
  • 7) Average Number of Observations for a Given Number of Competing Aspects:.

A. Experiments with Model Base II

  • The authors use the number of horizontal and vertical lines (hhvi), and the number of circles(hci) as features.
  • The recognition scheme has the ability to correctly identify objects even when they have a large number of similar views.
  • The primary result obtained in this paper is the use of signed distance ranking of fuzzy numbers obtaining Properties 3 and 4.
  • The purpose of the critical path method (CPM) is to identify critical activities on the critical path so that resources may be concentrated on these activities in order to reduce project length time.

A. CPM in Crisp Case

  • Thus, the activity(v1; v2) requires three days, whereas(v1; v3) requires four days.
  • Let tv v be the processing time for each activity(vi; vj).
  • The authors define the earliest event time for eventvi and the latest event time for eventvj astEv andtLv , respectively.
  • Assume that the values oftv v , tEv , andtLv are already known.

Did you find this useful? Give us your feedback

Figures (19)

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 1, JANUARY 2000 67
Correspondence________________________________________________________________________
Isolated 3-D Object Recognition through Next View
Planning
Sumantra Dutta Roy, Santanu Chaudhury, and Subhashis Banerjee
Abstract—In many cases, a single view of an object may not contain suf-
ficient features to recognize it unambiguously. This paper presents a new
on-line recognition scheme based on next view planning for the identifica-
tion ofanisolatedthree-dimensional(3-D) objectusingsimple features. The
scheme uses a probabilistic reasoning framework for recognition and plan-
ning. Our knowledge representation scheme encodes feature based infor-
mation about objects as well as the uncertainty in the recognition process.
This is used both in the probability calculations as well as in planning the
next view. Results clearly demonstrate the effectiveness of our strategy for
a reasonably complex experimental set.
Index Terms—Active vision, reactive planning, 3-D object recognition.
I. INTRODUCTION
In this paper, we present a new on-line scheme for the recognition
of an isolated three-dimensional (3-D) object using reactive next view
planning. A hierarchical knowledge representation scheme facilitates
recognition and the planning process. The planning process utilizes
the current observation and past history for identifying a sequence of
moves to disambiguate between similar objects.
Most model-based object recognition systems consider the problem
of recognizing objects from the image of a single view [1]–[4]. How-
ever, a single view may not contain sufficient features to recognize
the object unambiguously. In fact, two objects may have all views in
common with respect to a given feature set, and may be distinguished
only through a sequence of views. Further, in recognizing 3-D objects
from a single view, recognition systems often use complex feature sets
[2]. In many cases, it may be possible to achievethe same, incurring less
error and smaller processing cost using a simpler feature set and suit-
ably planning multiple observations. A simple feature set is applicable
for a larger class of objects than a model base specific complex feature
set. Model base-specific complex features such as 3-D invariants have
been proposed only for special cases so far (e.g., [3]). The purpose of
this paper is to investigate the use of suitably planned multiple views
and two-dimensional (2-D) invariants for 3-D object recognition.
A. Relation with Other Work
With an active sensor, object recognition involves identification of a
view of an object and if necessary, planning further views. Tarabanis
et al. [5] survey the field of sensor planning for vision tasks. We can
compare various active 3-D object recognition systems on the basis of
the following four issues.
1) Nature of the Next View Planning Strategy: The system should
plan moves with maximum ability to discriminate between views
Manuscript received October 23, 1997; revised May 5, 1998.
S. Dutta Roy and S. Banerjee are with the Department of Computer Sci-
ence and Engineering, Indian Institute of Technology, NewDelhi-110 016, India
(e-mail: sumantra@ee.iitd.ernet.in; suban@cse.iitd.ernet.in).
S. Chaudhury is with the Department of Electrical Engineering, Indian Insti-
tute of Technology, New Delhi 110 016, India.
Publisher Item Identifier S 1083-4427(00)01177-2.
common to more than one object in the model base. The cost in-
curred in this processshould also be minimal. The system should,
preferably be on-line and reactive—the past and present inputs
should guide the planning mechanism at each stage.
While the scheme of Maver and Bajcsy [6] is on-line, that of
Gremban and Ikeuchi [7] is not. Due to the combinatorial nature
of the problem, an off-line approach may not always be feasible.
2) Uncertainty Handling Capability of the Hypothesis Generation
Mechanism: The occlusion-based next view planning approach
of Maver and Bajcsy [6], as well as that of Gremban and Ikeuchi
[7] are essentially deterministic. A probabilistic strategy can
make the system more robust and resistant to errors compared to
a deterministic one. Dickinson et al. [8] use Bayesian methods
to handle uncertainty, while Hutchinson and Kak [9] use the
Dempster–Shafer theory.
3) Efficient Representation of Domain Knowledge: Theknowledge
representation scheme should support an efficient mechanism
to generate hypotheses on the basis of the evidence received. It
should also play a role in optimally planning the next view.
Dickinson et al. [8] use a hierarchical representation scheme
based on volumetric primitives, which are associated with a high
feature extraction cost. Due to the non-hierarchical nature of
Hutchinson and Kak’s system [9], many redundant hypotheses
are proposed, which have to be later removed through consis-
tency checks.
4) Speed and Efficiency of Algorithms for Both Hypothesis Gen-
eration and Next View Planning: It is desirable to have algo-
rithms with low order polynomial-time complexity to generate
hypotheses accurately and fast. The next view planning strategy
acts on the basis of these hypotheses.
In Hutchinson and Kak’s system [9], although the poly-
nomial-time formulation overcomes the exponential time
complexity associated with assigning beliefs to all possible
hypotheses, their system still has the overhead of intersection
computation in creating common frames of discernment. Con-
sistency checks have to be used to remove the many redundant
hypotheses produced earlier. Though Dickinson et al. [8] use
Bayes nets for hypothesis generation, their system incurs the
overhead of tracking the region of interest through successive
frames.
The next view planning strategy that this paper presents is reactive
and on-line—the evidence obtained from each view is used in the hy-
pothesis generation and the planning process. Our probabilistic hypoth-
esis generation mechanism can handle cases of feature detection errors.
We use a hierarchical knowledge representation scheme which not only
ensures a low-order polynomial-time complexity of the hypothesis gen-
eration process, but also plays an important role in planning the next
view. The hierarchy itself enforces different constraints to prune the
set of possible hypotheses. The scheme is independent of the type of
features used, unlike that of [8]. We present results of over 100 exper-
iments with our recognition scheme on two sets of models. Extensive
experimentation shows the effectiveness of our proposed strategy of
using simple features and multiple views for recognizing complex 3-D
shapes.
The organization of the rest of the paper is as follows: Section II
presents our knowledge representation scheme. We discuss hypothesis
generation for class and object recognition in Section III. Section IV
1083-4427/00$10.00 © 2000 IEEE

68 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 1, JANUARY 2000
describes our algorithm for planning the next view. In Section V we
demonstrate the working of our system on two sets of objects. We sum-
marize the salient features of our scheme and discuss areas for further
work in Section VI.
II. T
HE KNOWLEDGE REPRESENTATION SCHEME
A view of a 3-D object is characterized by a set of features. With re-
spect to a particular feature set and over a particular range of viewing
angles, a view of a 3-D object is independent of the viewpoint. Koen-
derink and van Doorn [10] define aspects as topologically equivalent
classes of object appearances. Ikeuchi et al. generalize this definition:
object appearances may be grouped into equivalence classes with re-
spect to a feature set. These equivalence classes are aspects [11]. In this
context, we define the following terms:
Class A: Class (or, aspect-class) is a set of aspects, equiva-
lent with respect to a feature set.
Feature-Class: A feature-class is a set of equivalent aspects de-
fined for one particular feature.
Fig. 1 shows a simple example of an object with its associated aspects
and classes. The locus of view-directions is one-dimensional (1-D) and
we assume orthographic projection. The basis of the different classes is
the number of horizontal lines
(
h
)
and vertical lines
(
v
)
in a particular
view of the object. Thus, a class may be represented as
h
hv
i
. There
are six aspects of the object shown, belonging to three classes. In this
example, for simplicity we assume only one feature detector so that
each feature-class is also a class.
We propose a new knowledge representation scheme encoding do-
main knowledge about the object, relations between different aspects,
and the correspondence of these aspects with feature detectors. Fig. 2
illustrates an example of this scheme. We use this knowledge represen-
tation scheme both in belief updating as well as in next view planning.
Sections III and
IV discuss these topics, respectively. The representa-
tion scheme consists of two parts.
1) The Feature-Dependence Subnet: In the feature-dependence
subnet
F represents the complete set of features
f
F
j
g
used for
characterizing views.
A feature node
F
j
is associated with feature-classes
f
jk
.
Factors such as noise and nonadaptive thresholds can introduce
errors in the feature detection process. Let
p
jlk
represent the
probability that the feature-class present is
f
jl
, given that the de-
tector for feature
F
j
detects it to be
f
jk
. We define
p
jlk
as the
ratio of the number of times the detector for feature
F
j
interprets
feature-class
f
jl
as
f
jk
, and the number of times the feature de-
tector reports the feature-class as
f
jk
. The
F
j
node stores a table
of these values for its corresponding feature detector.
A class node
C
i
stores its a priori probability,
P
(
C
i
)
.A
link between class
C
i
and feature-class
f
jk
indicates that
f
jk
forms a subset of features observed in
C
i
. This ac-
counts for a PART-OF relation between the two. Thus, a
class represents an
n
-vector
[
f
1
j
f
2
j
111
f
nj
]
. Since a
class cannot be independent of any feature, each class has
n
input edges corresponding to the
n
features.
2 The Class-Aspect Subnet: The class-aspect subnet encodes the
relationships between classes, aspects, and objects.
O represents the set of all objects
f
O
i
g
An object node
O
i
stores its probability,
P
(
O
i
)
.
An aspect node
a
ij
stores its angular extent
ij
(in de-
grees), its probability
P
(
a
ij
)
, its parent class
C
j
, and its
neighboring aspects.
Aspect
a
ij
has a PART-OF relationship with its parent ob-
ject
O
i
. Thus,
3
-tuple
h
O
i
;C
j
;
ik
i
represents an aspect.
Fig. 1. Aspects and classes of an object.
Fig. 2. Example of the knowledge representation scheme.
Aspect node
a
ij
has exactly one link to any object
(
O
i
)
and exactly one link to its parent class
C
j
.
III. H
YPOTHESIS GENERATION
The recognition system takes any arbitraryviewof an object as input.
Using a set of features (the feature-classes), it generates hypotheses
about the likely identity of the class. This is, in turn used for gener-
ating hypotheses about the object’s identity. The interaction of the hy-
pothesis generation part with the rest of the system is shown in Fig. 3.
Hypothesis generation consists of two steps namely, class identifica-
tion, and object identification.
A. Class Identification, Accounting for Uncertainty
Our algorithm suitably schedules feature detectors to perform prob-
abilistic class identification. In what follows, we discuss its various as-
pects. Fig. 4 presents the overall algorithm.
1) Ordering of Feature Detectors: A proper ordering of feature de-
tectors speeds up the class recognition process. At any stage, we choose
the hitherto unused feature detector for which the feature-class corre-
sponding to the most probable class has the least number of outgoing
arcs, i.e., the least out-degree. This is done in order to obtain that fea-
ture-class which has the largest discriminatory power in terms of the
number of classes it could correspond to. For example, in Fig. 2 if all
feature detectors are unused and
C
2
has the highest a priori probability,
F
3
will be tried first, followed by
F
2
and
F
1
, if required.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 1, JANUARY 2000 69
Fig. 3. Flow diagram depicting the flow of information and control in our system.
Fig. 4. Class recognition algorithm.
2) Class Probability Calculations Using the Knowledge Represen-
tation Scheme: We obtain the a priori probability of class
C
i
as
P
(
C
i
)=
p
P
(
O
p
)
1
q
P
(
a
pq
j
O
p
)
:
(1)
Here, aspects
a
pq
belong to class
C
i
. Let
N
F
,
N
C
, and
N
a
denote
the number of feature-classes associated with feature detector
F
j
, the
number of classes, and the number of aspects, respectively.
P
(
a
pq
j
O
p
)
is
pq
=
360
.We can compute
P
(
C
i
)
from our knowledgerepresentation
scheme by considering each aspect node belonging to an object and
testing if it has a link to node
C
i
; this takes
O
(
N
C
+
N
a
)
time. (The
N
C
term is for the initialization of class probabilities to 0.)
Let the detector for feature
F
j
report the feature-class obtained to be
f
jk
. Given this evidence, we obtain the probability of class
C
i
from the
Bayes rule
P
(
C
i
j
f
jk
)=
P
(
C
i
)
1
P
(
f
jk
j
C
i
)
m
[
P
(
C
m
)
1
P
(
f
jk
j
C
m
)]
(2)
P
(
f
jk
j
C
i
)
is 1 for those classes which have a link from feature-class
f
jk
. It is 0 for the rest. The computation of (2) takes
O
(
N
C
)
time—this
is done for each feature-class. Hence, the computation of
P
(
f
jk
j
C
i
)
for
all feature-classes
f
jk
for feature detector
F
j
takes time
O
(
N
F
1
N
C
)
.
For an error-free situation,
P
(
C
i
j
f
jk
)
is
P
0
(
C
i
)
, the a posteriori
probability of class
C
i
. However, due to errors possible in the feature
detection process, a degree of uncertainty is associated with the evi-
dence. The value of
P
0
(
C
i
)
is, then
P
0
(
C
i
)=
l
P
(
C
i
j
f
jl
)
1
p
jlk
(3)
where
f
jl
’s are feature-classes associated with feature
F
j
. According
to our knowledge representation scheme, only one feature-class under
feature
F
j
, say
f
jr
has a link to class
C
i
. The summation reduces to one
term,
P
(
C
i
j
f
jr
)
1
p
jrk
. Thus, our knowledge representation scheme
also enable recovery from feature detection errors.
B. Object Identification
Based on the outcome of the class recognition scheme, we estimate
the object probabilities as follows. Initially, we calculate the a priori
probability of each aspect as
P
(
a
j k
)=
P
(
O
j
)
1
P
(
a
j k
j
O
j
)
:
(4)

70 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 1, JANUARY 2000
(a) (b)
Fig. 5. (a) The notation used (
Section IV) and (b) a case when our algorithm
is not guaranteed to succeed (Section IV-A).
If there are
N
objects in the model base, weinitialize
P
(
O
j
)
to
1
=N
before the first observation. For the first observation,
P
(
a
j k
j
Oj
p
)
is
j k
=
360
. A priori aspect probability calculations take
O
(
N
a
)
time.
For any subsequent observation, we have to account for the move-
ment in the probability calculations. For example, a particular move-
ment may preclude the occurrence of some aspects for a given class
observed. The value of
P
(
a
j k
j
O
j
)
is given by
P
(
a
j k
j
O
j
)=
j k
=
360
(5)
where
j k
(
j k
2
[0
;
j k
])
represents the angular range pos-
sible within aspect
a
j k
for the move(s) taken to reach this posi-
tion. Due to the movement made, we could have observed only
m
(0
m
r
)
aspects out of a total of
r
aspects belonging to class
C
i
.
Experiments with Model Base I
Let the class recognition phase report the observed class to be
C
i
.
Let us assume that
C
i
could have come from aspects
a
j
k
,
a
j k
,
111
;a
j k
, where all
j
1
;j
2
;
111
;j
m
are not necessarily different.
We obtain the a posteriori probability of aspect
a
j k
given this evi-
dence using the Bayes rule
P
(
a
j k
j
C
i
)=
P
(
a
j k
)
1
P
(
C
i
j
a
j k
)
m
p
=1
[
P
(
a
j k
)
1
P
(
C
i
j
a
j k
)]
(6)
P
(
C
i
j
a
j k
)
is
1
for aspects with a link to class
C
i
,0 otherwise. Finally,
we obtain the a posteriori probability
P
(
O
j
)=
l
P
(
a
j k
j
C
i
)
(7)
where aspects
a
j k
belong to class
C
i
.
If the probability of some object is above a predetermined threshold
(experimentally determined, e.g., 0.87 for Model Base I), the algorithm
reports a success, and stops. If not, it means that the view of the object
is not sufficient to identify the object unambiguously. We have to take
the next view.
In our hierarchical scheme, the link conditional probabilities (rep-
resenting relations between nodes) themselves enforce consistency
checks at each level of evidence. The feature evidence is progressively
refined as it passes through different levels in the hierarchy, leading to
simpler evidence propagation and less computational cost. This is an
advantage of our scheme over that proposed in [9].
IV. N
EXT VIEW PLANNING
The class observed in the class recognition phase could have come
from many aspects in the model base, each with its own range of po-
sitions within the aspect. Due to this ambiguity, one has to search for
Fig. 6. Partially constructed search tree.
Fig. 7. Object recognition algorithm.
the best move to discern between these competing aspects subject to
memory and processing limitations, if any. The parameters described
above characterize the state of the system. The planning process aims
to determine a move from the current step, which would uniquely iden-
tify the given object. We pose the planning problem as that of a forward
search in the state space which takes us to a state in which the aspect
list corresponding to the class observed has exactly one node. We use a
search tree for this purpose. A search tree node represents the following
information: [Fig. 5(a)] the unique class observedfor the angular move-
ment made so far, the aspects possible for this angle-class pair, and for
each aspect, the range of positions possible within it
(
s
ij
0
e
ij
)
.
s
ij
and
e
ij
denote the two positions within aspect
a
ij
where the current
viewpoint can be, as a result of the movement made thus far. Here,
s
ij
e
ij
; and
s
ij
,
e
ij
2
[0
;
ij
]
, where
ij
is the angular extent of
aspect
a
ij
. A leaf node is one which has either one aspect associated
with it or corresponds to a total angular movement of 360
or more
from the root node.
Fig. 6 shows an example of a partially constructed search tree. From
a view point, we categorize possible moves as follows.

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 30, NO. 1, JANUARY 2000 71
Fig. 8. Model Base I: The objects (from left) are
O
,
O
,
O
,
O
,
O
,
O
,
O
, and
O
, respectively.
(a)
(b)
(c)
(d)
Fig. 9. Some experiments with Model Base I: initial class
h
232
i
. The objects are
O
[(a) and (c)], and
O
[(b) and (d)], respectively. (a)
h
232
i !h
231(221)
i !h
232
i !h
221
i !h
232
i
. (b)
h
232
i !h
221
i !h
221
i !h
221
i
. (c)
h
232
i !h
232
i !h
221
i
. (d)
h
232
i !h
221
i !h
221
i !h
221
i
. The numbers above the arrows denote the number of turntable steps. A negative sign indicates a clockwise movement.
(The figure in parentheses shows an example of recovery from feature detection errors.)
Primary Move: A primary move represents a move from an aspect
by
, the minimum angle needed to move out of it.
Auxiliary Move: An auxiliary move represents a move from an as-
pect by an angle corresponding to the primary move of another com-
peting aspect.
Let
c
ij
and
a
ij
represent the minimum angles necessary to move out
of the current assumed aspect in the clockwise and counterclockwise
directions, respectively. Three cases are possible.
1) Type I Move:
c
ij
and
a
ij
both take us out of the current aspect
to a single aspect in each of the two directions—
a
ip
and
a
iq
,
respectively. We construct search tree nodes corresponding to
both moves.
2) Type II Move: Exactly one out of
c
ij
and
a
ij
takes us to a single
aspect
a
ip
. For the other direction, the aspect we would reach
depends upon the initial position
(
2
s
ij
;
e
ij
])
in the current as-
pect. We construct a search tree nodecorresponding to the former
move.
3) Type III Move: Whether we move in the clockwise or the coun-
terclockwise direction, the aspect reached depends on the initial
position in the current aspect. We choose the move which leads

Citations
More filters
Journal ArticleDOI
TL;DR: A broad survey of developments in active vision in robotic applications over the last 15 years is provided, e.g. object recognition and modeling, site reconstruction and inspection, surveillance, tracking and search, as well as robotic manipulation and assembly, localization and mapping, navigation and exploration.
Abstract: In this paper we provide a broad survey of developments in active vision in robotic applications over the last 15 years. With increasing demand for robotic automation, research in this area has received much attention. Among the many factors that can be attributed to a high-performance robotic system, the planned sensing or acquisition of perceptions on the operating environment is a crucial component. The aim of sensor planning is to determine the pose and settings of vision sensors for undertaking a vision-based task that usually requires obtaining multiple views of the object to be manipulated. Planning for robot vision is a complex problem for an active system due to its sensing uncertainty and environmental uncertainty. This paper describes such problems arising from many applications, e.g. object recognition and modeling, site reconstruction and inspection, surveillance, tracking and search, as well as robotic manipulation and assembly, localization and mapping, navigation and exploration. A bundle of solutions and methods have been proposed to solve these problems in the past. They are summarized in this review while enabling readers to easily refer solution methods for practical applications. Representative contributions, their evaluations, analyses, and future research trends are also addressed in an abstract level.

398 citations

Journal ArticleDOI
TL;DR: It is argued that the next step in the evolution of object recognition algorithms will require radical and bold steps forward in terms of the object representations, as well as the learning and inference algorithms used.

312 citations


Cites background from "Isolated 3D object recognition thro..."

  • ...The various 3D active object recognition systems that have been proposed so far in the literature can be compared based on the following four main characteristics [267]:...

    [...]

Journal ArticleDOI
TL;DR: This paper surveys important approaches to active 3-D object recognition and reviews existing approaches towards another important application of an active sensor namely, that of scene analysis and interpretation.

138 citations


Cites background or methods from "Isolated 3D object recognition thro..."

  • ...Many active object recongition schemes are based on aspect graphs [55], [56], [14], [57], [58] (Section 2.3 describes these in detail)....

    [...]

  • ...They propose indirect searches to be more efficient as compared to direct searches for an object, in two cases....

    [...]

  • ...They report that in typical situations, indirect search provides up to about an 8-fold increase in efficiency....

    [...]

  • ...D object recognition uses a planning scheme in order to take the next view [57], [58], [59] (brief description in Sections 2.3 and 2.3)....

    [...]

  • ...D scene....

    [...]

Patent
20 Feb 2008
TL;DR: In this paper, a view-based approach is presented that does not show the drawbacks of previous methods because it is robust to image noise, object occlusions, clutter, and contrast changes.
Abstract: The present invention provides a system and method for recognizing a 3D object in a single camera image and for determining the 3D pose of the object with respect to the camera coordinate system. In one typical application, the 3D pose is used to make a robot pick up the object. A view-based approach is presented that does not show the drawbacks of previous methods because it is robust to image noise, object occlusions, clutter, and contrast changes. Furthermore, the 3D pose is determined with a high accuracy. Finally, the presented method allows the recognition of the 3D object as well as the determination of its 3D pose in a very short computation time, making it also suitable for real-time applications. These improvements are achieved by the methods disclosed herein.

117 citations

Journal ArticleDOI
TL;DR: A hierarchical view-based approach that addresses typical problems of previous methods is applied and is robust to noise, occlusions, and clutter to an extent that is sufficient for many practical applications, and is invariant to contrast changes.
Abstract: This paper describes an approach for recognizing instances of a 3D object in a single camera image and for determining their 3D poses. A hierarchical model is generated solely based on the geometry information of a 3D CAD model of the object. The approach does not rely on texture or reflectance information of the object's surface, making it useful for a wide range of industrial and robotic applications, e.g., bin-picking. A hierarchical view-based approach that addresses typical problems of previous methods is applied: It handles true perspective, is robust to noise, occlusions, and clutter to an extent that is sufficient for many practical applications, and is invariant to contrast changes. For the generation of this hierarchical model, a new model image generation technique by which scale-space effects can be taken into account is presented. The necessary object views are derived using a similarity-based aspect graph. The high robustness of an exhaustive search is combined with an efficient hierarchical search. The 3D pose is refined by using a least-squares adjustment that minimizes geometric distances in the image, yielding a position accuracy of up to 0.12 percent with respect to the object distance, and an orientation accuracy of up to 0.35 degree in our tests. The recognition time is largely independent of the complexity of the object, but depends mainly on the range of poses within which the object may appear in front of the camera. For efficiency reasons, the approach allows the restriction of the pose range depending on the application. Typical runtimes are in the range of a few hundred ms.

115 citations


Cites background from "Isolated 3D object recognition thro..."

  • ...Because no 3D data but only a single monocular image is available in many cases, the automation level of various industrial processes can be improved significantly if the pose of such objects can be determined reliably from a single image....

    [...]

References
More filters
Book
31 Jul 1985
TL;DR: The book updates the research agenda with chapters on possibility theory, fuzzy logic and approximate reasoning, expert systems, fuzzy control, fuzzy data analysis, decision making and fuzzy set models in operations research.
Abstract: Fuzzy Set Theory - And Its Applications, Third Edition is a textbook for courses in fuzzy set theory. It can also be used as an introduction to the subject. The character of a textbook is balanced with the dynamic nature of the research in the field by including many useful references to develop a deeper understanding among interested readers. The book updates the research agenda (which has witnessed profound and startling advances since its inception some 30 years ago) with chapters on possibility theory, fuzzy logic and approximate reasoning, expert systems, fuzzy control, fuzzy data analysis, decision making and fuzzy set models in operations research. All chapters have been updated. Exercises are included.

7,877 citations


"Isolated 3D object recognition thro..." refers background in this paper

  • ...Introduction Most model based object recognition systems con sider the problem of recognizing objects from the image of a single view However a single view may not contain su cient features to recognize the object unam biguously In single view object recognition systems of ten need to use complex…...

    [...]

Book
01 Sep 1991
TL;DR: This two-volume set is an authoritative, comprehensive, modern work on computer vision that covers all of the different areas of vision with a balanced and unified approach.
Abstract: From the Publisher: This two-volume set is an authoritative, comprehensive, modern work on computer vision that covers all of the different areas of vision with a balanced and unified approach. The discussion in "Volume I" focuses on image in, and image out or feature set out. "Volume II" covers the higher level techniques of illumination, perspective projection, analytical photogrammetry, motion, image matching, consistent labeling, model matching, and knowledge-based vision systems.

3,571 citations


"Isolated 3D object recognition thro..." refers methods in this paper

  • ...We represent a class as . We use Hough transform-based line and circle detectors [ 12 ]....

    [...]

  • ...1) Polyhedral Objects: We use as features, the number of horizontal and vertical lines , and the number of nonbackground segmented regions in an image . We represent a class as .W e use a Hough transform-based line detector [ 12 ]....

    [...]

  • ...For getting the number of regions in the image, we perform sequential labeling (connected components: pixel labeling) [ 12 ] on a thresholded gradient image....

    [...]

Journal ArticleDOI
TL;DR: This book introduced many novel mathematical operations based on this concept of level of confidence and have presented many generalizations, and presented several operations and functions of fuzzy numbers, such as integer modulo operations, trigonometric functions, and hyperbolic functions.
Abstract: We were rather pleased to read the review of our book, Introduction to Fuzzy Arithmetic: Theory and Applications. This review was done quite carefully by Caroline M. Eastman of the University of South Carolina, and we are grateful to her for pointing out many interesting, positive aspects as well as some shortcomings of our book. As members of the fuzzy community, we are concerned with studies and developments of concepts and techniques basic to the analysis of uncertainty arising from human perception, thinking, and reasoning processes. In this book we present such concepts and some novel tools for dealing with uncertainties. We start our introduction with the definition for the interval of confidence [al, a2], where al and a2 represent, respectively, the lower and upper bounds of our (subjective) confidence. Next, we introduce some arithmetic operations on these numbers. We then introduce the level of presumption ue [13, 1] and, using it, introduce the uncertain or fuzzy number that is so pervasive in our reasoning process. The reviewer has rightly pointed out that in certain situations, interval arithmetic can be considered a subset of fuzzy arithmetic, the main topic of our book. However, we intentionally did not want to confuse the issue by introducing interval arithmetic and then giving a generalization. We liked our approach, as have many other researchers and students who have used the book. In our approach, we have been guided throughout by a desire to lay a firm foundation for the definition of fuzzy numbers using the basic concept of level of confidence. We have introduced many novel mathematical operations based on this concept and have presented many generalizations. In addition, we have presented several operations and functions of fuzzy numbers, such as integer modulo operations, trigonometric functions, and hyperbolic functions. These studies have been included for students as well as researchers who wish to have an extended view of the theory. We have attempted to give a thorough exposition of fuzzy numbers; this exposition is illustrated by about 115 worked-out examples, 150 diagrams, and 90 tables. We did not include problems or exercises, which would have put this book in the category of a textbook. The subtitle of the book is \"Theory and Applications,\" but as is rightly

2,238 citations


"Isolated 3D object recognition thro..." refers background in this paper

  • ... [ 3 ] A. Zisserman, D. Forsyth, J. Mundy, C. Rothwell, J. Liu, and N. Pillow,...

    [...]

  • ...From [ 3 ] and [6] we have the following properties of binary operations....

    [...]

  • ...Model base-specific complex features such as 3-D invariants have been proposed only for special cases so far (e.g., [ 3 ])....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a precise definition of the 3D object recognition problem is proposed, and basic concepts associated with this problem are discussed, and a review of relevant literature is provided.
Abstract: A general-purpose computer vision system must be capable of recognizing three-dimensional (3-D) objects. This paper proposes a precise definition of the 3-D object recognition problem, discusses basic concepts associated with this problem, and reviews the relevant literature. Because range images (or depth maps) are often used as sensor input instead of intensity images, techniques for obtaining, processing, and characterizing range data are also surveyed.

1,146 citations

Frequently Asked Questions (14)
Q1. What have the authors contributed in "Isolated 3-d object recognition through next view planning" ?

This paper presents a new on-line recognition scheme based on next view planning for the identification of an isolated three-dimensional ( 3-D ) object using simple features. 

The next view planning strategy that this paper presents is reactive and on-line—the evidence obtained from each view is used in the hypothesis generation and the planning process. 

For getting the number of regions in the image, the authors perform sequential labeling (connected components: pixel labeling) [12] on a thresholded gradient image. 

While the authors use simple features for the purpose of illustration, one may use other features such as texture, color, specularities, and reflectance ratios. 

With an active sensor, object recognition involves identification of a view of an object and if necessary, planning further views. 

Over 100 experiments demonstrate the effectiveness of using simple features and multiple views even on a relatively complex class of objects with a high degree of ambiguity associated with a view of the object. 

Due to the non-hierarchical nature of Hutchinson and Kak’s system [9], many redundant hypotheses are proposed, which have to be later removed through consistency checks. 

The sequence of moves until observation 3 could correspond to O4, O5, O6, and O7 with probabilities 0.877, 0.102, 0.014, and 0.007, respectively. 

The knowledge representation scheme should support an efficient mechanism to generate hypotheses on the basis of the evidence received. 

The authors can computeP (Ci) from their knowledge representation scheme by considering each aspect node belonging to an object and testing if it has a link to node Ci; this takes O(NC + Na) time. 

If the view indeed corresponds to the most probable aspect at a particular stage, then their search process using primary and auxiliary moves is guaranteed to perform aspect resolution and uniquely identify the object in the following step, assuming no feature detection errors. 

Though Dickinson et al. [8] use Bayes nets for hypothesis generation, their system incurs the overhead of tracking the region of interest through successive frames. 

In the first image in Fig. 13(b), due to the shadow of the wing on the fuselage of the aircraft, the feature detector detectsfour vertical lines instead of three, the correct number. 

Their robust class recognition algorithm can recover from many feature detection errors at the class recognition phase itself (Section III-A-2).