scispace - formally typeset
Open AccessJournal ArticleDOI

Learning View-Model Joint Relevance for 3D Object Retrieval

TLDR
This is the first work to jointly explore the view-based and model-based relevance among the 3D objects in a graph-based framework and demonstrates the effectiveness on retrieval accuracy of the proposed 3D object retrieval method.
Abstract
3D object retrieval has attracted extensive research efforts and become an important task in recent years. It is noted that how to measure the relevance between 3D objects is still a difficult issue. Most of the existing methods employ just the model-based or view-based approaches, which may lead to incomplete information for 3D object representation. In this paper, we propose to jointly learn the view-model relevance among 3D objects for retrieval, in which the 3D objects are formulated in different graph structures. With the view information, the multiple views of 3D objects are employed to formulate the 3D object relationship in an object hypergraph structure. With the model data, the model-based features are extracted to construct an object graph to describe the relationship among the 3D objects. The learning on the two graphs is conducted to estimate the relevance among the 3D objects, in which the view/model graph weights can be also optimized in the learning process. This is the first work to jointly explore the view-based and model-based relevance among the 3D objects in a graph-based framework. The proposed method has been evaluated in three data sets. The experimental results and comparison with the state-of-the-art methods demonstrate the effectiveness on retrieval accuracy of the proposed 3D object retrieval method.

read more

Content maybe subject to copyright    Report

3
Learning View-Model Joint Relevance
for 3D Object Retrieval
Ke Lu, Member, IEEE, Ning He, Jian Xue, Jiyang Dong, and Ling Shao, Senior Member, IEEE
Abstract 3D object retrieval has attracted extensive research
efforts and become an important task in recent years. It is
noted that how to measure the relevance between 3D objects
is still a difficult issue. Most of the existing methods employ
just the model-based or view-based approaches, which may lead
to incomplete information for 3D object representation. In this
paper, we propose to jointly learn the view-model relevance
among 3D objects for retrieval, in which the 3D objects are
formulated in different graph structures. With the view informa-
tion, the multiple views of 3D objects are employed to formulate
the 3D object relationship in an object hypergraph structure.
With the model data, the model-based features are extracted to
construct an object graph to describe the relationship among
the 3D objects. The learning on the two graphs is conducted
to estimate the relevance among the 3D objects, in which the
view/model graph weights can be also optimized in the learning
process. This is the first work to jointly explore the view-based
and model-based relevance among the 3D objects in a graph-
based framework. The proposed method has been evaluated in
three data sets. The experimental results and comparison with
the state-of-the-art methods demonstrate the effectiveness on
retrieval accuracy of the proposed 3D object retrieval method.
Index Terms 3D object retrieval, view information, model
data, joint learning.
I.
INTRODUCTION
D Objects have been widely applied in plenty of diverse
applications [1][3], e.g., computer graphics, the medical
industry, and virtual reality, due to the fast advances in graphic
hardware, computer techniques and networks. Large scale
databases of 3D objects are rapidly increasing, which leads
Manuscript received July 28, 2014; revised November 10, 2014; accepted
January 19, 2015. Date of publication January 28, 2015; date of current
version March 6, 2015. This work was supported in part by the National
Natural Science Foundation of China under Grant 61271435, Grant 61370138,
and Grant U1301251, in part by the Beijing Natural Science Foundation
under Grant 4152017 and Grant 4141003, in part by the Importation and
Development of High-Caliber Talents Project of Beijing Municipal Institutions
under Grant IDHT20130225, in part by the National Program on Key Basic
Research Project (973 Programs) under Grant 2011CB706901-4, and in part
by the Instrument Developing Project through the Chinese Academy of Sci-
ences under Grant YZ201321. The associate editor coordinating the review of
this manuscript and approving it for publication was Prof. Marios S. Pattichis.
K. Lu is with the University of Chinese Academy of Sciences,
Beijing 100049, China, and also with the Beijing Center for Mathematics
and Information Interdisciplinary Sciences, Beijing 100190, China (e-mail:
luk@ucas.ac.cn).
N. He is with Beijing Union University, Beijing 100191, China (e-mail:
xxthening@buu.edu.cn).
J. Xue and J. Dong are with the University of Chinese Academy
of Sciences, Beijing 100049, China (e-mail: xuejian@ucas.ac.cn;
dongjiyang12@mails.ucas.ac.cn).
L. Shao is with the Department of Computer Science and Digital Technolo-
gies, Northumbria University, Newcastle upon Tyne, NE1 8ST, U.K. (e-mail:
ling.shao@ieee.org).
Fig. 1. Example views of two 3D objects.
to the high requirement of effective and efficient 3D object
retrieval algorithms.
Recently, extensive research efforts have been dedicated to
3D object retrieval technologies [4][7]. Existing 3D object
retrieval approaches can be briefly divided into two paradigms,
i.e., model-based methods and view-based methods.
In model-based method [8][10], 3D objects are described
model-based features, such as low-level feature (e.g. the volu-
metric descriptor [11], the surface distribution [9] and surface
geometry [8], [12], [13]) or high-level features, e.g. the method
in [14]. In [14], both visual and geometric characteristics
are taken into consideration and a high level semantic space
mapping from the low level features is further learned with
user relevance feedback, which is another Euclidean space
and can be regarded as a dimension reduction or feature
selection method. One advantage of model-based methods
is that they can preserve the global spatial information of
3D objects. Although model-based method is effective, they
require 3D model information explicitly, which limits the
applications of model-based methods. The 3D model infor-
mation is not always available, especially in some practical
applications.
In view-based method, [15][17], 3D objects are represented
by a group of images from different directions. For different
methods, these views may be captured with a static camera
array or without such camera array constraint. For view-based
method, the matching between two 3D objects is accomplished
via multiple-view matching. Figure 1 shows some examples of
multiple views for 3D objects. The view-based methods benefit
from existing image processing/matching technologies. These
methods make 3D object retrieval more flexible due to that
they do not require 3D model information. Existing works [18]
also show that view-based method can be highly discriminative
for 3D objects, which also provide better retrieval perfor-
mance than model-based methods [3], [19]. Compared with

Fig. 2. The framework of the proposed method.
model-based methods, one disadvantage of view-based
methods is that when the camera array information is not
available, they are difficult to describe the spatial relationship
among different views.
One typical scenario that 3D model information is not
available is when we want to search the objects in the world.
For example, when the tourist finds some interesting things and
wants to find similar ones in the dataset, it is hard to obtain the
model information but just take several pictures. In this case,
the model-based methods cannot work and only the image-
based methods can be applied. For model-based method,
CAD is a very important area for application. Other areas
where model-based methods work well are entertainment,
such as 3D TV and games, and the medical field, such as tele-
medical treatment and diagnosis. It is noted that the visual
information becomes more important recently in the above
application. Both the model-information and view-based
information can bring in useful angles, which can further
improve the performance.
It is noted that most of existing methods separate the model-
based methods and the view-based methods, and employ
either model information or view feature for 3D object
retrieval. In this work, we propose to jointly employ both
the model and the view information for 3D object relevance
estimation. In the view part, representative views are firstly
selected for each object, and then the view-level distances are
calculated. Following the method in [20], an object hypergraph
is constructed using the view star expansion. In the model
part, the spatial structure circular descriptor [21] is extracted
and a simple graph is generated using the pairwise object
distances. In this way, the view information and the model
data can be formulated in two graph structures. Learning
on the two graphs is conducted to estimate the relevance
among 3D objects, in which the graph weights can be also
optimized. Figure 2 demonstrates the schematic framework of
the proposed approach. Evaluation on three datasets has shown
superior 3D object retrieval accuracy performance compared
with the state-of-the-art methods.
The rest of the paper is organized as follows. Related work
on 3D object retrieval is reviewed in Section II. The proposed
method is provided in Section III. Experiments and discussion
are given in Section IV. We conclude the paper in Section V.
II.
RELATED WORK
In this section, we briefly review existing methods on
3D object retrieval. To represent 3D objects, low-level
features, such as volumetric descriptor [11] and surface
geometry [8], [12], and high-level features, such
as the method in [14] were employed in previous
works.
For model-based 3D object retrieval, the shape descriptor
is an important role for 3D object representation. According
to [22], 3D shape descriptors can be divided into four cate-
gories, i.e., histogram-based method [9], [23][25], transform-
based method [26][29], graph-based method [30][32] and
view-based method [21], [33], [34].
In the histogram-based method, a histogram-like feature is
extracted from the 3D model to collect numerical values of
certain attributes. Typical histogram-based descriptors include
shape distribution [9], generalized shape distribution [23],
extended Gaussian image [24] and 3D Hough transform [25].
In transform-based method, transform coefficients are
employed as the 3D shape descriptor, such as 3D Fourier [26],
spherical trace transform [27], radialized extend function [28],
and concrete radialized shperical projection [29]. The graph-
based method aims to represent 3 objects by graph structure,
and the comparison between 3D objects turns to matching of
two graphs. Some typical graph-based methods include reeb
graphs [30], [31] and skeletal graphs [32].
Given the 3D model, a spatial structure circular
descriptor (SSCD) descriptor was introduced in [21], which
projected the model information into a circular region to
preserve the global spatial information of the 3D model. In this
method, the histogram for each SSCD view was calculated to
measure the distance between two 3D objects. A panoramic
view, named PANORAMA, was employed in [35] for
3D model representation. The panoramic view was generated
by projecting the model to a lateral surface of a cylinder in
PANORAMA, and the distance between two models can be
calculated by the matching between two PANORAMA images.
Leng et al. [14] employs both the Dbuffer descriptor [36],
which contains 6 depth buffer images from the front,
lateral and vertical views, and GEDT coefficients [37] as
the descriptors. Then these two descriptors are combined
as TUGE descriptor, which is 982-dimension. With user
feedback, these low level features are mapped to high level
semantic space, which is another Euclidean space and can
be regarded as a dimension reduction or feature selection
method. A bipartite graph learning method is introduced
in [38], where the comparison between two groups of multiple
views is formulated in a bipartite graph. A learning-based
method for bipartite graph matching is proposed in [39].

In view-based 3D object retrieval methods, how to generate
multiple views is an important issue. Some existing methods
employed predefined camera arrays to capture views, while
some other works may not have such constraints. Lighting
Field Descriptor (LFD [33] is the first view-based 3D object
retrieval method. In LFD, each 3D object was represented by
several groups of representative views. Each group contained
10 views and the Zernike moments and Fourier descriptors
were employed as the view feature. The minimal distance
between two groups of views from two compared 3D objects
was employed as the pairwise object distance. Different
from LFD, Elevation Descriptor (ED) [34] employed six range
views from different directions of 3D objects. The depth
histogram was extracted to describe the EDs and the matching
between two groups of EDs was conducted to calculate the
distance between two 3D objects. 18 views were captured
in Compact Multi-View Descriptor (CMVD) [18] from the
18 vertices of a 32-hedron. 7 characteristic views were
generated in [40] from different directions. In the camera
constraint free method (CCFV) [41], a set of representative
views are selected from the originally captured multiple views
via view clustering and a probabilistic matching method
is then employed to calculate the similarity between each
two 3D objects.
Some other methods first generate large scale raw views,
and further select representative views in the big view pool.
One typical method is Adaptive Views Clustering (AVC) [15].
In AVC, 320 initial views were firstly captured and represen-
tative views, generally about 20 to 40 views, were selected
from these raw views. The comparison between 3D objects is
formulated as a probabilistic approach to measure the posterior
probability of the target object given the query. In [41],
a positive matching model and a negative matching model
were used to measure the relevance between a target object
and the query. This is the first attempt to explore the relevance
of one candidate object on both positive and negative samples
and evaluation has shown satisfied performance.
In [42], curvature scale space was employed as the view
descriptor, which was further combined with Zernike Moments
to measure the distance between two 3D models. In depth
gradient image (DGI) model [43], both the surface and the
contour information were synthesized, which can avoid restric-
tions concerning the layout and visibility of the models in the
scene.
Distance estimation between two groups of views is
one important problem in view-based 3D object retrieval.
Gao et al. [44] propose a learning based Hausdorff distance
for 3D object retrieval. In this method, a Mahalanobis distance
metric was learnt to the view-level distance measure, which
can be further used in the object-level Hausdorff distance
calculation. This method solves the challenges that the label
is on the object level while the distance metric is on the
view level. To estimate the relevance among 3D objects, semi-
supervised learning has been investigated in recent years. In
[20], a hypergraph structure was employed to formulate the
relationship among 3D objects. In this method, the view
clustering was conducted to generate hyperedges, which were
used to connect 3D objects. Based on different
view clustering results, multiple hypergraphs could be
constructed, and learning was conducted on the hypergraph to
estimate the relevance among 3D objects. This method further
extends existing view-based 3D object retrieval method to
semi-supervised learning approach, which has been justfied as
the state-of-the-art methods. Gaussian mixture model (GMM)
was used in [45] to formulate the distribution of multiple
views for 3D objects. In this method, the KL divergence was
employed to measure the distance between two 3D objects.
III.
LEARNING VIEW-MODEL JOINT RELEVANCE
FOR 3D OBJECT RETRIEVAL
In this section, we introduce the view-model joint relevance
learning method for 3D object retrieval. This method explores
both the view information and the model data of 3D objects.
The proposed method is composed of three key components, as
shown in Figure 2. Given the view information of 3D objects,
the proposed method first constructs a hypergraph to formulate
the relationship among 3D objects with the view connections.
Then with the model data, a spatial structure circular descriptor
is extracted from each 3D model, and the distance between
each two 3D models is used to generate a simple graph
to explore the relationship among 3D models. Finally, the
learning the joint view-model graphs is conducted to estimate
the relevance among 3D objects.
A.
View-Based Hypergraph Generation
Here the view-based hypergraph is generated following
the method in [20] and briefly introduced as follows. Let
O = {O
1
, O
2
, . . . , O
n
} denote the n 3D objects in the dataset,
and
V
i
=
.
v
i1
,
v
i2
,
...
,
v
in
i
.
denote
the
n
i
views
of
the
i th 3D object O
i
. In this part, we aim to explore the relevance
among 3D object with multiple view information.
Generally, although multiple views can represent rich infor-
mation of 3D objects, they also bring in redundant data, which
may cause much computational cost and even lead to false
results. Here we first select representative views for each
3D object, and only these representative views are employed
in the 3D object retrieval process.
Given
the
n
i
views
V
i
=
.
v
i1
,
v
i2
,
...,
v
in
i
.
of
O
i
,
we
con-
duct hierarchical agglomerative clustering (HAC) [46] to group
these views into view clusters. The HAC method is selected
here due to that it can guarantee the intracluster distance
between each pair of views cannot exceed a given threshold.
Here the widely employed Zernike moments [47] are used as
the view features, which are robust to image rotation, scaling
and translation and have been used in many 3D object retrieval
tasks [15], [20], [33], [48]. The 49-D Zernike moments are
extracted from each view of 3D objects. With the view
clustering results, one representative view is selected from
each view cluster. Here we let
V
i
=
.
v
i1
,
v
i2
, ...,
v
im
i
.
denote the m
i
representative views for O
i
. In our experiments,
m
i
mostly ranges from 5 to 20.
Hypergraph has been used in many multimedia information
retrieval tasks, such as image retrieval [49], [50]. Hypergraph
has shown its superior on high-order information representa-
tion. In our work, we propose to employ star expansion to

σ
H
=
σ
f
Fig. 3. An illustration of hyperedge construction. In this figure, there are
seven objects with representative views. Here one view from O
4
is selected
as the centra view, and its four closest views are located in the figure,
which are from O
1
, O
3
, O
6
and O
7
. Then the corresponding hyperedge
connects O
1
, O
3
, O
4
, O
6
and O
7
.
construct an object hypergraph with views to formulate the
B.
Model-Based Graph Generation
Given the model data of 3D objects, here we further explore
the model-based object relationship. Here the spatial structure
circular descriptor (SSCD) [21] is employed as the model
feature. SSCD aims to represent the depth information of
the model surface on the projection minimal bounding box
of the 3D model. The depth histogram is generated as the
feature for the 3D model. Following [21], the bipartite graph
matching is conducted to measure the distance between each
two 3D models, i.e., d
SSC D
(O
i
, O
j
).
Here, the relationship among 3D objects is formulated in
a simple object graph structure G = (V, E, W). Here each
vertex in G represents one 3D object, i.e., there are n vertices
in
G
. The weight of an edge
e
(
i
,
j
)
in
G
is calculated by
using the similarity between two corresponding 3D objects
O
i
and O
j
as
.
d
.
v , v
.
2
.
relationship among 3D objects. Here we denote the object
hypergraph as G
H
= (V
H
, E
H
, W
H
). For the n objects in
the dataset, there are n vertices in G
H
, where each vertex
W
.
v
i
,
v
j
.
=
exp
SSC D i j
2
s
,
(5)
represents one 3D object.
The hyperedges are generated as follows. We assume there
are totally n
r
representative views for all n objects. We first
calculate the Zernike moments-based distance between each
two views, and the top K closest views can be generated
for each representative view. For each representative view,
one hyperedge is constructed, which connects the objects with
views in the top K closest views. In our experiment, K is set
as 10. Figure 3 shows an example of hyperedge generation.
Generally, n
r
hyperedges can be generated for G
H
. The
weight of one hyperedge e
H
can be calculated by
where d
SSC D
.
v
i
,
v
j
.
is distance between O
i
and O
j
, and
σ
s
is
set as the median of all modal pair distances.
C.
Learning on the Joint Graphs
Now we have two types of formulation of relationship
among 3D objects, i.e., view-based and model-based. Here
these two formulations are jointly explored to estimate the
relevance among 3D objects.
In this part, first we introduce the learning framework when
the view-based and model-based information are regarded
w
(
e
)
1
.
K
exp
.
d
(v
x
, v
c
)
2
.
2
H
,
(1)
with equal weight, and then we propose a jointly learning
framework to learn the optimal combination weights for each
modality.
where v
c
is the centra view of the hyperedge, v
x
is one of the
top K closest view to v
c
, d (v
x
, v
c
) is the distance between
v
c
and v
x
, and σ
H
is empirically set as the median of all view
pair distances.
Given the object hypergraph G
H
= (V
H
, E
H
, W
H
), the
incidence matrix H can be generated by
.
1 if v
H
e
H
1)
The Initial Learning Framework: Here we start from
the learning framework which regards different modalities,
i.e., model and view, as equal. The 3D object retrieval task
can be formulated as the one-class classification work as
shown in [51]. The main objective is to learn the optimal
pairwise object relevance under both the graph and hypergraph
structure. Given the initial labeled data (the query object in
our case), an empirical loss term can be added as a constraint
h
(
v
H
,
e
H
)
=
(2)
0
if
v
H
/
e
H
for the learning process. The transductive inference can be
The vertex degree of v
H
can be defined as
formulated as a regularization as
ρ (v
H
)
=
.
e
H
E
H
ω (
e
H
)
h
(
v
H
,
e
H
).
(3)
arg min
{
K
V
(
f
)
+
K
M
(
f
)
+
μ
R
(
f
)
}
(6)
The edge degree of e
H
can be defined as
In this formulation, f is the to-be-learnt relevance vector,
ρ(
e
H
)
=
.
v
H
V
H
h
(
v
H
,
e
H
).
(4)
K
V
(
f
)
is the regularizer term on the view-based hypergraph
structure,
K
M
(
f
)
is the regularizer term on the model-based
The vertex degree matrix and the edge degree matrix can
be denoted by two diagonal matrices D
v
and D
e
.
In the constructed hypergraph, when two 3D objects share
more similar views, they can be connected by more hyperedges
with high weights, which can indicate the high correlation
among these 3D objects.
graph structure,
R
(
f
)
is the empirical loss. This objective
function aims to minimize the empirical loss and the regular-
izers on the model-based graph and the view-based hypergraph
simultaneously which can lead to the optimal relevance
vector f for retrieval. The two regularizers and the empirical
loss term are defined as follows.

α
"
"
f
=
The view-based hypergraph regularizer
K
V
(
f
)
is defined
as
2)
Learning the Combination Weights: We noted that the
view information and the model information may not share the
same impact on 3D object representation. In some scenarios,
1
. .
w
H
(
e
Hi
)
h
(
u
,
e
Hi
)
h
(v,
e
Hi
)
the view information may be more important, and in some
K
V
(
f
)
=
2
e
H u,v V
H
ρ (
e
Hi
)
other cases, the model data may play an important role. Under
.
f
(
u
)
f
(
v
)
.
2
× √
ρ (
u
)
− √
ρ (v)
such circumstances, we further learn the optimal weights
for the view information and the model data. In this part,
we introduce the learning framework embedding the combina-
1
. .
=
2
e
H u,v V
H
.
f
2
(
u
)
w
H
(
e
Hi
)
h
(
u
,
e
Hi
)
h
(v,
e
Hi
)
ρ (
e
Hi
)
f
(
u
)
f
(v)
.
tion weight learning. The objective for the learning process is
composed of three parts, i.e., the graph/hypergraph structure
regularizers, the empirical loss and the combination weight
×
ρ (
u
)
ρ (
u
) ρ (v)
=
f
T
(
I
©
V
)
f
,
(7)
regularizer.
Here we let α and β denote the combination weights for
view-based and model-based information respectively, where
1 1
α + β = 1. After addeing the l2 norm on the combination
where ©
H
is defined as ©
H
= D
2
HWD
1
H
T
D
2
. Here
weights, the objective function can be further revised as
v
e
v
we denote
6
H
=
I
©
H
,
K
V
(
f
)
can be written as
.
2 2
2
..
K
V
(
f
)
=
f
T
6
H
f
.
(8)
arg min
f,α,β
α
f
T
6
H
f
+
β
f
T
6
S
f
+
μ
"
f
y
" +
η
.
+ β ,
(14)
The model-based graph regularizer
K
M
(
f
)
is defined as
1
.
.
f
(
u
)
f
(
v
)
.
2
where α + β = 1.
The solotion for the above optimization task is provided as
follows. To solve the above objective function, we alternatively
K
M
(
f
)
=
2
u
,v
V
w (
e
i
)
d
(
u
)
d
(v)
optimize f and α/β. We first fix α and β, and optimize f. Now
the objective function changes to
.
.
f
2
(
u
)
f
(
u
)
f
(v)
.
=
u
,v
V
w (
e
i
)
d
(
u
)
− √
d
(
u
)
d
(v)
arg
min
.
α
f
T
6
H
f
f
+ β f
T
6
S
f
+
μ
"
f
y
2
.
(15)
=
f
T
(
I
©
S
)
f
,
(9)
where ©
S
= D
1/2
WD
1/2
. Here we denote 6
S
= I ©
S
,
According to Eq. (13), it can be solved by
.
1
.
1
K
M
(
f
)
can be written as
f = I +
λ
(α6
H
+ β6
S
) y. (16)
K
M
(
f
)
=
f
T
6
S
f
.
(10)
Then we optimize α/β with fixed f . Here we employ the
Lagrangian method, and the objective function changes to
The empirical loss term
R
(
f
)
is defined as
R
(
f
)
="
f
y
2
,
(11)
arg min
α,β
α
f
T
6
H
f
+
β
f
T
6
S
f
+
η
.
α
2
+
β
2
.
.
where y is the initial label vector. In the retrieval process, it
+ ξ + β 1) . (17)
is defined as an n × 1 vector, in which only the query is set
as 1 and all other components are set as 0.
Now the objective function can be rewritten as
2
.
Solving the above optimization problem, we can obtain
f
T
6
H
f + f
T
6
S
f
ξ =
2
η, (18)
1 f
T
6
H
f f
T
6
S
f
arg min
.
f
T
6
H
f
+
f
T
6
S
f
+
μ
"
f
y
"
f can be solved by
. (12)
and
α
2
1
4η
f
T
6
S
f f
T
6
H
f
(19)
.
1
.
1
β = . (20)
f
=
I
+
λ
(6
H
+
6
S
)
y
.
(13)
2 4η
f is the relevance of all the objects in the dataset with
respect to the query object. A large relevance value indicates
high similarity between the object and the query. The higher
the corresponding relevance value is, the more similar the
two objects are. With the generated object relevance f, all
the objects in the dataset can be sorted in a descending order
according to f.
The above alternative optimization can be processed under
the optimal f value is achieved, which can be used for the
3D object retrieval. With the learned combination weights, the
model-based and view-based data can be optimally explored
simultaneously and the relevance vector f can be obtained.
The main merit of the proposed method is that it jointly
explore the view information and the model data of 3D objects
in hypergraph/graph frameworks for 3D object retrieval.

Citations
More filters
Journal ArticleDOI

Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval

TL;DR: The proposed MCG provides the following benefits: 1) preserves the local and global attributes of a graph with the designed structure; 2) eliminates redundant and noisy information by strengthening inliers while suppressing outliers; and 3) avoids the difficulty of defining high-order attributes and solving hyper-graph matching.
Journal ArticleDOI

Local Bit-Plane Decoded Pattern: A Novel Feature Descriptor for Biomedical Image Retrieval

TL;DR: The experimental results confirm the discriminative ability and the efficiency of the proposed LBDP for biomedical image indexing and retrieval and prove the outperformance of existing biomedical image retrieval approaches.
Journal ArticleDOI

Exploring Deep Learning for View-Based 3D Model Retrieval

TL;DR: This work systematically evaluates the performance of deep learning features in view-based 3D model retrieval on four popular datasets (ETH, NTU60, PSB, and MVRED) by different kinds of similarity measure methods, and it is clear that theseDeep learning features can consistently outperform all of the hand-crafted features, and they are also more robust than the Handcrafted features when different degrees of noise are added into the image.
Journal ArticleDOI

Multi-view ensemble manifold regularization for 3D object recognition

TL;DR: A novel 3D object recognizing method based on multi-view data fusion, called Multi-view Ensemble Manifold Regularization (MEMR), which model image features with a regularization term for SVM and demonstrates the effectiveness of the proposed method.
Journal ArticleDOI

View-Based 3-D Model Retrieval: A Benchmark

TL;DR: By quantitatively analyzing the performances, it is discovered the graph matching-based method with deep features, especially the clique graph matching algorithm with convolutional neural networks features, can usually outperform the others.
References
More filters

A Comparison of Document Clustering Techniques

TL;DR: This paper compares the two main approaches to document clustering, agglomerative hierarchical clustering and K-means, and indicates that the bisecting K-MEans technique is better than the standard K-Means approach and as good or better as the hierarchical approaches that were tested for a variety of cluster evaluation metrics.
Journal ArticleDOI

Using spin images for efficient object recognition in cluttered 3D scenes

TL;DR: In this paper, a 3D shape-based object recognition system for simultaneous recognition of multiple objects in scenes containing clutter and occlusion is presented, which is based on matching surfaces by matching points using the spin image representation.
Proceedings ArticleDOI

Topology matching for fully automatic similarity estimation of 3D shapes

TL;DR: A novel technique is proposed, called Topology Matching, in which similarity between polyhedral models is quickly, accurately, and automatically calculated by comparing Multiresolutional Reeb Graphs (MRGs), which operates well as a search key for 3D shape data sets.
Journal ArticleDOI

Invariant image recognition by Zernike moments

TL;DR: A systematic reconstruction-based method for deciding the highest-order ZERNike moments required in a classification problem is developed and the superiority of Zernike moment features over regular moments and moment invariants was experimentally verified.
Journal ArticleDOI

Shape distributions

TL;DR: The dissimilarities between sampled distributions of simple shape functions provide a robust method for discriminating between classes of objects in a moderately sized database, despite the presence of arbitrary translations, rotations, scales, mirrors, tessellations, simplifications, and model degeneracies.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What are the contributions in "Learning view-model joint relevance for 3d object retrieval" ?

In this paper, the authors propose to jointly learn the view-model relevance among 3D objects for retrieval, in which the 3D objects are formulated in different graph structures. This is the first work to jointly explore the view-based and model-based relevance among the 3D objects in a graphbased framework. 

In transform-based method, transform coefficients are employed as the 3D shape descriptor, such as 3D Fourier [26], spherical trace transform [27], radialized extend function [28], and concrete radialized shperical projection [29]. 

In [41], a positive matching model and a negative matching model were used to measure the relevance between a target object and the query. 

To represent 3D objects, low-levelfeatures, such as volumetric descriptor [11] and surface geometry [8], [12], and high-level features, such as the method in [14] were employed in previous works. 

(4) KV ( f ) is the regularizer term on the view-based hypergraphstructure, KM ( f ) is the regularizer term on the model-basedThe vertex degree matrix and the edge degree matrix canbe denoted by two diagonal matrices Dv and De. 

In [42], curvature scale space was employed as the view descriptor, which was further combined with Zernike Moments to measure the distance between two 3D models. 

According to [22], 3D shape descriptors can be divided into four categories, i.e., histogram-based method [9], [23]–[25], transformbased method [26]–[29], graph-based method [30]–[32] and view-based method [21], [33], [34]. 

In their experiments, to evaluate the performance of the proposed method, three datasets are employed, i.e., National Taiwan University 3D Model database (NTU) [33], Princeton Shape Benchmark (PSB) [19] and Shape Retrieval Content 2009 (SHREC) [2] 

With the learned combination weights, the model-based and view-based data can be optimally explored simultaneously and the relevance vector f can be obtained. 

In the camera constraint free method (CCFV) [41], a set of representative views are selected from the originally captured multiple views via view clustering and a probabilistic matching method is then employed to calculate the similarity between each two 3D objects. 

The comparison between 3D objects is formulated as a probabilistic approach to measure the posterior probability of the target object given the query. 

In HL, the relevanceamong 3D objects is formulated in a hypergraph structure, where the hyperedges are generated using the view clustering results. 

This objective function aims to minimize the empirical loss and the regularizers on the model-based graph and the view-based hypergraph simultaneously which can lead to the optimal relevance vector f for retrieval. 

With user feedback, these low level features are mapped to high level semantic space, which is another Euclidean space and can be regarded as a dimension reduction or feature selection method. 

Given the view information of 3D objects, the proposed method first constructs a hypergraph to formulate the relationship among 3D objects with the view connections.