scispace - formally typeset
Open AccessProceedings ArticleDOI

DPF - a perceptual distance function for image retrieval

Reads0
Chats0
TLDR
This work reports the discovery of a perceptual distance function through mining a large set of visual data and calls the discovered function dynamic the partial distance function (DPF), which performs significantly better in finding similar images.
Abstract
For almost a decade, content-based image retrieval has been an active research area, yet one fundamental problem remains largely unsolved: how to measure perceptual similarity. To measure perceptual similarity, most researchers employ the Minkowski-type metric. Our extensive data-mining experiments on visual data show that, unfortunately, the Minkowski metric is not very effective in modeling perceptual similarity. Our experiments also show that the traditional "static" feature weighting approaches are not sufficient for retrieving various similar images. We report our discovery of a perceptual distance function through mining a large set of visual data. We call the discovered function dynamic the partial distance function (DPF). When we empirically compare the DPF to Minkowski-type distance functions, the DPF performs significantly better in finding similar images. The effectiveness of the DPF can be well explained by similarity theories in cognitive psychology.

read more

Content maybe subject to copyright    Report

DPF A Perceptual Distance Function for Image Retrieval
Beitao Li, Edward Chang, Ching-Tung Wu
Electrical
&
Computer Engineering, U.C. Santa Barbara
beitao@engineering.ucsb.edu, echang@ece.ucsb.edu
Abstract
For almost a decade, Content-Based Image Retrieval has
been an active research area, yet one fundamental prob-
lem remains largely unsolved: how to measure percep-
tual similarity. To measure perceptual similarity, most re-
searchers employ the Minkowski-type metric. Our exten-
sive data-mining experiments on visual data show that, un-
fortunately, the Minkowski metric is not very effective in
modeling perceptual similarity. Our experiments also show
that the traditional “static” feature weighting approaches
are not sufficient for retrieving various similar images. In
this paper, we report our discovery of a perceptual distance
function through mining a large set of visual data. We call
the discovered function dynamic partial distance function
(DPF). When we empirically compare DPF to Minkowski-
type distance functions, DPF performs significantly better
in finding similar images. The effectiveness of DPF can be
well explained by similarity theories in cognitive psychol-
ogy.
Keywords: content-based image retrieval, data mining, per-
ceptual distance function, similarity search.
1 Introduction
Research in content-based image retrieval has steadily gained
momentum in recent years as a result of the dramatic increase
in the volume of digital images. To achieve effective retrieval,
an image system must be able to accurately characterize and
quantify perceptual similarity. However, a fundamental chal-
lenge how to measure perceptual similarity remains
largely unanswered. Various distance functions, such as the
Minkowski metric, earth mover distance [5], and fuzzy logic,
have been used to measure similarity between feature vectors
representingimages. Unfortunately, our experimentsshowthat
they frequently overlook obviously similar images and hence
are not adequate for measuring perceptual similarity.
Quantifying perceptual similarity is a difficult problem. In-
deed, we may be decades away from fully understanding how
human perception works. In this project, we mine visual
data extensivelyto reverse-engineer a good perceptual distance
function for measuring image similarity. Our mining hypothe-
sis is this: Suppose most of the similar images can be clustered
in a feature space. We can then claim with high confidence that
1) the feature space can adequately capture visual perception,
and 2) the distance function used for clustering images in that
feature space can accurately model perceptual similarity.
We perform our mining operation in two stages. In the first
stage, we isolate the distance function factor (we use the Eu-
clidean distance) to find a reasonable feature set. In the second
stage, we freeze the features to discover a perceptual distance
function that can better cluster similar images in the feature
space. In other words, our goal is to find a function that can
keep similar images close together in the feature space, and at
the same time, keep dissimilar images away. We call the dis-
covered function dynamic partial distance function (DPF). We
empirically compare DPF to Minkowski-type distance func-
tions and show that DPF performs remarkably better.
Briefly, the contributions of this paper are as follows:
We construct a mining dataset to find a feature set that can
adequately represent images. In that feature space, we find
distinct patterns of similar and dissimilar images, which lead
to the discovery of DPF.
Through empirical study, we demonstrate that DPF is very
effective in finding images that have been transformed by ro-
tation, scaling, downsampling, and cropping, as well as im-
ages that are perceptually similar to the query image (e.g.,
images belonging to the same video shot). Our testbed shows
that DPF outperforms Minkowski-type functions by
25
per-
centiles in recall.
2 Discovering DPF
To ensure that sound inferences can be drawn from our min-
ing results, we carefully construct the training dataset. First,
we prepare for a dataset that is comprehensive enough to cover
a diversified set of images. To achieve this goal, we collect
60
;
000
JPEG images from Corel CDs and from the Internet.
Second, we define “similarity” in a slightly restrictive way so
that individuals’ subjectivity can be safely excluded. (We ad-
dress the problem of learning subjective perception in [1, 6].)
For each image in the
60
;
000
-image set, we perform
24
trans-
formations including scaling, downsamping, cropping, rota-
tion, and format transformation. (Details of these transforma-
tions are explained in the extended version of this paper [4].)
The total number of images in the testbed is
1
:
5
million.
Our experimental results (see Section 3) show that the per-
ceptual distance function discovered during the mining process
on this training dataset, which has a slightly restrictive defini-
tion of similarity, can be used effectively to find other percep-
tually similar images. In other words, our testbed consists of
a reasonable representation of similar images, and the mining
results (i.e., training results) can be generalized to testing data
consisting of perceptually similar images produced by other

1.0E-06
1.0E-05
1.0E-04
1.0E-03
1.0E-02
1.0E-01
0
0.06
0.12
0.17
0.23
0.29
0.34
0.4
0.46
0.51
0.57
0.63
0.69
0.74
0.8
0.86
0.91
0.97
Feature Distance
Frequency
1.0E-06
1.0E-05
1.0E-04
1.0E-03
1.0E-02
1.0E-01
0
0.06
0.12
0.17
0.23
0.29
0.34
0.4
0.46
0.51
0.57
0.63
0.69
0.74
0.8
0.86
0.91
0.97
Feature Distance
Frequency
(a) Similar Images (b) Dissimilar Images
Figure 1: The Distributions of Feature Distances.
methods (e.g., changing camera parameters).
From each image, we extract
144
features including color,
texture, and shape as its representation. We discuss what these
features are, and why they are chosen in [4]. In the remainder
of this section, we focus on examining the Minkowski metric
and its family. We explain why these functions are ineffective
for measuring image similarity, and present our DPF solution.
2.1 Minkowski Metric and Its Limitations
The Minkowski metric is widely used for measuring similarity
between objects (e.g., images). Suppose two objects
X
and
Y
are represented by two
p
dimensional vectors
(
x
1
;x
2
;

;x
p
)
and
(
y
1
;y
2
;

;y
p
)
, respectively. The Minkowski metric
d
(
X; Y
)
is defined as
d
(
X; Y
)=(
p
X
i
=1
j
x
i
y
i
j
r
)
1
r
;
(1)
where
r
is the Minkowski factor for the norm. Particularly,
when
r
is set as
2
, it is the well known Euclidean distance;
when
r
is
1
, it is the Manhattan distance (or
L
1
distance). An
object located a smaller distance from a query object is deemed
more similar to the query object. Measuring similarity by the
Minkowski metric is based on one assumption: the similar ob-
jects should be close to the query object in all dimensions.
A variant of the Minkowski function, the weighted
Minkowski distance function, has also been applied to mea-
sure image similarity. The basic idea is to introduce weight-
ing to identify important features. By assigning each feature a
weighting coefficient
w
i
(
i
=1

p
), the weighted Minkowski
distance function is defined as
d
w
(
X; Y
)=(
p
X
i
=1
w
i
j
x
i
y
i
j
r
)
1
r
:
(2)
By applying a static weighting vector for measuring simi-
larity, the weighted Minkowski distance function assumes that
similar images resemble the query image(s) in the same fea-
tures. For example, the weighted Minkowski function implic-
itly assumes that the important features for finding a scaled im-
age are the same as the importantfeatures for finding a cropped
image.
We can summarize the assumptions of the Minkowski met-
ric as follows:
Minkowski function: All similar images must be similar in
all features.
Weighted Minkowski function: All similar images are simi-
lar in the same way (e.g., in the same set of features) [7].
We questioned the above assumptions upon observing how
similar objects are located in the feature space. For this pur-
pose, we carried out extensive data mining work on the
1
:
5
M-
image dataset. To better discuss our findings, we introduce a
term we have found useful in our data mining work. We define
the feature distance on the
i
th
feature as
Æ
i
=
j
x
i
y
i
j
;i
=
1
;

;p
.
In our mining work, we first tallied the feature distances be-
tween similar images (denoted as
Æ
+
), and also those between
dissimilar images (denoted as
Æ
). Since we normalized fea-
ture values to be between zero and one, the range of both
Æ
+
and
Æ
are between zero and one. Figure 1 presents the dis-
tributions of
Æ
+
and
Æ
.The
x
-axis shows the possible value
of
Æ
, from zero to one, The
y
-axis (in logarithmic scale) shows
the percentage of the features at different
Æ
values.
The figure shows that
Æ
+
and
Æ
have different distribution
patterns. The distribution of
Æ
+
is much skewed toward small
values (Figure 1(a)), whereas the distribution of
Æ
is more
evenly distributed (Figure 1(b)). We can also see from Fig-
ure 1(a) that a moderate portion of
Æ
+
is in the high value range
(
0
:
5
), which indicates that similar images may be quite dis-
similar in many features. This observation suggests that the
assumption of the Minkowski metric is inaccurate. Similar im-
ages are not necessarily similar in all features.
Next, we examined whether similar images resemble the
query images in the same way. We tallied the feature distance
(
Æ
+
)ofthe
144
features for different kinds of image transfor-
mations. Figure 2 presents four representativetransformations:
GIF, cropped, rotated, and scaled. The
x
-axis of the figure de-
picts the feature numbers, from
1
to
144
.Therst
108
features
are various color features, and the last
36
are texture features.
The figure shows that various similar images can resemble the
query images in very different ways. GIF images have larger
Æ
+
in color features (the first
108
features) than in texture fea-
tures (the last
36
features). In contrast, cropped images have
larger
Æ
+
in texture features than in color features. For rotated
images, the
Æ
+
in colors comes close to zero, although its tex-
ture feature distance is much greater. A similar pattern appears
in the scaled and the rotated images. However, the magnitude
of the
Æ
+
of scaled images is very different from that of rotated
images.
We summarize our observations as follows:
Similar feature distance is distributed differently from dis-
similar feature distance. Similar feature distance skews to-
ward small values, while dissimilar feature distance shows
more even distribution.
Similar images do not resemble the query images in all fea-

0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
Feature Number
Average Distance
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
Feature Number
Average Distance
(a) GIF Images (b) Cropped Images
0
0.02
0.04
0.06
0.08
0.1
0.12
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
Feature Number
Average Distance
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
1
10
19
28
37
46
55
64
73
82
91
100
109
118
127
136
Feature Number
Average Distance
(c) Rotated Images (d) Scaled Images
Figure 2: The Average Feature Distances.
tures.
Images similar to the query images can be similar in differ-
ing features. For example, some images resemble the query
image in texture, others in color.
The above observations not only refute the assumptions of
Minkowski-type distance functions, but also provide hints as
to how a good distance function would work. The first point is
that a distance function does not need to consider all features
equally, since similar images may match only some features
of the query images. The second point is that a distance func-
tion should weight features dynamically, since various similar
images may resemble the query image in differing ways. Tra-
ditional relevance feedback methods [3] learn a set “optimal”
feature weights for a query. For instance, if the user is more
interested in color than in texture, color features are weighted
higher when similarity is computed. What we have discovered
here is that this “static” weighting is insufficient. An effective
distance function must weight features differently when com-
paring the query image to different images. These points lead
to the design of the dynamic partial distance function.
2.2 Dynamic Partial Distance Function
Based on the observations explained above, we designed a dis-
tance function to better represent the perceptual similarity. Let
Æ
i
=
j
x
i
y
i
j
,for
i
=1
;

;p
.Werstdenesets
m
as
m
=
f
The smal l est
0
s of
(
Æ
1
; :::; Æ
p
)
g
:
Then we define the Dynamic Partial Distance Function
(DPF) as
d
(
m; r
)=(
X
Æ
i
2
m
Æ
i
r
)
1
r
:
(3)
DPF has two adjustable parameters:
m
and
r
. Parame-
ter
m
can range from
1
to
p
.When
m
=
p
, it degenerates
to the Minkowski metric. When
m < p
, it counts only the
smallest
m
feature distances between two objects, and the in-
fluence of the
(
p
m
)
largest feature distances is eliminated.
DPF dynamically selects features to be considered for different
pairs of objects. This is achieved by the introduction of
m
,
which changes dynamically for different pairs of objects. In
Section 3, we will show that DPF makes similar images ag-
gregate more compactly and locate closer to the query images,
simultaneously keeping the dissimilar images away from the
query images. In other words, similar and dissimilar images
are better separated by DPF.
3 Empirical Study
Our empirical study consists of two parts: training and testing.
In the training part, we used the same
1
:
5
M-image dataset to
predict the optimal
m
value. In the testing part, we used a
50
K-image dataset to examine the effectiveness of DPF.
3.1 Predicting
m
Through Training
We used the
60
;
000
original images to perform queries. Ap-
plying DPF of different
m
values to the
1
:
5
M-image dataset,
we tallied the distances from these
60
;
000
queries to their sim-
ilar images, and their dissimilar images, respectively. We then
computed the average and the standard deviation of these dis-
tances. We denote the average distance of the similar images to
their queries as
+
d
, of the dissimilar images as
d
. We denote
the standard deviation of the similar images’ distances as
+
d
,
of the dissimilar images as
d
.
Figure 3 depicts the effect of
m
(in the
x
-axis) on
+
d
,
d
,
+
d
,and
d
. Figure 3(a) shows that as
m
becomes smaller,
both
+
d
and
d
decrease. The average distance of similar
images (
+
d
), however, decreases at a faster pace than that of
dissimilar images (
d
). For instance, when we decrease
m
from
144
to
130
,
+
d
decreases from
1
:
0
to about
0
:
3
,a
70%

0
0.5
1
1.5
2
2.5
3
3.5
144
134
124
114
104
94
84
74
64
54
44
34
24
14
4
m
Distance to Querys
Similar Images
Dissimilar Images
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
144
134
124
114
104
94
84
74
64
54
44
34
24
14
4
m
Standard Deviation
Similar Images
Dissimilar Images
(a) Average of Distances (b) Standard Deviation of Distances
Figure 3: The Effect of DPF.
0
0.2
0.4
0.6
0.8
1
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Recall
Precision
m=144 m=134
m=124 m=114
m=104 m=94
m=84
0.5
0.6
0.7
0.8
0.9
30 60 90 120 150 180 210 240
Top k
Recall
m=144 m=134
m=124 m=114
m=104 m=94
m=84
(a) Precision vs. Recall (b) Recall vs.
k
Figure 4: Search Performance of Different
m
at
r
=3
.
decrease, whereas
d
decreases from
3
:
2
to about
2
:
0
,a
38%
decrease. This gap indicates
+
d
is more sensitive to the
m
value than
d
. Figure 3(b) shows that the standard deviations
+
d
and
d
observe the same trend as the average distances.
When
m
decreases, similar images become more compact in
the feature space at a faster pace than dissimilar images do.
Our training result indicates that when
m
is set as
114
, similar
images are best clustered.
3.2 Testing New Distance Functions
The test dataset consists of
100
similar-image sets, each set
is composed of
30
images. Of these
30
images, we have the
original image,
24
transformed images (using the same trans-
formation methods described in Section 2), and five images
that are visually identified as similar. We then added
50
Kran-
domly crawled Web images to these
100
30
images to form
our testset.
We conducted
100
queries using the
100
original images.
For each query, we recorded the ranks of its similar images.
We experimented with
m
values from
84
to
144
, with
r
fixed
at three. Figure 4 depicts the experimental results.
The precision-recall curves of selected
m
values are plotted
in Figure 4(a). The peak search performance is achieved when
m
= 114
, and it does significantly better than the Minkowski
distance (
m
= 144
). Figure 4(b) plots the recall at selected
m
values for top-
k
retrievals. As we decrease the value of
m
from
144
, the recall improves steadily until
m
reaches
114
,where
the peak performance is achieved. Our DPF outperforms the
Minkowski distance function by
25
percentiles in recall.
Because of space limitation, we present extensive exper-
imental results and our comparison between the weighted
version of DPF and the weighted Minkowski metric in [4].
DPF consistently outperforms Minkowski-type function sig-
nificantly for finding similar images.
4Conclusion
In this work we tackled one fundamental problem in im-
age retrieval—how to measure perceptual similarity between
images—using data mining techniques. We discovered the dy-
namic partial distance function (DPF) through mining a large
set of visual data, and showed that DPF outperformed the tra-
ditional functions by significant margins. The effectivenese of
DPF can be explained by similarity theories in cognitive psy-
chology [2, 4].
References
[1] E. Chang and B. Li. Mega the maximizing ex-
pected generalization algorithm for learning complex query
concepts (extended version). Technical Report http://www-
db.stanford.edu/
echang/mega.ps, November 2000.
[2] R. L. Goldstone. Similarity, interactive activation, and mapping.
Journal of Experimental Psychology: Learning, Memory, and
Cognition, 20:3–28, 1994.
[3] Y. Ishikawa, R. Subramanya, and C. Faloutsos. Mindreader:
Querying databases through multiple examples. VLDB, 1998.
[4] B. Li, E. Chang, and Y. Wu. Dynamic partial function a
perceptual distance function for measuring similarity (ext. ver-
sion). http://www-db.stanford.edu/
echang/dpf-ext.pdf,Febru-
ary 2002.
[5] Y. Rubner, C. Tomasi, and L. Guibas. Adaptive color-image em-
bedding for database navigation. Proceedings of the the Asian
Conference on Computer Vision, January 1998.
[6] S. Tong and E. Chang. Support vector machine active learning for
image retrieval. Proceedings of ACM International Conference
on Multimedia, pages 107–118, October 2001.
[7] X. S. Zhou and T. S. Huang. Comparing discriminating transfor-
mations and svm for learning during multimedia retrieval. Proc.
of ACM Conf. on Multimedia, pages 137–146, 2001.
Citations
More filters
Journal ArticleDOI

A survey of content-based image retrieval with high-level semantics

TL;DR: This paper attempts to provide a comprehensive survey of the recent technical achievements in high-level semantic-based image retrieval, identifying five major categories of the state-of-the-art techniques in narrowing down the 'semantic gap'.
Proceedings ArticleDOI

Manifold-ranking based image retrieval

TL;DR: MRBIR first makes use of a manifold ranking algorithm to explore the relationship among all the data points in the feature space, and then measures relevance between the query and all the images in the database accordingly, which is different from traditional similarity metrics based on pair-wise distance.
Proceedings ArticleDOI

Efficient manifold ranking for image retrieval

TL;DR: The original manifold ranking algorithm is extended and a new framework named Efficient Manifold Ranking (EMR) is proposed, which significantly reduces the computational time and makes it a promising method to large scale real world retrieval problems.
Journal ArticleDOI

Generalized Manifold-Ranking-Based Image Retrieval

TL;DR: A general transductive learning framework named generalized manifold-ranking-based image retrieval (gMRBIR) for image retrieval that makes use of relevance feedback and active learning to refine the retrieval result so that it converges to the query concept as fast as possible.
Proceedings ArticleDOI

Human activity recognition for video surveillance

TL;DR: A confident-frame-based Recognizing algorithm (CFR) is proposed to recognize the human activity, where the video frames which have high confidence for recognition an activity (confident-frames) are used as a specialized model for classifying the rest of thevideo frames.
References
More filters
Proceedings ArticleDOI

Support vector machine active learning for image retrieval

TL;DR: This work proposes the use of a support vector machine active learning algorithm for conducting effective relevance feedback for image retrieval and achieves significantly higher search accuracy than traditional query refinement schemes after just three to four rounds of relevance feedback.
Proceedings Article

MindReader: Querying Databases Through Multiple Examples

TL;DR: The goal is to provide a user-friendly, but theoretically solid method, to handle ‘diagonal’ queries, and to propose a novel method to “guess” which attributes are important, which correlations areImportant, and with what weight.
Journal ArticleDOI

Similarity, interactive activation, and mapping

TL;DR: In this paper, the authors propose a similarity frequentity model, which assumes an interactive activation process whereby correspondences between the parts of compared things mutually and concurrently influence each other. And they show that matching and mismatching features influence similarity more if they belong to parts that are placed in correspondence.
Proceedings ArticleDOI

Comparing discriminating transformations and SVM for learning during multimedia retrieval

TL;DR: It is argued that relevance feedback problem is best represented as a biased classification problem, or a (1+x-class classification problem), and Biased Discriminant Transform (BDT) is shown to outperform all the others.
Patent

Maximizing expected generalization for learning complex query concepts

TL;DR: In this article, a method of learning a user query concept is provided which includes a sample selection stage and a feature reduction stage, during which the selected sample objects include feature sets that are no more than a prescribed amount different from a corresponding feature set defined by the k-CNF.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Dpf — a perceptual distance function for image retrieval" ?

Their experiments also show that the traditional “ static ” feature weighting approaches are not sufficient for retrieving various similar images. In this paper, the authors report their discovery of a perceptual distance function through mining a large set of visual data. When the authors empirically compare DPF to Minkowskitype distance functions, DPF performs significantly better in finding similar images. 

For each image in the 60; 000-image set, the authors perform 24 transformations including scaling, downsamping, cropping, rotation, and format transformation. 

Various distance functions, such as the Minkowski metric, earth mover distance [5], and fuzzy logic, have been used to measure similarity between feature vectors representing images. 

In this project, the authors mine visual data extensively to reverse-engineer a good perceptual distance function for measuring image similarity. 

A variant of the Minkowski function, the weighted Minkowski distance function, has also been applied to measure image similarity. 

Through empirical study, the authors demonstrate that DPF is very effective in finding images that have been transformed by rotation, scaling, downsampling, and cropping, as well as images that are perceptually similar to the query image (e.g., images belonging to the same video shot). 

When m < p, it counts only the smallest m feature distances between two objects, and the influence of the (p m) largest feature distances is eliminated. 

By assigning each feature a weighting coefficientwi (i = 1 p), the weighted Minkowski distance function is defined asdw(X;Y ) = (pXi=1wijxi yij r)1 r : (2)By applying a static weighting vector for measuring similarity, the weighted Minkowski distance function assumes that similar images resemble the query image(s) in the same features. 

The authors discovered the dynamic partial distance function (DPF) through mining a large set of visual data, and showed that DPF outperformed the traditional functions by significant margins. 

In the second stage, the authors freeze the features to discover a perceptual distance function that can better cluster similar images in the feature space. 

Research in content-based image retrieval has steadily gained momentum in recent years as a result of the dramatic increase in the volume of digital images. 

In other words, their goal is to find a function that can keep similar images close together in the feature space, and at the same time, keep dissimilar images away. 

when r is set as 2, it is the well known Euclidean distance; when r is 1, it is the Manhattan distance (or L1 distance). 

In Section 3, the authors will show that DPF makes similar images aggregate more compactly and locate closer to the query images, simultaneously keeping the dissimilar images away from the query images.