What contributions have the authors mentioned in the paper "Dpf — a perceptual distance function for image retrieval" ?

Their experiments also show that the traditional “ static ” feature weighting approaches are not sufficient for retrieving various similar images. In this paper, the authors report their discovery of a perceptual distance function through mining a large set of visual data. When the authors empirically compare DPF to Minkowskitype distance functions, DPF performs significantly better in finding similar images.

How many transformations are performed for each image in the 60; 000-image set?

For each image in the 60; 000-image set, the authors perform 24 transformations including scaling, downsamping, cropping, rotation, and format transformation.

What is the effect of m on the distances between two objects?

When m < p, it counts only the smallest m feature distances between two objects, and the influence of the (p m) largest feature distances is eliminated.

What is the distance between the two objects?

when r is set as 2, it is the well known Euclidean distance; when r is 1, it is the Manhattan distance (or L1 distance).

(Open Access) DPF - a perceptual distance function for image retrieval (2002) | Baitao Li

Q: What are some of the common distance functions used to measure similarity between image vectors?

Various distance functions, such as the Minkowski metric, earth mover distance [5], and fuzzy logic, have been used to measure similarity between feature vectors representing images.

Q: What is the purpose of this project?

In this project, the authors mine visual data extensively to reverse-engineer a good perceptual distance function for measuring image similarity.

Q: What is the metric used to measure image similarity?

A variant of the Minkowski function, the weighted Minkowski distance function, has also been applied to measure image similarity.

Q: What is the purpose of this paper?

Through empirical study, the authors demonstrate that DPF is very effective in finding images that have been transformed by rotation, scaling, downsampling, and cropping, as well as images that are perceptually similar to the query image (e.g., images belonging to the same video shot).

Q: What is the effect of m on the distances between two objects?

When m < p, it counts only the smallest m feature distances between two objects, and the influence of the (p m) largest feature distances is eliminated.

Q: How is the weighted Minkowski distance function defined?

By assigning each feature a weighting coefficientwi (i = 1 p), the weighted Minkowski distance function is defined asdw(X;Y ) = (pXi=1wijxi yij r)1 r : (2)By applying a static weighting vector for measuring similarity, the weighted Minkowski distance function assumes that similar images resemble the query image(s) in the same features.

Q: How does DPF perform in the visual data mining?

The authors discovered the dynamic partial distance function (DPF) through mining a large set of visual data, and showed that DPF outperformed the traditional functions by significant margins.

Q: What is the first stage of the project?

In the second stage, the authors freeze the features to discover a perceptual distance function that can better cluster similar images in the feature space.

DPF — A Perceptual Distance Function for Image Retrieval

Beitao Li, Edward Chang, Ching-Tung Wu

Electrical

Computer Engineering, U.C. Santa Barbara

beitao@engineering.ucsb.edu, echang@ece.ucsb.edu

Abstract

For almost a decade, Content-Based Image Retrieval has

been an active research area, yet one fundamental prob-

lem remains largely unsolved: how to measure percep-

tual similarity. To measure perceptual similarity, most re-

searchers employ the Minkowski-type metric. Our exten-

sive data-mining experiments on visual data show that, un-

fortunately, the Minkowski metric is not very effective in

modeling perceptual similarity. Our experiments also show

that the traditional “static” feature weighting approaches

are not sufﬁcient for retrieving various similar images. In

this paper, we report our discovery of a perceptual distance

function through mining a large set of visual data. We call

the discovered function dynamic partial distance function

(DPF). When we empirically compare DPF to Minkowski-

type distance functions, DPF performs signiﬁcantly better

in ﬁnding similar images. The effectiveness of DPF can be

well explained by similarity theories in cognitive psychol-

ogy.

Keywords: content-based image retrieval, data mining, per-

ceptual distance function, similarity search.

1 Introduction

Research in content-based image retrieval has steadily gained

momentum in recent years as a result of the dramatic increase

in the volume of digital images. To achieve effective retrieval,

an image system must be able to accurately characterize and

quantify perceptual similarity. However, a fundamental chal-

lenge — how to measure perceptual similarity — remains

largely unanswered. Various distance functions, such as the

Minkowski metric, earth mover distance [5], and fuzzy logic,

have been used to measure similarity between feature vectors

representingimages. Unfortunately, our experimentsshowthat

they frequently overlook obviously similar images and hence

are not adequate for measuring perceptual similarity.

Quantifying perceptual similarity is a difﬁcult problem. In-

deed, we may be decades away from fully understanding how

human perception works. In this project, we mine visual

data extensivelyto reverse-engineer a good perceptual distance

function for measuring image similarity. Our mining hypothe-

sis is this: Suppose most of the similar images can be clustered

in a feature space. We can then claim with high conﬁdence that

1) the feature space can adequately capture visual perception,

and 2) the distance function used for clustering images in that

feature space can accurately model perceptual similarity.

We perform our mining operation in two stages. In the ﬁrst

stage, we isolate the distance function factor (we use the Eu-

clidean distance) to ﬁnd a reasonable feature set. In the second

stage, we freeze the features to discover a perceptual distance

function that can better cluster similar images in the feature

space. In other words, our goal is to ﬁnd a function that can

keep similar images close together in the feature space, and at

the same time, keep dissimilar images away. We call the dis-

covered function dynamic partial distance function (DPF). We

empirically compare DPF to Minkowski-type distance func-

tions and show that DPF performs remarkably better.

Brieﬂy, the contributions of this paper are as follows:



We construct a mining dataset to ﬁnd a feature set that can

adequately represent images. In that feature space, we ﬁnd

distinct patterns of similar and dissimilar images, which lead

to the discovery of DPF.



Through empirical study, we demonstrate that DPF is very

effective in ﬁnding images that have been transformed by ro-

tation, scaling, downsampling, and cropping, as well as im-

ages that are perceptually similar to the query image (e.g.,

images belonging to the same video shot). Our testbed shows

that DPF outperforms Minkowski-type functions by

per-

centiles in recall.

2 Discovering DPF

To ensure that sound inferences can be drawn from our min-

ing results, we carefully construct the training dataset. First,

we prepare for a dataset that is comprehensive enough to cover

a diversiﬁed set of images. To achieve this goal, we collect

;

000

JPEG images from Corel CDs and from the Internet.

Second, we deﬁne “similarity” in a slightly restrictive way so

that individuals’ subjectivity can be safely excluded. (We ad-

dress the problem of learning subjective perception in [1, 6].)

For each image in the

;

000

-image set, we perform

trans-

formations including scaling, downsamping, cropping, rota-

tion, and format transformation. (Details of these transforma-

tions are explained in the extended version of this paper [4].)

The total number of images in the testbed is

million.

Our experimental results (see Section 3) show that the per-

ceptual distance function discovered during the mining process

on this training dataset, which has a slightly restrictive deﬁni-

tion of similarity, can be used effectively to ﬁnd other percep-

tually similar images. In other words, our testbed consists of

a reasonable representation of similar images, and the mining

results (i.e., training results) can be generalized to testing data

consisting of perceptually similar images produced by other

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

0.06

0.12

0.17

0.23

0.29

0.34

0.4

0.46

0.51

0.57

0.63

0.69

0.74

0.8

0.86

0.91

0.97

Feature Distance

Frequency

1.0E-06

1.0E-05

1.0E-04

1.0E-03

1.0E-02

1.0E-01

0.06

0.12

0.17

0.23

0.29

0.34

0.4

0.46

0.51

0.57

0.63

0.69

0.74

0.8

0.86

0.91

0.97

Feature Distance

Frequency

(a) Similar Images (b) Dissimilar Images

Figure 1: The Distributions of Feature Distances.

methods (e.g., changing camera parameters).

From each image, we extract

144

features including color,

texture, and shape as its representation. We discuss what these

features are, and why they are chosen in [4]. In the remainder

of this section, we focus on examining the Minkowski metric

and its family. We explain why these functions are ineffective

for measuring image similarity, and present our DPF solution.

2.1 Minkowski Metric and Its Limitations

The Minkowski metric is widely used for measuring similarity

between objects (e.g., images). Suppose two objects

and

are represented by two

dimensional vectors

(

;



)

and

(

;



)

, respectively. The Minkowski metric

(

X; Y

)

is deﬁned as

(

X; Y

)=(



)

;

(1)

where

is the Minkowski factor for the norm. Particularly,

when

is set as

, it is the well known Euclidean distance;

when

, it is the Manhattan distance (or

distance). An

object located a smaller distance from a query object is deemed

more similar to the query object. Measuring similarity by the

Minkowski metric is based on one assumption: the similar ob-

jects should be close to the query object in all dimensions.

A variant of the Minkowski function, the weighted

Minkowski distance function, has also been applied to mea-

sure image similarity. The basic idea is to introduce weight-

ing to identify important features. By assigning each feature a

weighting coefﬁcient

(



), the weighted Minkowski

distance function is deﬁned as

(

X; Y

)=(



)

(2)

By applying a static weighting vector for measuring simi-

larity, the weighted Minkowski distance function assumes that

similar images resemble the query image(s) in the same fea-

tures. For example, the weighted Minkowski function implic-

itly assumes that the important features for ﬁnding a scaled im-

age are the same as the importantfeatures for ﬁnding a cropped

image.

We can summarize the assumptions of the Minkowski met-

ric as follows:



Minkowski function: All similar images must be similar in

all features.



Weighted Minkowski function: All similar images are simi-

lar in the same way (e.g., in the same set of features) [7].

We questioned the above assumptions upon observing how

similar objects are located in the feature space. For this pur-

pose, we carried out extensive data mining work on the

M-

image dataset. To better discuss our ﬁndings, we introduce a

term we have found useful in our data mining work. We deﬁne

the feature distance on the

feature as



;



In our mining work, we ﬁrst tallied the feature distances be-

tween similar images (denoted as

), and also those between

dissimilar images (denoted as



). Since we normalized fea-

ture values to be between zero and one, the range of both

and



are between zero and one. Figure 1 presents the dis-

tributions of

and



.The

-axis shows the possible value

, from zero to one, The

-axis (in logarithmic scale) shows

the percentage of the features at different

values.

The ﬁgure shows that

and



have different distribution

patterns. The distribution of

is much skewed toward small

values (Figure 1(a)), whereas the distribution of



is more

evenly distributed (Figure 1(b)). We can also see from Fig-

ure 1(a) that a moderate portion of

is in the high value range

(



), which indicates that similar images may be quite dis-

similar in many features. This observation suggests that the

assumption of the Minkowski metric is inaccurate. Similar im-

ages are not necessarily similar in all features.

Next, we examined whether similar images resemble the

query images in the same way. We tallied the feature distance

(

)ofthe

144

features for different kinds of image transfor-

mations. Figure 2 presents four representativetransformations:

GIF, cropped, rotated, and scaled. The

-axis of the ﬁgure de-

picts the feature numbers, from

144

.Theﬁrst

108

features

are various color features, and the last

are texture features.

The ﬁgure shows that various similar images can resemble the

query images in very different ways. GIF images have larger

in color features (the ﬁrst

108

features) than in texture fea-

tures (the last

features). In contrast, cropped images have

larger

in texture features than in color features. For rotated

images, the

in colors comes close to zero, although its tex-

ture feature distance is much greater. A similar pattern appears

in the scaled and the rotated images. However, the magnitude

of the

of scaled images is very different from that of rotated

images.

We summarize our observations as follows:



Similar feature distance is distributed differently from dis-

similar feature distance. Similar feature distance skews to-

ward small values, while dissimilar feature distance shows

more even distribution.



Similar images do not resemble the query images in all fea-

0.02

0.04

0.06

0.08

0.1

0.12

0.14

100

109

118

127

136

Feature Number

Average Distance

0.05

0.1

0.15

0.2

0.25

0.3

0.35

100

109

118

127

136

Feature Number

Average Distance

(a) GIF Images (b) Cropped Images

0.02

0.04

0.06

0.08

0.1

0.12

100

109

118

127

136

Feature Number

Average Distance

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

100

109

118

127

136

Feature Number

Average Distance

Figure 2: The Average Feature Distances.

tures.



Images similar to the query images can be similar in differ-

ing features. For example, some images resemble the query

image in texture, others in color.

The above observations not only refute the assumptions of

Minkowski-type distance functions, but also provide hints as

to how a good distance function would work. The ﬁrst point is

that a distance function does not need to consider all features

equally, since similar images may match only some features

of the query images. The second point is that a distance func-

tion should weight features dynamically, since various similar

images may resemble the query image in differing ways. Tra-

ditional relevance feedback methods [3] learn a set “optimal”

feature weights for a query. For instance, if the user is more

interested in color than in texture, color features are weighted

higher when similarity is computed. What we have discovered

here is that this “static” weighting is insufﬁcient. An effective

distance function must weight features differently when com-

paring the query image to different images. These points lead

to the design of the dynamic partial distance function.

2.2 Dynamic Partial Distance Function

Based on the observations explained above, we designed a dis-

tance function to better represent the perceptual similarity. Let



,for

;



.Weﬁrstdeﬁnesets



The smal l est mÆ

s of

(

; :::; Æ

)

Then we deﬁne the Dynamic Partial Distance Function

(DPF) as

(

m; r

)=(



)

(3)

DPF has two adjustable parameters:

and

. Parame-

ter

can range from

.When

, it degenerates

to the Minkowski metric. When

m < p

, it counts only the

smallest

feature distances between two objects, and the in-

ﬂuence of the

(



)

largest feature distances is eliminated.

DPF dynamically selects features to be considered for different

pairs of objects. This is achieved by the introduction of



which changes dynamically for different pairs of objects. In

Section 3, we will show that DPF makes similar images ag-

gregate more compactly and locate closer to the query images,

simultaneously keeping the dissimilar images away from the

query images. In other words, similar and dissimilar images

are better separated by DPF.

3 Empirical Study

Our empirical study consists of two parts: training and testing.

In the training part, we used the same

M-image dataset to

predict the optimal

value. In the testing part, we used a

K-image dataset to examine the effectiveness of DPF.

3.1 Predicting

Through Training

We used the

;

000

original images to perform queries. Ap-

plying DPF of different

values to the

M-image dataset,

we tallied the distances from these

;

000

queries to their sim-

ilar images, and their dissimilar images, respectively. We then

computed the average and the standard deviation of these dis-

tances. We denote the average distance of the similar images to

their queries as



, of the dissimilar images as





. We denote

the standard deviation of the similar images’ distances as



of the dissimilar images as





Figure 3 depicts the effect of

(in the

-axis) on







,and





. Figure 3(a) shows that as

becomes smaller,

both



and





decrease. The average distance of similar

images (



), however, decreases at a faster pace than that of

dissimilar images (





). For instance, when we decrease

from

144

130



decreases from

to about

70%

0.5

1.5

2.5

3.5

144

134

124

114

104

Distance to Querys

Similar Images

Dissimilar Images

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

144

134

124

114

104

Standard Deviation

Similar Images

Dissimilar Images

(a) Average of Distances (b) Standard Deviation of Distances

Figure 3: The Effect of DPF.

0.2

0.4

0.6

0.8

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall

Precision

m=144 m=134

m=124 m=114

m=104 m=94

m=84

0.5

0.6

0.7

0.8

0.9

30 60 90 120 150 180 210 240

Top k

Recall

m=144 m=134

m=124 m=114

m=104 m=94

m=84

(a) Precision vs. Recall (b) Recall vs.

Figure 4: Search Performance of Different

decrease, whereas





decreases from

to about

38%

decrease. This gap indicates



is more sensitive to the

value than





. Figure 3(b) shows that the standard deviations



and





observe the same trend as the average distances.

When

decreases, similar images become more compact in

the feature space at a faster pace than dissimilar images do.

Our training result indicates that when

is set as

114

, similar

images are best clustered.

3.2 Testing New Distance Functions

The test dataset consists of

100

similar-image sets, each set

is composed of

images. Of these

images, we have the

original image,

transformed images (using the same trans-

formation methods described in Section 2), and ﬁve images

that are visually identiﬁed as similar. We then added

Kran-

domly crawled Web images to these

100



images to form

our testset.

We conducted

100

queries using the

100

original images.

For each query, we recorded the ranks of its similar images.

We experimented with

values from

144

, with

ﬁxed

at three. Figure 4 depicts the experimental results.

The precision-recall curves of selected

values are plotted

in Figure 4(a). The peak search performance is achieved when

= 114

, and it does signiﬁcantly better than the Minkowski

distance (

= 144

). Figure 4(b) plots the recall at selected

values for top-

retrievals. As we decrease the value of

from

144

, the recall improves steadily until

reaches

114

,where

the peak performance is achieved. Our DPF outperforms the

Minkowski distance function by

percentiles in recall.

Because of space limitation, we present extensive exper-

imental results and our comparison between the weighted

version of DPF and the weighted Minkowski metric in [4].

DPF consistently outperforms Minkowski-type function sig-

niﬁcantly for ﬁnding similar images.

4Conclusion

In this work we tackled one fundamental problem in im-

age retrieval—how to measure perceptual similarity between

images—using data mining techniques. We discovered the dy-

namic partial distance function (DPF) through mining a large

set of visual data, and showed that DPF outperformed the tra-

ditional functions by signiﬁcant margins. The effectivenese of

DPF can be explained by similarity theories in cognitive psy-

chology [2, 4].

References

[1] E. Chang and B. Li. Mega — the maximizing ex-

pected generalization algorithm for learning complex query

concepts (extended version). Technical Report http://www-

db.stanford.edu/



echang/mega.ps, November 2000.

[2] R. L. Goldstone. Similarity, interactive activation, and mapping.

Journal of Experimental Psychology: Learning, Memory, and

Cognition, 20:3–28, 1994.

[3] Y. Ishikawa, R. Subramanya, and C. Faloutsos. Mindreader:

Querying databases through multiple examples. VLDB, 1998.

[4] B. Li, E. Chang, and Y. Wu. Dynamic partial function — a

perceptual distance function for measuring similarity (ext. ver-

sion). http://www-db.stanford.edu/



echang/dpf-ext.pdf,Febru-

ary 2002.

[5] Y. Rubner, C. Tomasi, and L. Guibas. Adaptive color-image em-

bedding for database navigation. Proceedings of the the Asian

Conference on Computer Vision, January 1998.

[6] S. Tong and E. Chang. Support vector machine active learning for

image retrieval. Proceedings of ACM International Conference

on Multimedia, pages 107–118, October 2001.

[7] X. S. Zhou and T. S. Huang. Comparing discriminating transfor-

mations and svm for learning during multimedia retrieval. Proc.

of ACM Conf. on Multimedia, pages 137–146, 2001.

DPF - a perceptual distance function for image retrieval

Figures

Citations

A survey of content-based image retrieval with high-level semantics

Manifold-ranking based image retrieval

Efficient manifold ranking for image retrieval

Generalized Manifold-Ranking-Based Image Retrieval

Human activity recognition for video surveillance

References

Support vector machine active learning for image retrieval

MindReader: Querying Databases Through Multiple Examples

Similarity, interactive activation, and mapping

Comparing discriminating transformations and SVM for learning during multimedia retrieval

Maximizing expected generalization for learning complex query concepts

Related Papers (5)

Relevance feedback: a power tool for interactive content-based image retrieval

Color indexing

Support vector machine active learning for image retrieval

A metric for distributions with applications to image databases

Image indexing using color correlograms

Frequently Asked Questions (14)

Q1. What contributions have the authors mentioned in the paper "Dpf — a perceptual distance function for image retrieval" ?

Q2. How many transformations are performed for each image in the 60; 000-image set?

Q3. What are some of the common distance functions used to measure similarity between image vectors?

Q4. What is the purpose of this project?

Q5. What is the metric used to measure image similarity?

Q6. What is the purpose of this paper?

Q7. What is the effect of m on the distances between two objects?

Q8. How is the weighted Minkowski distance function defined?

Q9. How does DPF perform in the visual data mining?

Q10. What is the first stage of the project?

Q11. What is the main topic of this article?

Q12. What is the goal of the project?

Q13. What is the distance between the two objects?

Q14. What is the effect of m on the dissimilar images?