What are the contributions in "A clustering based approach to efficient image retrieval" ?

This paper addresses the issue of effective and efficient content based image retrieval by presenting a novel indexing and retrieval methodology that integrates color, texture, and shape information for the indexing and retrieval, and applies these features in regions obtained through unsupervised segmentation, as opposed to applying them to the whole image domain. In order to further improve the retrieval efficiency, a secondary clustering technique is developed and employed to significantly save query processing time without compromising the retrieval precision. An implemented prototype system has demonstrated a promising retrieval performance for a test database containing 2000 general-purpose color images, as compared with its peer systems in the literature.

Why does FUZZYCLUB use a secondary clustering technique?

In order to improve the query processing time, FUZZYCLUB incorporates a secondary clustering technique to “pre-organize” the database to significantly save the search time.

What is the p-value for the texture vector?

L2 distance metric is applied to the texture vector and the shape vector, respectively:qp pq T TTd −= (9)qp pq S SSd −= (10)where pT , pS are the texture feature vector and the shape feature vector for region p, respectively, and qT , qS are for region q, respectively.

How long does it take to index the images?

The whole indexing time, including running the secondary clustering after indexing each image, for the 2000 image database takes 60-70 minutes, corresponding to about 2 seconds per image.

How many images are returned to each of the three systems?

For each group images in the 2000 database images, the authors randomly select 30 images as queries to each of the three systems, respectively.

Why are the recall values not proportional to the precision values in this case?

Since the number of relevant images in the database for each query image is the same, the recall values are not computed as they are proportional to the precision values in this case.

(Open Access) A clustering based approach to efficient image retrieval (2002) | Ruofei Zhang

Q: What is the k-means algorithm for image indexing?

Within each region, the authors define three types of features: color, texture, and shape, along with the conventionalgeometric information as the feature vector for image indexing.

A Clustering Based Approach to Efficient Image Retrieval

Ruofei Zhang, Zhongfei (Mark) Zhang

Department of Computer Science

Thomas J. Watson School of Engineering and Applied Science

State University of New York at Binghamton, Binghamton, NY 13902

E-mails: {rzhang, zhongfei}@cs.binghamton.edu

Abstract

This paper addresses the issue of effective and

efficient content based image retrieval by presenting a

novel indexing and retrieval methodology that integrates

color, texture, and shape information for the indexing and

retrieval, and applies these features in regions obtained

through unsupervised segmentation, as opposed to

applying them to the whole image domain. In order to

address the typical color feature “inaccuracy” problem in

the literature, fuzzy logic is applied to the traditional color

histogram to solve for the problem to a certain degree. The

similarity is defined through a balanced combination

between global and regional similarity measures

incorporating all the features. In order to further improve

the retrieval efficiency, a secondary clustering technique is

developed and employed to significantly save query

processing time without compromising the retrieval

precision. An implemented prototype system has

demonstrated a promising retrieval performance for a test

database containing 2000 general-purpose color images,

as compared with its peer systems in the literature.

1. Introduction

Content-based image retrieval (CBIR) concerns

automatic or semi-automatic retrieval of image data from

an imagery database based on semantic similarity between

the imagery content. The semantic similarity is typically

defined through a set of imagery features. These features

are extracted from shape, texture, or color properties

defined in the imagery domain. The relevance between a

query image and images in the database is ranked

according to the similarity measure computed from the

features. Due to its wide application potential, CBIR

research has received intensive attention over the last few

years.

In this paper we present a novel approach to addressing

the general-purpose CBIR problem. This approach

integrates semantics-intensive clustering-based

segmentation with fuzzy color histogram as well as texture

and shape features to index imagery data, and

consequently, is called FUZZYCLUB. A computationally

efficient distance metric is proposed in FUZZYCLUB to

reduce the query processing time. The response time is

further improved by imposing a secondary clustering

technique to achieve the high scalability in the case of very

large image databases.

The paper is organized as follows. We begin with a

brief review of the related work. Then we introduce to the

unsupervised segmentation technique employed in

FUZZYCLUB, which is followed by the development of

all the indexing features defined in FUZZYCLUB, with a

focus on the fuzzy color histogram. The distance metric

and the overall similarity issues between two images are

subsequently discussed, followed by the introduction to the

secondary clustering technique in the region feature vector

space to improve retrieval efficiency. Finally, the retrieval

performance of FUZZYCLUB is evaluated with a

comparison with its two peer systems in the literature, and

the paper is concluded.

2. Related Work and Significance

A broad range of research efforts and commercial

products [1,2,3] is reported to address the general-purpose

CBIR problem. Almost all of the approaches proposed are

based on indexing imagery in a feature space. Typical

features are color, texture, shape, region, and appearance

[4,6,7,9-16]. The most popularly used features are color

histogram and its variants. They are used in systems such

as IBM QBIC [7] and Berkeley Chabot [8]. Color

histogram is computationally efficient, and generally

insensitive to small changes in camera position. However,

a color histogram provides only a very coarse

characterization of an image, resulting a very coarse

indexing; images with similar histograms may have

dramatically different semantics. The “coarseness” of the

color histogram approach is due to the total loss of spatial

information of pixels in images. To retain the spatial

information of a color histogram, many research efforts are

made in the literature. Pass and Zabih [4] described a split

histogram called color coherence vector (CCV). Each of

its buckets j contains pixels having a given color j and two

classes based on the pixels spatial coherence. The feature

is also extended by successive refinement, with buckets of

a CCV further subdivided based on additional features.

Huang et al [10] proposed color correlograms to integrate

color and spatial information. Given

n inter-pixel

distances, a correlogram is defined as a set of

matrices

)(k

, where an element

)(

is the probability

that a pixel with color

c is at a distance k away from a

pixel with color

c . Rao et al [11] generalized the color

spatial distribution by computing the color histogram with

specific geometric relationships between pixels of each

color histogram bucket. Cinque et al [12] proposed another

color histogram refinement method, called Spatial-

Chromatic Histogram, in which the average position of

each color histogram and its standard deviation are

recorded to add the spatial information into the traditional

histogram. All of the refinement efforts failed to reflect the

fuzzy nature of the color features inherently exhibited in

the color histogram itself.

Ravela and Manmatha [13] proposed an appearance

based image indexing technique using Gaussian derivative

filters at several scales to compute low order 2D

differential invariants as indexing features. Recently,

region based features are developed to address the partial

matching capability for robust CBIR. A region-based

retrieval system segments images into regions (objects),

and retrieves images based on the similarity between

regions. Typical region based CBIR systems include

Berkeley Blobworld[15], UCSB Netra[16], Columbia

VisualSEEK[9], and Stanford IRM[6], of which [9,15,16]

are the classic region based CBIR systems which require

significant user interaction in defining or selecting region

features, preventing from a friendly interface to users,

especially to non-professional users. Another problem in

the classic region based CBIR systems is that they focus

too much on the region-based similarity as opposed to the

similarity with a balanced focus between regions and

global images. Wang et al [6] proposed an integrated

regional matching scheme for CBIR, which allows for

matching a region in one image against several regions

from another image. As a result, the similarity between

two images is defined as the weighed sum of the distances

in a feature space between all regions from different

images. Compared with the classic region-based CBIR

systems, this scheme decreases the impact of inaccurate

region segmentation by smoothing over the “inaccuracy”

in distance. Nevertheless, the color representation of each

region is simplistic such that much of the rich color

information in a region is lost, as it fails to explicitly

express the “inaccuracy” of the color feature exhibited by

the fuzzy nature in the feature extraction and human

perception of color.

When we design FUZZYCLUB, we keep the

following three principles in mind. First, we intend to

apply pattern recognition techniques to connect low level

features to high level semantics. Consequently,

FUZZYCLUB is also region-based methodology, as

opposed to indexing images in the whole image domain.

Second, we intend to address the color “inaccuracy” issue

typically existing in color based image retrieval in the

literature. With this consideration, we apply fuzzy logic to

the system. Third, we intend to improve the query

processing time to avoid the typical linear search problem

in the literature; this drives us to develop the secondary

clustering technique currently employed in FUZZYCLUB.

As a result, comparing with the existing techniques and

systems, FUZZYCLUB exhibits the following distinctive

advantages: (i) it solves for the color “inaccuracy” problem

typically existing in color based CBIR systems to a certain

degree (ii) it develops a balanced scheme in similarity

measure between regional and global matching in order to

capture as much semantic information as possible without

sacrificing the efficiency (iii) it “pre-organizes” image

databases to further improve retrieval efficiency without

compromising retrieval effectiveness. The novelty of

FUZZYCLUB is its improvement of the existing

techniques and its incorporation and combination of these

techniques together in a single system.

3. Image Segmentation

The very first step of FUZZYCLUB is to segment an

image into different regions based on color and spatial

variation features using a modified version of the k-means

algorithm [17] due to its unsupervised learning nature such

that we can adaptively updates the number of regions as an

iterative process to accommodate the fact that the number

of regions in an image is unknown before the segmentation.

Image indexing is then taken based on the color, texture,

and shape features in each region, as well as the global,

overall combination of the regional features.

To segment an image into regions, FUZZYCLUB first

partitions an image into 4 by 4 blocks to compromise

between texture granularity and computation time. To

apply the k-means algorithm, a feature vector consisting of

six features from each block is defined as follows. Three

of the features are the average color components in a 4 by

4 block. The LAB color space is used due to its desired

property of the perceptual color difference proportional to

the numerical difference in the LAB space. These features

are denoted as

},,{

321

CCC .

The other three features are used to capture the texture

information of the image, represented by the energy in the

high frequency bands of the Haar wavelet transform [18],

i.e., the square roots of the second order moments of

wavelet coefficients in high frequency bands. To obtain

these moments, a Haar wavelet transform is applied to the

L component of the image. After a one-level wavelet

transform, a 4 by 4 block is decomposed into four

frequency bands; each band contains 2*2 coefficients.

Without loss of generality, suppose the coefficients in the

HL band are

},,,{

1,1,11,, ++++ lklklklk

CCCC . Then the

texture feature of this block in the HL band is computed as:

∑∑

2/12

)

(

jlik

cf (1)

The other two features are computed similarly in the LH

and HH bands. These three features of the block are

denoted as

},,{

321

TTT . They can be used to discern

texture by showing variations in different directions.

After obtaining the feature vectors for all the blocks,

we perform the normalization on both color and texture

features to whiten them such that the effects of different

feature ranges are eliminated. Consequently the k-means

algorithm [17] is used to cluster the feature vectors into

several classes with every class in the feature space

corresponding to one spatial region in the image space.

Since clustering is performed in the feature space, blocks

in each cluster do not necessarily form a connected region

in the image. This way, we preserve the natural clustering

of objects in general-purpose images. The k-means

algorithm does not specify how many clusters to choose.

We adaptively select the number of clusters C by gradually

increasing C until a stop constraint is satisfied. The

average number of clusters for all images in the database

varies in accordance with the adjustment of the stop

constraint. In the k-means algorithm we use a color-texture

weighted L2 distance metric as

∑∑

−+−

)()(

iit

iic

ttwccw (2)

to describe the distance between blocks. In our

implemented prototype system of FUZZYCLUB, we set

65.0=

w and 35.0=

w .

After the segmentation, FUZZYCLUB is ready for

image indexing. In other words, image indexing is based

on the features defined in the regions obtained from the

image segmentation.

4. Region-Based Features

Within each region, we define three types of features:

color, texture, and shape, along with the conventional

geometric information as the feature vector for image

indexing.

4.1 Color Features

Color is the most popularly used type of features in

image indexing. On the other hand, due to its inherent

nature of “inaccuracy” in description of the same semantic

content by different the color quantization and/or by the

uncertainty of human perception, it is important to capture

this “inaccuracy” when define the features. We apply

fuzzy logic [22][5] to the traditional color histogram to

help capture this uncertainty in color indexing.

We assume that any color is a fuzzy set [20]. That

means we will associate any color to a fuzzy

function,

]1,0[: →

and for any color

′

of the color

universe,

)(c

′

is the resemblance degree of the color

′

to the color c . The fuzzy model we define should

follow the property that the resemblance degree decreases

as the inter-color distance increases. The natural choice,

according to the typical soft computing literature [20], is to

impose a smooth decay of the resemblance function when

the inter-color distance increases. Since the LAB color

space is of the equivalence between the perceptual inter-

color distance and the actual Euclidean distance between

the color space coordinates, we define a Gaussian operator

to be the fuzzy resemblance function:

}

),(

exp{

)(

πσ

ccd

′

−=

′

(3)

where d is the Euclidean distance between color

c and

′

in LAB space, and

is the average distance between

color

and

′

defined as

∑∑

−

=+=

′

−

),(

)1(

ccd

(4)

where B is the number of buckets in the color histogram.

This fuzzy color model enables to enlarge the

influence of a given color to its neighboring colors,

according to the uncertainty principle and the perceptual

similarity. This means that each time a color

c is found in

the image, it will influence all the quantized colors

according to their resemblance to the color

. Numerically,

this is expressed as:

∑

∈

′

′′

cchch )()()(

(5)

where

is the color universe in the image and )(

′

the normalized, traditional color histogram. This fuzzy

histogram operation in fact is the linear convolution

between the traditional color histogram and the fuzzy color

model. This convolution expresses the histogram

smoothing, provided that the color model is indeed a

smoothing, low-pass filtering kernel. The use of Gaussian

function as the color model helps generate such a smooth

histogram, which subsequently helps reduce the

quantization errors [21].

In the implementation of the prototype system of

FUZZYCLUB, the LAB color space is quantized into 96

buckets by using uniform quantization (L by 6, A by 4, B

by 4). Then Eq. 5 is applied to obtain the fuzzy histogram

for each region.

)(c

′

for each bucket in the histogram

is pre-computed based on Eqs. 3 and 4, and is

implemented as a lookup table to reduce the online

computation.

4.2 Texture Features

The texture features are defined as the centroid vector

of all the three-components Haar wavelet moment vectors

defined in Section 3 for each blocks of a region.

4.3 Shape Features

The shape features are defined as a vector containing

three components for the normalized inertia [19] of order 1

to 3 of a region, respectively. For a region H in 2-

dimensional Euclidean space

ℜ (i.e., an image space),

the normalized inertia of order p is

2/1

),(:),(

2/22

)]([

])

()

[(

),(

Hyxyx

yyxx

pHl

∈

∑

−+−

(6)

where V(H) is the number of pixels in the region H, and

)

( yx is the centroid of H. The minimum normalized

inertia is achieved by spheres. Denote the

p order

normalized inertia of spheres as

. The following

features are used to describe the shape of a region:

/)1,( LHlS = ,

/)2,( LHlS = ,

/)3,( LHlS =

(7)

Now the indexing vector of a region consists of the

three fuzzy color features, the three texture feature

components, and the three shape feature components

defined in Eq. 7. In addition, the location of the region,

represented as the centroid coordinates of the region, and

the area of the region, represented as the total number of

pixels, are also captured as part of the indexing feature

vector, resulting in a complete feature vector for each

region.

5. Region Matching and Image Similarity

To compute the distance between two regions, we

apply L2 distance metric to fuzzy color histogram, texture

vector, and shape vector, respectively. For the fuzzy

histogram, we define the distance metric as:

ihih

∑

−

)]()([(

(8)

where B is the number of buckets in the histogram, and

)(ih

and

)(ih

are the fuzzy histograms for region p

and q, respectively. Similarly, L2 distance metric is

applied to the texture vector and the shape vector,

respectively:

TTd −= (9)

SSd −= (10)

where

are the texture feature vector and the shape

feature vector for region p, respectively, and

T ,

S are

for region q, respectively.

To measure the distance between two regions, we

separate the contribution from the color and texture

features from that from the shape features, as the former is

considered more reliable in image indexing than the latter.

Consequently, given two regions p and q, the inter-region

distance on color and texture is defined as

Cpq

ddD +=

(11)

The intra-region distance (i.e., the deviation) for region p

on color and texture is defined as

2/1

]

[

∑

−=

(12)

where

N is the number of blocks in region p,

Z is the

color-texture vector of block j in region p defined in

Section 3, and

Z is the centroid vector in region p of all

the color-texture vectors for all the blocks in this region.

Conceptually, the overall distance between two regions

should increase when the inter-region distance increases

and when the intra-region distance decreases. Hence, we

define the overall distance between two regions p and q as

follows:

qqpp

wDIST )1( −+

= (13)

where w is a weight. In the prototype system of

FUZZYCLUB, we set

w as 0.7. Since all components are

normalized, this overall distance between two regions is

also normalized. The separation between the contribution

from the color and texture features and that from the shape

features allows us to adjust the weights for these different

contributions. The favor to the former in weight in the

prototype system reflects our belief that the former is more

reliable than the latter, which is verified by the

experimental results.

Given the definition of the distance between two

regions, we are ready to compute the global similarity

between two images. Suppose we have M regions in image

1 and N regions in image 2, the following algorithm

computes the global similarity between image 1 and image

Step 1: compute the distance between one region in

image 1 and all regions in image 2. For each region i in

image 1, the distance between this region and image 2 is

defined as:

),(

2Im

jDISTMinR

ijagei

= (14)

where j enumerates each regions in image 2. This

definition captures the minimum distance between a region

and all the regions in an image, which maximizes the

potential similarity between the region and the image.

Step 2: similarly, the distance between a region j in

image 2 and image 1 is defined as

),(

1Im

iDISTMinR

jiagej

(15)

where i enumerates each regions in image 1.

Step 3: now we have M+N distances. We define the

distance between two images (1 and 2) as follows:

)2,1(Im

1Im2

2Im1

∑∑

agejj

ageii

RwRw

ageDist

(16)

where

is the weight for region i in image 1, and

is the weight for region j in image 2. Since we think a

region with a larger area plays a more significant role in

contributing to the overall similarity value between two

images than a region with a smaller area, we define

= , where

is the number of blocks in region

i and

is the total number of blocks in image 1, and

define

similarly for image 2.

This definition of the overall similarity between two

images captured by the overall distance between the

images is a balanced scheme in similarity measure

between regional and global matching. As compared with

many existing similarity measures in the literature, this

definition strives to incorporate as much semantic

information as possible and at the same time also achieves

a computational efficiency. Given this definition, for each

query image q, it is straightforward to compute

),(Im dqageDist for every image d in the database in

the retrieval.

6. Secondary Clustering and Image Retrieval

The time of image retrieval depends in a large degree

on the number of images in the database in almost all

CBIR systems. Many existing systems attempt to compare

the query image with every target images in the database

to find the top matching images, resulting in an essentially

linear search, which is prohibitive when the database is

large. We believe that it is not necessary to conduct a

whole database comparison. In fact, it is possible to exploit

a priori information regarding the “organization” of the

images in the database in the feature space before a query

is posed, such that when a query is received, only a part of

the database needs to be searched while a large portion of

the database may be eliminated in the search. This

certainly saves significant query processing time without

compromising the retrieval precision.

To achieve this goal, in FUZZYCLUB we add a pre-

retrieval screening phase to the feature space after a

database is indexed by applying a secondary k-means

algorithm to the distance

DIST in the region feature

vector space to cluster all the regions in the database into

classes. The philosophy is that regions with similar {color,

texture, shape} features should be grouped together in the

same class. This secondary clustering is performed offline,

and each region’s indexing data along with its associated

class ID is recorded in the index files. Consequently, in the

prototype implementation of FUZZYCLUB, the image

database is indexed in terms of a three level tree structure,

one for the region level, one for the class level, and one for

the image level.

Assuming an image database is indexed based on the

features defined in Sections 4 and 5, and is “organized”

based on the secondary clustering, given a query image,

FUZZYCLUB processes the query as follows:

Step 1: Perform the query image segmentation to

obtain all the regions, say we have regions

]1,0[,

−

∈

NiQ

in the query image.

Step 2: Compute the distances between each region

Q and all class centroids in the database to determine

which class

Q belongs to by the minimum-distance-win

principle. Assume that region

Q belongs to

class

]1,0[,

−

∈

KjC

Step 3: Retrieve all the regions in the database which

belongs to the class

]1,0[, −∈ KjC

. These regions

comprise a region set

T . The images containing any

regions in the set

is subsequently retrieved from the

index structure. These images comprise an image set

A clustering based approach to efficient image retrieval

Figures

Citations

Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation

A Pattern Similarity Scheme for Medical Image Retrieval

Efficient fingerprint search based on database clustering

Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching

Traffic Sign Classification Using Ring Partitioned Method

References

Ten lectures on wavelets

A K-Means Clustering Algorithm

Content-based image retrieval at the end of the early years

Query by image and video content: the QBIC system

Ten Lectures on Wavelets.

Related Papers (5)

SIMPLIcity: semantics-sensitive integrated matching for picture libraries

Photobook: content-based manipulation of image databases

VisualSEEk: a fully automated content-based image query system

Content-based image retrieval at the end of the early years

Texture features for browsing and retrieval of image data

Frequently Asked Questions (12)

Q1. What are the contributions in "A clustering based approach to efficient image retrieval" ?

Q2. What is the natural choice in color indexing?

Q3. Why does FUZZYCLUB use a secondary clustering technique?

Q4. What is the p-value for the texture vector?

Q5. What is the k-means algorithm used to cluster the feature vectors?

Q6. What are the common CBIR systems?

Q7. How long does it take to index the images?

Q8. How many images are returned to each of the three systems?

Q9. What is the k-means algorithm for image indexing?

Q10. Why are the recall values not proportional to the precision values in this case?

Q11. Why is it important to capture the inaccuracy in color indexing?

Q12. What is the definition of a content-based image retrieval?