scispace - formally typeset
Open AccessProceedings ArticleDOI

A clustering based approach to efficient image retrieval

Reads0
Chats0
TLDR
This paper addresses the issue of effective and efficient content based image retrieval by presenting a novel indexing and retrieval methodology that integrates color, texture, and shape information, and applies these features in regions obtained through unsupervised segmentation, as opposed to applying them to the whole image domain.
Abstract
This paper addresses the issue of effective and efficient content based image retrieval by presenting a novel indexing and retrieval methodology that integrates color, texture, and shape information for the indexing and retrieval, and applies these features in regions obtained through unsupervised segmentation, as opposed to applying them to the whole image domain. In order to address the typical color feature "inaccuracy" problem in the literature, fuzzy logic is applied to the traditional color histogram to solve for the problem to a certain degree. The similarity is defined through a balanced combination between global and regional similarity measures incorporating all the features. In order to further improve the retrieval efficiency, a secondary clustering technique is developed and employed to significantly save query processing time without compromising the retrieval precision. An implemented prototype system has demonstrated a promising retrieval performance for a test database containing 2000 general-purpose color images, as compared with its peer systems in the literature.

read more

Content maybe subject to copyright    Report

A Clustering Based Approach to Efficient Image Retrieval
Ruofei Zhang, Zhongfei (Mark) Zhang
Department of Computer Science
Thomas J. Watson School of Engineering and Applied Science
State University of New York at Binghamton, Binghamton, NY 13902
E-mails: {rzhang, zhongfei}@cs.binghamton.edu
Abstract
This paper addresses the issue of effective and
efficient content based image retrieval by presenting a
novel indexing and retrieval methodology that integrates
color, texture, and shape information for the indexing and
retrieval, and applies these features in regions obtained
through unsupervised segmentation, as opposed to
applying them to the whole image domain. In order to
address the typical color feature “inaccuracy” problem in
the literature, fuzzy logic is applied to the traditional color
histogram to solve for the problem to a certain degree. The
similarity is defined through a balanced combination
between global and regional similarity measures
incorporating all the features. In order to further improve
the retrieval efficiency, a secondary clustering technique is
developed and employed to significantly save query
processing time without compromising the retrieval
precision. An implemented prototype system has
demonstrated a promising retrieval performance for a test
database containing 2000 general-purpose color images,
as compared with its peer systems in the literature.
1. Introduction
Content-based image retrieval (CBIR) concerns
automatic or semi-automatic retrieval of image data from
an imagery database based on semantic similarity between
the imagery content. The semantic similarity is typically
defined through a set of imagery features. These features
are extracted from shape, texture, or color properties
defined in the imagery domain. The relevance between a
query image and images in the database is ranked
according to the similarity measure computed from the
features. Due to its wide application potential, CBIR
research has received intensive attention over the last few
years.
In this paper we present a novel approach to addressing
the general-purpose CBIR problem. This approach
integrates semantics-intensive clustering-based
segmentation with fuzzy color histogram as well as texture
and shape features to index imagery data, and
consequently, is called FUZZYCLUB. A computationally
efficient distance metric is proposed in FUZZYCLUB to
reduce the query processing time. The response time is
further improved by imposing a secondary clustering
technique to achieve the high scalability in the case of very
large image databases.
The paper is organized as follows. We begin with a
brief review of the related work. Then we introduce to the
unsupervised segmentation technique employed in
FUZZYCLUB, which is followed by the development of
all the indexing features defined in FUZZYCLUB, with a
focus on the fuzzy color histogram. The distance metric
and the overall similarity issues between two images are
subsequently discussed, followed by the introduction to the
secondary clustering technique in the region feature vector
space to improve retrieval efficiency. Finally, the retrieval
performance of FUZZYCLUB is evaluated with a
comparison with its two peer systems in the literature, and
the paper is concluded.
2. Related Work and Significance
A broad range of research efforts and commercial
products [1,2,3] is reported to address the general-purpose
CBIR problem. Almost all of the approaches proposed are
based on indexing imagery in a feature space. Typical
features are color, texture, shape, region, and appearance
[4,6,7,9-16]. The most popularly used features are color
histogram and its variants. They are used in systems such
as IBM QBIC [7] and Berkeley Chabot [8]. Color
histogram is computationally efficient, and generally
insensitive to small changes in camera position. However,
a color histogram provides only a very coarse
characterization of an image, resulting a very coarse
indexing; images with similar histograms may have
dramatically different semantics. The “coarseness” of the
color histogram approach is due to the total loss of spatial
information of pixels in images. To retain the spatial

information of a color histogram, many research efforts are
made in the literature. Pass and Zabih [4] described a split
histogram called color coherence vector (CCV). Each of
its buckets j contains pixels having a given color j and two
classes based on the pixels spatial coherence. The feature
is also extended by successive refinement, with buckets of
a CCV further subdivided based on additional features.
Huang et al [10] proposed color correlograms to integrate
color and spatial information. Given
n inter-pixel
distances, a correlogram is defined as a set of
n
matrices
)(k
γ
, where an element
)(
,
k
cc
ji
γ
is the probability
that a pixel with color
i
c is at a distance k away from a
pixel with color
j
c . Rao et al [11] generalized the color
spatial distribution by computing the color histogram with
specific geometric relationships between pixels of each
color histogram bucket. Cinque et al [12] proposed another
color histogram refinement method, called Spatial-
Chromatic Histogram, in which the average position of
each color histogram and its standard deviation are
recorded to add the spatial information into the traditional
histogram. All of the refinement efforts failed to reflect the
fuzzy nature of the color features inherently exhibited in
the color histogram itself.
Ravela and Manmatha [13] proposed an appearance
based image indexing technique using Gaussian derivative
filters at several scales to compute low order 2D
differential invariants as indexing features. Recently,
region based features are developed to address the partial
matching capability for robust CBIR. A region-based
retrieval system segments images into regions (objects),
and retrieves images based on the similarity between
regions. Typical region based CBIR systems include
Berkeley Blobworld[15], UCSB Netra[16], Columbia
VisualSEEK[9], and Stanford IRM[6], of which [9,15,16]
are the classic region based CBIR systems which require
significant user interaction in defining or selecting region
features, preventing from a friendly interface to users,
especially to non-professional users. Another problem in
the classic region based CBIR systems is that they focus
too much on the region-based similarity as opposed to the
similarity with a balanced focus between regions and
global images. Wang et al [6] proposed an integrated
regional matching scheme for CBIR, which allows for
matching a region in one image against several regions
from another image. As a result, the similarity between
two images is defined as the weighed sum of the distances
in a feature space between all regions from different
images. Compared with the classic region-based CBIR
systems, this scheme decreases the impact of inaccurate
region segmentation by smoothing over the “inaccuracy
in distance. Nevertheless, the color representation of each
region is simplistic such that much of the rich color
information in a region is lost, as it fails to explicitly
express the “inaccuracy” of the color feature exhibited by
the fuzzy nature in the feature extraction and human
perception of color.
When we design FUZZYCLUB, we keep the
following three principles in mind. First, we intend to
apply pattern recognition techniques to connect low level
features to high level semantics. Consequently,
FUZZYCLUB is also region-based methodology, as
opposed to indexing images in the whole image domain.
Second, we intend to address the color “inaccuracy” issue
typically existing in color based image retrieval in the
literature. With this consideration, we apply fuzzy logic to
the system. Third, we intend to improve the query
processing time to avoid the typical linear search problem
in the literature; this drives us to develop the secondary
clustering technique currently employed in FUZZYCLUB.
As a result, comparing with the existing techniques and
systems, FUZZYCLUB exhibits the following distinctive
advantages: (i) it solves for the color “inaccuracy” problem
typically existing in color based CBIR systems to a certain
degree (ii) it develops a balanced scheme in similarity
measure between regional and global matching in order to
capture as much semantic information as possible without
sacrificing the efficiency (iii) it “pre-organizes” image
databases to further improve retrieval efficiency without
compromising retrieval effectiveness. The novelty of
FUZZYCLUB is its improvement of the existing
techniques and its incorporation and combination of these
techniques together in a single system.
3. Image Segmentation
The very first step of FUZZYCLUB is to segment an
image into different regions based on color and spatial
variation features using a modified version of the k-means
algorithm [17] due to its unsupervised learning nature such
that we can adaptively updates the number of regions as an
iterative process to accommodate the fact that the number
of regions in an image is unknown before the segmentation.
Image indexing is then taken based on the color, texture,
and shape features in each region, as well as the global,
overall combination of the regional features.
To segment an image into regions, FUZZYCLUB first
partitions an image into 4 by 4 blocks to compromise
between texture granularity and computation time. To
apply the k-means algorithm, a feature vector consisting of
six features from each block is defined as follows. Three
of the features are the average color components in a 4 by
4 block. The LAB color space is used due to its desired
property of the perceptual color difference proportional to
the numerical difference in the LAB space. These features
are denoted as
},,{
321
CCC .
The other three features are used to capture the texture
information of the image, represented by the energy in the

high frequency bands of the Haar wavelet transform [18],
i.e., the square roots of the second order moments of
wavelet coefficients in high frequency bands. To obtain
these moments, a Haar wavelet transform is applied to the
L component of the image. After a one-level wavelet
transform, a 4 by 4 block is decomposed into four
frequency bands; each band contains 2*2 coefficients.
Without loss of generality, suppose the coefficients in the
HL band are
},,,{
1,1,11,, ++++ lklklklk
CCCC . Then the
texture feature of this block in the HL band is computed as:
∑∑
==
++
=
1
0
1
0
2/12
,
)
4
1
(
ij
jlik
cf (1)
The other two features are computed similarly in the LH
and HH bands. These three features of the block are
denoted as
},,{
321
TTT . They can be used to discern
texture by showing variations in different directions.
After obtaining the feature vectors for all the blocks,
we perform the normalization on both color and texture
features to whiten them such that the effects of different
feature ranges are eliminated. Consequently the k-means
algorithm [17] is used to cluster the feature vectors into
several classes with every class in the feature space
corresponding to one spatial region in the image space.
Since clustering is performed in the feature space, blocks
in each cluster do not necessarily form a connected region
in the image. This way, we preserve the natural clustering
of objects in general-purpose images. The k-means
algorithm does not specify how many clusters to choose.
We adaptively select the number of clusters C by gradually
increasing C until a stop constraint is satisfied. The
average number of clusters for all images in the database
varies in accordance with the adjustment of the stop
constraint. In the k-means algorithm we use a color-texture
weighted L2 distance metric as
==
+
6
4
2
21
3
1
2
21
)()(
i
iit
i
iic
ttwccw (2)
to describe the distance between blocks. In our
implemented prototype system of FUZZYCLUB, we set
65.0=
c
w and 35.0=
t
w .
After the segmentation, FUZZYCLUB is ready for
image indexing. In other words, image indexing is based
on the features defined in the regions obtained from the
image segmentation.
4. Region-Based Features
Within each region, we define three types of features:
color, texture, and shape, along with the conventional
geometric information as the feature vector for image
indexing.
4.1 Color Features
Color is the most popularly used type of features in
image indexing. On the other hand, due to its inherent
nature of “inaccuracy” in description of the same semantic
content by different the color quantization and/or by the
uncertainty of human perception, it is important to capture
this “inaccuracy” when define the features. We apply
fuzzy logic [22][5] to the traditional color histogram to
help capture this uncertainty in color indexing.
We assume that any color is a fuzzy set [20]. That
means we will associate any color to a fuzzy
function,
]1,0[:
µ
µ
c
and for any color
c
of the color
universe,
)(c
c
µ
is the resemblance degree of the color
c
to the color c . The fuzzy model we define should
follow the property that the resemblance degree decreases
as the inter-color distance increases. The natural choice,
according to the typical soft computing literature [20], is to
impose a smooth decay of the resemblance function when
the inter-color distance increases. Since the LAB color
space is of the equivalence between the perceptual inter-
color distance and the actual Euclidean distance between
the color space coordinates, we define a Gaussian operator
to be the fuzzy resemblance function:
}
2
),(
exp{
2
1
)(
2
2
2
σ
πσ
µ
ccd
c
c
=
(3)
where d is the Euclidean distance between color
c and
c
in LAB space, and
σ
is the average distance between
color
c
and
c
defined as
∑∑
=+=
=
1
11
),(
)1(
2
B
i
B
ik
ccd
BB
σ
(4)
where B is the number of buckets in the color histogram.
This fuzzy color model enables to enlarge the
influence of a given color to its neighboring colors,
according to the uncertainty principle and the perceptual
similarity. This means that each time a color
c is found in
the image, it will influence all the quantized colors
according to their resemblance to the color
c
. Numerically,
this is expressed as:
=
µ
µ
c
c
cchch )()()(
1
(5)
where
µ
is the color universe in the image and )(
1
ch
is
the normalized, traditional color histogram. This fuzzy
histogram operation in fact is the linear convolution
between the traditional color histogram and the fuzzy color
model. This convolution expresses the histogram
smoothing, provided that the color model is indeed a

smoothing, low-pass filtering kernel. The use of Gaussian
function as the color model helps generate such a smooth
histogram, which subsequently helps reduce the
quantization errors [21].
In the implementation of the prototype system of
FUZZYCLUB, the LAB color space is quantized into 96
buckets by using uniform quantization (L by 6, A by 4, B
by 4). Then Eq. 5 is applied to obtain the fuzzy histogram
for each region.
)(c
c
µ
for each bucket in the histogram
is pre-computed based on Eqs. 3 and 4, and is
implemented as a lookup table to reduce the online
computation.
4.2 Texture Features
The texture features are defined as the centroid vector
of all the three-components Haar wavelet moment vectors
defined in Section 3 for each blocks of a region.
4.3 Shape Features
The shape features are defined as a vector containing
three components for the normalized inertia [19] of order 1
to 3 of a region, respectively. For a region H in 2-
dimensional Euclidean space
2
(i.e., an image space),
the normalized inertia of order p is
2/1
),(:),(
2/22
)]([
])
ˆ
()
ˆ
[(
),(
p
Hyxyx
p
HV
yyxx
pHl
+
+
=
(6)
where V(H) is the number of pixels in the region H, and
)
ˆ
,
ˆ
( yx is the centroid of H. The minimum normalized
inertia is achieved by spheres. Denote the
th
p order
normalized inertia of spheres as
p
L
. The following
features are used to describe the shape of a region:
11
/)1,( LHlS = ,
22
/)2,( LHlS = ,
33
/)3,( LHlS =
(7)
Now the indexing vector of a region consists of the
three fuzzy color features, the three texture feature
components, and the three shape feature components
defined in Eq. 7. In addition, the location of the region,
represented as the centroid coordinates of the region, and
the area of the region, represented as the total number of
pixels, are also captured as part of the indexing feature
vector, resulting in a complete feature vector for each
region.
5. Region Matching and Image Similarity
To compute the distance between two regions, we
apply L2 distance metric to fuzzy color histogram, texture
vector, and shape vector, respectively. For the fuzzy
histogram, we define the distance metric as:
B
ihih
d
B
i
qp
pq
C
=
=
1
2
)]()([(
(8)
where B is the number of buckets in the histogram, and
)(ih
p
and
)(ih
q
are the fuzzy histograms for region p
and q, respectively. Similarly, L2 distance metric is
applied to the texture vector and the shape vector,
respectively:
qp
pq
T
TTd = (9)
qp
pq
S
SSd = (10)
where
p
T
,
p
S
are the texture feature vector and the shape
feature vector for region p, respectively, and
q
T ,
q
S are
for region q, respectively.
To measure the distance between two regions, we
separate the contribution from the color and texture
features from that from the shape features, as the former is
considered more reliable in image indexing than the latter.
Consequently, given two regions p and q, the inter-region
distance on color and texture is defined as
22
pq
T
pq
Cpq
ddD +=
(11)
The intra-region distance (i.e., the deviation) for region p
on color and texture is defined as
2/1
1
2
]
1
[
=
=
p
N
j
p
j
p
p
pp
ZZ
N
D
(12)
where
p
N is the number of blocks in region p,
j
p
Z is the
color-texture vector of block j in region p defined in
Section 3, and
p
Z is the centroid vector in region p of all
the color-texture vectors for all the blocks in this region.
Conceptually, the overall distance between two regions
should increase when the inter-region distance increases
and when the intra-region distance decreases. Hence, we
define the overall distance between two regions p and q as
follows:
pq
S
qqpp
pq
pq
dw
DD
D
wDIST )1( +
+
= (13)
where w is a weight. In the prototype system of
FUZZYCLUB, we set
w as 0.7. Since all components are
normalized, this overall distance between two regions is
also normalized. The separation between the contribution
from the color and texture features and that from the shape
features allows us to adjust the weights for these different
contributions. The favor to the former in weight in the
prototype system reflects our belief that the former is more

reliable than the latter, which is verified by the
experimental results.
Given the definition of the distance between two
regions, we are ready to compute the global similarity
between two images. Suppose we have M regions in image
1 and N regions in image 2, the following algorithm
computes the global similarity between image 1 and image
2:
Step 1: compute the distance between one region in
image 1 and all regions in image 2. For each region i in
image 1, the distance between this region and image 2 is
defined as:
),(
2Im
jDISTMinR
ijagei
= (14)
where j enumerates each regions in image 2. This
definition captures the minimum distance between a region
and all the regions in an image, which maximizes the
potential similarity between the region and the image.
Step 2: similarly, the distance between a region j in
image 2 and image 1 is defined as
),(
1Im
iDISTMinR
jiagej
=
(15)
where i enumerates each regions in image 1.
Step 3: now we have M+N distances. We define the
distance between two images (1 and 2) as follows:
2
)2,1(Im
1
1Im2
1
2Im1
==
+
=
N
j
agejj
M
i
ageii
RwRw
ageDist
(16)
where
i
w
1
is the weight for region i in image 1, and
j
w
2
is the weight for region j in image 2. Since we think a
region with a larger area plays a more significant role in
contributing to the overall similarity value between two
images than a region with a smaller area, we define
1
1
1
N
N
w
i
i
= , where
i
N
1
is the number of blocks in region
i and
1
N
is the total number of blocks in image 1, and
define
j
w
2
similarly for image 2.
This definition of the overall similarity between two
images captured by the overall distance between the
images is a balanced scheme in similarity measure
between regional and global matching. As compared with
many existing similarity measures in the literature, this
definition strives to incorporate as much semantic
information as possible and at the same time also achieves
a computational efficiency. Given this definition, for each
query image q, it is straightforward to compute
),(Im dqageDist for every image d in the database in
the retrieval.
6. Secondary Clustering and Image Retrieval
The time of image retrieval depends in a large degree
on the number of images in the database in almost all
CBIR systems. Many existing systems attempt to compare
the query image with every target images in the database
to find the top matching images, resulting in an essentially
linear search, which is prohibitive when the database is
large. We believe that it is not necessary to conduct a
whole database comparison. In fact, it is possible to exploit
a priori information regarding the “organization” of the
images in the database in the feature space before a query
is posed, such that when a query is received, only a part of
the database needs to be searched while a large portion of
the database may be eliminated in the search. This
certainly saves significant query processing time without
compromising the retrieval precision.
To achieve this goal, in FUZZYCLUB we add a pre-
retrieval screening phase to the feature space after a
database is indexed by applying a secondary k-means
algorithm to the distance
pq
DIST in the region feature
vector space to cluster all the regions in the database into
classes. The philosophy is that regions with similar {color,
texture, shape} features should be grouped together in the
same class. This secondary clustering is performed offline,
and each region’s indexing data along with its associated
class ID is recorded in the index files. Consequently, in the
prototype implementation of FUZZYCLUB, the image
database is indexed in terms of a three level tree structure,
one for the region level, one for the class level, and one for
the image level.
Assuming an image database is indexed based on the
features defined in Sections 4 and 5, and is “organized
based on the secondary clustering, given a query image,
FUZZYCLUB processes the query as follows:
Step 1: Perform the query image segmentation to
obtain all the regions, say we have regions
]1,0[,
NiQ
i
in the query image.
Step 2: Compute the distances between each region
i
Q and all class centroids in the database to determine
which class
i
Q belongs to by the minimum-distance-win
principle. Assume that region
i
Q belongs to
class
]1,0[,
KjC
j
.
Step 3: Retrieve all the regions in the database which
belongs to the class
]1,0[, KjC
j
. These regions
comprise a region set
jd
T . The images containing any
regions in the set
jd
T
is subsequently retrieved from the
index structure. These images comprise an image set
d
I

Citations
More filters
Journal ArticleDOI

Exploiting Deep Features for Remote Sensing Image Retrieval: A Systematic Investigation

TL;DR: Zhang et al. as mentioned in this paper analyzed the three core issues of remote sensing image retrieval and provided a comprehensive review on existing methods, focusing on the feature extraction issue and how to use powerful deep representations to address this task.
Journal ArticleDOI

A Pattern Similarity Scheme for Medical Image Retrieval

TL;DR: It is shown that the proposed scheme can be efficiently and effectively applied for medical image retrieval from large databases, providing unsupervised semantic interpretation of the results, which can be further extended by knowledge representation methodologies.
Journal ArticleDOI

Efficient fingerprint search based on database clustering

TL;DR: An efficient fingerprint search algorithm based on database clustering, which narrows down the search space of fine matching, is proposed, which not only greatly speeds up the search process but also improves the retrieval accuracy.
Proceedings ArticleDOI

Automatic image annotation by an iterative approach: incorporating keyword correlations and region matching

TL;DR: This work demonstrates that the iterative annotation algorithm can incorporate the keyword correlations and the region matching approaches handily to improve the image annotation significantly and outperforms the state-of-the-art continuous feature model MBRM with recall and precision improving 21% and 11% respectively.
Journal ArticleDOI

Traffic Sign Classification Using Ring Partitioned Method

TL;DR: The experimental results show the effectiveness of the ring-partitioned method in the matching of occluded, rotated, and illumination problems of the traffic sign images with the fast computation time.
References
More filters
Book

Ten lectures on wavelets

TL;DR: This paper presents a meta-analyses of the wavelet transforms of Coxeter’s inequality and its applications to multiresolutional analysis and orthonormal bases.
Journal ArticleDOI

Content-based image retrieval at the end of the early years

TL;DR: The working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap are discussed, as well as aspects of system engineering: databases, system architecture, and evaluation.
Journal ArticleDOI

Query by image and video content: the QBIC system

TL;DR: The Query by Image Content (QBIC) system as discussed by the authors allows queries on large image and video databases based on example images, user-constructed sketches and drawings, selected color and texture patterns, camera and object motion, and other graphical information.
Journal ArticleDOI

Ten Lectures on Wavelets.

Related Papers (5)
Frequently Asked Questions (12)
Q1. What are the contributions in "A clustering based approach to efficient image retrieval" ?

This paper addresses the issue of effective and efficient content based image retrieval by presenting a novel indexing and retrieval methodology that integrates color, texture, and shape information for the indexing and retrieval, and applies these features in regions obtained through unsupervised segmentation, as opposed to applying them to the whole image domain. In order to further improve the retrieval efficiency, a secondary clustering technique is developed and employed to significantly save query processing time without compromising the retrieval precision. An implemented prototype system has demonstrated a promising retrieval performance for a test database containing 2000 general-purpose color images, as compared with its peer systems in the literature. 

The natural choice, according to the typical soft computing literature [20], is to impose a smooth decay of the resemblance function when the inter-color distance increases. 

In order to improve the query processing time, FUZZYCLUB incorporates a secondary clustering technique to “pre-organize” the database to significantly save the search time. 

L2 distance metric is applied to the texture vector and the shape vector, respectively:qp pq T TTd −= (9)qp pq S SSd −= (10)where pT , pS are the texture feature vector and the shape feature vector for region p, respectively, and qT , qS are for region q, respectively. 

Consequently the k-means algorithm [17] is used to cluster the feature vectors into several classes with every class in the feature space corresponding to one spatial region in the image space. 

Typical region based CBIR systems include Berkeley Blobworld[15], UCSB Netra[16], Columbia VisualSEEK[9], and Stanford IRM[6], of which [9,15,16] are the classic region based CBIR systems which require significant user interaction in defining or selecting region features, preventing from a friendly interface to users, especially to non-professional users. 

The whole indexing time, including running the secondary clustering after indexing each image, for the 2000 image database takes 60-70 minutes, corresponding to about 2 seconds per image. 

For each group images in the 2000 database images, the authors randomly select 30 images as queries to each of the three systems, respectively. 

Within each region, the authors define three types of features: color, texture, and shape, along with the conventionalgeometric information as the feature vector for image indexing. 

Since the number of relevant images in the database for each query image is the same, the recall values are not computed as they are proportional to the precision values in this case. 

On the other hand, due to its inherent nature of “inaccuracy” in description of the same semantic content by different the color quantization and/or by the uncertainty of human perception, it is important to capture this “inaccuracy” when define the features. 

Content-based image retrieval (CBIR) concerns automatic or semi-automatic retrieval of image data from an imagery database based on semantic similarity between the imagery content.