SIMPLIcity: semantics-sensitive integrated matching for picture libraries
Summary (6 min read)
1 INTRODUCTION
- WITH the steady growth of computer power, rapidlydeclining cost of storage, and ever-increasing access to the Internet, digital acquisition of information has become increasingly popular in recent years.
- The automatic derivation of semantically-meaningful information from the content of an image is the focus of interest for most research on image databases.
- The image ªsemantics,º i.e., the meanings of an image, has several levels.
- Content-based image retrieval (CBIR) is the set of techniques for retrieving semantically-relevant images from an image database based on automatically-derived image features.
1.1.1 Histogram Search
- Histogram search algorithms [4], [18] characterize an image by its color distribution or histogram.
- Many distances have been used to define the similarity of two color histogram representations.
- Euclidean distance and its variations are the most commonly used [4].
- The drawback of a global histogram representation is that information about object location, shape, and texture [10] is discarded.
- Color histogram search is sensitive to intensity variation, color distortions, and cropping.
1.1.2 Color Layout Search
- The ªcolor layoutº approach attempts to overcome the drawback of histogram search.
- In simple color layout indexing [4], images are partitioned into blocks and the average color of each block is stored.
- Thus, the color layout is essentially a low resolution representation of the original image.
- As with pixel representation, although information such as shape is preserved in the color layout representation, the retrieval system cannot perceive it directly.
- This system is also limited to intensity-level image representations.
1.1.3 Region-Based Search
- Region-based retrieval systems attempt to overcome the deficiencies of color layout search by representing images at the object-level.
- The motivation is to shift part of the comparison task to the users.
- The user's semantic understanding of an image is at a higher level than the region representation.
- On the other hand, because of the great difficulty of achieving accurate segmentation, systems in [11], [2] often partition one object into several regions with none of them being representative for the object, especially for images without distinctive objects and scenes.
- Region strings are converted to composite region template (CRT) descriptor matrices that provide the relative ordering of symbols.
1.3 Overview of the SIMPLIcity System
- CBIR is a complex and challenging problem spanning diverse disciplines, including computer vision, color perception, image processing, image classification, statistical clustering, psychology, human-computer interaction (HCI), and specific application domain dependent criteria.
- While the authors are not claiming to be able to solve all the problems related to CBIR, they have made some advances towards the final goal, close to human-level automatic image understanding and retrieval performance.
- The authors discuss issues related to the design and implementation of a semantics-sensitive CBIR system for picture libraries.
- An experimental system, the SIMPLIcity (Semantics-sensitive Integrated Matching for Picture LIbraries) system, has been developed to validate the methods.
- The authors summarize the main contributions as follows.
1.3.1 Semantics-Sensitive Image Retrieval
- The capability of existing CBIR systems is limited in large part by fixing a set of features used for retrieval.
- The authors propose a semantics-sensitive approach to the problem of searching general-purpose image databases.
- Semantic classification methods are used to categorize images so that semantically-adaptive searching methods applicable to each category can be applied.
- Automatic classification methods can be used to categorize a general-purpose picture library into semantic classes including ªgraph,º ªphotograph,º ªtextured,º ªnontextured,º ªbenign,º ªobjectionable,º ªindoor,º ªoutdoor,º ªcity,º ªlandscape,º ªwith people,º and ªwithout people.º.
- Automatic derivation of optimal features is a challenging and important issue in its own right.
1.3.2 Image Classification
- For the purpose of searching picture libraries such as those on the Web or in a patient digital library, the authors are initially focusing on techniques to classify images into the classes ªtexturedº versus ªnontextured,º ªgraphº versus ªphotograph.º.
- Several other classification methods have been previously developed elsewhere, including ªcityº versus ªlandscapeº [26], and ªwith peopleº versus ªwithout peopleº [1].
- The authors report on several classification methods they have developed and their performance.
1.3.3 Integrated Region Matching (IRM) Similarity Measure
- Besides using semantics classification, another strategy of SIMPLIcity to better capture the image semantics is to define a robust region-based similarity measure, the Integrated Region Matching (IRM) metric.
- Image segmentation is an extremely difficult process and is still an open problem in computer vision.
- Traditionally, region-based matching is performed on individual regions [2], [11].
- The IRM metric the authors have developed has the following major advantages: 1. Compared with retrieval based on individual regions, the overall ªsoft similarityº approach in IRM reduces the adverse effect of inaccurate segmentation, an important property lacked by previous systems.
- In many cases, knowing that one object usually appears with another helps to clarify the semantics of a particular region.
1.4 Outline of the Paper
- The remainder of the paper is organized as follows:.
- The semantics-sensitive architecture is further introduced in Section 2.
- The image segmentation algorithm is described in Section 3.
- Classification methods are presented in Section 4.
- In Section 6, experiments and results are described.
2 SEMANTICS-SENSITIVE ARCHITECTURE
- The architecture of the SIMPLIcity retrieval system is presented in Fig. 1.
- During indexing, the system partitions an image into 4 4 pixel blocks and extracts a feature vector for each block.
- A statistical clustering [8] algorithm is then used to quickly segment the image into regions.
- For an image in the database, its semantic type is first checked and then its signature is extracted from the corresponding database.
- Once the signature of the query image is obtained, similarity scores between the query image and images in the database with the same semantic type are computed and sorted to provide the list of images that appear to have the closest semantics.
3 THE IMAGE SEGMENTATION METHOD
- The authors describe the image segmentation procedure based on the k-means algorithm [8] using color and spatial variation features.
- A low D k indicates high purity in the clustering process.
- The first derivative of distortion with respect to k, D k ÿD kÿ 1 , is below a threshold with comparison to the average derivative at k 2; 3. A lowD k ÿ D kÿ 1 indicates convergence in the clustering process.
- After a one-level wavelet transform, a 4 4 block is decomposed into four frequency bands, as shown in Fig.
- An image with vertical strips thus has high energy in the HL band and low energy in the LH band.
4 THE IMAGE CLASSIFICATION METHODS
- The image classification methods described in this section have been developed mainly for searching picture libraries such as Web images.
- The authors are initially interested in classifying images into the classes textured versus nontextured, graph versus photograph, and objectionable versus benign.
- Karu et al. provided an overview of texture-related research [10].
- Other classification methods such as city versus landscape [26] and with people versus without people [1] were developed elsewhere.
4.1 Textured versus Nontextured Classification
- The authors describe the algorithm to classify images into the semantic classes textured or nontextured.
- Fig. 4 shows some sample textured images.
- The classification of textured or nontextured image is performed by thresholding the average 2 statistics for all the regions in the image, 2 1m Pm i 1 2 i .
4.2 Graph versus Photograph Classification
- An image is a photograph if it is a continuous-tone image.
- The authors have developed a graph-photograph classification method.
- The classifier partitions an image into blocks and classifies every block into either of the two classes.
- If the percentage of blocks classified as photograph is higher than a threshold, the image is marked as photograph; otherwise, graph.
- The authors achieved 100 percent sensitivity for photographic images and higher than 95 percent specificity.
5 THE IRM SIMILARITY MEASURE
- The integrated region matching (IRM) measure of image similarity is described.
- An advantage of the overall similarity measure is the robustness against poor segmentation (Fig. 6), an important property lacked in previous work [2], [11].
- Every point in the space corresponds to the feature vector or the descriptor of a region.
- Such as the Euclidean distance, it is not obvious how to define a distance between two sets of feature points.
- The distance should be sufficiently consistent with a person's concept of semantic ªclosenessº of two images.
5.1 Integrated Region Matching (IRM)
- Every match between images is characterized by links between regions and their significance credits.
- If a graph represents an admissible matching, the distance between the two region sets is the summation of all the weighted edge lengths, i.e., d R1; R2 X i;j si;jdi;j: 4.
- The SIMPLIcity system uses the area percentage scheme.
5.2 Distance between Regions
- Now, the authors discuss the definition of distance between a region pair, d r; r0 .
- Denote the th order normalized inertia of spheres as L .
- If two regions match very well in shape, their color and texture distance is attenuated by a smaller weight to provide the final distance.
5.3 Characteristics of IRM
- To study the characteristics of the IRM distance, the authors performed 100 random queries on their COREL photograph data set.
- 6 million IRM distances obtained, the authors estimated the distribution of the IRM distance, also known as Based on the 5.
- The authors may notify the user that two images are considered to be very close when the IRM distance between the two images is less than 15.
- Likewise, the authors may advise the user that two images are considerably different when the IRM distance between the two images is greater than 50.
6 EXPERIMENTS
- The SIMPLIcity system has been implemented with a general-purpose image database including about 200; 000 pictures, which are stored in JPEG format with size 384 256 or 256 384.
- Two classification methods, graph-photograph and textured-nontextured, have been used in their experiments.
- WBIIS had been compared with the original IBM QBIC system and found to perform better [28].
- It is difficult to design a fair comparison with existing region-based searching algorithms such as the Blobworld system and the NeTra system which depends on additional information to be provided by the user during the process.
- A list of online image retrieval demonstration Web sites can be found on their site.
6.1 Accuracy
- The authors evaluated the accuracy of the system in two ways.
- First, the authors used a 200,000-image COREL database to compare with existing systems such as EMD-based color histogram and WBIIS.
- Then, the authors designed systematic evaluation methods to judge the performance statistically.
- The SIMPLIcity system has demonstrated much improved accuracy over the other systems.
6.2 Query Comparison
- The authors compare the SIMPLIcity system with the WBIIS (Wavelet-Based Image Indexing and Searching) system [28] with the same image database.
- Due to the limitation of space, the authors show only two rows of images with the top 11 matches to each query.
- The authors chose the numbers ª11º and ª29º before viewing the results.
- In each query, the authors decide the relevance to the query image before viewing the query results.
- To view the images better or to see more matched images, users can visit the demonstration Web site and use the query image ID to repeat the retrieval.
6.3.1 Performance on Image Queries
- To provide numerical results, the authors tested 27 sample images chosen randomly from nine categories, each containing three of the images.
- The categories of images tested are listed in Table 1a.
- Images in the ªsports and public eventsº class contain people in a game or public event, such as a festival.
- On average, the precision and the weighted precision of SIMPLIcity are higher than those of WBIIS by 0:227 and 0:273, respectively.
6.3.2 Performance on Image Categorization
- The SIMPLIcity system was also evaluated based on a subset of the COREL database, formed by 10 image categories (shown in Table 1b), each containing 100 pictures.
- The recall within the first 100 retrieved images is identical to the precision in this special case.
- The authors used LUV color space and a matching metric similar to the EMD described in [18] to extract color histogram features and match in the categorized image database.
- The authors call the one with less filled color bins the Color Histogram 1 system and the other the Color Histogram 2 system.
- For this reason, the authors cannot evaluate this system using the COREL database of 200,000 images and the 27 sample query images described in the previous section.
6.4.1 Speed
- The algorithm has been implemented on a Pentium III 450MHz PC using the Linux operating system.
- On average, one second is needed to segment an image and to compute the features of all regions.
- The speed is much faster than other region-based methods.
- Fast indexing has provided us with the capability of handling external queries and sketch queries in real time.
- If the query image is not already in the database, one extra second of CPU time is spent to extract the feature from the query image.
Did you find this useful? Give us your feedback
Citations
3,433 citations
1,713 citations
1,535 citations
1,512 citations
1,199 citations
References
[...]
22,994 citations
"SIMPLIcity: semantics-sensitive int..." refers methods in this paper
...The goodness of t is measured by the 2 statistics [20]....
[...]
16,073 citations
"SIMPLIcity: semantics-sensitive int..." refers background in this paper
...The other three represent energy in high frequency bands of wavelet transforms [3], that is, the square root of the second order moment of wavelet coefficients in high frequency bands....
[...]
14,157 citations
13,789 citations
10,702 citations
Related Papers (5)
Frequently Asked Questions (19)
Q2. What future works have the authors mentioned in the paper "Simplicity: semantics-sensitive integrated matching for picture libraries" ?
The authors are planning to build a sharable testbed for statistical evaluation of different CBIR systems.
Q3. What is the focus of research on image databases?
The automatic derivation of semantically-meaningfulinformation from the content of an image is the focus ofinterest for most research on image databases.
Q4. What is the algorithm used to classify graph images?
The algorithm the authors used to classify image blocks is based on a probability density analysis of wavelet coefficients in high frequency bands.
Q5. How fast is the query image retrieval?
When the query image is in the database, it takes about 1:5 seconds of CPU time on average to sort all the images in the 200,000-image database using the IRM similarity measure.
Q6. What is the main task of designing a signature?
the main task of designing a signature is to bridge the gap between image semantics and the pixel representation, that is, to create a better correlation with image semantics.
Q7. What is the purpose of region-based retrieval systems?
Region-based retrieval systems attempt to overcome the deficiencies of color layout search by representing images at the object-level.
Q8. How long does it take to compute the feature vectors for the color images?
To compute the feature vectors for the 200; 000 color images of size 384 256 in their general-purpose image database requires approximately 60 hours.
Q9. What are the three categories of CBIR systems?
Existing general-purpose CBIR systems roughly fall into three categories depending on the approach to extract signatures: histogram, color layout, and region-based search.
Q10. What is the recent approach to reduce the shifting and scaling sensitivity for color layout search?
The approach taken by the recent WALRUS system [14] to reduce the shifting and scaling sensitivity for color layout search is to exhaustively reproduce many subimages based on an original image.
Q11. How many features can be obtained by uniformly quantizing features?
If texture and shape features are also used to distinguish patterns, the number of patterns in the library will increase dramatically, roughly exponentially in the number of features if patterns are obtained by uniformly quantizing features.
Q12. How many images are stored in the SIMPLIcity system?
The SIMPLIcity system has been implemented with a general-purpose image database including about 200; 000 pictures, which are stored in JPEG format with size 384 256 or 256 384.
Q13. What is the way to extract color histogram features from the categorized image database?
The authors used LUV color space and a matching metric similar to the EMD described in [18] to extract color histogram features and match in the categorized image database.
Q14. Why does the system depend on the pattern library?
Because the definition of the CRT descriptor matrix relies on the pattern library, the system performance depends critically on the library.
Q15. How much time is spent to extract the feature from the query image?
If the query image is not already in the database, one extra second of CPU time is spent to extract the feature from the query image.
Q16. How do the authors notify the user that two images are considered very close?
The authors may notify the user that two images are considered to be very close when the IRM distance between the two images is less than 15.
Q17. How do the authors use the image ID to view the images better?
To view the images better or to see more matched images, users can visit the demonstration Web site and use the query image ID to repeat the retrieval.
Q18. What is the way to determine whether an image is textured?
As shown by the segmentation results in Fig. 3, regions in textured images tend to scatter in the entire image, whereas nontextured images are usually partitioned into clumped regions.
Q19. How fast is the application of SIMPLIcity to a database of general-purpose images?
The application of SIMPLIcity to a database of about 200,000 general-purpose images shows more accurate and much faster retrieval compared with the existing algorithms.