what is the method to locate images with a particular face?

When the problem is to locate images with a particular object (a particular face, a particular building) and not any object of a given type, principal component analysis methods of more general features of the images is the only efficient method.

What are the problems of the present schemes for shape similarity modelling?

As for color and texture, the present schemes for shape similarity modelling are faced with serious difficulties when images include several objects or background.

(Open Access) Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review (1996) | Philippe Aigrain

Q: What is the main reason for content based retrieval?

Not only content based retrieval reduces the high variability among human indexers, but it also enables more “fuzzy” browsing and search which in many application is an essential part of the process.

Q: What are the main applications of multimedia technology?

The present development of multimedia technology and information highways has put content processing of visual media at the core of key application domains: digital and interactive video, large distributed digital libraries, multimedia publishing.

Q: What is the problem of combining color similarity with a spatial component?

One gets three components (periodic, evanescent and random) corresponding to the bi-dimensional periodicity, mono-dimensional orientation, and complexity of the analyzed texture.

Q: Why is color distribution similarity one of the first choices?

Color distribution similarity has been one of the first choices [HK92, FSN+95] because if one chooses a proper representation and measure it can be partially reliable even in presence of changes in lighting, view angle, and scale.

Q: What is the main reason for using color distributions in search?

It has been proposed in [AJL95] to use hue and saturation distributions only when one wants to capture lighting-independent color distribution properties which are good signatures of a scene when the scale does not change too much.

Content-based Representation and Retrieval of Visual

Media: A State-of-the-Art Review

Philippe Aigrain

Institut de Recherche en Informatique de Toulouse, Universite Paul Sabatier

118, route de Narbonne, F-31062 Toulouse Cedex, France

HongJiang Zhang

Broadband Information Systems Lab, Hewlett-Packard Labs

1501 Page Mill Road. Palo Alto, CA94304, USA

Dragutin Petkovic

IBM Almaden Research Center

San Jose, CA 95120-6099, USA

Abstract

This paper reviews a number of recently available techniques in contentanalysis

of visual media and their application to the indexing, retrieval,abstracting, rele-

vance assessment, interactive perception, annotation and re-use of visualdocu-

ments.

1. Background

A few years ago, the problems of representation and retrieval of visualmedia were conﬁned to

specialized image databases (geographical, medical, pilot experimentsin computerized slide

libraries), in the professional applications of the audiovisualindustries (production, broadcasting

and archives), and in computerized training or education. The presentdevelopment of multimedia

technology and information highways has put content processing of visualmedia at the core of

key application domains: digital and interactive video, large distributed digital libraries, multime-

dia publishing. Though the most important investments have been targeted at the information

infrastructure (networks, servers, coding and compression, deliverymodels, multimedia systems

architecture), a growing number of researchers have realized thatcontent processing will be a key

asset in putting together successful applications. The need for contentprocessing techniques has

been made evident from a variety of angles, ranging from achievingbetter quality in compression,

allowing user choice of programs in video-on-demand, achieving betterproductivity in video pro-

duction, providing access to large still image databases or integrating still images and video in

multimedia publishing and cooperative work.

Content-based retrieval of visual media and representation of visualdocuments in human-com-

puter interfaces are based on the availability of content representationdata (time-structure for

1. To apprear in Multimedia Tools and Applications special issue on Representation and Retrieval of Visual

Media.

time-based media, image signatures, object and motion data). When itis possible, the human pro-

duction of this descriptive data is so time consuming - and thus costly -that it is almost impossible

to generate it for large document spaces. There is some hope that for video documents, some of

this data will be created at production time and coded in the documentitself. Nonetheless it will

never be available for many existing documents, and when considering thehistory of media and

carriers one is lead to a very cautious estimate of how often this typeof information will be really

available even in future documents. Thus, there is a clear need forautomatic analysis tools which

are able to extract representation data from the documents.

The researchers involved in content processing efforts come from various backgrounds, for

instance:

• the publishing, entertainment, retail or document industry whereresearchers try to extend

their activity to visual documents, or to integrate them inhypertext-based new document

types,

• the AV hardware and software industry, primarily interested by digital editing tools and

other programma production tools,

• academic laboratories where research had been conducted for some time oncomputer

analysis and access to existing visual media, such as the MIT MediaLaboratory , the Insti-

tute of System Sciences in Singapore, or IRIT in France,

• large telecommunication company laboratories, where researchers areprimarily interest-

ing in cooperative work and remote access to visual media,

• the robotics vision, signal processing, image sequence processing forsecurity , or data

compression research communities who try to ﬁnd new applications fortheir models of

images or human perception.

• computer hardware manufacturers developing digital library or visualmedia research pro-

grams.

These researchers originally used very different models and techniques and often conﬂicting

vocabulary. After a few years of lively confusion and exciting achievements, it isnow possible to

draw a clearer panorama of the state of this emerging ﬁeld, and to outline some of its possible

directions of development.

In this paper, we review the methods available for different types of visual content analysis, repre-

sentation and their application and survey some open research problems.Section 2 covers various

visual features for representing and comparing image content. Sections 3reviews video content

parsing and representation algorithms and schemes, including temporalsegmentation, video

abstraction, shot comparison and soundtrack analysis. Section 4 presentapplications of visual

representation schemes in content-based image and video retrieval andbrowsing. Finally, Section

5 summaries our survey and current research directions.

2 The many facets of image similarity

Retrieval of still images by similarity, i.e. retrieving images which are similar to an already

retrieved image (retrieval by example) or to a model or schema is arelatively old idea. Some

might date it to the mnemotechnical ideas of the antiquity, but more seriously it appeared in spe-

cialized geographical information systems databases around 1980, inparticular in the Query by

Pictorial Example system of IMAID[CF80]. From the start, it was clearthat retrieval by similarity

called for speciﬁc deﬁnitions of what it means to be similar. In the mapping system, a satellite

image was matched to existing map images from the point of view ofsimilarity of road and river

networks, easily extracted from images by edge detection. Apart frompaper models [Aig87], it

was only in the beginning of the 90s that researchers started to look atretrieval by similarity in

large set of heterogeneous images with no speciﬁc model of their semanticcontents. The proto-

type systems of Kato[Kat92], followed by the availability of the QBIC commercial system using

several types of similarities [FSN+95] contributed to making this idea moreand more popular .

A system for retrieval by similarity rest on 3 components:

• extraction of features or image signatures from the images, and an efﬁcient representation

and storage strategy for this precomputed data,

• a set of similarity measures, each of which captures some perceptivelymeaningful deﬁni-

tion of similarity, and which should be efﬁciently computable when matching an example

with the whole database,

• a user interface for the choice of which deﬁnition(s) ofsimilarity should be applied for

retrieval, and for the ordered and visually efﬁcient presentation of retrieved images and for

supporting relevance feedback.

Recent work has made evident that:

• A large number of meaningful types of similarity can and must be deﬁned.Only part of

these deﬁnitions are associated with efﬁcient feature extraction mechanisms and (dis)sim-

ilarity measures.

• Since there are many deﬁnitions of similarity and the discriminatingpower of each of the

measures is likely to degrade signiﬁcantly for large image databases, the user interface and

the feature storage strategy components of the systems will play a moreand more essential

role. We will come back to this point in Section 4.1.

• Visual content based retrieval is best utilized when combined with thetraditional search,

both at user interface and the system level. The basic reason for this isthat we do not see

the possibility of content based retrieval replacing the ability ofparametric (SQL) search,

text and keywords to represent the rich semantic content of the visualmaterial (names,

places, action, prices etc.). The key is to apply content basedretrieval where appropriate,

and this is where the use of text and keywords is suboptimal. Examples ofsuch applica-

tions are where visual appearance (e.g. color, texture, shape, motion) are important search

arguments like in stock photo/video, art, retail, on-line shopping etc. Notonly content

based retrieval reduces the high variability among human indexers, but italso enables

more “fuzzy” browsing and search which in many application is anessential part of the

process. It is obvious then that content based retrieval involves stronguser interaction,

thus necessitating the development of special fast browser and UI techniques.

In this section we brieﬂy survey the various types of similaritydeﬁnitions and associated feature

extraction and measures for systems which do not assume any speciﬁcimage domain or a-priori

semantic knowledge on the images.

Gudivada has listed possible types of similarity for retrievalin[Gudivada95]: color similarity , tex-

ture similarity, shape similarity, spatial similarity, etc. Some of these types can be considered in

all or only part of one image, can be considered independently of scaleor angle or not, depending

on whether one is interested in the scene represented by the image or in theimage per se.

2.1 Color similarity

Color distribution similarity has been one of the ﬁrst choices [HK92,FSN+95] because if one

chooses a proper representation and measure it can be partially reliableeven in presence of

changes in lighting, view angle, and scale. For the capture of propertiesof the global color distri-

bution in images, the need for a perceptively meaningful color modelleads to the choice of HLS

(Hue-Luminosity-Saturation) models, and of measures based on the 3ﬁrst moments of color dis-

tributions[SO94] preferably to histogram distances. It has been proposedin [AJL95] to use hue

and saturation distributions only when one wants to capturelighting-independent color distribu-

tion properties which are good signatures of a scene when the scale doesnot change too much. In

this case one can identify the hue-saturation perceptive space with thecomplex unit discus and

deﬁne measures using statistical moments in this space. This isuseful to avoid biases of measures

which do not take in account the circular nature of hue, and could befurther reﬁned to distinguish

between true spectral hues and the purples. Stricker and Orengo have argued in [SO94] for the

importance of including the 3rd moment (distribution skewness) inthe deﬁnition of the similarity

measure.

One important difﬁculty with color similarity is that when using it for retrieval, anuser will often

be looking for an image “with a red object such as this one”. Thisproblem of restricting color

similarity to a spatial component, and more generally of combiningspatial similarity and color

similarity is also present for texture similarity. It explains why prototype and commercial systems

have included complex ad-hoc mechanisms in their user interfaces tocombine various similarity

functions.

2.2 Texture similarity

For texture as for color, it is essential to deﬁne a well-funded perceptive space. Picard and

Liu[PL94] have shown that it is possible to do so using the Wold decomposition of the texture

considered as a luminance ﬁeld. One gets three components(periodic, evanescent and random)

corresponding to the bi-dimensional periodicity, mono-dimensional orientation, and complexity

of the analyzed texture. Experiments have shown that these independentcomponents agree well

with the perceptive evaluation of texture similarity[TMY79]. Therelated similarity measures

has lead to remarkably efﬁcient results including for the retrieval of large-scale textures such as

images of buildings and cars [PM95] In QIBC system, Tomura texture features, contrast, com-

pactness and direction, ared used [FSN+95]. But of course one is againconfronted to the problem

of combining texture information with the spatial organization of several textures (see below).

2.3 Shape similarity

A proper deﬁnition of shape similarity calls for the distinctionsbetween shape similarity in

images (similarity between actual geometrical shapes appearing in theimages) and shape similar-

ity between the objects depicted by the images, i.e. similarity modulo anumber of geometrical

transformations corresponding to changes in view angle, opticalparameters and scale. In some

cases, one wants to include even deformation of non-rigid bodies. Theﬁrst type of similarity has

attracted research work only for calibrated image databases of specialtypes of objects, such as

ceramic plates. Even, in this case, the researchers have tried todeﬁne shape representations which

are scale independent, resting on curvature, angle statistics andcontour complexity . Systems such

as QBIC[FSN+95] use circularity, eccentricity, major axis orientation (not angle-independent)

and algebraic moment. It should be noted that in some cases the user of aretrieval system will

want a deﬁnition of shape similarity which is dependent on view angle(for instance will want to

retrieve trapezoids with an horizontal basis and not the othertrapezoids).

In the general case, a promising approach has been proposed by Sclaroff and Pent-

land[PS91,SP95] in which shapes are represented as canonicaldeformations of prototype objects.

In this approach, a “physical” model of the 2D-shape is builtusing a new form of Galerkin’ s inter-

polation method (ﬁnite-element discretization). The possibledeformation modes are analyzed

using Karhunen-Loeve transform. This yields an ordered list ofdeformation modes corresponding

to rigid body modes (translation, rotation), low-frequency non-rigidmodes associated to global

deformations and higher-frequency modes associated to localized deformations.

As for color and texture, the present schemes for shape similaritymodelling are faced with seri-

ous difﬁculties when images include several objects or background. Apreliminary segmentation

as well as modelling of spatial relationships between shapes is thennecessary (are we interested

in ﬁnding images where one region represent a shape similar to a givenprototype or to some spa-

tial organization of several shapes?).

2.4 Spatial similarity

Gudivada and Raghavan[GR95] have treated spatial similarity in thesituation in which it is

assumed that images have been (automatically or manually) segmentedinto meaningful objects,

each object being associated with is centroid and a symbolic name. Sucha representation is called

a symbolic image, and it is relatively easy to deﬁne similarity functions for suchimage modulo

transformations such as rotation, scaling and translation. Efforts have also been made to address

spatial similarity directly (without segmentation and objectindexing). This was the case, for

instance, in the original work of Kato[Kat92], in the limited case ofdirect spatial similarity (with-

out geometrical transformation), using a number of ad-hoc statisticalfeatures computed on very

low resolution images.

2.5 Object presence analysis

Finding in a set of images in which a particular object or type of objectappears - all images with

cars, all shots in a video in which a given character is present - is aparticular case of similarity

computation. Once again, the range of applicable methods is deﬁned bythe invariants of the

object to be recognized. For color images, and for images whose colordoes not change, local

color distribution is efﬁcient, and can be reliable even when changes in scale or angle occur

[NT92]. In the general case, the best results so far have been obtainedwith texture-based mod-

els[PPDH94]. A pyramidal analysis of texture (with the whole imageconsidered as the texture

Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review

Citations

Content-based image retrieval at the end of the early years

Image retrieval: Ideas, influences, and trends of the new age

Bursty and Hierarchical Structure in Streams

Bursty and hierarchical structure in streams

Similarity measures

References

Query by image and video content: the QBIC system

Similarity of color images

Photobook: content-based manipulation of image databases

Automatic partitioning of full-motion video

Photobook: tools for content-based manipulation of image databases

Related Papers (5)

Content-based image retrieval at the end of the early years

Color indexing

Query by image and video content: the QBIC system

Photobook: content-based manipulation of image databases

Automatic partitioning of full-motion video

Frequently Asked Questions (15)

Q1. What are the contributions mentioned in the paper "Content-based representation and retrieval of visual media: a state-of-the-art review1" ?

Q2. What is the main reason for content based retrieval?

Q3. What are the main applications of multimedia technology?

Q4. What is the problem of combining color similarity with a spatial component?

Q5. What types of similarity can be considered in a given image?

Q6. Why is color distribution similarity one of the first choices?

Q7. what is the method to locate images with a particular face?

Q8. What are the problems of the present schemes for shape similarity modelling?

Q9. What is the main reason for using color distributions in search?

Q10. What is the time structure for retrieval of visual media?

Q11. What are the main reasons for the need for content processing techniques?

Q12. What is the problem of combining texture information with a spatial component?

Q13. What is the definition of shape similarity?

Q14. When did researchers start looking at retrieval by similarity in large set of heterogeneous?

Q15. What are examples of applications where visual appearance is important?