How can the adaptive still color image compression algorithm be extended?

the reported visual attention algorithm can be extended to detect ROIs in temporally changing scenes, by introducing motion as an additional scene feature into the model.

What is the purpose of the paper?

After introducing the biologically inspired saliencybased model of visual attention which permits the identification of perceptually salient regions-of-interest on color images, this paper reported an adaptive still color image compression method.

What is the inverse quantization of the DCT coefficients?

To preserve image detail within the ROIs, sf0 is usually chosen to be in the interval [0.5, 1], while sf1 is generally selected to be a real number larger than two.

(Open Access) Adaptive color image compression based on visual attention (2001) | Nabil Ouerhani

Q: What are the contributions mentioned in the paper "Adaptive color image compression based on visual attention" ?

This paper reports an adaptive still color image compression method which produces automatically-selected ROIs with a higher reconstruction quality with respect to the rest of the input image.

Q: What is the way to compress still color images?

The compressed image files produced by this adaptive method are fully compatible with the JPEG standard, which favors their widespread utilization.

Q: What is the adaptive algorithm used to produce the results?

Afterwards,the color image is compressed using two methods, a) standard JPEG, and b) the JPEG-based adaptive compression algorithm.

Adaptive Color Image Compression based on Visual Attention

Nabil Ouerhani, Javier Bracamonte, Heinz H

ugli, Michael Ansorge, Fausto Pellandini

Institute of Microtechnology, University of Neuch

atel

Rue A.-L. Breguet 2, CH-2000 Neuch

atel, Switzerland

{Nabil.Ouerhani, Javier.Bracamonte}@unine.ch

Abstract

This paper reports an adaptive still color image com-

pression method which produces automatically-selected

ROIs with a higher reconstruction quality with respect to

the rest of the input image. The ROIs are generated on-the-

ﬂy with a purely data-driven technique based on visual at-

tention. Inspired from biological vision, the multicue visual

attention algorithm detects the most visually-salient regions

of an image. Thus, when operating in systems with low bit-

rate constraints, the adaptive coding scheme favors the al-

location of a higher number of bits to those image regions

that are more conspicuous to the human visual system. The

compressed image ﬁles produced by this adaptive method

are fully compatible with the JPEG standard, which favors

their widespread utilization.

1 Introduction

Visual attention is the ability to rapidly detect interesting

parts of a given scene. Using visual attention in a com-

puter vision system permits a rapid selection of a subset of

the available sensory information before further processing.

The selected locations usually correlates with the conspicu-

ous parts of the scene.

Various computational models of visual attention have been

presented in previous works [1, 2]. These models are, in

general, data-driven and based on the feature integration

principle [9] which is inspired by psychophysical studies

on human visual attention. Known as saliency-based, the

model presented in [2] considers a variety of scene fea-

tures (intensity, orientation, and color) to compute a set of

conspicuity maps which are then combined into the ﬁnal

saliency map. The conspicuity operator is a kind of “con-

trast detector” which, applied on a feature map, brings out

the regions of the scene containing salient information and

thus, provides a set of relevant regions-of-interest, ROIs.

The availability of a set of preidentiﬁed ROIs can be associ-

ated with spatially adaptive image coding in order to obtain

higher compression ratios while keeping a good reconstruc-

tion quality within the visually important regions of the im-

age.

Previous works have dealt with the problem of the identiﬁ-

cation of ROIs to spatially adapt the compression according

to the relative importance of regions [3, 7, 10]. A recent

work [5, 6] presented an algorithm based on automatically

preidentiﬁed ROIs which have been computed by means of

a biologically plausible technique. This technique deals,

however, only with grey scale images; chromatic features

were not considered.

In this work, we investigate a different biologically inspired

technique to identify visually relevant regions on an image.

In contrast with the method discussed above, the multicue

saliency-based model of visual attention combines chro-

matic as well as monochromatic features to identify ROIs

on color images.

This paper is organized as follows. Section 2 presents the

saliency-based model of visual attention. The adaptive im-

age compression algorithm is described in Section 3. Sec-

tion 4 reports the results of experiments involving a variety

of color images, in order to assess the effectiveness of visual

attention in the ﬁeld of adaptive color image compression.

Finally, the conclusions are stated in Section 5.

2 Visual attention model

2.1 Saliency-based model

According to a generally admitted model of visual per-

ception [4], a visual attention task can be achieved in three

main steps (Fig. 1).

1) First, a number (n) of features are extracted from the

scene by computing the so-called feature maps. A feature

map represents the image of the scene, based on a well-

deﬁned feature. This leads to a multi-feature representation

of the scene. Five feature maps have been considered in our

implementation: a) The difference between the red and the

green component (R − G), b) The difference between the

blue and the yellow component (B − Y ), c) The intensity

Published in Proceedings of the 11th International Conference on Image Analysis and Processing (ICIAP), 416-421, 2001

which should be used for any reference to this work

Sensors Data

Color Image

Features Extraction

Feature

Map 1

Feature

Map i

Feature

Map n

Center-

Surround

Center-

Surround

Center-

Surround

Conspicuity Conspicuity Conspicuity

Map 1 Map i Map n

Camera

Video

Integration Process

Saliency Map

Scene

Figure 1. Scheme of a computational model of attention.

image, and d) The gradient orientation map.

2) In a second step, each feature map is transformed in its

conspicuity map. Each conspicuity map highlights the parts

of the scene that, according to a speciﬁc feature, strongly

differ from its surrounding. In biologically plausible mod-

els, this is usually achieved by using a center-surround-

mechanism. Practically, this mechanism can be imple-

mented with a difference-of-Gaussians-ﬁlter, which can be

applied on feature maps to extract local activities for each

feature type.

3) In the last stage of the attention model, the n conspicuity

maps C

are integrated together, in a competitive way, into

a saliency map S in accordance with equation 1.

S =

i=1

(1)

The competition between conspicuity maps is usually es-

tablished by selecting weights w

according to a weighting

function w, like the one presented in [2]: w = (M − m)

where M is the maximum activity of the conspicuity map

and m is the average of all its local maxima. w mea-

sures how the most active locations differ from the average.

Thus, this weighting function promotes conspicuity maps in

which a small number of strong peaks of activity is present.

Maps that contain numerous comparable peak responses are

demoted. It is obvious that this competitive mechanism is

purely data-driven and does not require any a priori knowl-

edge about the analyzed scene.

FOA

Input

Saliency

map

image

Figure 2. Salient regions selection. Applying a WTA

mechanism to a saliency map permits the selection of

the most conspicuous parts of the image.

2.2 Selection of salient locations

At any given time, the maximum of the saliency map

deﬁnes the most salient location, to which the focus of

attention (FOA) should be directed. A ”winner-take-all”

(WTA) mechanism [2] is used to detect, successively, the

signiﬁcant regions. Given a saliency map computed by the

saliency-based model of visual attention, the WTA mecha-

nism starts with selecting the location with the maximum

value on the map. This selected region is considered as

the most salient part of the image (winner). The FOA is

then shifted to this location. Local inhibition is activated in

the saliency map, in an area around the actual FOA. This

yields dynamical shifts of the FOA by allowing the next

most salient location to subsequently become the winner.

Besides, the inhibition mechanism prevents the FOA from

returning to previously attended locations. An example of



Figure 3. Adaptive JPEG-based Image compression

Algorithm

salient regions selection based on the WTA mechanism is

given on the Figure 2.

3 Adaptive compression algorithm

Figure 3 shows a block diagram of the adaptive coding

method. This scheme follows the same operations of the

baseline JPEG algorithm, albeit with a quantization unit that

has been modiﬁed to receive an additional input: a binary

image produced by the visual attention stage. This binary

information will indicate the quantizer to execute either a

short- or a large-step quantization of the DCT coefﬁcients,

depending on whether a given 8x8-pixel block lies inside or

outside any of the identiﬁed ROIs.

The modiﬁed quantization unit uses the same normalization

array N proposed in the ofﬁcial JPEG document [8]. This

unit requires also two scale factor parameters: sf

and sf

These values can be permanently set by the user, or left

to be set by the system in correspondence with a required

compression bit rate for a given image. For those 8x8-pixel

blocks with a majority of pixels lying within the ROIs, the

quantization is executed using the normalization array pre-

viously scaled by sf

. For the rest of the blocks, N is scaled

by sf

before quantization. To preserve image detail within

the ROIs, sf

is usually chosen to be in the interval [0.5, 1],

while sf

is generally selected to be a real number larger

than two.

The binary image that is produced by the visual attention

stage and used by the quantization unit, represents over-

head information to be embedded in the compressed data

bitstream. This data is required for a decoder to execute

the corresponding ROI-dependent inverse quantization of

the DCT coefﬁcients. The presence of this overhead data

precludes the compressed image from being reconstructed

Figure 4. Quantizer of the adaptive compression al-

gorithm to produce JPEG compatibility

(a) Original image

(b) Identiﬁed ROIs

(d) Adaptive compression

Figure 5. Adaptive versus non-adaptive compression: Example 1.

by a standard JPEG decoder. If JPEG compatibility is not

an issue, then any given JPEG decoder can be straightfor-

wardly modiﬁed to accept the additional binary informa-

tion, and accordingly execute the decoding operation of the

compressed bitstream.

JPEG compatibility, however, could be required in a large

number of systems, and it can be easily achieved in ex-

change of two additional quantization operations, as shown

in Figure 4. After the DCT operation, the initial ROI-

dependent quantization (Q

) is followed by a correspond-

ing ROI-dependent inverse quantization (Q

−1

). After this

point, the overhead data is no longer required and the cur-

rent DCT coefﬁcients can be re-normalized using a regu-

lar, ROI-independent, JPEG quantization (Q). This proce-

dure produces a spatially adaptive compressed image which

is fully compatible with the JPEG standard. This was the

scheme used to produce the results presented in Section 4.

4 Experiments

This section is dedicated to report experiments involving

two sets of color images, in order to assess the usefulness of

visual attention in the ﬁeld of color image compression by

means of the adaptive coding scheme presented in Section

3. For each example, a color image is acquired; using this

image as input, the visual attention algorithm computes a

set of ROIs. In these experiments the number of identiﬁed

ROIs has been, for simplicity, limited to three. A yellow cir-

cle is drawn around each of the identiﬁed ROIs. Afterwards,

(a) Original image

(b) Identiﬁed ROIs

(d) Adaptive compression

Figure 6. Adaptive versus non-adaptive compression: Example 2.

the color image is compressed using two methods, a) stan-

dard JPEG, and b) the JPEG-based adaptive compression

algorithm. With both methods, and for all the experiments,

the overall compression ratio produced was 100:1.

The images of the reported experiments are shown in Fig.

5 and 6, they mainly feature two persons facing the cam-

era. Based on the considered features described in Section

2, the two persons stand out from the rest of the scene, and

are thus, natural candidates for the ROIs, to be automati-

cally identiﬁed by the visual attention algorithm. Figures

5b) and 6b) show, that as expected, the two persons’ faces

ﬁgure among the three most salient regions of the image.

The adaptive compression algorithm takes into account the

relative importance of these image regions. Consequently,

the reconstructed images (bottom-right images in Fig. 5 and

6) preserve the visual details of the two faces, which may

be relevant for the recognition of the two persons. On the

other hand, the persons’ faces have lost important percep-

tual details when using the standard JPEG method (bottom-

left images in Fig. 5 and 6). In the latter case, one may have

difﬁculty to identify the two persons.

The advantage of the adaptive algorithm is highlighted in

Fig. 7, where the rightmost ROI has been zoomed in.

These examples clearly validate the adaptive color image

compression algorithm based on visual attention. Despite

the unavailability of any a priori knowledge about the an-

alyzed images, the reported coding scheme permits the

preservation of perceptually important image details.

Adaptive color image compression based on visual attention

Figures

Citations

Visual attention detection in video sequences using spatiotemporal cues

VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search (Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence)

Empirical Validation of the Saliency-based Model of Visual Attention

Assessing the contribution of color in visual attention

Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features

References

A feature-integration theory of attention

A model of saliency-based visual attention for rapid scene analysis

Algorithms for defining visual regions-of-interest: comparison with eye fixations

Perceptual adaptive JPEG coding

A prototype for data-driven visual attention

Related Papers (5)

A model of saliency-based visual attention for rapid scene analysis

Shifts in selective visual attention: towards the underlying neural circuitry.

A feature-integration theory of attention

Frequency-tuned salient region detection

Learning to Detect a Salient Object

Frequently Asked Questions (6)

Q1. What are the contributions mentioned in the paper "Adaptive color image compression based on visual attention" ?

Q2. How can the adaptive still color image compression algorithm be extended?

Q3. What is the purpose of the paper?

Q4. What is the way to compress still color images?

Q5. What is the inverse quantization of the DCT coefficients?

Q6. What is the adaptive algorithm used to produce the results?