scispace - formally typeset
Open AccessProceedings ArticleDOI

Adaptive color image compression based on visual attention

TLDR
This paper reports an adaptive still color image compression method which produces automatically selected ROI with a higher reconstruction quality with respect to the rest of the input image.
Abstract
This paper reports an adaptive still color image compression method which produces automatically selected ROI with a higher reconstruction quality with respect to the rest of the input image. The ROI are generated on-the fly with a purely data-driven technique based on visual attention. Inspired from biological vision, the multicue visual attention algorithm detects the most visually salient regions of an image. Thus, when operating in systems with low bit rate constraints, the adaptive coding scheme favors the allocation of a higher number of bits to those image regions that are more conspicuous to the human visual system. The compressed image files produced by this adaptive method are fully compatible with the JPEG standard, which favors their widespread utilization.

read more

Content maybe subject to copyright    Report

Adaptive Color Image Compression based on Visual Attention
Nabil Ouerhani, Javier Bracamonte, Heinz H
¨
ugli, Michael Ansorge, Fausto Pellandini
Institute of Microtechnology, University of Neuch
ˆ
atel
Rue A.-L. Breguet 2, CH-2000 Neuch
ˆ
atel, Switzerland
{Nabil.Ouerhani, Javier.Bracamonte}@unine.ch
Abstract
This paper reports an adaptive still color image com-
pression method which produces automatically-selected
ROIs with a higher reconstruction quality with respect to
the rest of the input image. The ROIs are generated on-the-
fly with a purely data-driven technique based on visual at-
tention. Inspired from biological vision, the multicue visual
attention algorithm detects the most visually-salient regions
of an image. Thus, when operating in systems with low bit-
rate constraints, the adaptive coding scheme favors the al-
location of a higher number of bits to those image regions
that are more conspicuous to the human visual system. The
compressed image files produced by this adaptive method
are fully compatible with the JPEG standard, which favors
their widespread utilization.
1 Introduction
Visual attention is the ability to rapidly detect interesting
parts of a given scene. Using visual attention in a com-
puter vision system permits a rapid selection of a subset of
the available sensory information before further processing.
The selected locations usually correlates with the conspicu-
ous parts of the scene.
Various computational models of visual attention have been
presented in previous works [1, 2]. These models are, in
general, data-driven and based on the feature integration
principle [9] which is inspired by psychophysical studies
on human visual attention. Known as saliency-based, the
model presented in [2] considers a variety of scene fea-
tures (intensity, orientation, and color) to compute a set of
conspicuity maps which are then combined into the final
saliency map. The conspicuity operator is a kind of “con-
trast detector” which, applied on a feature map, brings out
the regions of the scene containing salient information and
thus, provides a set of relevant regions-of-interest, ROIs.
The availability of a set of preidentified ROIs can be associ-
ated with spatially adaptive image coding in order to obtain
higher compression ratios while keeping a good reconstruc-
tion quality within the visually important regions of the im-
age.
Previous works have dealt with the problem of the identifi-
cation of ROIs to spatially adapt the compression according
to the relative importance of regions [3, 7, 10]. A recent
work [5, 6] presented an algorithm based on automatically
preidentified ROIs which have been computed by means of
a biologically plausible technique. This technique deals,
however, only with grey scale images; chromatic features
were not considered.
In this work, we investigate a different biologically inspired
technique to identify visually relevant regions on an image.
In contrast with the method discussed above, the multicue
saliency-based model of visual attention combines chro-
matic as well as monochromatic features to identify ROIs
on color images.
This paper is organized as follows. Section 2 presents the
saliency-based model of visual attention. The adaptive im-
age compression algorithm is described in Section 3. Sec-
tion 4 reports the results of experiments involving a variety
of color images, in order to assess the effectiveness of visual
attention in the field of adaptive color image compression.
Finally, the conclusions are stated in Section 5.
2 Visual attention model
2.1 Saliency-based model
According to a generally admitted model of visual per-
ception [4], a visual attention task can be achieved in three
main steps (Fig. 1).
1) First, a number (n) of features are extracted from the
scene by computing the so-called feature maps. A feature
map represents the image of the scene, based on a well-
defined feature. This leads to a multi-feature representation
of the scene. Five feature maps have been considered in our
implementation: a) The difference between the red and the
green component (R G), b) The difference between the
blue and the yellow component (B Y ), c) The intensity
Published in Proceedings of the 11th International Conference on Image Analysis and Processing (ICIAP), 416-421, 2001
which should be used for any reference to this work
1

Sensors Data
Color Image
Features Extraction
Feature
Map 1
Feature
Map i
Feature
Map n
Center-
Surround
Center-
Surround
Center-
Surround
Conspicuity Conspicuity Conspicuity
Map 1 Map i Map n
Camera
Video
Integration Process
Saliency Map
Scene
Figure 1. Scheme of a computational model of attention.
image, and d) The gradient orientation map.
2) In a second step, each feature map is transformed in its
conspicuity map. Each conspicuity map highlights the parts
of the scene that, according to a specific feature, strongly
differ from its surrounding. In biologically plausible mod-
els, this is usually achieved by using a center-surround-
mechanism. Practically, this mechanism can be imple-
mented with a difference-of-Gaussians-filter, which can be
applied on feature maps to extract local activities for each
feature type.
3) In the last stage of the attention model, the n conspicuity
maps C
i
are integrated together, in a competitive way, into
a saliency map S in accordance with equation 1.
S =
n
X
i=1
w
i
C
i
(1)
The competition between conspicuity maps is usually es-
tablished by selecting weights w
i
according to a weighting
function w, like the one presented in [2]: w = (M m)
2
,
where M is the maximum activity of the conspicuity map
and m is the average of all its local maxima. w mea-
sures how the most active locations differ from the average.
Thus, this weighting function promotes conspicuity maps in
which a small number of strong peaks of activity is present.
Maps that contain numerous comparable peak responses are
demoted. It is obvious that this competitive mechanism is
purely data-driven and does not require any a priori knowl-
edge about the analyzed scene.
2

FOA
Input
Saliency
map
image
Figure 2. Salient regions selection. Applying a WTA
mechanism to a saliency map permits the selection of
the most conspicuous parts of the image.
2.2 Selection of salient locations
At any given time, the maximum of the saliency map
defines the most salient location, to which the focus of
attention (FOA) should be directed. A ”winner-take-all”
(WTA) mechanism [2] is used to detect, successively, the
significant regions. Given a saliency map computed by the
saliency-based model of visual attention, the WTA mecha-
nism starts with selecting the location with the maximum
value on the map. This selected region is considered as
the most salient part of the image (winner). The FOA is
then shifted to this location. Local inhibition is activated in
the saliency map, in an area around the actual FOA. This
yields dynamical shifts of the FOA by allowing the next
most salient location to subsequently become the winner.
Besides, the inhibition mechanism prevents the FOA from
returning to previously attended locations. An example of

Figure 3. Adaptive JPEG-based Image compression
Algorithm
salient regions selection based on the WTA mechanism is
given on the Figure 2.
3 Adaptive compression algorithm
Figure 3 shows a block diagram of the adaptive coding
method. This scheme follows the same operations of the
baseline JPEG algorithm, albeit with a quantization unit that
has been modified to receive an additional input: a binary
image produced by the visual attention stage. This binary
information will indicate the quantizer to execute either a
short- or a large-step quantization of the DCT coefficients,
depending on whether a given 8x8-pixel block lies inside or
outside any of the identified ROIs.
The modified quantization unit uses the same normalization
array N proposed in the official JPEG document [8]. This
unit requires also two scale factor parameters: sf
0
and sf
1
.
These values can be permanently set by the user, or left
to be set by the system in correspondence with a required
compression bit rate for a given image. For those 8x8-pixel
blocks with a majority of pixels lying within the ROIs, the
quantization is executed using the normalization array pre-
viously scaled by sf
0
. For the rest of the blocks, N is scaled
by sf
1
before quantization. To preserve image detail within
the ROIs, sf
0
is usually chosen to be in the interval [0.5, 1],
while sf
1
is generally selected to be a real number larger
than two.
The binary image that is produced by the visual attention
stage and used by the quantization unit, represents over-
head information to be embedded in the compressed data
bitstream. This data is required for a decoder to execute
the corresponding ROI-dependent inverse quantization of
the DCT coefficients. The presence of this overhead data
precludes the compressed image from being reconstructed

Figure 4. Quantizer of the adaptive compression al-
gorithm to produce JPEG compatibility
3


(a) Original image

(b) Identified ROIs

(c) Standard JPEG

(d) Adaptive compression
Figure 5. Adaptive versus non-adaptive compression: Example 1.
by a standard JPEG decoder. If JPEG compatibility is not
an issue, then any given JPEG decoder can be straightfor-
wardly modified to accept the additional binary informa-
tion, and accordingly execute the decoding operation of the
compressed bitstream.
JPEG compatibility, however, could be required in a large
number of systems, and it can be easily achieved in ex-
change of two additional quantization operations, as shown
in Figure 4. After the DCT operation, the initial ROI-
dependent quantization (Q
x
) is followed by a correspond-
ing ROI-dependent inverse quantization (Q
1
x
). After this
point, the overhead data is no longer required and the cur-
rent DCT coefficients can be re-normalized using a regu-
lar, ROI-independent, JPEG quantization (Q). This proce-
dure produces a spatially adaptive compressed image which
is fully compatible with the JPEG standard. This was the
scheme used to produce the results presented in Section 4.
4 Experiments
This section is dedicated to report experiments involving
two sets of color images, in order to assess the usefulness of
visual attention in the field of color image compression by
means of the adaptive coding scheme presented in Section
3. For each example, a color image is acquired; using this
image as input, the visual attention algorithm computes a
set of ROIs. In these experiments the number of identified
ROIs has been, for simplicity, limited to three. A yellow cir-
cle is drawn around each of the identified ROIs. Afterwards,
4


(a) Original image

(b) Identified ROIs

(c) Standard JPEG

(d) Adaptive compression
Figure 6. Adaptive versus non-adaptive compression: Example 2.
the color image is compressed using two methods, a) stan-
dard JPEG, and b) the JPEG-based adaptive compression
algorithm. With both methods, and for all the experiments,
the overall compression ratio produced was 100:1.
The images of the reported experiments are shown in Fig.
5 and 6, they mainly feature two persons facing the cam-
era. Based on the considered features described in Section
2, the two persons stand out from the rest of the scene, and
are thus, natural candidates for the ROIs, to be automati-
cally identified by the visual attention algorithm. Figures
5b) and 6b) show, that as expected, the two persons’ faces
figure among the three most salient regions of the image.
The adaptive compression algorithm takes into account the
relative importance of these image regions. Consequently,
the reconstructed images (bottom-right images in Fig. 5 and
6) preserve the visual details of the two faces, which may
be relevant for the recognition of the two persons. On the
other hand, the persons’ faces have lost important percep-
tual details when using the standard JPEG method (bottom-
left images in Fig. 5 and 6). In the latter case, one may have
difficulty to identify the two persons.
The advantage of the adaptive algorithm is highlighted in
Fig. 7, where the rightmost ROI has been zoomed in.
These examples clearly validate the adaptive color image
compression algorithm based on visual attention. Despite
the unavailability of any a priori knowledge about the an-
alyzed images, the reported coding scheme permits the
preservation of perceptually important image details.
5

Citations
More filters
Proceedings ArticleDOI

Visual attention detection in video sequences using spatiotemporal cues

TL;DR: The proposed spatiotemporal video attention framework has been applied on over 20 testing video sequences, and attended regions are detected to highlight interesting objects and motions present in the sequences with very high user satisfaction rate.
Book

VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search (Lecture Notes in Computer Science / Lecture Notes in Artificial Intelligence)

TL;DR: In this paper, a biologically motivated computational attention system VOCUS (Visual Object detection with a Computational Attention System) is proposed to detect regions of interest in images, which are defined by strong contrasts (e.g., color or intensity contrasts) and by the uniqueness of a feature.
Journal ArticleDOI

Empirical Validation of the Saliency-based Model of Visual Attention

TL;DR: A new method for quantitatively assessing the plausibility of this model of visual attention by comparing its performance with human behavior is proposed, which can easily be compared by qualitative and quantitative methods.
Journal ArticleDOI

Assessing the contribution of color in visual attention

TL;DR: An in-depth analysis of the saliency-based model of visual attention by assessing the contribution of different cues to visual attention as modeled by different versions of the computer model is presented.
Journal ArticleDOI

Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features

TL;DR: A framework that estimates the saliency of a given image using an ensemble of extreme learners, each trained on an image similar to the input image, and measured in terms of the mean of predicted saliency value by the ensembles members.
References
More filters
Journal ArticleDOI

A feature-integration theory of attention

TL;DR: A new hypothesis about the role of focused attention is proposed, which offers a new set of criteria for distinguishing separable from integral features and a new rationale for predicting which tasks will show attention limits and which will not.
Journal ArticleDOI

A model of saliency-based visual attention for rapid scene analysis

TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Journal ArticleDOI

Algorithms for defining visual regions-of-interest: comparison with eye fixations

TL;DR: This paper investigates and develops a methodology that serves to automatically identify a subset of aROIs (algorithmically detected ROIs) using different image processing algorithms (IPAs), and appropriate clustering procedures, and compares hROIs with hROI as a criterion for evaluating and selecting bottom-up, context-free algorithms.
Proceedings ArticleDOI

Perceptual adaptive JPEG coding

TL;DR: This work compute the perceptual error for each block based upon the DCT quantization error adjusted according to the contrast sensitivity, light adaptation, and contrast masking, and pick the set of multipliers which yield maximally flat perceptual error over the blocks of the image.
Proceedings ArticleDOI

A prototype for data-driven visual attention

TL;DR: An attentional prototype for early visual processing is presented, composed of a processing hierarchy and an attention beam that traverses the hierarchy, passing through the regions of greatest interest and inhibiting the regions that are not relevant.
Related Papers (5)
Frequently Asked Questions (6)
Q1. What are the contributions mentioned in the paper "Adaptive color image compression based on visual attention" ?

This paper reports an adaptive still color image compression method which produces automatically-selected ROIs with a higher reconstruction quality with respect to the rest of the input image. 

the reported visual attention algorithm can be extended to detect ROIs in temporally changing scenes, by introducing motion as an additional scene feature into the model. 

After introducing the biologically inspired saliencybased model of visual attention which permits the identification of perceptually salient regions-of-interest on color images, this paper reported an adaptive still color image compression method. 

The compressed image files produced by this adaptive method are fully compatible with the JPEG standard, which favors their widespread utilization. 

To preserve image detail within the ROIs, sf0 is usually chosen to be in the interval [0.5, 1], while sf1 is generally selected to be a real number larger than two. 

Afterwards,the color image is compressed using two methods, a) standard JPEG, and b) the JPEG-based adaptive compression algorithm.