scispace - formally typeset
Open AccessJournal ArticleDOI

A coherent computational approach to model bottom-up visual attention

TLDR
This paper presents a coherent computational approach to the modeling of the bottom-up visual attention, mainly based on the current understanding of the HVS behavior, which includes Contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions.
Abstract
Visual attention is a mechanism which filters out redundant visual information and detects the most relevant parts of our visual field. Automatic determination of the most visually relevant areas would be useful in many applications such as image and video coding, watermarking, video browsing, and quality assessment. Many research groups are currently investigating computational modeling of the visual attention system. The first published computational models have been based on some basic and well-understood human visual system (HVS) properties. These models feature a single perceptual layer that simulates only one aspect of the visual system. More recent models integrate complex features of the HVS and simulate hierarchical perceptual representation of the visual input. The bottom-up mechanism is the most occurring feature found in modern models. This mechanism refers to involuntary attention (i.e., salient spatial visual features that effortlessly or involuntary attract our attention). This paper presents a coherent computational approach to the modeling of the bottom-up visual attention. This model is mainly based on the current understanding of the HVS behavior. Contrast sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions are some of the features implemented in this model. The performances of this algorithm are assessed by using natural images and experimental measurements from an eye-tracking system. Two adequate well-known metrics (correlation coefficient and Kullbacl-Leibler divergence) are used to validate this model. A further metric is also defined. The results from this model are finally compared to those from a reference bottom-up model.

read more

Content maybe subject to copyright    Report

A Coherent Computational Approach to
Model Bottom-Up Visual Attention
Olivier Le Meur, Patrick Le Callet, Member, IEEE,
Dominique Barba, Senior Member, IEEE, and Dominique Thoreau
Abstract—Visual attention is a mechanism which filters out redundant visual information and detects the most relevant parts of our visual
field. Automatic determination of the most visually relevant areas would be useful in many applications such as image and video coding,
watermarking, video browsing, and quality assessment. Many research groups are currently investigating computational modeling of the
visual attention system. The first published computational models have been based on some basic and well-understood Human Visual
System (HVS) properties. These models feature a single perceptual layer that simulates only one aspect of the visual system. More recent
models integrate complex features of the HVS and simulate hierarchical perceptual representation of the visual input. The bottom-up
mechanism is the most occurring feature found in modern models. This mechanism refers to involuntary attention (i.e., salient spatial
visual features that effortlessly or involuntary attract our attention). This paper presents a coherent computational approach to the
modeling of the bottom-up visual attention. This model is mainly based on the current understanding of the HVS behavior. Contrast
sensitivity functions, perceptual decomposition, visual masking, and center-surround interactions are some of the features implemented
in this model. The performances of this algorithm are assessed by using natural images and experimental measurements from an eye-
tracking system. Two adequate well-known metrics (correlation coefficient and Kullbacl-Leibler divergence) are used to validate this
model. A further metric is also defined. The results from this model are finally compared to those from a reference bottom-up model.
Index Terms—Computationally modeled human vision, bottom-up visual attention, coherent modeling, eye tracking experiments.
æ
1INTRODUCTION
V
ISUAL attention is one of the most important features of
the human visual system. Rather than speaking about
the usefulness of visual attention, which seems obvious, it is
worth lingering about its description. The first trial dates back
to 1890 when James [1] suggested that everyone knows what
attentions is. It is the taking possession by the mind, in clear and
vivid form, of one out of what seem several simultaneously possible
objects or trains of thought. In others words, visual attention
serves as a mediating mechanism involving competition
between different aspects of the visual scene and selecting the
most relevant areas to the detriment of others.
Nevertheless, our environment presents far more per-
ceptual information than can be effectively processed. In
order to keep the essential visual information, humans have
developed a particular strategy, first outlined by James.
This strategy, confirmed during the last two decades,
involves two mechanisms. The first refers to the sensory
attention driven by environmental events, commonly called
bottom-up or stimulus-driven. The second one is the
voluntational attention to both external and internal stimuli,
commonly called top-down or goal-driven.
Most recent computational models of visual attention can
be placed in two categories. A recent trend concerns a
statistical signal-based approach [2] whic h con sists of
automatically predicting salient regions of the visual scene
by directly using image statistics at the point of gaze. In fact,
several studies have recently reported [3], [4], [5] that the
human fixation regions present higher spatial contrast and
spatial entropy than random fixation regions. These studies
show that human eyes movements are not necessarily
random but rather driven by particular features. The second
category consists of models [6], [7], [8], [9], [10], [11] built
around two important concepts: the Feature Integration
Theory (FIT) from Treisman and Gelade [12] and a neurally
plausible architecture proposed by Koch and Ullman [13].
The FIT suggests that visual information is analyzed in
parallel from different maps. These maps are retinotopically
organized according to locations in our visual field. There is a
map for each early visual feature. From this theory, several
frameworks for simulating human visual attention have been
designed. The most interesting one has been proposed by
Koch and Ullman [13]. Their framework is based on the
concept of saliency map which is a two-dimensional topo-
graphic representation of conspicuity for every pixels in the
image. Fig. 1 illustrates the general synoptic of their model. It
mainly consists of early visual features extraction, feature
maps building, and feature map fusion.
In this paper, a new bottom-up model based on the FIT and
the plausible architecture proposed by Koch and Ullman [13]
is described. Its purpose is to automatically detect the most
relevant parts of a color picture displayed on a television
screen. The general philosophy of this approach is to design a
biologically-inspired algorithm that performs better than
802 IEEE TRANSACTIONS ON PATTERN ANALYSI S AND MACHINE INTELLIGENCE, VOL. 28, NO. 5, MAY 2006
. O. Le Meur and D. Thoreau are with the Video Compression Laboratory,
Thomson, 1 avenue Belle Fontaine-CS 17616, 35576 Cesson-Se
´
vigne
´
Cedex, France. E-mail: {olivier.le-meur, dominique.thoreau}thomson.net.
. P. Le Callet and D. Barba are with the Institut de Recherche en
Communications et Cyberne
´
tique de Nantes (IRCCyN) Laboratory, Ecole
Polytechnique de l’Universite
´
de Nantes, Rue Christian Pauc-BP 50609,
44306 Nantes Cedex 3, France.
E-mail: {patrick.lecallet, dominique.barba}@polytech.univ-nantes.fr.
Manuscript received 20 July 2004; revised 24 Aug. 2005; accepted 12 Sept.
2005; published online 13 Mar. 2006.
Recommended for acceptance by M. Srinivasan.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number TPAMI-0367-0704.
0162-8828/06/$20.00 ß 2006 IEEE Published by the IEEE Computer Society

conventional approaches. The proposed model is based on a
coherent psychovisual space from which a saliency map is
deduced. This space, well justified with psychophysic
experiments, is used to combine the visual features (intensity,
color, orientation, spatial frequencies...) of the image, that are
normalized to their individual visibility threshold. Accurate
nonlinear models simulating visual cells behaviors are used
to calculate the visibility threshold associated to each value of
each component. From this coherent psychovisual space, a
new way of calculating a saliency map is proposed.
The paper is organized as follows: Section 2 gives insight
into the natural mechanisms that allow us to reduce the
amount of visual information. Experiments are conducted
to record and track real observer’s eye movements with an
eye tracking apparatus. These experiments aim to build the
ground truth required to achieve a performance assessment
of the bottom-up model described here. This experiment is
presented in Section 3. The proposed coherent computa-
tional approach to model the bottom-up visual attention is
described in Section 4. In Section 5, the performances of this
model are evaluated, both qualitatively and quantitatively,
using relevant metrics. A particular saliency-based applica-
tion is then briefly de scribed. Finally, the results are
summarized and some conclusions are drawn in Section 6.
2THE NATURAL SELECTION OF THE VISUAL
INFORMATION
2.1 A Passive Selection
HVS acts as a passive selector, acknowledging some stimuli
but rejecting others. The first information reduction appears
in the retina in which the photoreceptors only process the
wavelengths of the visible light. The neural signal is then
treated by ganglion cells which are insensitive to uniform
illumination. This particular property is due to the spatial
organization of their receptive fields (RF). This fundamental
notion was first emphasized in the work of Hartline [33]: The
RF is defined as a particular region of the retina within which
an appropriate stimulation gives a relevant response. The RF
presents an antagonistic center-surround organization. The
center is roughly circular surrounded by an annulus. These
two regions provide an opposite response for the same
stimulation. This center-surround organization is responsi-
ble for our great sensibility to the contrast and to the spatial
frequency leading to the definition of Contrast Sensitivity
Function (CSF).
The responses stemming from the retina neurons are then
transmitted to the primary visual cortex. Hubel and Wiesel
who received the Nobel prize for medicine and physiology
in 1981 discovered that the RF’s structure of the cortical cells
is considerably different to the structure of the RF of retinal
and lateral geniculate nucleus (LGN) cells. The RFs of retinal
and LGN cells have a circular structure with a center-
surround organization whereas the cortical cells present an
elongated RF and respond best to a particular orientation
and to a particular spatial frequency. In addition, recent
studies [15], [16], [17], [18], [19], [20] have shown that the
cortical cell’s response can be influenced by stimuli outside
their classical RF. These contextual influences are mediated
by long-range connections linking cells with nonoverlap-
ping receptive fields. Studies by Kapadia et al. [19], [20]
show that the cell’s response can be greatly enhanced by the
presentation of coaligned, coorientated stimuli in the
neighborhood and increases with the number of appropriate
stimuli placed outside the CRF. Generally speaking, the
contour, feature linking [21], [23], [43], and texture segmen-
tation [22] are assumed to be in close relation with the long-
range connections.
2.2 An Active Selection
Human beings have a collection of passive mechanisms
lessening the amount of incoming visual information. For
instance, the signal stemming from the photoreceptors is
assumed to be compressed by a factor of about 130:1, before it
is transmitted to the visual cortex. Nevertheless, the visual
system is still faced with too much information. To deal with
the still overwhelming amount of input, an active selection,
involving eye movement, is required to allocate processing
resources to some parts of our visual field. Oculomotor
mechanisms involve different types of eye movements. A
saccade is a rapid eye movement allowing jump from one
location to another. The purpose of this type of eye move-
ment, occurring up to three times per second, is to direct a
small part of our visual field into the fovea in order to achieve
a closer inspection. This last step corresponds to a fixation.
Saccades are therefore a major instrument of the selective
visual attention. This active selection is assumed to be
controlled by two major mechanisms called bottom-up and
top-down control. The former, the bottom-up attentional
selection, is linked to involuntary attention. This mechanism
is fast, involuntary, and stimulus-driven. Our attention is
effortlessly drawn to salient parts in our visual field. These
salient parts consist of an abrupt onsets [25] or a local
singularity [12]. An image containing one green circle (called
target) located among a number of red circles (distractors) is a
classic example. The target is easily seen against the red
circles due to its local singularity (its local hue), no matter how
many distractors are present. The appearance of new
perceptual object consistent or not with the context of the
scene could also attract our attention [24], [26]. Several studies
have shown that observers tend to make longer and more
frequent fixations on such object [24].
The second control, top-down attentional selection,
refers to voluntary attention closely linked to the experience
LE MEUR ET AL.: A COH ERENT COMPUTATIONAL APPROACH TO MODEL BOTTOM-UP VISUAL ATTENTION 803
Fig. 1. Framework proposed by Koch and Ullman. Early visual features
are extracted from the visual input into several separate parallel
channels. After this extraction and a particular treatment, a feature
map is obtained for each channel. Next, the saliency map is built by
fusing all these maps.

of the observers and to the task they have in mind.
Compared to the bottom-up attentional selection, the top-
down mechanism, voluntary and task-driven, is slower.
3EYE TRACKING EXPERIMENTS
3.1 Apparatus and Procedure
In order to track and record real observers eye movements,
experiments have been conducted using an eye tracker from
Cambridge Research Corporation. This apparatus is
mounted on a rigid headrest for gr eater m easurement
accuracy (less than 0.5 degree on the fixation point).
Experiments were conducted in normalized conditions
(ITU-R BT 500-10) at a viewing distance of four times the
TV monitor height. Ten natural color images with various
contents have been selected. The quality of these pictures was
then degraded using different techniques (spatial filtering,
JPEG, JPEG200 coding...). Forty-six pictures were finally
obtained. Every image was seen in random order by up to
40 observers for 15 seconds each in a task-free viewing mode.
The collected data corresponds to the regular time sampling
(20 ms) of eye gaze on the monitor.
3.2 Human Fixation Density Map Computation
A fixation map, which encodes the conspicuous locations, is
computed from the collected data. For a particular picture
and for each observer, the samples corresponding to
saccades are filtered out. A data point is removed if the
number of data included in a squared window is below a
given threshold. The size of the window and the threshold
are functions of the viewing distance, the accuracy of the eye
tracker (0.25 degrees of visual angle) and the resolution of
the display (800 600 pixels). In practice, the size of the
window and the threshold are, respectively, 9 9 (corre-
sponding to 0.25 degrees of visual angle) and 5 (correspond-
ing to the number of data required in the previous defined
window).
All fixation patterns for a given picture are added together
providing a spatial d istribution o f human fixation (see
examples in Fig. 2). The resulting map is then smoothed
using a two-dimensional Gaussian filter. Its standard devia-
tion is determined according to the accuracy of the eye-
tracking apparatus. The result is a fixation density map [34]
which represents the observer’s regions of interest (RoI). This
is often compared to a landscape map [35] consisting of peaks
and valleys (see examples in Fig. 2).
3.3 Conclusions from Empirical Data
3.3.1 Coverage
Coverage has been previously defined by Wooding [34] in
the following terms: the coverage is a measure of the amount of
the original stimulus covered by the fixations. The coverage
value is therefore given by the ratio between the number of
fixated pixels and the number of inspected pixels. A
threshold, called T , is required in order to decide whether
a pixel is fixated or not.
The coverage value is assessed on the human fixation
density maps for three threshold values 0:25; 0:5; 0:75 and
for three viewing times (2s, 8s, and 14s). Table 1 gives the
results for three pictures (Kayak (see Fig. 3), Rapids (second
row of the Fig. 2), and ChurchandCapitol).
As expected, the coverage value increases with increas-
ing viewing time and decreasing threshold T . Moreover, the
coverage value is highly dependent on the picture content:
for picture Kayak, the coverage is equal to 21 percent for a
viewing time of 14s and for a threshold of 0.75 whereas, in
the same condition, the coverage is about 40 percent for
picture ChurchandCapitol. It is worth noticing that only a
small area of the pictures (on average 36 percent for the
three thresholds T for 14s of viewing time) has been fixated.
In fact, humans pursue to fixate areas of interest rather than
to scan the whole scene.
3.3.2 Bias toward the Central Part of Pictures
Fig. 2 shows the spatial distribution and the density of human
fixations. These results are coherent with a well-known
property of the human visual strategy. Observers have a
general tendency to stare at the central locations of the screen.
This tendency is not reduced with the viewing time: It can be
804 IEEE TRANSACTIONS ON PATTERN ANALYSI S AND MACHINE INTELLIGENCE, VOL. 28, NO. 5, MAY 2006
Fig. 2. (a) The original picture, (b) the spatial distribution of human fixations for 14s of viewing time, (c) fixation density map obtained by convolved
the spatial distribution with a 2D Gaussian filter, and (d) highlighted human RoI (Regions of Interest) obtained by redrawing the original picture by
leaving in the darkness in the nonfixated areas.

shown that observers continue to focus on these areas rather
than to scan the whole picture. There are at least two plausible
explanations: The nonuniform distribution of photoreceptors
is a biological candidate. However, it seems more logical to
tackle this question by introducing a top-down or higher-
level explanation as proposed by Parkhurst et al. [14]. The
great majority of visually important information is tradition-
ally loc ated in the central par t o f t he pi cture frame.
Consequently, observers unconsciously tend to select central
locations in order to catch the potentially most important
visual information.
4THE PROPOSED COMPUTATIONAL MODEL
The model proposed in this paper is based on the architecture
of Koch and Ullman. The model designed by Itti et al. [7] was
one of the first to take advantage of such architecture. It has
been chosen as a benchmark for the model presented here
and is therefore briefly described hereafter.
The first step of Itti et al.’s model consists of the extraction
of early visual features. The visual input is broken down into
three separate feature channels (color, intensity, and orienta-
tion). Each channel is obtained from Gaussian pyramids as in
[32]. This allows the computation of different spatial scales by
progressively applying a low-pass filter and subsampling the
visual features. In order to take into account the organization
of the visual cells, a center-surround mechanism based on a
Difference of Gaussian (DoG) is applied on each scale. The
resulting maps are then linearly summed across feature
channels to form the saliency map.
Although this model provides good results on several
types of picture, it contains arbitrary steps that are difficult
to justify with respect to the HVS:
. several normalization steps are applied before and
after the fusion step,
. each channel is normalized independently to a
common scale in order to be independent of the
feature extraction mechanisms, and
. there are strong links between the visual sensitivity
and the viewing distance. However, this has been
overlooked.
The proposed computational bottom-up model has been
developed bearing numerous properties of human visual
cells in mind. Three aspects of the vision process are
sequentially tackled, namely, the visibility, the perception,
and the perceptual grouping. The complete synoptic is
shown in Fig. 3 and described in the following sections.
4.1 Visibility Process
The visibility process simulates the limited sensitivity of the
HVS. Despite the seemingly complex mechanisms under-
lying the human vision, the visual system is not able to
perceive all information present in the visual field with the
same accuracy. A coherent normalization is first used to
scale all the visual data. A value of 1 represents a feature
which is just noticeable. All the normalized data is grouped
into a psychovisual space. This space is built from the
following set of basic mechanisms entirely identified and
validated from psychophysic experiments.
4.1.1 Transformation of the RGB Luminance into the
Krauskopf’s Color Space
There are two different types of photoreceptors in the retina:
cones and rods. As TV displays luminance levels not
corresponding to scotopic conditions (low light levels), rods
can be neglected. Cones form the basis of color perception
and work at photopic conditions. Cones are of three types:
L-cones, M-cones, and S-cones which are sensitive to long,
medium, and short wavelengths, respectively. They are
mainly located in the central part of the retina, called fovea,
which is 2 degrees in diameter. Both psychological and
physiological experiments give evidences to the theory of
early transformation in the HVS of the L, M, and S signals
issued from cones absorption. This transformation provides
an opponent-color space in which the signals are less
correlated. The principal components of opponent colors
space are black-white (B-W), red-green (R-G), and blue-
yellow (B-Y). There is a variety of opponent color spaces
which differ in the way they combine the different cone
responses. The color space proposed by Krauskopf was
validated from psychophysic experiments. These experi-
ments are based on the interaction between a color masking
signal and a color stimulus signal in term of differential
visibility thresho ld
1
(DVT)ofthestimulus.Thecolor
orientations of the masking and stimulus signal, respectively,
for which the DVT value is minimum are determined. These
experiments have been made with still and time varying
stimulus. The color space is given by the relation (1):
A
Cr
1
Cr
2
0
@
1
A
¼
110
1 10
0:5 0:51
0
@
1
A
L
M
S
0
@
1
A
: ð1Þ
LE MEUR ET AL.: A COH ERENT COMPUTATIONAL APPROACH TO MODEL BOTTOM-UP VISUAL ATTENTION 805
1. The differential visibility threshold of a stimulus superimposed to a
background (masking signal) is defined as the magnitude required by the
stimulus to be just noticeable.
TABLE 1
Coverage Evolution in Function of Viewing Time
and of Picture Content

A is a pure achromatic perceptual signal whereas Cr
1
and
Cr
2
are pure chromatic perceptual signals.
During these experiments, the adaptation effects through a
mechanism of “desensibiliza tion” [16] were taken i nto
account. While Krauskopf used only a temporal “desensibi-
lization” mechanism, a spatial “desensibilization” mechan-
ism was used here. Both methods produced the same result.
4.1.2 Early Visual Features Extraction
It was previously mentioned that visual cells can be
characterized by a radial spatial frequency and by orienta-
tion. It could therefore be interesting to group visual cells
sharing similar properties. The early visual features extrac-
tion perform ed by a perceptual channel decomposition
consists of splitting the 2D spatial frequency domain both in
spatial radial frequency and in orientation. This decomposi-
tion is applied to each of the three perceptual components.
Psychophysic experiments [17] show that psychovisual
spatial frequency partitioning for the achromatic component
leads to 17 psychovisual channels in standard TV viewing
conditions while only five channels are obtained for
chromatic component (see Fig. 3). Each resulting subband
or channel may be regarded as the neural image correspond-
ing to a population of visual cells tuned to a range of spatial
frequency and to a particular orientation.
The achromatic subbands are distributed over four
crowns noted I, II, III, and IV (see Fig. 3). Chromatic
subbands are distributed over two crowns noted I, II. The
main properties of these decompositions and the main
differences from a similar transform, called the cortex
transform [27], are a nondyadic radial selectivity and an
orientation selectivity that increases with radial frequency
(except for the chromatic components).
4.1.3 Contrast Sensitivity Functions
Contrast sensitivity functions (CSF) have been widely used to
measure the visibility of natural images components. In fact,
these components can be described by a setof Fourier function
andtheir amplitude. The visibility of a specific component can
be assessed by applying a CSF in the frequency domain. When
the amplitude of a frequency component is greater than a
threshold CT
0
, the frequency component is perceptible. This
threshold is called the visibility threshold, and its inverse
defines the value of the CSF at this spatial frequency. In the
806 IEEE TRANSACTIONS ON PATTERN ANALYSI S AND MACHINE INTELLIGENCE, VOL. 28, NO. 5, MAY 2006
Fig. 3. Flow chart of the proposed computational model of bottom-up visual selective attention. It presents three aspects of the vision: visibility,
perception, and perceptual grouping. The visibility part, also called the psychovisual space, simulates the limited sensitivity of the human eyes and
takes into account the major properties of the retinal cells. The perception is used to suppress the redundant visual information by simulating the
behavior of cortical cells. Finally, the non-CRF and the saliency map building are achieved by the perceptual grouping.

Figures
Citations
More filters
Journal ArticleDOI

Learning to Detect a Salient Object

TL;DR: A set of novel features, including multiscale contrast, center-surround histogram, and color spatial distribution, are proposed to describe a salient object locally, regionally, and globally.
Journal ArticleDOI

State-of-the-Art in Visual Attention Modeling

TL;DR: A taxonomy of nearly 65 models of attention provides a critical comparison of approaches, their capabilities, and shortcomings, and addresses several challenging issues with models, including biological plausibility of the computations, correlation with eye movement datasets, bottom-up and top-down dissociation, and constructing meaningful performance measures.
Journal ArticleDOI

Context-Aware Saliency Detection

TL;DR: A new type of saliency is proposed—context-aware saliency—which aims at detecting the image regions that represent the scene, and a detection algorithm is presented which is based on four principles observed in the psychological literature.
Proceedings ArticleDOI

Context-aware saliency detection

TL;DR: A new type of saliency is proposed – context-aware saliency – which aims at detecting the image regions that represent the scene and a detection algorithm is presented which is based on four principles observed in the psychological literature.
Proceedings ArticleDOI

Learning to Detect A Salient Object

TL;DR: A set of novel features including multi-scale contrast, center-surround histogram, and color spatial distribution are proposed to describe a salient object locally, regionally, and globally for salient object detection.
References
More filters
Book

The Principles of Psychology

William James
TL;DR: For instance, the authors discusses the multiplicity of the consciousness of self in the form of the stream of thought and the perception of space in the human brain, which is the basis for our work.
Journal ArticleDOI

A feature-integration theory of attention

TL;DR: A new hypothesis about the role of focused attention is proposed, which offers a new set of criteria for distinguishing separable from integral features and a new rationale for predicting which tasks will show attention limits and which will not.
Journal ArticleDOI

A model of saliency-based visual attention for rapid scene analysis

TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Journal ArticleDOI

The Laplacian Pyramid as a Compact Image Code

TL;DR: A technique for image encoding in which local operators of many scales but identical shape serve as the basis functions, which tends to enhance salient image features and is well suited for many image analysis tasks as well as for image compression.
Book ChapterDOI

Shifts in selective visual attention: towards the underlying neural circuitry.

TL;DR: This study addresses the question of how simple networks of neuron-like elements can account for a variety of phenomena associated with this shift of selective visual attention and suggests a possible role for the extensive back-projection from the visual cortex to the LGN.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What is the visibility threshold for a visual cell?

Accurate nonlinear models simulating visual cells behaviors are used to calculate the visibility threshold associated to each value of each component. 

For instance, the signal stemming from the photoreceptors is assumed to be compressed by a factor of about 130:1, before it is transmitted to the visual cortex. 

The purpose of this type of eye movement, occurring up to three times per second, is to direct a small part of their visual field into the fovea in order to achieve a closer inspection. 

Studies by Kapadia et al. [19], [20] show that the cell’s response can be greatly enhanced by the presentation of coaligned, coorientated stimuli in the neighborhood and increases with the number of appropriate stimuli placed outside the CRF. 

The first information reduction appears in the retina in which the photoreceptors only process the wavelengths of the visible light. 

In addition, recent studies [15], [16], [17], [18], [19], [20] have shown that the cortical cell’s response can be influenced by stimuli outside their classical RF. 

the results are summarized and some conclusions are drawn in Section 6.HVS acts as a passive selector, acknowledging some stimuli but rejecting others. 

This center-surround organization is responsible for their great sensibility to the contrast and to the spatial frequency leading to the definition of Contrast Sensitivity Function (CSF).