Foveated Image and Video Coding

Zhou Wang and Alan C. Bovik

The human visual system (HVS) is highly space-variant in sampling, coding,

processing, and understanding of visual information. The visual sensitivity is high-

est at the point of ﬁxation and decreases dramatically with distance from the point

of ﬁxation. By taking advantage of this phenomenon, foveated image and video

coding systems achieve increased compression eﬃciency by removing considerable

high-frequency information redundancy from the regions away from the ﬁxation

point without signiﬁcant loss of the reconstructed image or video quality.

This chapter has three major purposes. The ﬁrst is to introduce the back-

ground of the foveation feature of the HVS that motivates the research eﬀort of

foveated image processing. The second is to review various foveation techniques

that have been used to construct image and video coding systems. The third is

to provide in more details a speciﬁc example of such systems, which delivers rate

scalable codestreams ordered according to foveation-based perceptual importance,

and has a wide range of potential applications such as video communications over

heterogeneous, time-varying, multi-user and interactive networks.

1.1 Foveated Human Vision and Foveated Image

Processing

Let us start by looking at the anatomy of the human eye. A simpliﬁed structure

is illustrated in Figure 1.1. The light that passes through the optics of the eye is

projected onto the retina and sampled by the photoreceptors in the retina. The

retina has two major types of photoreceptors known as cones and rods. The rods

Chapter 14 in Digital Video Image Quality and Perceptual Coding 

(H. R. Wu, and K. R. Rao, eds.), Marcel Dekker Series in Signal 

Processing and Communications, Nov. 2005.

2 Chapter 1. Foveated Image and Video Coding

Retina

Lens

Fovea

Pupil

Optic Nerve

Cornea

Figure 1.1: Structure of the human eye.

support achromatic vision in low level illuminations and the cone receptors are

responsible for daylight vision. The cones and rods are non-uniformly distributed

over the surface of the retina [1, 2]. The region of highest visual acuity is the fovea,

which contains no rods but has the highest concentration of approximately 50,000

cones [2]. Figure 1.2 shows the variation of the densities of photoreceptors with

retinal eccentricity, which is deﬁned as the visual angle (in degree) between the

fovea and the location of the photoreceptor. The density of the cone cells is highest

at zero eccentricity (the fovea) and drops rapidly with increasing eccentricity. The

photoreceptors deliver data to the plexiform layers of the retina, which provide

both direct and inter-connections from the photoreceptors to the ganglion cells.

The distribution of ganglion cells is also highly non-uniform as shown in Figure

1.2. The density of the ganglion cells drops even faster than the density of the

cone receptors. The receptive ﬁelds of the ganglion cells also vary with eccentricity

[1, 2].

The density distributions of cone receptors and ganglion cells play important

roles in determining the ability of our eyes in resolving what we see. When a

human observer gazes at a point in a real-world image, a variable resolution image

is transmitted through the front visual channel into the information processing

units in the human brain. The region around the point of ﬁxation (or foveation

point) is projected onto the fovea, sampled with the highest density, and perceived

by the observer with the highest contrast sensitivity. The sampling density and the

contrast sensitivity decrease dramatically with increasing eccentricity. An example

is shown in Figure 1.3, where Figure 1.3(a) is the original “Goldhill” image and

Figure 1.3(b) is a foveated version of that image. At certain viewing distance, if

attention is focussed at the man at the lower part of the image, then the foveated

and the original images are almost indistinguishable.

Despite the highly space-variant sampling and processing features of the HVS,

traditional digital image processing and computer vision systems represent images

on uniformly sampled rectangular lattices, which have the advantages of simple

acquisition, storage, indexing and computation. Nowadays, most digital images

1.1. Foveated Human Vision and Foveated Image Processing 3

Figure 1.2: Photoreceptor and ganglion cell density versus retinal eccentricity.

(From [1]).

and video sequences are stored, processed, transmitted and displayed in rectangu-

lar matrix format, in which each entry represents one sampling point. In recent

years, there has been growing interest in research work on foveated image process-

ing [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,

26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46], which

is targeted at a number of application ﬁelds. Signiﬁcant examples include image

quality assessment [33, 38], image segmentation [24], stereo 3D scene perception

[22], volume data visualization [9], object tracking [25], and image watermarking

[42]. Nevertheless, the majority of research has been focused on foveated image and

video coding, communication and related issues. The major motivation is that con-

siderable high frequency information redundancy exists in the peripheral regions,

thus more eﬃcient image compression can be obtained by removing or reducing

such information redundancy. As a result, the bandwidth required to transmit the

image and video information over communication channels is signiﬁcantly reduced.

Foveation techniques also supply some additional beneﬁts in visual communica-

tions. For example, in noisy communication environments, foveation provides a

natural way for unequal error-protection of diﬀerent spatial regions in the image

and video streams being transmitted. Such an error-resilient coding scheme has

shown to be more robust than protecting all the image regions equally [27, 46]. For

another example, in an interactive multi-point communication environment where

information about the foveated regions at the terminals of the communication net-

works is available, higher perceptual quality images can be achieved by applying

foveated coding techniques [40].

Perfect foveation of discretely-sampled images with smoothly varying resolution

4 Chapter 1. Foveated Image and Video Coding

(a) (b)

Figure 1.3: Sample foveated image. (a) original “Goldhill” image; (b) foveated

“Glodhill” image.

turns out to be a diﬃcult theoretical as well as implementation problem. In the

next section, we review various practical foveation techniques that approximate

perfect foveation. Section 1.3 discusses a continuously rate-scalable foveated image

and video coding system that has a number of good features in favor of network

visual communications.

1.2 Foveation Methods

The foveation approaches proposed in the literature may be roughly classiﬁed into

three categories: geometric method, ﬁltering-based method, and multiresolution

method. These methods are closely related and the third method may be viewed

as a combination of the ﬁrst two.

1.2.1 Geometric Methods

The general idea of the geometric methods is to make use of the foveated retinal

sampling geometry. We wish to associate such a highly non-uniform sampling ge-

ometry with a spatially-adaptive coordinate transform, which we call the foveation

coordinate transform. When the transform is applied to the non-uniform retinal

sampling points, uniform sampling density is obtained in the new coordinate sys-

tem. A typically used solution is the logmap transform [13] deﬁned as

w = log(z + a) , (1.1)

1.2. Foveation Methods 5

(a) (b)

Figure 1.4: Application of foveation coordinate transform to images. (a) original

image; (b) transformed image.

where a is a constant, and z and w are complex numbers representing the positions

in the original coordinate and the transformed coordinate, respectively. While

the logmap transform is empirical, it is shown in [34] that precise mathematical

solutions of the foveation coordinate transforms may be derived directly from given

retinal sampling distributions.

The foveated retinal sampling geometry can be used in diﬀerent ways. The

ﬁrst method is to apply the foveation coordinate transform directly to a uniform

resolution image, thus the underlying image space is mapped onto the new coor-

dinate system as exempliﬁed by Figure 1.4. In the transform domain, the image

is treated as a uniform resolution image, and regular uniform-resolution image

processing techniques, such as linear and non-linear ﬁltering and compression, are

applied. Finally, the inverse coordinate transform is employed to obtain a “foveat-

edly” processed image. The diﬃculty with this method is that the image pixels

originally located at integer grids are moved to non-integer positions, making it dif-

ﬁcult to index them. Interpolation and resampling procedures have to be applied

in both the transform and the inverse transform domains. These procedures not

only signiﬁcantly complicate the system, but may also cause further distortions.

The second approach is the superpixel method [13, 16, 6, 15, 14], in which

local image pixel groups are averaged and mapped into superpixels, whose sizes

are determined by the retinal sampling density. Figure 1.5 shows a sophisticated

superpixel look-up table given in [13], which attempts to adhere with the logmap

structure. However, the number and variation of superpixel shapes make it incon-

venient to manipulate. In [16], a more practical superpixel method is used, where

all the superpixels have rectangular shapes. In [14], a multistage superpixel ap-

Foveated Image and Video Coding

Citations

Kernel Foveated Rendering

Gaze-Aware Streaming Solutions for the Next Generation of Mobile VR Experiences

Accurate and Efficient Method for Smoothly Space-Variant Gaussian Blurring

Cloud Gaming with Foveated Video Encoding

Design and evaluation of a foveated video streaming service for commodity client devices

References

A wavelet tour of signal processing

The Laplacian Pyramid as a Compact Image Code

A new, fast, and efficient image codec based on set partitioning in hierarchical trees

Embedded image coding using zerotrees of wavelet coefficients

JPEG2000 : image compression fundamentals, standards, and practice

Related Papers (5)

Real-time foveated multiresolution system for low-bandwidth video communication

Automatic foveation for video compression using a neurobiological model of visual attention

Gaze-contingent real-time simulation of arbitrary visual fields

Foveated video compression with optimal rate control

Foundations of vision