scispace - formally typeset
Open Access

Foveated Image and Video Coding

Zhou Wang, +1 more
Reads0
Chats0
TLDR
Foveated image and video coding systems achieve increased compression efficiency by removing considerable high-frequency information redundancy from the regions away from the fixation point without significant loss of the reconstructed image or video quality.
Abstract
The human visual system (HVS) is highly space-variant in sampling, coding, processing, and understanding of visual information. The visual sensitivity is highest at the point of fixation and decreases dramatically with distance from the point of fixation. By taking advantage of this phenomenon, foveated image and video coding systems achieve increased compression efficiency by removing considerable high-frequency information redundancy from the regions away from the fixation point without significant loss of the reconstructed image or video quality.

read more

Content maybe subject to copyright    Report

Foveated Image and Video Coding
Zhou Wang and Alan C. Bovik
The human visual system (HVS) is highly space-variant in sampling, coding,
processing, and understanding of visual information. The visual sensitivity is high-
est at the point of fixation and decreases dramatically with distance from the point
of fixation. By taking advantage of this phenomenon, foveated image and video
coding systems achieve increased compression efficiency by removing considerable
high-frequency information redundancy from the regions away from the fixation
point without significant loss of the reconstructed image or video quality.
This chapter has three major purposes. The first is to introduce the back-
ground of the foveation feature of the HVS that motivates the research effort of
foveated image processing. The second is to review various foveation techniques
that have been used to construct image and video coding systems. The third is
to provide in more details a specific example of such systems, which delivers rate
scalable codestreams ordered according to foveation-based perceptual importance,
and has a wide range of potential applications such as video communications over
heterogeneous, time-varying, multi-user and interactive networks.
1.1 Foveated Human Vision and Foveated Image
Processing
Let us start by looking at the anatomy of the human eye. A simplified structure
is illustrated in Figure 1.1. The light that passes through the optics of the eye is
projected onto the retina and sampled by the photoreceptors in the retina. The
retina has two major types of photoreceptors known as cones and rods. The rods
Chapter 14 in Digital Video Image Quality and Perceptual Coding 
(H. R. Wu, and K. R. Rao, eds.), Marcel Dekker Series in Signal 
Processing and Communications, Nov. 2005.

2 Chapter 1. Foveated Image and Video Coding
Retina
Lens
Fovea
Pupil
Optic Nerve
Cornea
Figure 1.1: Structure of the human eye.
support achromatic vision in low level illuminations and the cone receptors are
responsible for daylight vision. The cones and rods are non-uniformly distributed
over the surface of the retina [1, 2]. The region of highest visual acuity is the fovea,
which contains no rods but has the highest concentration of approximately 50,000
cones [2]. Figure 1.2 shows the variation of the densities of photoreceptors with
retinal eccentricity, which is defined as the visual angle (in degree) between the
fovea and the location of the photoreceptor. The density of the cone cells is highest
at zero eccentricity (the fovea) and drops rapidly with increasing eccentricity. The
photoreceptors deliver data to the plexiform layers of the retina, which provide
both direct and inter-connections from the photoreceptors to the ganglion cells.
The distribution of ganglion cells is also highly non-uniform as shown in Figure
1.2. The density of the ganglion cells drops even faster than the density of the
cone receptors. The receptive fields of the ganglion cells also vary with eccentricity
[1, 2].
The density distributions of cone receptors and ganglion cells play important
roles in determining the ability of our eyes in resolving what we see. When a
human observer gazes at a point in a real-world image, a variable resolution image
is transmitted through the front visual channel into the information processing
units in the human brain. The region around the point of fixation (or foveation
point) is projected onto the fovea, sampled with the highest density, and perceived
by the observer with the highest contrast sensitivity. The sampling density and the
contrast sensitivity decrease dramatically with increasing eccentricity. An example
is shown in Figure 1.3, where Figure 1.3(a) is the original “Goldhill” image and
Figure 1.3(b) is a foveated version of that image. At certain viewing distance, if
attention is focussed at the man at the lower part of the image, then the foveated
and the original images are almost indistinguishable.
Despite the highly space-variant sampling and processing features of the HVS,
traditional digital image processing and computer vision systems represent images
on uniformly sampled rectangular lattices, which have the advantages of simple
acquisition, storage, indexing and computation. Nowadays, most digital images

1.1. Foveated Human Vision and Foveated Image Processing 3
Figure 1.2: Photoreceptor and ganglion cell density versus retinal eccentricity.
(From [1]).
and video sequences are stored, processed, transmitted and displayed in rectangu-
lar matrix format, in which each entry represents one sampling point. In recent
years, there has been growing interest in research work on foveated image process-
ing [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46], which
is targeted at a number of application fields. Significant examples include image
quality assessment [33, 38], image segmentation [24], stereo 3D scene perception
[22], volume data visualization [9], object tracking [25], and image watermarking
[42]. Nevertheless, the majority of research has been focused on foveated image and
video coding, communication and related issues. The major motivation is that con-
siderable high frequency information redundancy exists in the peripheral regions,
thus more efficient image compression can be obtained by removing or reducing
such information redundancy. As a result, the bandwidth required to transmit the
image and video information over communication channels is significantly reduced.
Foveation techniques also supply some additional benefits in visual communica-
tions. For example, in noisy communication environments, foveation provides a
natural way for unequal error-protection of different spatial regions in the image
and video streams being transmitted. Such an error-resilient coding scheme has
shown to be more robust than protecting all the image regions equally [27, 46]. For
another example, in an interactive multi-point communication environment where
information about the foveated regions at the terminals of the communication net-
works is available, higher perceptual quality images can be achieved by applying
foveated coding techniques [40].
Perfect foveation of discretely-sampled images with smoothly varying resolution

4 Chapter 1. Foveated Image and Video Coding
(a) (b)
Figure 1.3: Sample foveated image. (a) original “Goldhill” image; (b) foveated
“Glodhill” image.
turns out to be a difficult theoretical as well as implementation problem. In the
next section, we review various practical foveation techniques that approximate
perfect foveation. Section 1.3 discusses a continuously rate-scalable foveated image
and video coding system that has a number of good features in favor of network
visual communications.
1.2 Foveation Methods
The foveation approaches proposed in the literature may be roughly classified into
three categories: geometric method, filtering-based method, and multiresolution
method. These methods are closely related and the third method may be viewed
as a combination of the first two.
1.2.1 Geometric Methods
The general idea of the geometric methods is to make use of the foveated retinal
sampling geometry. We wish to associate such a highly non-uniform sampling ge-
ometry with a spatially-adaptive coordinate transform, which we call the foveation
coordinate transform. When the transform is applied to the non-uniform retinal
sampling points, uniform sampling density is obtained in the new coordinate sys-
tem. A typically used solution is the logmap transform [13] defined as
w = log(z + a) , (1.1)

1.2. Foveation Methods 5
(a) (b)
Figure 1.4: Application of foveation coordinate transform to images. (a) original
image; (b) transformed image.
where a is a constant, and z and w are complex numbers representing the positions
in the original coordinate and the transformed coordinate, respectively. While
the logmap transform is empirical, it is shown in [34] that precise mathematical
solutions of the foveation coordinate transforms may be derived directly from given
retinal sampling distributions.
The foveated retinal sampling geometry can be used in different ways. The
first method is to apply the foveation coordinate transform directly to a uniform
resolution image, thus the underlying image space is mapped onto the new coor-
dinate system as exemplified by Figure 1.4. In the transform domain, the image
is treated as a uniform resolution image, and regular uniform-resolution image
processing techniques, such as linear and non-linear filtering and compression, are
applied. Finally, the inverse coordinate transform is employed to obtain a “foveat-
edly” processed image. The difficulty with this method is that the image pixels
originally located at integer grids are moved to non-integer positions, making it dif-
ficult to index them. Interpolation and resampling procedures have to be applied
in both the transform and the inverse transform domains. These procedures not
only significantly complicate the system, but may also cause further distortions.
The second approach is the superpixel method [13, 16, 6, 15, 14], in which
local image pixel groups are averaged and mapped into superpixels, whose sizes
are determined by the retinal sampling density. Figure 1.5 shows a sophisticated
superpixel look-up table given in [13], which attempts to adhere with the logmap
structure. However, the number and variation of superpixel shapes make it incon-
venient to manipulate. In [16], a more practical superpixel method is used, where
all the superpixels have rectangular shapes. In [14], a multistage superpixel ap-

Citations
More filters
Journal ArticleDOI

Kernel Foveated Rendering

TL;DR: This paper parameterize foveated rendering by embedding polynomial kernel functions in the classic log-polar mapping, a GPU-driven technique that uses closed-form, parameterized foveation that mimics the distribution of photoreceptors in the human retina.
Journal ArticleDOI

Gaze-Aware Streaming Solutions for the Next Generation of Mobile VR Experiences

TL;DR: The proposed solution aims to deliver high visual quality, in real time, around the users' fixations points while lowering the quality everywhere else while substantially reducing the overall bandwidth requirements for supporting VR video experiences.
Journal ArticleDOI

Accurate and Efficient Method for Smoothly Space-Variant Gaussian Blurring

TL;DR: Experimental results show that the proposed algorithm provides typically 10 to 15 dB better approximation of perfect Gaussian blurring than the blended Gaussian pyramid blurring approach when using a bank of just eight filters.
Journal ArticleDOI

Cloud Gaming with Foveated Video Encoding

TL;DR: A cloud gaming FVE prototype that is game-agnostic and requires no modifications to the underlying game engine is provided and results suggest that it is possible to find a “sweet spot” for the encoding parameters so the users hardly notice the presence of foveated encoding but at the same time the scheme yields most of the achievable bandwidth savings.
Proceedings ArticleDOI

Design and evaluation of a foveated video streaming service for commodity client devices

TL;DR: A multi-resolution video coding approach that is scalable in that it is possible to pre-code the video in a small number of copies for a given set of resolutions and designed to match the error performance of an eye tracker built using commodity webcams.
References
More filters
Book

A wavelet tour of signal processing

TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Journal ArticleDOI

The Laplacian Pyramid as a Compact Image Code

TL;DR: A technique for image encoding in which local operators of many scales but identical shape serve as the basis functions, which tends to enhance salient image features and is well suited for many image analysis tasks as well as for image compression.
Journal ArticleDOI

A new, fast, and efficient image codec based on set partitioning in hierarchical trees

TL;DR: The image coding results, calculated from actual file sizes and images reconstructed by the decoding algorithm, are either comparable to or surpass previous results obtained through much more sophisticated and computationally complex methods.
Journal ArticleDOI

Embedded image coding using zerotrees of wavelet coefficients

TL;DR: The embedded zerotree wavelet algorithm (EZW) is a simple, yet remarkably effective, image compression algorithm, having the property that the bits in the bit stream are generated in order of importance, yielding a fully embedded code.
Book

JPEG2000 : image compression fundamentals, standards, and practice

TL;DR: This work has specific applications for those involved in the development of software and hardware solutions for multimedia, internet, and medical imaging applications.
Related Papers (5)