scispace - formally typeset
Open AccessJournal ArticleDOI

Down-scaling for better transform compression

TLDR
It is shown how down-sampling an image to a low resolution, then using JPEG at the lower resolution, and subsequently interpolating the result to the original resolution can improve the overall PSNR performance of the compression process.
Abstract
The most popular lossy image compression method used on the Internet is the JPEG standard. JPEG's good compression performance and low computational and memory complexity make it an attractive method for natural image compression. Nevertheless, as we go to low bit rates that imply lower quality, JPEG introduces disturbing artifacts. It is known that, at low bit rates, a down-sampled image, when JPEG compressed, visually beats the high resolution image compressed via JPEG to be represented by the same number of bits. Motivated by this idea, we show how down-sampling an image to a low resolution, then using JPEG at the lower resolution, and subsequently interpolating the result to the original resolution can improve the overall PSNR performance of the compression process. We give an analytical model and a numerical analysis of the down-sampling, compression and up-sampling process, that makes explicit the possible quality/compression trade-offs. We show that the image auto-correlation can provide a good estimate for establishing the down-sampling factor that achieves optimal performance. Given a specific budget of bits, we determine the down-sampling factor necessary to get the best possible recovered image in terms of PSNR.

read more

Content maybe subject to copyright    Report

1132 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 9, SEPTEMBER 2003
Down-Scaling for Better Transform Compression
Alfred M. Bruckstein, Michael Elad, and Ron Kimmel
Abstract—The most popular lossy image compression method
used on the Internet is the JPEG standard. JPEG’s good compres-
sion performance and low computational and memory complexity
make it an attractive method for natural image compression.
Nevertheless, as we go to low bit rates that imply lower quality,
JPEG introduces disturbing artifacts. It is known that at low bit
rates a down-sampled image when JPEG compressed visually
beats the high resolution image compressed via JPEG to be
represented with the same number of bits. Motivated by this
idea, we show how down-sampling an image to a low resolution,
then using JPEG at the lower resolution, and subsequently
interpolating the result to the original resolution can improve the
overall PSNR performance of the compression process. We give an
analytical model and a numerical analysis of the down-sampling,
compression and up-sampling process, that makes explicit the
possible quality/compression trade-offs. We show that the image
auto-correlation can provide good estimate for establishing the
down-sampling factor that achieves optimal performance. Given
a specific budget of bits, we determine the down sampling factor
necessary to get the best possible recovered image in terms of
PSNR.
Index Terms—Bit allocation, image down-sampling, JPEG com-
pression, quantization.
I. INTRODUCTION
T
HE most popular lossy image compression method used
on the Internet is the JPEG standard [1]. Fig. 1 presents a
basic block diagram of the JPEG encoder. JPEG uses the Dis-
crete Cosine Transform (DCT) on image blocks of size 8
8
pixels. The fact that JPEG operates on small blocks is moti-
vated by both computational/memory considerations and the
need to account for the nonstationarity of the image. A quality
measure determines the (uniform) quantization steps for each
of the 64 DCT coefficients. The quantized coefficients of each
block are then zigzag-scanned into one vector that goes through
a run-length coding of the zero sequences, thereby clustering
long insignificant low energy coefficients into short and com-
pact descriptors. Finally, the run-length sequence is fed to an en-
tropy coder, that can be a Huffman coding algorithm with either
a known dictionary or a dictionary extracted from the specific
statistics of the given image. A different alternative supported
by the standard is arithmetic coding.
JPEG’s good middle and high rate compression performance
and low computational and memory complexity make it an at-
Manuscript received May 29, 2001; revised April 22, 2003. The associate ed-
itor coordinating the review of this manuscript and approving it for publication
was Prof. Trac D. Tran.
A. M. Bruckstein and R. Kimmel are with the Computer Science Department,
The Technion—Israel Institute of Technology, Haifa 32000, Israel (e-mail:
freddy@cs.technion.ac.il; ron@cs.technion.ac.il).
M. Elad is with the Computer Science Department—SCCM Program, Stan-
ford University, Stanford, CA 94305 USA (e-mail: elad@sccm.stanford.edu).
Digital Object Identifier 10.1109/TIP.2003.816023
tractive method for natural image compression. Nevertheless, as
we go to low bit rates that imply lower quality, the JPEG com-
pression algorithm introduces disturbing blocking artifacts. It
appears that at low bit rates a down-sampled image when JPEG
compressed and later interpolated, visually beats the high res-
olution image compressed directly via JPEG using the same
number of bits. Whereas this property is known to some in the
industry (see, for example, [9]), it was never explicitly proposed
nor treated in the scientific literature. One might argue however,
that the hierarchical JPEG algorithm implicitly uses this idea
when low bit-rate compression is considered [1].
Let us first establish this interesting property though a
simple experiment, testing the compression-decompression
performance both by visual inspection and quantitative
mean-square-error comparisons. An experimental result dis-
played in Fig. 2 shows indeed that both visually and in terms
of the Mean Square Error (or PSNR), one obtains better results
using down-sampling, compression, and interpolation after the
decompression. Two comments are in order at this stage: i)
throughout this paper, all experiments are done using Matlab
v.6.1. Thus, simple IJG-JPEG is used with fixed quantization
tables, and control over the compression is achieved via the
Quality parameter and ii) throughout this paper, all experi-
ments applying down-sampling use an anti-aliasing pre-filter,
as Matlab 6.1 suggests, through its standard image resizing
function.
Let us explain this behavior from an intuitive perspective.As-
sume that for a given image we use blocks of 8
8 pixels in the
coding procedure. As we allocate too few bits (say 4 bits per
block on average), only the DC coefficients are coded and the
resulting image after decompression consists of essentially con-
stant valued blocks. Such an image will clearly exhibit strong
blocking artifacts. If instead the image is down-sampled by a
factor of 2, the coder is now effectively working with blocks of
16
16 and has an average budget of bits to code
the coefficients.Thus, some bits will be allocated to higher order
DCT coefficients as well, and the blocks will exhibit more de-
tail. Moreover, as we up-scale the image at the decoding stage
we add another improving ingredient to the process, since inter-
polation further blurs the blocking effects. Thus, the down-sam-
pling approach is expected to result in better both visually and
qualitatively outcomes.
In this paper we propose an analytical explanation to
the above phenomenon, along with a practical algorithm
to automatically choose the optimal down-sampling factor
for best PSNR. Following the method outlined in [4], we
derive an analytical model of the compression-decompression
reconstruction error as a function of the memory budget, (i.e.,
the total number of bits) the (statistical) characteristics of the
image, and the down-sampling factor. We show that a simplistic
1057-7149/03$17.00 © 2003 IEEE

BRUCKSTEIN et al.: DOWN-SCALING FOR BETTER TRANSFORM COMPRESSION 1133
Fig. 1. JPEG encoder block diagram.
Fig. 2. Original image (on theleft), JPEG compressed-decompressed image (middle), and down-sampled-JPEG compressed-decompressed and up sampled image
(right). The down-sampling factor was 0.5. The compressed 256
2
256 “Lena” image in both cases used 0.25 bpp inducing MSEsof 219.5 and 193.12, respectively.
The compressed 512
2
512 “Barbara” image in both cases used 0.21 bpp inducing MSEs of 256.04 and 248.42, respectively.
second order statistical model provides good estimates for the
down-sampling factors that achieves optimal performance.
This paper is organized as follows.Sections II–IV present the
analytic model and explore its theoretical implications. In Sec-
tion II we start the analysis by developing a model that describes
the compression-decompression error based on the quantiza-
tion error and the assumption that the image is a realization of
a Markov random field. Section III then introduces the impact
of bit-allocation so as to relate the expected error to the given
bit-budget. In Section IV we first establish several important pa-
rameters used by the model, and then use the obtained formula-
tion in order to graphically describe the trade-offs between the
total bit-budget, the expected error, and the coding block-size.
Section V describes an experimentalsetup that validates the pro-
posed model and its applicability for choosing best down-sam-
pling factor for a given image with a given bits budget. Finally,
Section VI ends the paper with some concluding remarks.
II. A
NALYSIS OF A CONTINUOUS “JPEG-STYLE”IMAGE
REPRESENTATION MODEL
In this section we start building a theoretical model for ana-
lyzing the expected reconstruction error when doing compres-
sion-decompression as a function of the total bits budget, the

1134 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 9, SEPTEMBER 2003
characteristics of the image, and the down-sampling factor. Our
model considers the image over a continuous domain rather then
a discrete one, in order to simplify the derivation. The steps we
follow are as follows.
1) We derive the expected compression-decompression
mean-square-error for a general image representation.
Slicing of the image domain into
by blocks is
assumed.
2) We use the fact that the coding is done in the transform
domain using an orthonormal basis, to derive to error in-
duced due to truncation only.
3) We extend the calculation to account for quantization
error of the nontruncated coefficients.
4) We specialize the image transform to the DCT basis.
5) We introduce an approximation for the quantization error,
as a function of the allocated bits.
6) We explore several possible bit-allocation policies and
introduce the overall bit-budget as a parameter into our
model.
At the end of this process we obtain an expression for the ex-
pected error as a function of the bit budget, scaling factor, and
the image characteristics. This function eventually allows us
to determine the optimal down-sampling factor in JPEG-like
image coding.
A. Compression-Decompression Expected Error
Assume we are given images on the unit square
,
, realizations of a 2-D random
process
, with second order statistics given by
(1)
Note that here we assume that the image is stationary. This is a
marked deviation from the real-life scenario, and this assump-
tion is done mainly to simplify our analysis. Nevertheless, as we
shall hereafter see, the obtained model succeeds in predicting
the down-sampling effect on compression-decompression per-
formance.
We assume that the image domain
is sliced into
regions of the form
Assume that due to our coding of the original image
we obtain the compressed-decompressed result , which
is an approximation of the original image. We can measure the
error in approximating
by as follows:
(2)
where we define
(3)
We shall, of course, be interested in the expected mean square
error of the digitization, i.e.,
(4)
Note that the assumed wide-sense stationarity of the image
process results in the fact that the expression
is independent of , i.e., we have the same expected mean
square error over each slice of the image. Thus, we can write
(5)
Up to now we considered the quality measure to evaluate the
approximation of
in the digitization process. We shall
next consider the set of basis functions needed for representing
over each slice.
B. Bases for Representing
Over Slices
In order to represent the image over each slice
,wehave
to choose an orthonormal basis of functions. Denote this basis
by
. We must have
if
otherwise.
If
is indeed an orthonormal basis then we can write
(6)
as a representation of
over in terms of an infinite
set of coefficients
(7)
Suppose now that we approximate
over by using
only a finite set
of the orthonormal functions , i.e.,
consider
(8)

BRUCKSTEIN et al.: DOWN-SCALING FOR BETTER TRANSFORM COMPRESSION 1135
The optimal coefficients in the approximation above turn out to
be the corresponding
’s from the infinite representation. The
mean square error of this approximation, over
say, will be
(9)
Hence,
(10)
Now the expected
will be
(11)
Hence
(12)
C. Effect of Quantization of the Expansion Coefficient
Suppose that in the approximation
(13)
we can only use a finite number of bits in representing the coeffi-
cients
that take values in R. If is represented/encoded
with
-bits we shall be able to describe it via that takes
on
values only, i.e., set of
representation levels. The error in representing in this way
is
Let us now see how the quantization
errors affect the
.Wehave
(14)
where
. Some alge-
braic steps leads to the following result for the expected
. The expected is therefore
given by
(15)
Hence, in order to evaluate
in a particular representa-
tion when the image is sliced into
pieces and over each
piece we use a subset
of the possible basis functions (i.e.,
) and we quantize the coefficients
with
-bits we have to evaluate
D. An Important Particular Case: Markov Process With
Separable Cosine Bases
We now return to the assumption that the statistics of
is given by (1), namely,
and we choose a separable cosine basis for the slices, i.e., over
, , where
This choice of using the DCT basis is motivated by our desire to
model the JPEG behavior. As is well known [6], the DCT offers
a good approximation of the KLT if the image is modeled as a
2-D random Markov field with very high correlation factor.

1136 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 12, NO. 9, SEPTEMBER 2003
To compute for this case we need to evaluate the vari-
ances of
defined as
(16)
We have
(17)
Therefore, by separating the integrations we obtain
(18)
Changing variables of integration to
yields
(19)
Let us define, for compactness, the following integral:
(20)
Then we see that
(21)
An Appendix at the end of the paper derives
, leading
to the following expression:
(22)
Hence
(23)
E. Incorporating the Effect of Coefficient Quantization
According to rate-distortion theory, if we either assume uni-
form or Gaussian random variables, there is a formula for evalu-
ating the Mean-Square-Error due to quantization. This formula,
known to be accurate at high rates, is given by [3]
(24)
where
is a constant in the range and represents the
number of bits allocated for representing
. Putting the above
results together, we get that the expected mean square error in
representing images from the process
with Markov
statistics, by slicing the image plane into
slices and using,
over each slice, a cosine basis is given by
(25)
This expression gives
in terms of and
-the bits allocated to the coefficients where the subset
of the coefficient is given via
.
III. S
LICING AND BIT-ALLOCATION OPTIMIZATION PROBLEMS
Suppose we consider
(26)

Citations
More filters
Book

Super Resolution of Images and Video

TL;DR: There is clear that there is a strong interplay between the tools and techniques developed for SR and a number of other inverse problems encountered in sig...
Journal ArticleDOI

Adaptive downsampling to improve image compression at low bit rates

TL;DR: This paper presents a new algorithm, based on the adaptive decision of appropriate downsampling directions/ratios and quantization steps, in order to achieve higher coding quality with low bit rates with the consideration of local visual significance.
Journal ArticleDOI

Lossy Point Cloud Geometry Compression via End-to-End Learning

TL;DR: A novel end-to-end Learned Point Cloud Geometry Compression framework, to efficiently compress the point cloud geometry using deep neural networks (DNN) based variational autoencoders (VAE), which exceeds the geometry-based point cloud compression (G-PCC) algorithm standardized by well-known Moving Picture Experts Group (MPEG).
Journal ArticleDOI

Convolutional Neural Network-Based Block Up-Sampling for Intra Frame Coding

TL;DR: A new CNN structure for up-sampling is explored, which features deconvolution of feature maps, multi-scale fusion, and residue learning, making the network both compact and efficient.
Journal ArticleDOI

Learning a Convolutional Neural Network for Image Compact-Resolution

TL;DR: The requirements of image CR are translated into operable optimization targets for training CNN-CR and the visual quality of the compact resolved image is ensured by constraining its difference from a naively downsampled version and the information loss of imageCR is measured by upsampling/super-resolving the compact-resolved image and comparing that to the original image.
References
More filters
Book

Fundamentals of digital image processing

TL;DR: This chapter discusses two Dimensional Systems and Mathematical Preliminaries and their applications in Image Analysis and Computer Vision, as well as image reconstruction from Projections and image enhancement.
Book

Vector Quantization and Signal Compression

TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.
Journal ArticleDOI

High performance scalable image compression with EBCOT

TL;DR: A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT), capable of modeling the spatially varying visual masking phenomenon.
Proceedings ArticleDOI

High performance scalable image compression with EBCOT

TL;DR: A new image compression algorithm is proposed, based on independent embedded block coding with optimized truncation of the embedded bit-streams (EBCOT), capable of modeling the spatially varying visual masking phenomenon.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What have the authors contributed in "Down-scaling for better transform compression" ?

Nevertheless, as the authors go to low bit rates that imply lower quality, JPEG introduces disturbing artifacts. Motivated by this idea, the authors show how down-sampling an image to a low resolution, then using JPEG at the lower resolution, and subsequently interpolating the result to the original resolution can improve the overall PSNR performance of the compression process. The authors show that the image auto-correlation can provide good estimate for establishing the down-sampling factor that achieves optimal performance. 

Further work is required in order explore extensions and implementation issues, such as an efficient estimation for the image statistics, extraction of second order statistics locally and using an hierarchical slicing of the image to various block sizes, and more. Further work is required to replace the approach presented here to a locally adaptive one, as is done naturally by the wavelet coders.