scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Information Content Weighting for Perceptual Image Quality Assessment

01 May 2011-IEEE Transactions on Image Processing (IEEE Trans Image Process)-Vol. 20, Iss: 5, pp 1185-1198
TL;DR: This paper aims to test the hypothesis that when viewing natural images, the optimal perceptual weights for pooling should be proportional to local information content, which can be estimated in units of bit using advanced statistical models of natural images.
Abstract: Many state-of-the-art perceptual image quality assessment (IQA) algorithms share a common two-stage structure: local quality/distortion measurement followed by pooling. While significant progress has been made in measuring local image quality/distortion, the pooling stage is often done in ad-hoc ways, lacking theoretical principles and reliable computational models. This paper aims to test the hypothesis that when viewing natural images, the optimal perceptual weights for pooling should be proportional to local information content, which can be estimated in units of bit using advanced statistical models of natural images. Our extensive studies based upon six publicly-available subject-rated image databases concluded with three useful findings. First, information content weighting leads to consistent improvement in the performance of IQA algorithms. Second, surprisingly, with information content weighting, even the widely criticized peak signal-to-noise-ratio can be converted to a competitive perceptual quality measure when compared with state-of-the-art algorithms. Third, the best overall performance is achieved by combining information content weighting with multiscale structural similarity measures.

Summary (3 min read)

I. INTRODUCTION

  • I N RECENT years, there has been an increasing interest in developing objective image quality assessment (IQA) methods that can automatically predict human behaviors in evaluating image quality [1] - [3] .
  • Spatial domain methods such as the mean squared error (MSE) and the structural similarity (SSIM) index [4] , [5] compute pixelor patch-wise distortion/quality measures in space, while block-discrete cosine transform [6] and wavelet-based [7] - [11] approaches define localized quality/distortion measures across scale, space and orientation.
  • This is supported by a number of interesting recent studies [14] - [16] , where it has been shown that sizable performance gain can be obtained by combining objective local quality measures with subjective human fixation or region-of-interest detection data.
  • The existing pooling approaches can be roughly categorized in the following ways.

• Local quality/distortion-based pooling

  • The intuitive idea that more emphasis should be put at high distortion regions can be implemented in a more straightforward way by local qulaity/distoriton-based pooling.
  • This can be done by using a nonuniform weighting approach, where the weight may be determined by an error visibility detection map [17] .
  • It may also be computed using the local quality/distortion measure itself [13] , such that the overall quality/distortion measure is given by (2) where the weighting function is monotonically increasing when is a distortion measure (i.e., larger value indicates higher distortion), and monotonically decreasing when is a quality measure (i.e., larger value indicates higher quality).
  • Another method to assign more weights to low quality regions is to sort all values and use a small percentile of them that correspond to the lowest quality regions.
  • Local quality/distortion-based pooling has been shown to be effective in improving IQA performance, as reported in [13] , [19] , though the implementations are often heuristic (for example, in the selection of the weighting function and the percentile), without theoretical guiding principles.

• Saliency-based pooling

  • Here the authors use "saliency" as a general term that represents low-level local image features that are of perceptual significance (as opposed to high-level components such as human faces).
  • The motivation behind saliency-based pooling approaches is that visual attention is attracted to distinctive saliency features and, thus, more importance should be given to the associated regions in the image.
  • This can range from simple features such as local variance [13] or contrast [20] to sophisticated computational models based upon automatic point of gaze predictions from low-level vision features [19] , [21] - [24] .
  • It has also been found that motion information is another useful feature to use in the pooling stage of video quality assessment algorithms [25] - [27] .

• Object-based pooling

  • Different from low-level vision based saliency approaches, object-based pooling methods resort to high-level cognitive vision based image understanding algorithms that help detect and/or segment significant regions from the image.
  • What are lacking are not heuristic tricks but general theoretical principles that are not only qualitative sensible but also quantitative manageable, so that reliable computational models for pooling can be derived.
  • In essence, their approach is saliency-based, but the resulting weighting function also has interesting connections with quality/distortion-based pooling method, which the authors will discuss later in Section II.
  • Information theoretic methods are by no means new for IQA.
  • In fact, their work is inspired by the success of the visual information fidelity (VIF) method [34] , though VIF was not originally proposed for pooling purpose.

II. INFORMATION CONTENT WEIGHTING

  • The computation of image information content relies on good statistical image models.
  • The remaining task is, thus, the statistical modeling of groups of neighboring pixels (or coefficients).
  • To simplify the computation, the authors assume that only takes a fixed value at each location (but varies over space and scale).
  • This was demonstrated empirically in [34] using an image synthesis ap- proach, where images under different types of distortions were compared with synthesized distortion images using the local attenuation/noise model.
  • As a result, the mutual information evaluations, and , can be calculated based upon the determinants of the covariances [41] by (13) (14) (15) where ( 16) (17) (18) Equation ( 16) can be simplified based upon the fact that (19) where is the expectation operator and the authors have used the fact that and are independent.

A. Information Content Weighted PSNR

  • Let and be the th pixel in the original image and the distorted image , respectively.
  • The MSE and PSNR between the two images are given by MSE (34) PSNR MSE (35) where is the total number of pixels in the image and is the maximum dynamic range.
  • Here the authors define an information content weighted MSE (IW-MSE) and an information content weighted PSNR (IW-PSNR) measures by incorporating the Laplacican pyramid transform [40] domain information content weights computed as in (28) .

B. Information Content Weighted MultiScale SSIM

  • The basic spatial domain SSIM algorithm [5] is based upon separated comparisons of local luminance, contrast and structure between an original and a distorted images.
  • Here, and represent the mean, standard deviation and cross-correlation evaluations, respectively.
  • It has been found that the performance of the previous single-scale SSIM algorithm depends upon the scale it is applied to [42] and [43] .
  • Interestingly, the measured weight function peaks at middle-resolution scales and drops at both low-and high-resolution scales, consistent with the contrast sensitivity function extensively studied in the vision literature [12] .
  • The final overall IW-SSIM measure is then computed as (47) using the same set of scale weights 's as in MS-SSIM.

C. Interpretation of VIF Based Upon Information Content Weighting

  • Based upon the interpretation in its original publication, the VIF algorithm [34] does not seem to fit into the two-stage framework shown in Fig. 1 , because the information content is summed over the entire image space before the fidelity ratio is computed VIF (48).
  • Here the authors show that with some simple transformations, VIF indeed can be nicely interpreted using the same two-stage framework.
  • Specifically, the authors can write VIF VIF (49) where they have defined a local VIF measure (which follows the same philosophy as the general VIF concept [34] ) EQUATION ) and a weighting function (51) Interestingly, this weight definition is essentially an information content measure, although different from what they use in their approach [as in (12) ].

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 5, MAY 2011 1185
Information Content Weighting for Perceptual
Image Quality Assessment
Zhou Wang, Member, IEEE, and Qiang Li, Member, IEEE
Abstract—Many state-of-the-art perceptual image quality as-
sessment (IQA) algorithms share a common two-stage structure:
local quality/distortion measurement followed by pooling. While
significant progress has been made in measuring local image
quality/distortion, the pooling stage is often done in ad-hoc ways,
lacking theoretical principles and reliable computational models.
This paper aims to test the hypothesis that when viewing natural
images, the optimal perceptual weights for pooling should be
proportional to local information content, which can be estimated
in units of bit using advanced statistical models of natural images.
Our extensive studies based upon six publicly-available sub-
ject-rated image databases concluded with three useful findings.
First, information content weighting leads to consistent improve-
ment in the performance of IQA algorithms. Second, surprisingly,
with information content weighting, even the widely criticized
peak signal-to-noise-ratio can be converted to a competitive
perceptual quality measure when compared with state-of-the-art
algorithms. Third, the best overall performance is achieved by
combining information content weighting with multiscale struc-
tural similarity measures.
Index Terms—Gaussian scale mixture (GSM), image quality
assessment (IQA), pooling, information content measure, peak
signal-to-noise-ratio (PSNR), structural similarity (SSIM), statis-
tical image modeling.
I. INTRODUCTION
I
N RECENT years, there has been an increasing interest
in developing objective image quality assessment (IQA)
methods that can automatically predict human behaviors in
evaluating image quality [1]–[3]. Such perceptual IQA mea-
sures have broad applications in the evaluation, control, design
and optimization of image acquisition, communication, pro-
cessing and display systems. Depending upon the availability
of a “perfect quality” reference image, they may be classified
into full-reference (FR, where the reference image is fully
accessible when evaluating the distorted image), reduced-refer-
ence (RR, where only partial information about the reference
Manuscript received January 21, 2010; revised June 07, 2010 and
September 06, 2010; accepted November 04, 2010. Date of publication
November 15, 2010; date of current version April 15, 2011. This work was
supported in part by Natural Sciences and Engineering Research Council of
Canada in the forms of Discovery, Strategic and Collaborative Research and
Development (CRD) Grants, and in part by an Ontario Early Researcher
Award. The associate editor coordinating the review of this manuscript and
approving it for publication was Dr. Alex C. Kot.
Z. Wang is with Department of Electrical and Computer Engineering, Uni-
versity of Waterloo, Waterloo, ON, N2L 3G1, Canada (e-mail: zhouwang@
ieee.org).
Q. Li is with Media Excel Inc., Austin, TX, 78759 USA.
Digital Object Identifier 10.1109/TIP.2010.2092435
image is available) and no-reference (NR, where no access to
the reference image is allowed) algorithms [3].
Many state-of-the-art IQA measures (especially FR algo-
rithms) adopted a common two-stage structure, as illustrated in
Fig. 1. In the first stage, image quality/distortion is evaluated
locally, where the locality may be defined in space, scale
(or spatial frequency) and orientation. For example, spatial
domain methods such as the mean squared error (MSE) and
the structural similarity (SSIM) index [4], [5] compute pixel-
or patch-wise distortion/quality measures in space, while
block-discrete cosine transform [6] and wavelet-based [7]–[11]
approaches define localized quality/distortion measures across
scale, space and orientation. Such localized measurement
approaches are consistent with our current understanding about
the human visual system (HVS), where it has been found that
the responses of many neurons in the primary visual cortex are
highly tuned to the stimuli that are “narrow-band” in frequency,
space and orientation [12]. The local measurement process
typically results in a quality/distortion map defined either in
the spatial domain or in the transform domain (e.g., wavelet
subbands). A spatial domain example is shown in Fig. 2. To
assess the quality of a JPEG compressed image (b) given a
reference image (a), two local quality/distortion measures,
absolute error and the SSIM index, were computed, resulting an
absolute error map (c) and an SSIM map (d). Careful inspection
shows that the SSIM index better reflects the spatial variations
of perceived image quality. For example, the blockiness in the
sky is clearly indicated in Fig. 2(d) but not in Fig. 2(c). To
convert such quality/distortion maps into a single quality score,
a pooling algorithm is employed in the second stage of the IQA
algorithm.
In the literature, significant progress has been made in the de-
sign of the first stage, i.e., local quality measurement [1]–[3], but
much less is understood about the pooling stage. The potential
of spatial pooling has been demonstrated by experimenting with
different pooling strategies [13] or optimizing spatially varying
weights to maximize the correlation between objective and sub-
jective image quality ratings [14]. A common hypothesis un-
derlying nearly all existing schemes is that the pooling strategy
should be correlated with human visual fixation or visual re-
gion-of-interest detection. This is supported by a number of in-
teresting recent studies [14]–[16], where it has been shown that
sizable performance gain can be obtained by combining objec-
tive local quality measures with subjective human fixation or
region-of-interest detection data. In practice, however, the sub-
jective data is not available, and the pooling stage is often done
in simplistic or ad-hoc ways, lacking theoretical principles as
the basis for the development of reliable computational models.
1057-7149/$26.00 © 2010 IEEE

1186 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 5, MAY 2011
The existing pooling approaches can be roughly categorized in
the following ways.
Minkowski pooling
Let
be the local quality/distortion value at the th loca-
tion in the quality/distortion map. The Minkowski summa-
tion is given by
(1)
where
is the total number of samples in the map, and
is the Minkowski exponent. To give a specific example,
let
represent the absolute error as in Fig. 2(c), then (1)
is directly related to the
norm (subject to a monotonic
nonlinearity). As special cases,
corresponds to the
mean absolute error (MAE), and
to the MSE. As
increases, more emphasis is shifted to the high distortion
regions. Intuitively, this makes sense because when most
distortions in an image is concentrated in a small region
of an image, humans tend to pay more attentions to this
low quality region and give an overall quality score lower
than direct average of the quality map [13]. In the extreme
case
, it converges to , i.e., the measure
is completely determined by the highest distortion point.
In practice, the value of
typically ranges from 1 to 4
[5]–[10]. In [13], it was shown that Minkowski pooling can
help improve the performance of IQA algorithms, but the
best
value depends upon the underlying local metric
and there is no simple method to derive it.
Local quality/distortion-based pooling
The intuitive idea that more emphasis should be put at high
distortion regions can be implemented in a more straight-
forward way by local qulaity/distoriton-based pooling.
This can be done by using a nonuniform weighting ap-
proach, where the weight may be determined by an error
visibility detection map [17]. It may also be computed
using the local quality/distortion measure itself [13], such
that the overall quality/distortion measure is given by
(2)
where the weighting function
is monotonically in-
creasing when
is a distortion measure (i.e., larger value
indicates higher distortion), and monotonically decreasing
when
is a quality measure (i.e., larger value indicates
higher quality). Another method to assign more weights
to low quality regions is to sort all
values and use a
small percentile of them that correspond to the lowest
quality regions. For example, in [18] and [19], the worst
5% or 6% distortion values were employed in computing
the overall quality scores. Local quality/distortion-based
pooling has been shown to be effective in improving
IQA performance, as reported in [13], [19], though the
implementations are often heuristic (for example, in the
selection of the weighting function
and the per-
centile), without theoretical guiding principles.
Fig. 1. Two-stage structure of IQA systems.
Saliency-based pooling
Here we use “saliency” as a general term that represents
low-level local image features that are of perceptual signifi-
cance (as opposed to high-level components such as human
faces). The motivation behind saliency-based pooling ap-
proaches is that visual attention is attracted to distinctive
saliency features and, thus, more importance should be
given to the associated regions in the image. A saliency
map
, created by computing saliency at each image
location, can be used as a visual attention predictor, as well
as a weighting function for IQA pooling as follows:
(3)
Given an infinite number of possible saliency features, the
question is what saliency should be used to create
.
This can range from simple features such as local vari-
ance [13] or contrast [20] to sophisticated computational
models based upon automatic point of gaze predictions
from low-level vision features [19], [21]–[24]. It has also
been found that motion information is another useful fea-
ture to use in the pooling stage of video quality assessment
algorithms [25]–[27].
Object-based pooling
Different from low-level vision based saliency approaches,
object-based pooling methods resort to high-level cog-
nitive vision based image understanding algorithms that
help detect and/or segment significant regions from the
image. A similar weighting approach as in (3) may be
employed, just that the weight map
is generated from
object detection or segmentation algorithms. More weights
can be assigned to segmented foreground objects [28] or
on human faces [26], [29]–[31]. Although object-based
weighting has demonstrated improved performance for
specific scenarios (e.g., when the image contains distin-
guishable human faces), they may not be easily applied to
general situations where it may not always be an easy task
to find distinctive objects that attract visual attention.
In summary, all of the previous pooling strategies are well
motivated and have achieved certain levels of success. Combi-
nations of different strategies have also shown to be a useful
approach [19], [25], [26], [31]. However, the existing pooling
algorithms tend to be ad-hoc, and model parameters are often
set by experimenting with subject-rated image databases. What
are lacking are not heuristic tricks but general theoretical prin-
ciples that are not only qualitative sensible but also quantitative
manageable, so that reliable computational models for pooling
can be derived.
In this research, we look at the IQA pooling problem from
an information theoretic point of view. The general belief is
that the HVS is an optimal information extractor, as widely

WANG AND LI: INFORMATION CONTENT WEIGHTING FOR PERCEPTUAL IQA 1187
Fig. 2. (a) Original image. (b) Distorted image (by JPEG compression). (c) Absolute error map—brighter indicates better quality (smaller absolute difference).
(d) SSIM index map—brighter indicates better quality (larger SSIM value).
hypothesized in computational vision science [32]. To achieve
such optimality, the image components that contain more infor-
mation content would attract more visual attention [33]. Using
statistical information theory, the local information content can
be quantified in units of bit, provided that a statistical image
model is available. The local information content measure can
then be employed for IQA weighting. In essence, our approach
is saliency-based, but the resulting weighting function also has
interesting connections with quality/distortion-based pooling
method, which we will discuss later in Section II. Information
theoretic methods are by no means new for IQA. In fact,
our work is inspired by the success of the visual information
fidelity (VIF) method [34], though VIF was not originally
proposed for pooling purpose. In [27], based upon statistical
models of Bayesian motion perception [35], motion informa-
tion content and perceptual uncertainty were computed for
video quality assessment. In our preliminary work [13], simple
local information-based weighting demonstrated promising
results for improving IQA performance. In this paper, we build
our information content weighting method upon advanced
statistical image models and combine it with multiscale IQA
methods. This results in superior performance in our extensive
tests using six independent databases, which in turn, provides
strong support of our general hypothesis.
II. I
NFORMATION CONTENT
WEIGHTING
The computation of image information content relies on
good statistical image models. In [13], a rather crude spatial

1188 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 5, MAY 2011
domain local Gaussian model is assumed for spatial pooling
of IQA. Inspired by several recent successful approaches in
image denoising [36] and IQA [34], [37], [38], here we adopt
the Gaussian scale mixture (GSM) model for natural images.
As in many other image models, to reduce the high dimen-
sionality of natural images, a Markov assumption is made that
the probability density of a pixel (or a transform coefficient)
is fully determined by the pixels (coefficients) within a spatial
(and/or scale) neighborhood. The remaining task is, thus,
the statistical modeling of groups of neighboring pixels (or
coefficients). GSM has found to be a powerful model for this
purpose [39], where the neighborhood is typically composed
of a set of neighboring coefficients in a multiresolution image
transform domain. It has been shown that the GSM framework
can be easily adapted to account for the marginal statistics of
multiresolution transform coefficients of natural images, where
the density exhibits strong non-Gaussianity, with sharp peak
at zero and heavy tails [32]. Meanwhile, GSM is also effective
in describing the amplitude-dependency between neighboring
coefficients [39].
Let
be a length- column vector that contains a group of
neighboring transform coefficients (e.g., wavelet or Lapla-
cian pyramid transform [40] coefficients). We model it as a
GSM, which can be expressed as a product of two independent
components
(4)
where
is a zero-mean Gaussian vector with covariance ma-
trix
, and is called a mixing multiplier. The general form
of GSM allows
to be a random variable that has a certain dis-
tribution in a continuous scale. To simplify the computation, we
assume that
only takes a fixed value at each location (but varies
over space and scale). The benefit of this simplification is that
when
is fixed and given, is simply a zero-mean Gaussian
vector with covariance
(5)
An important concept that we learned from the information
theoretical IQA approaches [34], [37] is that the information
contained in an image is not equated with the amount of in-
formation perceived by the visual system. The mutual informa-
tion between the images before and after the visual perceptual
channel provides a more useful measure. Following this idea,
we propose a model to compute perceptual information content,
which is illustrated in Fig. 3. First, the reference signal
passes
through a distortion channel, resulting in a distorted signal
(6)
where the distortion is modeled based upon a gain factor
fol-
lowed by additive independent Gaussian noise contamination
with covariance (where represents the iden-
tity matrix). Although this model seems to be over simplistic in
capturing all potential types of distortions such as blocking and
ringing artifacts that often appear in compressed images, it was
claimed to achieve a reasonable balance in terms of the level
of perceptual annoyance across distortion types [34]. This was
demonstrated empirically in [34] using an image synthesis ap-
Fig. 3. Diagram for computing information content.
proach, where images under different types of distortions were
compared with synthesized distortion images using the local at-
tenuation/noise model. Although the real and synthesized dis-
torted images look different in terms of the types of artifacts, the
synthesized images reproduced more reasonably balanced per-
ceptual annoyance than an additive noise-only distortion model
[34]. Stronger and more theoretical justifications of this distor-
tion model are still yet to be discovered.
Next, both the reference and distorted signals pass through a
perceptual visual noise channel
(7)
(8)
where
and are assumed to be independent white
Gaussian noise with diagonal covariance
.
This simple one-parameter
visual distortion model aims
to capture the lumped uncertainty of the visual system [34].
Similar to (5), we can then compute the covariance matrices of
and as
(9)
(10)
(11)
Since all the computation in the rest of this section assumes a
fixed and known multiplier
, for notational convenience, we
drop the conditional notation
in all the derivations.
Based upon the approach given in [34], at each location, the
information of the original and distorted images perceived by
the visual system can be computed by the mutual information
and , respectively. Here we move one step fur-
ther to estimate the total perceptual information content from
both images. More specifically, we compute the sum of
and minus the common information shared between
and . This results in a total information content weight mea-
sure given by
(12)

WANG AND LI: INFORMATION CONTENT WEIGHTING FOR PERCEPTUAL IQA 1189
To compute (12), it is useful to be aware that and
are all Gaussian for given fixed . As a result, the mutual
information evaluations,
and , can be
calculated based upon the determinants of the covariances [41]
by
(13)
(14)
(15)
where
(16)
(17)
(18)
Equation (16) can be simplified based upon the fact that
(19)
where
is the expectation operator and we have used the fact
that
and are independent. This leads to
(20)
Similarly, we can derive
(21)
(22)
and
(23)
Combining (12), (13), (14), (15), (20), and (23), we can simplify
our information content weight computation to the following
expression:
(24)
Plug (22), (10), and (11) into (18), we have
(25)
To compute the determinant of
, it is useful to apply
an eigenvalue decomposition to the covariance matrix
, where is an orthogonal matrix, and is a diagonal
matrix with eigenvalues
for along its diagonal
entries. Equation (25) can then be expressed as
(26)
Since
is orthogonal and the expression between the two
matrices in (26) is a diagonal matrix, the determinant of
can be easily computed as
(27)
Plug this into (24) and simplify the expression, we obtain
(28)
Although the derivation mentioned here is completely based
upon evaluations of local information content, the resulting
weight function (28) shows some interesting connections with
local distortion/quality-weighted pooling method described in
Section I. In particular, based upon the distortion model (6),
the variations from
to are characterized by the gain factor
and the random distortion . Since is a scale factor along
the signal direction, it does not cause structural changes of
the signal. Therefore, the structural distortions are essentially
captured by
. Note that the weight function (28) increases
monotonically with
. This implies that more weights are
given to the regions with larger distortions, which is in line with
the philosophy behind quality/distortion-weighted pooling.
To finish the computation in (28), we need to estimate a set of
parameters, including
and . As in [36], we estimate
using
(29)
where
is the number of evaluation windows in the subband,
and
is the th neighborhood coefficient vector. This needs to
be computed only once for each subband. The multiplier
is
spatially varying and can be estimated using a maximum likeli-
hood estimator [39]
(30)
Finally, the distortion parameters
and can be obtained by
least square regression that optimizes
(31)
Take derivative of the squared error function with respective to
and let it equal zero, we have
(32)
Substitute this into (6), we can estimate
using ,
which leads to
(33)

Citations
More filters
Journal ArticleDOI
TL;DR: A novel feature similarity (FSIM) index for full reference IQA is proposed based on the fact that human visual system (HVS) understands an image mainly according to its low-level features.
Abstract: Image quality assessment (IQA) aims to use computational models to measure the image quality consistently with subjective evaluations. The well-known structural similarity index brings IQA from pixel- to structure-based stage. In this paper, a novel feature similarity (FSIM) index for full reference IQA is proposed based on the fact that human visual system (HVS) understands an image mainly according to its low-level features. Specifically, the phase congruency (PC), which is a dimensionless measure of the significance of a local structure, is used as the primary feature in FSIM. Considering that PC is contrast invariant while the contrast information does affect HVS' perception of image quality, the image gradient magnitude (GM) is employed as the secondary feature in FSIM. PC and GM play complementary roles in characterizing the image local quality. After obtaining the local quality map, we use PC again as a weighting function to derive a single quality score. Extensive experiments performed on six benchmark IQA databases demonstrate that FSIM can achieve much higher consistency with the subjective evaluations than state-of-the-art IQA metrics.

4,028 citations

Journal ArticleDOI
TL;DR: It is shown that the quality of the results improves significantly with better loss functions, even when the network architecture is left unchanged, and a novel, differentiable error function is proposed.
Abstract: Neural networks are becoming central in several areas of computer vision and image processing and different architectures have been proposed to solve specific problems. The impact of the loss layer of neural networks, however, has not received much attention in the context of image processing: the default and virtually only choice is $\ell _2$ . In this paper, we bring attention to alternative choices for image restoration. In particular, we show the importance of perceptually-motivated losses when the resulting image is to be evaluated by a human observer. We compare the performance of several losses, and propose a novel, differentiable error function. We show that the quality of the results improves significantly with better loss functions, even when the network architecture is left unchanged.

1,758 citations


Cites methods from "Information Content Weighting for P..."

  • ...One of these is the Information Weigthed SSIM (IW-SSIM), a modification of MS-SSIM that also includes a weighting scheme proportional to the local image information [20]....

    [...]

Journal ArticleDOI
TL;DR: It is found that the pixel-wise gradient magnitude similarity (GMS) between the reference and distorted images combined with a novel pooling strategy-the standard deviation of the GMS map-can predict accurately perceptual image quality.
Abstract: It is an important task to faithfully evaluate the perceptual quality of output images in many applications, such as image compression, image restoration, and multimedia streaming. A good image quality assessment (IQA) model should not only deliver high quality prediction accuracy, but also be computationally efficient. The efficiency of IQA metrics is becoming particularly important due to the increasing proliferation of high-volume visual data in high-speed networks. We present a new effective and efficient IQA model, called gradient magnitude similarity deviation (GMSD). The image gradients are sensitive to image distortions, while different local structures in a distorted image suffer different degrees of degradations. This motivates us to explore the use of global variation of gradient based local quality map for overall image quality prediction. We find that the pixel-wise gradient magnitude similarity (GMS) between the reference and distorted images combined with a novel pooling strategy-the standard deviation of the GMS map-can predict accurately perceptual image quality. The resulting GMSD algorithm is much faster than most state-of-the-art IQA methods, and delivers highly competitive prediction accuracy. MATLAB source code of GMSD can be downloaded at http://www4.comp.polyu.edu.hk/~cslzhang/IQA/GMSD/GMSD.htm.

1,211 citations


Cites methods from "Information Content Weighting for P..."

  • ...With the gradient magnitude images mr and md in hand, the gradient magnitude similarity (GMS) map is computed as follows: 2 2 2 ( ) ( ) ( ) ( ) ( ) r d r d i i c GMS i i i c m m m m (4) where c is a positive constant that supplies numerical stability, L is the range of the image intensity....

    [...]

  • ...Let’s use some examples to analyze the GMS induced LQM....

    [...]

Journal ArticleDOI
TL;DR: This paper describes a recently created image database, TID2013, intended for evaluation of full-reference visual quality assessment metrics, and methodology for determining drawbacks of existing visual quality metrics is described.
Abstract: This paper describes a recently created image database, TID2013, intended for evaluation of full-reference visual quality assessment metrics. With respect to TID2008, the new database contains a larger number (3000) of test images obtained from 25 reference images, 24 types of distortions for each reference image, and 5 levels for each type of distortion. Motivations for introducing 7 new types of distortions and one additional level of distortions are given; examples of distorted images are presented. Mean opinion scores (MOS) for the new database have been collected by performing 985 subjective experiments with volunteers (observers) from five countries (Finland, France, Italy, Ukraine, and USA). The availability of MOS allows the use of the designed database as a fundamental tool for assessing the effectiveness of visual quality. Furthermore, existing visual quality metrics have been tested with the proposed database and the collected results have been analyzed using rank order correlation coefficients between MOS and considered metrics. These correlation indices have been obtained both considering the full set of distorted images and specific image subsets, for highlighting advantages and drawbacks of existing, state of the art, quality metrics. Approaches to thorough performance analysis for a given metric are presented to detect practical situations or distortion types for which this metric is not adequate enough to human perception. The created image database and the collected MOS values are freely available for downloading and utilization for scientific purposes. We have created a new large database.This database contains larger number of distorted images and distortion types.MOS values for all images are obtained and provided.Analysis of correlation between MOS and a wide set of existing metrics is carried out.Methodology for determining drawbacks of existing visual quality metrics is described.

943 citations


Cites methods from "Information Content Weighting for P..."

  • ...Correspondence to HVS has been evaluated for the following metrics (quality indices): SFF [44], componentwise FSIM and its color version FSIMc [20], PSNR-HA and PSNR-HMA [43], SR-SIM [45], MSSIM [46], MAD index [27], IW-SSIM [19], MSDDM [47], IW-PSNR [19], color version of PSNR which takes into account color in a manner similar to PSNR-HA [43], VSNR [48], PSNR-HVS [49], PSNR-HVS-M [40], SSIM [9], NQM [50], DCTune [51], VIF and a pixel based version of VIF (VIFP) [52], UQI [53], WSNR [54], CWSSIM [55], XYZ [56], LINLAB [57], IFC [58], BMMF [59]....

    [...]

Proceedings ArticleDOI
23 Jun 2014
TL;DR: A Convolutional Neural Network is described to accurately predict image quality without a reference image to achieve state of the art performance on the LIVE dataset and shows excellent generalization ability in cross dataset experiments.
Abstract: In this work we describe a Convolutional Neural Network (CNN) to accurately predict image quality without a reference image. Taking image patches as input, the CNN works in the spatial domain without using hand-crafted features that are employed by most previous methods. The network consists of one convolutional layer with max and min pooling, two fully connected layers and an output node. Within the network structure, feature learning and regression are integrated into one optimization process, which leads to a more effective model for estimating image quality. This approach achieves state of the art performance on the LIVE dataset and shows excellent generalization ability in cross dataset experiments. Further experiments on images with local distortions demonstrate the local quality estimation ability of our CNN, which is rarely reported in previous literature.

942 citations


Cites methods from "Information Content Weighting for P..."

  • ...When reference images are available, Full Reference (FR) IQA methods [14, 22, 16, 17, 19] can be ap-...

    [...]

References
More filters
Book
01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

45,034 citations

Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Journal ArticleDOI
TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Abstract: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail.

10,525 citations

01 Jan 1998
TL;DR: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented, which breaks down the complex problem of scene understanding by rapidly selecting conspicuous locations to be analyzed in detail.

8,566 citations

Journal ArticleDOI
TL;DR: A technique for image encoding in which local operators of many scales but identical shape serve as the basis functions, which tends to enhance salient image features and is well suited for many image analysis tasks as well as for image compression.
Abstract: We describe a technique for image encoding in which local operators of many scales but identical shape serve as the basis functions. The representation differs from established techniques in that the code elements are localized in spatial frequency as well as in space. Pixel-to-pixel correlations are first removed by subtracting a lowpass filtered copy of the image from the image itself. The result is a net data compression since the difference, or error, image has low variance and entropy, and the low-pass filtered image may represented at reduced sample density. Further data compression is achieved by quantizing the difference image. These steps are then repeated to compress the low-pass image. Iteration of the process at appropriately expanded scales generates a pyramid data structure. The encoding process is equivalent to sampling the image with Laplacian operators of many scales. Thus, the code tends to enhance salient image features. A further advantage of the present code is that it is well suited for many image analysis tasks as well as for image compression. Fast algorithms are described for coding and decoding.

6,975 citations