Information Content Weighting for Perceptual Image Quality Assessment

doi:10.1109/TIP.2010.2092435

Journal Article•DOI•

Information Content Weighting for Perceptual Image Quality Assessment

01 May 2011-IEEE Transactions on Image Processing (IEEE Trans Image Process)-Vol. 20, Iss: 5, pp 1185-1198

TL;DR: This paper aims to test the hypothesis that when viewing natural images, the optimal perceptual weights for pooling should be proportional to local information content, which can be estimated in units of bit using advanced statistical models of natural images.

read less

Abstract: Many state-of-the-art perceptual image quality assessment (IQA) algorithms share a common two-stage structure: local quality/distortion measurement followed by pooling. While significant progress has been made in measuring local image quality/distortion, the pooling stage is often done in ad-hoc ways, lacking theoretical principles and reliable computational models. This paper aims to test the hypothesis that when viewing natural images, the optimal perceptual weights for pooling should be proportional to local information content, which can be estimated in units of bit using advanced statistical models of natural images. Our extensive studies based upon six publicly-available subject-rated image databases concluded with three useful findings. First, information content weighting leads to consistent improvement in the performance of IQA algorithms. Second, surprisingly, with information content weighting, even the widely criticized peak signal-to-noise-ratio can be converted to a competitive perceptual quality measure when compared with state-of-the-art algorithms. Third, the best overall performance is achieved by combining information content weighting with multiscale structural similarity measures.

...read moreread less

Summary (3 min read)

Jump to: [I. INTRODUCTION] – [• Local quality/distortion-based pooling] – [• Saliency-based pooling] – [• Object-based pooling] – [II. INFORMATION CONTENT WEIGHTING] – [A. Information Content Weighted PSNR] – [B. Information Content Weighted MultiScale SSIM] and [C. Interpretation of VIF Based Upon Information Content Weighting]

I. INTRODUCTION

I N RECENT years, there has been an increasing interest in developing objective image quality assessment (IQA) methods that can automatically predict human behaviors in evaluating image quality [1] - [3] .
Spatial domain methods such as the mean squared error (MSE) and the structural similarity (SSIM) index [4] , [5] compute pixelor patch-wise distortion/quality measures in space, while block-discrete cosine transform [6] and wavelet-based [7] - [11] approaches define localized quality/distortion measures across scale, space and orientation.
This is supported by a number of interesting recent studies [14] - [16] , where it has been shown that sizable performance gain can be obtained by combining objective local quality measures with subjective human fixation or region-of-interest detection data.
The existing pooling approaches can be roughly categorized in the following ways.

• Local quality/distortion-based pooling

The intuitive idea that more emphasis should be put at high distortion regions can be implemented in a more straightforward way by local qulaity/distoriton-based pooling.
This can be done by using a nonuniform weighting approach, where the weight may be determined by an error visibility detection map [17] .
It may also be computed using the local quality/distortion measure itself [13] , such that the overall quality/distortion measure is given by (2) where the weighting function is monotonically increasing when is a distortion measure (i.e., larger value indicates higher distortion), and monotonically decreasing when is a quality measure (i.e., larger value indicates higher quality).
Another method to assign more weights to low quality regions is to sort all values and use a small percentile of them that correspond to the lowest quality regions.
Local quality/distortion-based pooling has been shown to be effective in improving IQA performance, as reported in [13] , [19] , though the implementations are often heuristic (for example, in the selection of the weighting function and the percentile), without theoretical guiding principles.

• Saliency-based pooling

Here the authors use "saliency" as a general term that represents low-level local image features that are of perceptual significance (as opposed to high-level components such as human faces).
The motivation behind saliency-based pooling approaches is that visual attention is attracted to distinctive saliency features and, thus, more importance should be given to the associated regions in the image.
This can range from simple features such as local variance [13] or contrast [20] to sophisticated computational models based upon automatic point of gaze predictions from low-level vision features [19] , [21] - [24] .
It has also been found that motion information is another useful feature to use in the pooling stage of video quality assessment algorithms [25] - [27] .

• Object-based pooling

Different from low-level vision based saliency approaches, object-based pooling methods resort to high-level cognitive vision based image understanding algorithms that help detect and/or segment significant regions from the image.
What are lacking are not heuristic tricks but general theoretical principles that are not only qualitative sensible but also quantitative manageable, so that reliable computational models for pooling can be derived.
In essence, their approach is saliency-based, but the resulting weighting function also has interesting connections with quality/distortion-based pooling method, which the authors will discuss later in Section II.
Information theoretic methods are by no means new for IQA.
In fact, their work is inspired by the success of the visual information fidelity (VIF) method [34] , though VIF was not originally proposed for pooling purpose.

II. INFORMATION CONTENT WEIGHTING

The computation of image information content relies on good statistical image models.
The remaining task is, thus, the statistical modeling of groups of neighboring pixels (or coefficients).
To simplify the computation, the authors assume that only takes a fixed value at each location (but varies over space and scale).
This was demonstrated empirically in [34] using an image synthesis ap- proach, where images under different types of distortions were compared with synthesized distortion images using the local attenuation/noise model.
As a result, the mutual information evaluations, and , can be calculated based upon the determinants of the covariances [41] by (13) (14) (15) where ( 16) (17) (18) Equation ( 16) can be simplified based upon the fact that (19) where is the expectation operator and the authors have used the fact that and are independent.

A. Information Content Weighted PSNR

Let and be the th pixel in the original image and the distorted image , respectively.
The MSE and PSNR between the two images are given by MSE (34) PSNR MSE (35) where is the total number of pixels in the image and is the maximum dynamic range.
Here the authors define an information content weighted MSE (IW-MSE) and an information content weighted PSNR (IW-PSNR) measures by incorporating the Laplacican pyramid transform [40] domain information content weights computed as in (28) .

B. Information Content Weighted MultiScale SSIM

The basic spatial domain SSIM algorithm [5] is based upon separated comparisons of local luminance, contrast and structure between an original and a distorted images.
Here, and represent the mean, standard deviation and cross-correlation evaluations, respectively.
It has been found that the performance of the previous single-scale SSIM algorithm depends upon the scale it is applied to [42] and [43] .
Interestingly, the measured weight function peaks at middle-resolution scales and drops at both low-and high-resolution scales, consistent with the contrast sensitivity function extensively studied in the vision literature [12] .
The final overall IW-SSIM measure is then computed as (47) using the same set of scale weights 's as in MS-SSIM.

C. Interpretation of VIF Based Upon Information Content Weighting

Based upon the interpretation in its original publication, the VIF algorithm [34] does not seem to fit into the two-stage framework shown in Fig. 1 , because the information content is summed over the entire image space before the fidelity ratio is computed VIF (48).
Here the authors show that with some simple transformations, VIF indeed can be nicely interpreted using the same two-stage framework.
Specifically, the authors can write VIF VIF (49) where they have defined a local VIF measure (which follows the same philosophy as the general VIF concept [34] ) EQUATION ) and a weighting function (51) Interestingly, this weight definition is essentially an information content measure, although different from what they use in their approach [as in (12) ].

Did you find this useful? Give us your feedback

Figures (9)

Fig. 1. Two-stage structure of IQA systems.

Fig. 4. Computation of local information content maps. (a),(b) Original and distorted images. (c),(d) Corresponding Laplacian pyramid subbands at four scales (enhanced for visualization). (e) Corresponding information content maps computed at four scales (enhanced for visualization). Brighter indicates larger information content.

TABLE III SPEARMAN RANK ORDER CORRELATION COEFFICIENT COMPARISONS FOR INDIVIDUAL DISTORTION TYPES

TABLE II AVERAGE PERFORMANCE OVER SIX DATABASES

Fig. 2. (a) Original image. (b) Distorted image (by JPEG compression). (c) Absolute error map—brighter indicates better quality (smaller absolute difference). (d) SSIM index map—brighter indicates better quality (larger SSIM value).

Fig. 3. Diagram for computing information content.

TABLE V COMPARISONS OF COMPUTATION TIME (IN SECOND/IMAGE)

TABLE IV PEARSON LINEAR CORRELATION COEFFICIENT COMPARISONS FOR LOW AND HIGH QUALITY IMAGES

TABLE I PERFORMANCE COMPARISONS OF 15 IQA ALGORITHMS ON SIX PUBLICLY AVAILABLE IMAGE DATABASES

Content maybe subject to copyright Report

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 5, MAY 2011 1185

Information Content Weighting for Perceptual

Image Quality Assessment

Zhou Wang, Member, IEEE, and Qiang Li, Member, IEEE

Abstract—Many state-of-the-art perceptual image quality as-

sessment (IQA) algorithms share a common two-stage structure:

local quality/distortion measurement followed by pooling. While

signiﬁcant progress has been made in measuring local image

quality/distortion, the pooling stage is often done in ad-hoc ways,

lacking theoretical principles and reliable computational models.

This paper aims to test the hypothesis that when viewing natural

images, the optimal perceptual weights for pooling should be

proportional to local information content, which can be estimated

in units of bit using advanced statistical models of natural images.

Our extensive studies based upon six publicly-available sub-

ject-rated image databases concluded with three useful ﬁndings.

First, information content weighting leads to consistent improve-

ment in the performance of IQA algorithms. Second, surprisingly,

with information content weighting, even the widely criticized

peak signal-to-noise-ratio can be converted to a competitive

perceptual quality measure when compared with state-of-the-art

algorithms. Third, the best overall performance is achieved by

combining information content weighting with multiscale struc-

tural similarity measures.

Index Terms—Gaussian scale mixture (GSM), image quality

assessment (IQA), pooling, information content measure, peak

signal-to-noise-ratio (PSNR), structural similarity (SSIM), statis-

tical image modeling.

I. INTRODUCTION

N RECENT years, there has been an increasing interest

in developing objective image quality assessment (IQA)

methods that can automatically predict human behaviors in

evaluating image quality [1]–[3]. Such perceptual IQA mea-

sures have broad applications in the evaluation, control, design

and optimization of image acquisition, communication, pro-

cessing and display systems. Depending upon the availability

of a “perfect quality” reference image, they may be classiﬁed

into full-reference (FR, where the reference image is fully

accessible when evaluating the distorted image), reduced-refer-

ence (RR, where only partial information about the reference

Manuscript received January 21, 2010; revised June 07, 2010 and

September 06, 2010; accepted November 04, 2010. Date of publication

November 15, 2010; date of current version April 15, 2011. This work was

supported in part by Natural Sciences and Engineering Research Council of

Canada in the forms of Discovery, Strategic and Collaborative Research and

Development (CRD) Grants, and in part by an Ontario Early Researcher

Award. The associate editor coordinating the review of this manuscript and

approving it for publication was Dr. Alex C. Kot.

Z. Wang is with Department of Electrical and Computer Engineering, Uni-

versity of Waterloo, Waterloo, ON, N2L 3G1, Canada (e-mail: zhouwang@

ieee.org).

Q. Li is with Media Excel Inc., Austin, TX, 78759 USA.

Digital Object Identiﬁer 10.1109/TIP.2010.2092435

image is available) and no-reference (NR, where no access to

the reference image is allowed) algorithms [3].

Many state-of-the-art IQA measures (especially FR algo-

rithms) adopted a common two-stage structure, as illustrated in

Fig. 1. In the ﬁrst stage, image quality/distortion is evaluated

locally, where the locality may be deﬁned in space, scale

(or spatial frequency) and orientation. For example, spatial

domain methods such as the mean squared error (MSE) and

the structural similarity (SSIM) index [4], [5] compute pixel-

or patch-wise distortion/quality measures in space, while

block-discrete cosine transform [6] and wavelet-based [7]–[11]

approaches deﬁne localized quality/distortion measures across

scale, space and orientation. Such localized measurement

approaches are consistent with our current understanding about

the human visual system (HVS), where it has been found that

the responses of many neurons in the primary visual cortex are

highly tuned to the stimuli that are “narrow-band” in frequency,

space and orientation [12]. The local measurement process

typically results in a quality/distortion map deﬁned either in

the spatial domain or in the transform domain (e.g., wavelet

subbands). A spatial domain example is shown in Fig. 2. To

assess the quality of a JPEG compressed image (b) given a

reference image (a), two local quality/distortion measures,

absolute error and the SSIM index, were computed, resulting an

absolute error map (c) and an SSIM map (d). Careful inspection

shows that the SSIM index better reﬂects the spatial variations

of perceived image quality. For example, the blockiness in the

sky is clearly indicated in Fig. 2(d) but not in Fig. 2(c). To

convert such quality/distortion maps into a single quality score,

a pooling algorithm is employed in the second stage of the IQA

algorithm.

In the literature, signiﬁcant progress has been made in the de-

sign of the ﬁrst stage, i.e., local quality measurement [1]–[3], but

much less is understood about the pooling stage. The potential

of spatial pooling has been demonstrated by experimenting with

different pooling strategies [13] or optimizing spatially varying

weights to maximize the correlation between objective and sub-

jective image quality ratings [14]. A common hypothesis un-

derlying nearly all existing schemes is that the pooling strategy

should be correlated with human visual ﬁxation or visual re-

gion-of-interest detection. This is supported by a number of in-

teresting recent studies [14]–[16], where it has been shown that

sizable performance gain can be obtained by combining objec-

tive local quality measures with subjective human ﬁxation or

region-of-interest detection data. In practice, however, the sub-

jective data is not available, and the pooling stage is often done

in simplistic or ad-hoc ways, lacking theoretical principles as

the basis for the development of reliable computational models.

1186 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 5, MAY 2011

The existing pooling approaches can be roughly categorized in

the following ways.

• Minkowski pooling

Let

be the local quality/distortion value at the th loca-

tion in the quality/distortion map. The Minkowski summa-

tion is given by

(1)

where

is the total number of samples in the map, and

is the Minkowski exponent. To give a speciﬁc example,

let

represent the absolute error as in Fig. 2(c), then (1)

is directly related to the

norm (subject to a monotonic

nonlinearity). As special cases,

corresponds to the

mean absolute error (MAE), and

to the MSE. As

increases, more emphasis is shifted to the high distortion

regions. Intuitively, this makes sense because when most

distortions in an image is concentrated in a small region

of an image, humans tend to pay more attentions to this

low quality region and give an overall quality score lower

than direct average of the quality map [13]. In the extreme

case

, it converges to , i.e., the measure

is completely determined by the highest distortion point.

In practice, the value of

typically ranges from 1 to 4

[5]–[10]. In [13], it was shown that Minkowski pooling can

help improve the performance of IQA algorithms, but the

best

value depends upon the underlying local metric

and there is no simple method to derive it.

• Local quality/distortion-based pooling

The intuitive idea that more emphasis should be put at high

distortion regions can be implemented in a more straight-

forward way by local qulaity/distoriton-based pooling.

This can be done by using a nonuniform weighting ap-

proach, where the weight may be determined by an error

visibility detection map [17]. It may also be computed

using the local quality/distortion measure itself [13], such

that the overall quality/distortion measure is given by

(2)

where the weighting function

is monotonically in-

creasing when

is a distortion measure (i.e., larger value

indicates higher distortion), and monotonically decreasing

when

is a quality measure (i.e., larger value indicates

higher quality). Another method to assign more weights

to low quality regions is to sort all

values and use a

small percentile of them that correspond to the lowest

quality regions. For example, in [18] and [19], the worst

5% or 6% distortion values were employed in computing

the overall quality scores. Local quality/distortion-based

pooling has been shown to be effective in improving

IQA performance, as reported in [13], [19], though the

implementations are often heuristic (for example, in the

selection of the weighting function

and the per-

centile), without theoretical guiding principles.

Fig. 1. Two-stage structure of IQA systems.

• Saliency-based pooling

Here we use “saliency” as a general term that represents

low-level local image features that are of perceptual signiﬁ-

cance (as opposed to high-level components such as human

faces). The motivation behind saliency-based pooling ap-

proaches is that visual attention is attracted to distinctive

saliency features and, thus, more importance should be

given to the associated regions in the image. A saliency

map

, created by computing saliency at each image

location, can be used as a visual attention predictor, as well

as a weighting function for IQA pooling as follows:

(3)

Given an inﬁnite number of possible saliency features, the

question is what saliency should be used to create

This can range from simple features such as local vari-

ance [13] or contrast [20] to sophisticated computational

models based upon automatic point of gaze predictions

from low-level vision features [19], [21]–[24]. It has also

been found that motion information is another useful fea-

ture to use in the pooling stage of video quality assessment

algorithms [25]–[27].

• Object-based pooling

Different from low-level vision based saliency approaches,

object-based pooling methods resort to high-level cog-

nitive vision based image understanding algorithms that

help detect and/or segment signiﬁcant regions from the

image. A similar weighting approach as in (3) may be

employed, just that the weight map

is generated from

object detection or segmentation algorithms. More weights

can be assigned to segmented foreground objects [28] or

on human faces [26], [29]–[31]. Although object-based

weighting has demonstrated improved performance for

speciﬁc scenarios (e.g., when the image contains distin-

guishable human faces), they may not be easily applied to

general situations where it may not always be an easy task

to ﬁnd distinctive objects that attract visual attention.

In summary, all of the previous pooling strategies are well

motivated and have achieved certain levels of success. Combi-

nations of different strategies have also shown to be a useful

approach [19], [25], [26], [31]. However, the existing pooling

algorithms tend to be ad-hoc, and model parameters are often

set by experimenting with subject-rated image databases. What

are lacking are not heuristic tricks but general theoretical prin-

ciples that are not only qualitative sensible but also quantitative

manageable, so that reliable computational models for pooling

can be derived.

In this research, we look at the IQA pooling problem from

an information theoretic point of view. The general belief is

that the HVS is an optimal information extractor, as widely

WANG AND LI: INFORMATION CONTENT WEIGHTING FOR PERCEPTUAL IQA 1187

Fig. 2. (a) Original image. (b) Distorted image (by JPEG compression). (c) Absolute error map—brighter indicates better quality (smaller absolute difference).

(d) SSIM index map—brighter indicates better quality (larger SSIM value).

hypothesized in computational vision science [32]. To achieve

such optimality, the image components that contain more infor-

mation content would attract more visual attention [33]. Using

statistical information theory, the local information content can

be quantiﬁed in units of bit, provided that a statistical image

model is available. The local information content measure can

then be employed for IQA weighting. In essence, our approach

is saliency-based, but the resulting weighting function also has

interesting connections with quality/distortion-based pooling

method, which we will discuss later in Section II. Information

theoretic methods are by no means new for IQA. In fact,

our work is inspired by the success of the visual information

ﬁdelity (VIF) method [34], though VIF was not originally

proposed for pooling purpose. In [27], based upon statistical

models of Bayesian motion perception [35], motion informa-

tion content and perceptual uncertainty were computed for

video quality assessment. In our preliminary work [13], simple

local information-based weighting demonstrated promising

results for improving IQA performance. In this paper, we build

our information content weighting method upon advanced

statistical image models and combine it with multiscale IQA

methods. This results in superior performance in our extensive

tests using six independent databases, which in turn, provides

strong support of our general hypothesis.

II. I

NFORMATION CONTENT

WEIGHTING

The computation of image information content relies on

good statistical image models. In [13], a rather crude spatial

1188 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 20, NO. 5, MAY 2011

domain local Gaussian model is assumed for spatial pooling

of IQA. Inspired by several recent successful approaches in

image denoising [36] and IQA [34], [37], [38], here we adopt

the Gaussian scale mixture (GSM) model for natural images.

As in many other image models, to reduce the high dimen-

sionality of natural images, a Markov assumption is made that

the probability density of a pixel (or a transform coefﬁcient)

is fully determined by the pixels (coefﬁcients) within a spatial

(and/or scale) neighborhood. The remaining task is, thus,

the statistical modeling of groups of neighboring pixels (or

coefﬁcients). GSM has found to be a powerful model for this

purpose [39], where the neighborhood is typically composed

of a set of neighboring coefﬁcients in a multiresolution image

transform domain. It has been shown that the GSM framework

can be easily adapted to account for the marginal statistics of

multiresolution transform coefﬁcients of natural images, where

the density exhibits strong non-Gaussianity, with sharp peak

at zero and heavy tails [32]. Meanwhile, GSM is also effective

in describing the amplitude-dependency between neighboring

coefﬁcients [39].

Let

be a length- column vector that contains a group of

neighboring transform coefﬁcients (e.g., wavelet or Lapla-

cian pyramid transform [40] coefﬁcients). We model it as a

GSM, which can be expressed as a product of two independent

components

(4)

where

is a zero-mean Gaussian vector with covariance ma-

trix

, and is called a mixing multiplier. The general form

of GSM allows

to be a random variable that has a certain dis-

tribution in a continuous scale. To simplify the computation, we

assume that

only takes a ﬁxed value at each location (but varies

over space and scale). The beneﬁt of this simpliﬁcation is that

when

is ﬁxed and given, is simply a zero-mean Gaussian

vector with covariance

(5)

An important concept that we learned from the information

theoretical IQA approaches [34], [37] is that the information

contained in an image is not equated with the amount of in-

formation perceived by the visual system. The mutual informa-

tion between the images before and after the visual perceptual

channel provides a more useful measure. Following this idea,

we propose a model to compute perceptual information content,

which is illustrated in Fig. 3. First, the reference signal

passes

through a distortion channel, resulting in a distorted signal

(6)

where the distortion is modeled based upon a gain factor

fol-

lowed by additive independent Gaussian noise contamination

with covariance (where represents the iden-

tity matrix). Although this model seems to be over simplistic in

capturing all potential types of distortions such as blocking and

ringing artifacts that often appear in compressed images, it was

claimed to achieve a reasonable balance in terms of the level

of perceptual annoyance across distortion types [34]. This was

demonstrated empirically in [34] using an image synthesis ap-

Fig. 3. Diagram for computing information content.

proach, where images under different types of distortions were

compared with synthesized distortion images using the local at-

tenuation/noise model. Although the real and synthesized dis-

torted images look different in terms of the types of artifacts, the

synthesized images reproduced more reasonably balanced per-

ceptual annoyance than an additive noise-only distortion model

[34]. Stronger and more theoretical justiﬁcations of this distor-

tion model are still yet to be discovered.

Next, both the reference and distorted signals pass through a

perceptual visual noise channel

(7)

(8)

where

and are assumed to be independent white

Gaussian noise with diagonal covariance

This simple one-parameter

visual distortion model aims

to capture the lumped uncertainty of the visual system [34].

Similar to (5), we can then compute the covariance matrices of

and as

(9)

(10)

(11)

Since all the computation in the rest of this section assumes a

ﬁxed and known multiplier

, for notational convenience, we

drop the conditional notation “

” in all the derivations.

Based upon the approach given in [34], at each location, the

information of the original and distorted images perceived by

the visual system can be computed by the mutual information

and , respectively. Here we move one step fur-

ther to estimate the total perceptual information content from

both images. More speciﬁcally, we compute the sum of

and minus the common information shared between

and . This results in a total information content weight mea-

sure given by

(12)

WANG AND LI: INFORMATION CONTENT WEIGHTING FOR PERCEPTUAL IQA 1189

To compute (12), it is useful to be aware that and

are all Gaussian for given ﬁxed . As a result, the mutual

information evaluations,

and , can be

calculated based upon the determinants of the covariances [41]

(13)

(14)

(15)

where

(16)

(17)

(18)

Equation (16) can be simpliﬁed based upon the fact that

(19)

where

is the expectation operator and we have used the fact

that

and are independent. This leads to

(20)

Similarly, we can derive

(21)

(22)

and

(23)

Combining (12), (13), (14), (15), (20), and (23), we can simplify

our information content weight computation to the following

expression:

(24)

Plug (22), (10), and (11) into (18), we have

(25)

To compute the determinant of

, it is useful to apply

an eigenvalue decomposition to the covariance matrix

, where is an orthogonal matrix, and is a diagonal

matrix with eigenvalues

for along its diagonal

entries. Equation (25) can then be expressed as

(26)

Since

is orthogonal and the expression between the two

matrices in (26) is a diagonal matrix, the determinant of

can be easily computed as

(27)

Plug this into (24) and simplify the expression, we obtain

(28)

Although the derivation mentioned here is completely based

upon evaluations of local information content, the resulting

weight function (28) shows some interesting connections with

local distortion/quality-weighted pooling method described in

Section I. In particular, based upon the distortion model (6),

the variations from

to are characterized by the gain factor

and the random distortion . Since is a scale factor along

the signal direction, it does not cause structural changes of

the signal. Therefore, the structural distortions are essentially

captured by

. Note that the weight function (28) increases

monotonically with

. This implies that more weights are

given to the regions with larger distortions, which is in line with

the philosophy behind quality/distortion-weighted pooling.

To ﬁnish the computation in (28), we need to estimate a set of

parameters, including

and . As in [36], we estimate

using

(29)

where

is the number of evaluation windows in the subband,

and

is the th neighborhood coefﬁcient vector. This needs to

be computed only once for each subband. The multiplier

spatially varying and can be estimated using a maximum likeli-

hood estimator [39]

(30)

Finally, the distortion parameters

and can be obtained by

least square regression that optimizes

(31)

Take derivative of the squared error function with respective to

and let it equal zero, we have

(32)

Substitute this into (6), we can estimate

using ,

which leads to

(33)

HTML Viewer

Information Content Weighting for Perceptual Image Quality Assessment

Summary (3 min read)

I. INTRODUCTION

• Local quality/distortion-based pooling

• Saliency-based pooling

• Object-based pooling

II. INFORMATION CONTENT WEIGHTING

A. Information Content Weighted PSNR

B. Information Content Weighted MultiScale SSIM

C. Interpretation of VIF Based Upon Information Content Weighting

Figures (9)

Citations

Cites methods from "Information Content Weighting for P..."

Cites methods from "Information Content Weighting for P..."

Cites methods from "Information Content Weighting for P..."

Cites methods from "Information Content Weighting for P..."

References

Related Papers (5)