What is the slope factor for a SSIM estimator?

More specifically, an RR-SSIM estimator can be written asŜ = 1− αDn, (6)where α is the slope factor that needs to be learned from training images.

What is the standard deviation of the DNT coefficients?

Let σr and σd be the vectors containing the standard deviation σ values of the DNT coefficients from each subband in the reference and distorted images, respectively.

How can the authors reconstruct the distorted image?

The authors can then compute the wavelet coefficients using ẑinvνrepaired,followed by an inverse wavelet transform to construct the repaired image.

(Open Access) Reduced-reference SSIM estimation (2010) | Abdul Rehman

Q: What contributions have the authors mentioned in the paper "Reduced-reference ssim estimation" ?

Here the authors propose a reduced-reference approach that estimates SSIM with only partial information about the original image. Specifically, the authors extract statistical features from a multi-scale, multi-orientation divisive normalization transform and develop a distortion measure by following the philosophy analogous to that in the construction of SSIM. The authors use the LIVE database to test the proposed distortion measure, which shows strong correlations with both SSIM and subjective evaluations. The authors also demonstrate how their reduced-reference features may be employed to partially repair a distorted image.

Q: How many distortions were used in the training data?

Their training data included 29 reference images altered with 50 levels of distortions for five types of distortions, including Gaussian Blur, JPEG2000 compression, JPEG compression, fast fading channel distortion of JPEG2000 compressed bitstream and white Gaussian noise.

Q: How is the normalization applied to the SSIM?

Division normalization is then applied using 13 neighboring coefficients, including 9 spatial neighbors from the same subband, 1 from parent subband, and 3 from the same spatial location in the other orientation bands at the same scale.

Q: How many features are extracted for each subband?

Three features, σr , kr and d(pm||p), are extracted for each subband, resulting in a total of 36 RR features for a reference image.

Q: What is the way to estimate a distorted image?

By assuming independence between subbands, the subbandlevel distortion measure of Eq. (2) can be combined to provide an overall distortion assessment of the whole image [4]

Q: What is the scale of the distortion measure?

D = log ( 1 + 1D0 K∑ k=1 ∣∣∣d̂k(pk||qk)∣∣∣) , (3) where K is the total number of subbands, pk and qk are the probability distributions of the k-th subband of the reference and distorted images, respectively, d̂k represents the KLD between pk and qk, and D0 is a constant to control the scale of the distortion measure.

REDUCED-REFERENCE SSIM ESTIMATION

Abdul Rehman and Zhou Wang

Dept. of Electrical & Computer Engineering, University of Waterloo, Waterloo, ON, Canada

Email: a5rehman@uwaterloo.ca, zhouwang@ieee.org

ABSTRACT

The structural similarity (SSIM) index has been shown to be a good

perceptual image quality predictor. In many real-world applica-

tions such as network visual communications, however, SSIM is

not applicable because its computation requires full access to the

original image. Here we propose a reduced-reference approach that

estimates SSIM with only partial information about the original im-

age. Speciﬁcally, we extract statistical features from a multi-scale,

multi-orientation divisive normalization transform and develop a dis-

tortion measure by following the philosophy analogous to that in the

construction of SSIM. We found an interesting linear relationship

between our reduced-reference SSIM estimate and full-reference

SSIM when the image distortion type is ﬁxed. A regression-by-

discretization method is then applied to normalize our measure

between image distortion types. We use the LIVE database to test

the proposed distortion measure, which shows strong correlations

with both SSIM and subjective evaluations. We also demonstrate

how our reduced-reference features may be employed to partially

repair a distorted image.

Index Terms— reduced-reference image quality assessment,

structural similarity, natural image statistics, divisive normalization

transform, regression by discretization, image repairing

1. INTRODUCTION

Multimedia contents delivered over networks suffer from various

types of distortions on its way to the destination. It is highly desir-

able to measure the perceptual similarity of the received content with

the original. The structure similarity (SSIM) index [1] was shown to

correlate well with perceived image quality and has found a wide

variety of applications, ranging from image coding, restoration and

fusion, to watermarking and biometrics [2]. However, direct SSIM

evaluation is not possible in practical visual communication appli-

cations, because it is a full-reference (FR) measure that requires the

original image at the receiver [2]. On the other hand, the lack of

knowledge of natural scene statistics and the human visual system

(HVS) creates great challenge for no-reference image quality assess-

ment (NR-IQA), especially for the general-purpose case. Reduced-

reference (RR) IQA, a compromise between FR and NR, is designed

to employ only a set of RR features extracted from the reference im-

age for quality evaluation of the distorted image at the receiver [2].

To the best of our knowledge, only a few schemes have been

presented in the literature for general purpose RR-IQA. In [3], the

marginal distribution of wavelet subband coefﬁcients is modeled us-

ing a generalized Gaussian density function, and the variations of

marginal distributions are used to quantify image distortion. This led

to an effective RR-IQA method with low RR data rate. This scheme

was further improved in [4] by making use of a divisive normal-

ization transform (DNT). An RR video SSIM metric was proposed

in [5] for quantifying visual degradations caused by channel trans-

mission error. It is based on local spatial statistical features and uses

distributed source coding techniques to reduce the required band-

width to transmit RR features, though the resulting RR data rate is

still much higher than those in [3] and [4].

In this paper, we propose a new approach for the design of low

data rate general-purpose RR-IQA method. Instead of directly con-

structing an RR algorithm to predict subjective quality evaluations,

we develop our method as an attempt to estimate SSIM. The beneﬁts

of this approach are twofold. First, the successful design principle

in the construction of SSIM can be naturally incorporated into the

development of our algorithm. Second, when the algorithm design

involves a supervised machine learning stage, it is much easier to

obtain training data, because SSIM can be readily computed, as op-

posed to the expensive and time-consuming subjective tests. Our

experiments using the LIVE database [6] show that this is a useful

approach, as the resulting RR-SSIM estimator exhibits good perfor-

mance in predicting not only FR-SSIM, but also subjective scores.

Moreover, we also use a simple image deblurring example to show

that the RR features employed in our approach can be employed to

partially repair a distorted image.

2. RR SSIM ESTIMATION

The proposed RR-SSIM estimation algorithm starts from a feature

extraction process of the reference image based on a multi-scale

multi-orientation divisive normalization transform (DNT). Divisive

normalization was found to be a simple but effective mechanism to

account for many neuronal behaviors in biological perceptual sys-

tems [7]. In [7], a DNT is deﬁned by using a Gaussian scale mixture

(GSM) model of image wavelet coefﬁcients. A vector Y of length

N is a GSM if it can be represented as the product of two inde-

pendent components: Y ˙=zU, where z is a scalar random variable

called mixing multiplier, and U is a zero-mean Gaussian distributed

random vector with covariance C

. It was found that the histogram

of normalized wavelet coefﬁcient vector, ν = Y /ˆz, can be modeled

by a zero-mean Gaussian density function [7], where ˆz is a local

estimation of the multiplier z using a maximum-likelihood estima-

tor [7]:

ˆz =

−1

Y/N. (1)

As a result, the DNT coefﬁcient distribution of each subband is

characterized by a single parameter σ, the standard deviation of the

Gaussian distribution. This provides a very efﬁcient summary of

the reference image. In addition to σ, the Kullback-Leibler diver-

gence (KLD) between model Gaussian distribution, p

(x), and the

true probability distribution of the DNT-domain coefﬁcients, p(x),

denoted by d(p

||p) is extracted as the second feature for each sub-

band. The subband distortion of the distorted image can be evaluated

IEEE Inter. Conf. Image Processing, Hong Kong, China, Sept. 26-29, 2010.

by the KLD between the probability distribution of the original im-

age, p(x), and that of the distorted image, q(x):

d(p||q) = d(p

||q) − d(p

||p) , (2)

where d(p

||q) is the KLD between the model Gaussian distribution

and the distribution computed from the distorted image. As demon-

strated in [3, 4], different types of distortions affect the statistics of

the reference image in a different manner, but are all summarized in

Eq. (2) to a single distortion measure.

By assuming independence between subbands, the subband-

level distortion measure of Eq. (2) can be combined to provide an

overall distortion assessment of the whole image [4]

D = log

1 +

k=1



||q

)



, (3)

where K is the total number of subbands, p

and q

are the proba-

bility distributions of the k-th subband of the reference and distorted

images, respectively,

represents the KLD between p

and q

, and

is a constant to control the scale of the distortion measure.

(a) (b)

(c)

(e)

(f)

(a)

Philosophy behind SSIM

(b)

Fig. 1. Equal-distortion contours with respect to the central reference

vectors. (a) MSE measure; (b) SSIM measure.

The limitation of the measure in Eq. (3) is that it does not take

into account the relationship (or structures) between the distortions

across different subbands. Such distortion structure is a critical issue

behind the philosophy of the SSIM approach [1], which attempts to

distinguish structural and non-structural distortions. Figure 1 pro-

vides a graphical explanation in the vector space of image com-

ponents, where the image components can be pixels, wavelet co-

efﬁcients, or extracted features from the reference image. For the

purpose of illustration, two-dimensional diagrams are shown here.

However, the actual dimensions may be equal to the number of pix-

els or features being compared. In the graphs for both MSE and

SSIM measures, we use three vectors to represent three reference

images, and the contour around each vector represents the set of im-

ages that have the same level of distortion with respect to the ref-

erence. Unlike the MSE metric, SSIM is totally adaptive according

to the reference signal. In particular, if the distortion is consistent

with the underlying reference signal (the reference vector direction),

we call it a non-structural distortion, which is much less objectional

than structural distortions (for example, the distortions perpendicu-

lar to the reference vector direction). This is reﬂected in the shapes

of the equal-distortion contours. Here we make a ﬁrst attempt to

extend this idea for RR IQA by applying it to the subband standard

deviation measures of the reference and distorted images. This is

intuitively sensible because in the case that the distorted image is a

globally contrast scaled (contrast reduction or enhancement) version

of the reference image, then the standard deviations of all subbands

should scale by the same factor, which is considered consistent non-

structural distortion and is less objectional than the case that the sub-

band standard deviations change in different ways.

Let σ

and σ

be the vectors containing the standard deviation

σ values of the DNT coefﬁcients from each subband in the reference

and distorted images, respectively. We deﬁne a new RR distortion

measure as

= g(σ

, σ

) log

1 +

k=1



||q

)



, (4)

where the key feature is the function g(σ

, σ

) added in front of

Eq. (3). This function should serve the purpose of identifying and

distinguishing the consistent non-structural distortion directions in

the feature vector space of subband σ values, so as to scale the dis-

tortion measure D in a way that structural distortions are penal-

ized more than non-structural distortions. Motivated by the suc-

cessful normalized correlation formulation in SSIM [1], we deﬁne

g(σ

, σ

) as

g(σ

, σ

) =

|σ

+ |σ

+ C

2|σ

· σ

| + C

, (5)

where σ

· σ

represents the dot product between the two vectors,

and the constant C is included to avoid instability when σ

· σ

close to 0. This function is lower-bounded by 1, when σ

and σ

are fully correlated, or in other words, when their orientations in the

feature vector space are completely consistent. With the decrease of

correlation, g(σ

, σ

) increases, thus gives more penalty to incon-

sistent structural distortions.

0 5 10 15 20 25 30

0.5

0.6

0.7

0.8

0.9

SSIM

Blurr

JPG

JPG2K

Noise

Fig. 2. Relationship between SSIM and D

for blur, JPEG compres-

sion, JPEG2000 compression, and noise contamination distortions.

Figure 2 shows the D

results computed for 4 different distor-

tion types at different distortion levels, and compares them with the

corresponding SSIM values calculated for the distorted images. In-

terestingly, for each ﬁxed distortion type, D

exhibits a nearly per-

fect linear relationship with SSIM. We regard this as an outcome

of the similarity between their design principles, even though the

principle is applied to completely different domains of signal repre-

sentation. The clean linear relationship also helps us to design an

SSIM predictor based on D

because the remaining job is just to es-

timate the normalization slope factor across distortion types. More

speciﬁcally, an RR-SSIM estimator can be written as

S = 1 − αD

, (6)

where α is the slope factor that needs to be learned from training

images. In particular, we adopted a regression-by-discretization ap-

proach [8], which is a regression scheme that employs a classiﬁer

on a copy of the data that has the class attribute discretized, and the

predicted value is the expected value of the mean class value for

each discretized interval. A decision tree classiﬁer was built using

|σ

− σ

| and |k

− k

| as the attributes, where k

and k

are the

kurtosis values of the DNT coefﬁcients computed from the original

and distorted images, respectively.

3. IMPLEMENTATION AND VALIDATION

To extract RR features, the reference image is ﬁrst decomposed into

12 subbands using a three-scale four-orientation steerable pyramid

transform [9]. Division normalization is then applied using 13 neigh-

boring coefﬁcients, including 9 spatial neighbors from the same sub-

band, 1 from parent subband, and 3 from the same spatial location

in the other orientation bands at the same scale. Three features, σ

and d(p

||p), are extracted for each subband, resulting in a to-

tal of 36 RR features for a reference image. These RR features are

used for SSIM estimation of the distorted image using the approach

described in Section 2.

A training process is needed to determine the slope factor α

based on the observed differences between subband standard devi-

ation and kurtosis. Our training data included 29 reference images

altered with 50 levels of distortions for ﬁve types of distortions, in-

cluding Gaussian Blur, JPEG2000 compression, JPEG compression,

fast fading channel distortion of JPEG2000 compressed bitstream

and white Gaussian noise. Decision trees were built using the open

source data mining tool WEKA [10].

The proposed scheme is tested using the LIVE database [6],

which contains seven data sets with a total of 779 distorted images.

Figure 3 shows the scatter plot, and Table 1 computes the mean abso-

lute error (MAE) and Pearson linear correlation coefﬁcient (PLCC)

between FR SSIM and our RR SSIM estimate. It can be seen that the

proposed SSIM estimator achieves high prediction accuracy across

various types of distortions.

To further validate the proposed algorithm, we compare three

objective IQA algorithms, namely peak signal-to-noise-ratio (PSNR),

SSIM, and our RR SSIM estimate, with subjective quality evalua-

tions (in particular, the differences of mean opinion scores) available

in the LIVE database [6]. Four metrics are employed for evaluation,

which include PLCC and MAE after nonlinear mapping between

subjective and objective scores, Spearman’s rank correlation coef-

ﬁcient (SRCC), and Kendall’s rank correlation coefﬁcient (KRCC).

The results are shown in Table 2. It can be observed that in gen-

eral the proposed method performs inferior to SSIM (which is as

expected) and signiﬁcantly outperforms PSNR. It needs to be men-

tioned that the comparison is unfair to the proposed method, because

the other two are FR measures. However, It outperforms the already

existing general purpose RR measures in the literature [3] [4].

4. IMAGE REPAIRING USING RR FEATURES

Since the RR features reﬂect certain statistical properties about the

reference signal, they may be used to partially “repair” the distorted

0 0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

SSI M

JPG2K

JPG

Noise

Blur

Fig. 3. SSIM versus RR SSIM estimation

S for LIVE database.

Table 1. MAE and PLCC between SSIM and RR SSIM estimation

S for LIVE database

MAE PLCC

JP2 (1) 0.0107 0.9829

JP2 (2) 0.0098 0.9894

JPG (1) 0.0147 0.9603

JPG (2) 0.0111 0.9877

Noise 0.0178 0.9816

Blur 0.0156 0.9624

FF 0.0206 0.9760

All data 0.0155 0.9802

image. Here we provide an example that uses RR features to correct

a blurred image. Since blur reduces energy at mid and high frequen-

cies, the subband standard deviation σ

of DNT coefﬁcients in the

distorted image is smaller than that of the reference image σ

. The

most straightforward way to enforce a corrected image to have the

same statistical property as the reference image is to scale up all the

DNT coefﬁcients in each subband i of the distorted image by a ﬁxed

scale factor s

= σ

/σ

repaired

= s

. (7)

Figure 4(d) compares the histograms of the reference, distorted and

repaired DNT coefﬁcients. It can be observed that the histogram of

scaled DNT coefﬁcients is very close to that of the reference image.

To reconstruct the repaired image, it remains to invert the DNT

transform, where the critical issue is to estimate the local scalar mul-

tiplier ˆz. Based on Eq. (1), the scalar multiplier for inverse DNT is

given by

ˆz

inv

(sY )

−1

)(sY )/N

−1

Y/N = ˆz . (8)

This largely simpliﬁes the inversion, as we have already calculated ˆz.

We can then compute the wavelet coefﬁcients using ˆz

inv

repaired

(a) (b) (c)

0 50 100 150 200 250

0.01

0.02

0.03

0.04

0.05

0.06

Reference image histogram

Distorted image histogram

Repaired image histogram

(d)

Fig. 4. Repairing blurred image using RR features. (a) Original “building” image; (b) Blurred image, SSIM = 0.674,

S = 0.662; (c) Repaired,

image SSIM = 0.918,

S = 0.928; (d) DNT coefﬁcient histograms of original, distorted and repaired images.

Table 2. Performance comparison of IQA measures using the LIVE database

PLCC MAE SRCC KRCC

PSNR SSIM

S PSNR SSIM

JP2 (1) 0.9331 0.9687 0.9597 6.5033 4.7620 4.9860 0.9264 0.9637 0.9555 0.7600 0.8332 0.8140

JP2 (1) 0.8740 0.9691 0.9632 9.9656 5.2016 5.2320 0.8549 0.9604 0.9539 0.6640 0.8290 0.8163

JPG (1) 0.8866 0.9667 0.9449 8.6900 4.7096 5.6854 0.8779 0.9637 0.9493 0.7026 0.8364 0.8096

JPG (2) 0.9167 0.9851 0.9761 10.013 4.6077 5.7997 0.7699 0.9215 0.8979 0.5776 0.7774 0.7240

Noise 0.9879 0.9830 0.9773 3.4195 4.2499 4.8172 0.9854 0.9694 0.9642 0.8939 0.8523 0.8345

Blur 0.7840 0.9483 0.9154 9.0550 4.6651 7.5136 0.7823 0.9517 0.8692 0.5847 0.8010 0.7158

FF 0.8897 0.9552 0.9316 9.9898 6.1810 8.0113 0.8907 0.9556 0.9138 0.7069 0.8207 0.7473

All 0.8721 0.9449 0.9212 10.5248 6.9325 8.3641 0.8755 0.9479 0.9214 0.6864 0.7963 0.7561

followed by an inverse wavelet transform to construct the repaired

image. An example is given in Fig. 4, where the blurred image is

successfully repaired, and the effect is reﬂected by both SSIM and

the proposed RR SSIM measures.

5. CONCLUSIONS

We propose an RR SSIM estimation algorithm by incorporating

DNT-domain image statistical properties and the design principle

of the SSIM approach. Our experiments show that the proposed

SSIM estimation has good correlations with not only FR SSIM, but

also subjective evaluations of image quality. We also demonstrate

that the RR features being used can be employed to partially repair a

distorted images. The proposed method has a fairly low RR data rate

and is applicable to various types of distortions. It has good poten-

tials to be employed in real-world visual communications systems

for quality monitoring and resource allocation purposes. It may also

be a useful tool in image quality optimization problems when the

reference image is not fully available.

6. ACKNOWLEDGMENT

This work was supported in part by the Natural Sciences and Engi-

neering Research Council of Canada in the form of Discovery and

Strategic Grants, and in part by Ontario Ministry of Research & In-

novation in the form of an Early Researcher Award, which are grate-

fully acknowledged.

7. REFERENCES

[1] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli,

“Image quality assessment: From error visibility to structural

similarity,” IEEE Trans. Image Processing, vol. 13, no. 4, pp.

600–612, 2004.

[2] Z. Wang and A. C. Bovik, Modern Image Quality Assessment,

Morgan & Claypool Publishers, March 2006.

[3] Z. Wang, G. Wu, H. R. Sheikh, E. P. Simoncelli, E.-H. Yang,

and A. C. Bovik, “Quality-aware images,” IEEE Trans. Image

Processing, vol. 15, no. 6, pp. 1680–1689, June 2006.

[4] Q. Li and Z. Wang, “Reduced-reference image quality as-

sessment using divisive normalization-based image represen-

tation,” IEEE Journal on Selected Topics in Signal Processing,

vol. 3, no. 2, pp. 202–211, 2009.

[5] A. Albonico, G. Valenzise, M. Naccari, M. Tagliasacchi, and

S. Tubaro, “A reduced-reference video structural similarity

metric based on no-reference estimation of channel-induced

distortion,” in IEEE Inter. Conf. Acoustics, Speech and Sig-

nal Processing, 2009, pp. 1857–1860.

[6] Hamid R. Sheikh, Zhou Wang, Alan C. Bovik, and L. K. Cor-

mack, “Image and video quality assessment research at LIVE,”

http://live.ece.utexas.edu/research/quality/.

[7] M. J. Wainwright and E. P. Simoncelli, “Scale mixtures of

gaussians and the statistics of natural images,” in in Adv. Neu-

ral Info. Processing Systems. 2000, pp. 855–861, MIT Press.

[8] S. M. Weiss and N. Indurkhya, “Rule-based machine learning

methods for functional prediction,” Journal of Artiﬁcial Intel-

ligence Research, vol. 3, pp. 383–403, 1995.

[9] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J.

Heeger, “Shiftable multiscale transforms,” IEEE Trans. In-

formation Theory, vol. 38, no. 2 pt II, pp. 587–607, 1992.

[10] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann,

and I. H. Witten, “The weka data mining software: an update,”

SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10–18, 2009.

Reduced-reference SSIM estimation

Figures

Citations

Seven Challenges in Image Quality Assessment: Past, Present, and Future Research

SSIM-Motivated Rate-Distortion Optimization for Video Coding

Applications of Objective Image Quality Assessment Methods [Applications Corner]

Applications of Objective Image Quality Assessment Methods

Fourier Transform-Based Scalable Image Quality Measure

References

Image quality assessment: from error visibility to structural similarity

The WEKA data mining software: an update

Shiftable multiscale transforms

Modern image quality assessment

Scale Mixtures of Gaussians and the Statistics of Natural Images

Related Papers (5)

Image quality assessment: from error visibility to structural similarity

Reduced-Reference Image Quality Assessment Using Divisive Normalization-Based Image Representation

Mean squared error: Love it or leave it? A new look at Signal Fidelity Measures

Quality-aware images

Image information and visual quality

Frequently Asked Questions (15)

Q1. What contributions have the authors mentioned in the paper "Reduced-reference ssim estimation" ?

Q2. What is the critical issue to reconstruct the repaired image?

Q3. How many distortions were used in the training data?

Q4. How is the normalization applied to the SSIM?

Q5. What is the proposed RR-SSIM estimation algorithm?

Q6. What is the straightforward way to enforce a corrected image to have the same statistical property as?

Q7. How many features are extracted for each subband?

Q8. What is the slope factor for a SSIM estimator?

Q9. What is the way to estimate a distorted image?

Q10. What is the scale of the distortion measure?

Q11. What is the standard deviation of the DNT coefficients?

Q12. What is the potential of the proposed SSIM estimation algorithm?

Q13. How can the authors reconstruct the distorted image?

Q14. What is the main problem with NR-IQA?

Q15. What is the KLD between the distribution of the distorted image and the distribution of the sub?