Journal Article•DOI•

Hdr-vqm

Manish Narwaria¹, Matthieu Perreira Da Silva¹, Patrick Le Callet¹•Institutions (1)

01 Jul 2015-Signal Processing-image Communication (Elsevier)-Vol. 35, pp 46-60

TL;DR: An objective HDR video quality measure (HDR-VQM) based on signal pre-processing, transformation, and subsequent frequency based decomposition is presented, which is one of the first objective method for high dynamic range video quality estimation.

read less

Abstract: High dynamic range (HDR) signals fundamentally differ from the traditional low dynamic range (LDR) ones in that pixels are related (proportional) to the physical luminance in the scene (i.e. scene-referred). For that reason, the existing LDR video quality measurement methods may not be directly used for assessing quality in HDR videos. To address that, we present an objective HDR video quality measure (HDR-VQM) based on signal pre-processing, transformation, and subsequent frequency based decomposition. Video quality is then computed based on a spatio-temporal analysis that relates to human eye fixation behavior during video viewing. Consequently, the proposed method does not involve expensive computations related to explicit motion analysis in the HDR video signal, and is therefore computationally tractable. We also verified its prediction performance on a comprehensive, in-house subjective HDR video database with 90 sequences, and it was found to be better than some of the existing methods in terms of correlation with subjective scores (for both across sequence and per sequence cases). A software implementation of the proposed scheme is also made publicly available for free download and use. HighlightsThe paper presents one of the first objective method for high dynamic range video quality estimation.It is based on analysis of short term video segments taking into account human viewing behavior.The method described in the paper would be useful in scenarios where HDR video quality needs to be determined in an HDR video chain study.

...read moreread less

Summary (4 min read)

Jump to: [Introduction] – [II. BACKGROUND] – [III. THE PROPOSED OBJECTIVE HDR VIDEO QUALITY MEASURE] – [A. From native HDR values to emitted luminance: modeling the display processing] – [B. Transformation from emitted to perceived luminance] – [C. Computation of subband error signal] – [D. From spatio-temporal subband errors to overall video quality: The Pooling step] – [IV. HDR VIDEO DATASET] – [A. Test material preparation] – [B. Rating methodology] – [C. Display] – [A. Correlation based comparisons] – [B. Outlier ratio analysis] – [VI. DISCUSSION] – [VII. CONCLUDING THOUGHTS] and [ACKNOWLEDGMENT]

Introduction

High Dynamic Range (HDR) signals fundamentally differ from the traditional low dynamic range (LDR) ones in that pixels are related to the physical luminance in the scene (i.e. scenereferred).
Such high luminance values often exceed the capabilities of the traditional low dynamic range (LDR) capturing and display devices.
It is therefore important to develop objective methods for HDR video quality measurement and benchmark their performance against subjective ground truth.
The latter quality assessment method employs a computational model to provide estimates of the subjective video quality.

II. BACKGROUND

Humans perceive the outside visual world through the interaction between luminance (measured in candela per square meter cd/m2) and the eyes.
The rods are more sensitive than cones but do not provide color vision.
With regards to human eyes, their dynamic range depends on the time allowed to adjust or adapt to the given luminance levels.
HDR imaging technologies therefore aim to overcome the inadequacies of the LDR capture and display technologies via better video signal capture, representation and display, so that the dynamic range of the video can better match the instantaneous range of the eye.
This, nonetheless, is sufficient for most purposes.

III. THE PROPOSED OBJECTIVE HDR VIDEO QUALITY MEASURE

It takes as input the source and the distorted HDR video sequences.
Note that throughout the paper the authors use the notation src and hrc (hypothetical reference circuit) to respectively denote reference and distorted video sequences.
As shown in the figure, the first two steps are meant to convert the native input luminance to perceived luminance.
These can therefore be seen as pre-processing steps.
The last step is that of error pooling which is achieved via spatio-temporal processing of the subband errors.

A. From native HDR values to emitted luminance: modeling the display processing

The authors begin with two observations with regard to HDR video signal representation.
Therefore, the exact scene luminance at each pixel location will be, generally, unknown.
With regard to HDR displays, the inherent hardware limitations will impose a limit on the maximum luminance that can be displayed.
While one can adopt different strategies (from simple ones like linear scaling to more sophisticated ones) for the said display based pre-processing, this is not the main focus of the work.
For the purpose of the method described in this paper, it is sufficient to highlight that in the general case, it is important that the characteristics of the HDR display are taken into account and the HDR video transformed (pre-processed) accordingly i.e. Nsrc → Esrc and Nhrc → Ehrc, for objective HDR video quality estimation.

B. Transformation from emitted to perceived luminance

The second step in the design of HDR-VQM concerns the transformation of the emitted luminance to perceived luminance i.e. Esrc → Psrc and Ehrc → Phrc as indicated in Figure 1.
An implication of such non-linearity is that the changes introduced by an HDR video December 1, 2014 DRAFT processing algorithm in the emitted luminance may not have a direct correspondence to the actual modification of visual quality.
To further quantify this, it was found that the linear correlation between the original and transformed signals was 0.9334 for PU encoding and 0.9071 for logarithmic, for the range between 1 - 200 cd/m2.
Thus, PU encoding can better approximate the response of HVS which is approximately linear at lower luminance and increasingly logarithmic for higher luminance values.

C. Computation of subband error signal

The proposed HDR-VQM is based on spatio-temporal analysis of an error video whose frames denote the localized perceptual error between a source and distorted video.
The authors first describe the steps to obtain the subband error signal and then present the details of the spatio-temporal processing.
The authors employed log-Gabor filters, introduced in [10], to calculate the perceptual error at different scales and orientations.
Video frames in the perceived luminance domain (i.e Psrc and Phrc) were decomposed into a set of subbands by computing the inverse DFT of the product of the frames’s DFT with frequency domain filter defined in (1).
The authors can then obtain the total error at each pixel in each video frame by pooling across scales and orientations.

D. From spatio-temporal subband errors to overall video quality: The Pooling step

Video signals propagate information along both spatial and temporal dimensions.
Due to visual acuity limitations of the eye, humans fixate their attention to local regions when viewing a video because only a small area of the eye retina, generally referred to as fovea, has a high visual acuity.
In the light of this, a reasonable strategy for objective video quality measurement is by analyzing the video in a spatio-temporal (ST) dimension [12], [13], [14], so that the impact of distortions can be localized along both spatial and temporal axes.
2) Spatial and Long term temporal pooling:.
Therefore, the authors first perform spatial pooling on spatio-temporal error frames STv,ts in order to obtain the short-term quality scores, as illustrated in Figure 3.

IV. HDR VIDEO DATASET

To the best of their knowledge there are currently no publicly available subjectively annotated HDR video datasets dealing with the issue of visual quality.
Therefore, for verifying the prediction performance of HDR-VQM and other objective methods, an in-house and comprehensive HDR video dataset was used.
This section provides a brief description of the dataset.

A. Test material preparation

The dataset used 10 source HDR video sequences3.
The spatial versus temporal information measures (computed on tone mapped version of video frames) for each source sequence is shown in Figure 4.
In general, any backward-compatible HDR compression scheme comprises [16] of 3 steps: (a) forward tone mapping in order to convert HDR video to LDR (8-bit precision), (b) compression and decompression of the LDR video by a standard LDR video compression method, (c) inverse tone mapping of the decoded LDR bit stream to reconstruct HDR video.
The LDR video was encoded and decoded using H.264/AVC at different bit rates.

B. Rating methodology

The authors study involved 25 paid observers who were not expert in image or video processing.
They were seated in a standardized room conforming to the International Telecommunication Union Recommendation (ITU-R) BT500-11 recommendations [17].
Prior to the test, observers were screened for visual acuity by using a Monoyer optometric table and for normal color vision by using Ishiharas tables.
For rating the test stimuli, the authors adopted the absolute category rating with hidden reference (ACR-HR), which is one of the rating methods recommended by the ITU December 1, 2014 DRAFT in Rec. ITU-T P.910 [18].
The rating method also includes the source sequences (i.e. undistorted) to be shown as any other test stimulus without informing the observers.

C. Display

For displaying the HDR video sequences, SIM2 Solar47 HDR display was used which has a maximum displayable luminance of 4000 cd/m2.
In their study this was set to 200 cd/m2 as it provided comfortable viewing conditions for the observers [20].
The authors however observed that this approach suffers from at least two drawbacks.
To ameliorate these two issues, the authors opted for a temporally more coherent strategy and the normalization factor was determined as the maximum of the mean of top 5% luminance values of all the frames in an HDR video sequence.
Then, the native HDR values N were converted to emitted luminance values E as E = N × 179 max(MT5) (6) where the multiplication factor of 179 is the luminous efficacy of equal energy white light that is defined and used by the Radiance file format (RGBE) for the conversion to actual luminance value.

A. Correlation based comparisons

The first set of experimental results are reported in terms of two criteria: Pearson linear correlation coefficient Cp (for prediction accuracy) and Spearman rank order correlation coefficient Cs (for monotonicity), between the subjective score and the objective prediction.
The better performance of HDR-VQM relative to these methods therefore indicates the added value of taking into account frequency and orientation information.
As a result, for similar MOS values across sequences, the corresponding RPSNR values can be quite different.
The RPSNR value for the first condition was 32.70 dB while the corresponding subjective score was 1.04.
Of course, one should rely on correlation based comparisons and outlier ratio analysis (presented in the next subsection) to draw more general conclusions about the performance of different objective methods.

B. Outlier ratio analysis

Outlier ratio analysis is another approach to evaluate objective methods for their prediction accuracy.
Particularly, it can be very useful in applications such as video compression where one is generally interested in the rate distortion (RD) behavior of objective methods i.e. how the objective visual quality varies with bit rates for different source sequences and to what extent that compares with the subjective video quality.
Therefore, the authors first computed the absolute prediction error between the subjective MOS and logistically transformed objective scores for each of the 80 test conditions.
The authors find that HDR-VQM has the least number of outliers (22%).
The main advantage of outlier analysis is that it helps to evaluate metric accuracy by taking into account the variability or uncertainty (expressed via 95% confidence intervals in their dataset) in subjective opinions, which are ignored in correlation based comparisons.

VI. DISCUSSION

The previous sections proposed and verified the performance of an objective HDR video quality estimator HDR-VQM.
Also recall that in (3) the authors did not employ a more sophisticated weighting such as one based on CSF.
The authors find that the relative execution time for HDRVQM is reasonable considering the improvements (i.e. smallest % of outliers) in performance over other methods.
The reader will also appreciate the fact that video quality judgment, in general, can depend on several extraneous factors (such as display type, viewing distance, ambient lighting conditions etc.) apart from the distortions themselves.
This allows HDR-VQM to adapt to some of the physical factors that may affect subjective quality December 1, 2014 DRAFT judgment.

VII. CONCLUDING THOUGHTS

HDR imaging is increasingly becoming popular in the multimedia signal processing community primarily as a tool towards enhancing the immersive video experience of the user.
There are very few works that address the issue of assessing the impact of HDR video processing algorithms on the perceptual video quality both from subjective and objective angles.
To that extent and within the scope of its application, HDR-VQM is a reasonable objective tool for HDR video quality measurement.
To enable others to use the proposed method as well as validate it independently, a software implementation will soon be made available online for free download and use.

ACKNOWLEDGMENT

The authors wish to thank Romuald Pepion for his help in generating the subjective test results used in this paper.
This work has been supported by the NEVEx project FUI11 financed by the French government.

Did you find this useful? Give us your feedback

Figures (9)

Fig. 1: Block diagram of the proposed HDR-VQM.

TABLE I: Outlier ratio analysis for the 80 test conditions in the HDR video dataset.

Fig. 4: Spatial and temporal information plot.

Fig. 5: Numerical results, (a) Cp and Cs for all 80 distorted sequences, (b) F-test results. Figure best viewed in color.

Fig. 6: Cp values for per source (src) sequence (these values are computed without the logistic fitting function). Figure best viewed in color.

Fig. 8: % of outliers and the relative computational complexity (expressed as relative execution time with respect to RPSNR) for different methods. Figure best viewed in color.

Fig. 2: Comparison of responses for two transformation functions in two different ranges of luminance, (a) 1 - 200 cd/m2, (b) 1000 - 10000 cd/m2. Figure best viewed in color.

Fig. 7: Scatter plots for different objective methods, (a) RPSNR, (b) P-PSNR, (c) P-SSIM, (d) HDR-VDP-2 and (e) HDR-VQM. Figure best viewed in color.

Content maybe subject to copyright Report

DRAFT SUBMITTED TO SPIC 1

HDR-VQM: An Objective Quality Measure for

High Dynamic Range Video

Manish Narwaria, Matthieu Perreira Da Silva, Patrick Le Callet

Abstract

High Dynamic Range (HDR) signals fundamentally differ from the traditional low dynamic range

(LDR) ones in that pixels are related (proportional) to the physical luminance in the scene (i.e. scene-

referred). For that reason, the existing LDR video quality measurement methods may not be directly

used for assessing quality in HDR videos. To address that, we present an objective HDR video quality

measure (HDR-VQM) based on signal pre-processing, transformation, and subsequent frequency based

decomposition. Video quality is then computed based on a spatio-temporal analysis that relates to

human eye ﬁxation behavior during video viewing. Consequently, the proposed method does not involve

expensive computations related to explicit motion analysis in the HDR video signal, and is therefore

computationally tractable. We also veriﬁed its prediction performance on a comprehensive, in-house

subjective HDR video database with 90 sequences, and it was found to be better than some of the existing

methods in terms of correlation with subjective scores (for both across sequence and per sequence cases).

A software implementation of the proposed scheme is also made publicly available for free download

and use.

Index Terms

High Dynamic Range (HDR) video quality, objective quality, spatio-temporal analysis.

I. INTRODUCTION

The advent of better technologies in the ﬁeld of visual signal capture and processing has

fueled a paradigm shift in todays’ multimedia communication systems. As a result, the notion

The authors are with IRCCyN/IVC group, University of Nantes, 44306, France e-mail: (manish.narwaria@univ-nantes.fr,

matthieu.perreiradasilva@univ-nantes.fr, patrick.lecallet@univ-nantes.fr).

December 1, 2014 DRAFT

DRAFT SUBMITTED TO SPIC 2

of network-centric Quality of Service (QoS) in multimedia systems is being extended by relying

on the concept of Quality of Experience (QoE) [1]. In this quest of increasing the immersive

video experience and the overall QoE of the end user, newer technologies such as 3D, ultra

high deﬁnition (UHD) and, more recently, High Dynamic Range (HDR) imaging have gained

prominence within the multimedia signal processing community. HDR in particular has attracted

attention since it in a way revisits the way we capture and display natural scenes. This is motivated

by the fact that natural scenes often exhibit large ranges of illumination values. However, such

high luminance values often exceed the capabilities of the traditional low dynamic range (LDR)

capturing and display devices. Consequently, it is not possible to properly expose the dark and

the bright areas simultaneously in one image (or video) during capture. This may lead to over-

exposure (saturated pixels that are fully white) and/or under-exposure (very dark or noisy pixels

as sensor’s response falls below its noise threshold). In both cases, visual information is either

lost or altered. HDR imaging focuses on minimizing such losses and therefore aims at improving

the quality of the displayed pixels by incorporating higher contrast and luminance.

As a result, HDR imaging has attracted attention from both academia and industry, and there

has been interest and effort to develop tools/algorithms for HDR video processing [2]. For

instance, there have been recent efforts within the Moving Picture Experts Group (MPEG) for

extending High Efﬁciency Video Coding (HEVC) to HDR. Likewise, the JPEG has announced

extensions that will feature the original JPEG standard with support for HDR image compression.

However, there is lack of such effort to quantify and measure the impact of such tools on

HDR video quality using both subjective and objective approaches. The issue assumes further

signiﬁcance given that most of the existing objective methods may not be directly applicable

for HDR quality estimation [3], [4] and [5] (note that these studies only deal with HDR images

and not video). It is therefore important to develop objective methods for HDR video quality

measurement and benchmark their performance against subjective ground truth.

With regards to visual quality measurement, both subjective and objective approaches can

be used. The former involves the use of human subjects to judge and rate the quality of the

test stimuli. With appropriate laboratory conditions and a sufﬁciently large subject panel, it

remains the most accurate method. The latter quality assessment method employs a computa-

tional (mathematical) model to provide estimates of the subjective video quality. While such

objective models may not mimic subjective opinions accurately in a general scenario, they can

December 1, 2014 DRAFT

DRAFT SUBMITTED TO SPIC 3

be reasonably effective in speciﬁc conditions/applications. Hence, they can be an important tool

towards automating the testing and standardization of HDR video processing algorithms such

as HDR video compression, post-processing, inverse video tone mapping etc. especially when

subjective tests may not be feasible. In light of this, we present a computationally tractable HDR

video quality estimation method based on HDR signal transformation and subsequent analysis

of spatio-temporal segments, and also verify its prediction performance based on a test bed of

90 subjectively rated compressed HDR video sequences. To the best of our knowledge, our

study is amongst the ﬁrst few efforts towards the design and veriﬁcation of an objective quality

measurement method for HDR video, and is therefore of interest to the video signal processing

community both from subjective and objective quality view points.

II. BACKGROUND

Humans perceive the outside visual world through the interaction between luminance (mea-

sured in candela per square meter cd/m

) and the eyes. Luminance ﬁrst passes through the

cornea, a transparent membrane. Then it enters the pupil, an aperture that is modiﬁed by the iris,

a muscular diaphragm. Subsequently, light is refracted by the lens and hits the photoreceptors in

the retina. There are two types of photoreceptors: cones and rods. The cones are located mostly

in the fovea. They are more sensitive at luminance levels between 10

-2

cd/m

to 10

cd/m

(referred to as the photopic or daylight vision) [6]. Further, color vision is due to three types of

cones: short, middle and long wavelength cones. The rods, on the other hand, are sensitive at

luminance levels between 10

-6

cd/m

to 10 cd/m

(scotopic or night vision). The rods are more

sensitive than cones but do not provide color vision.

Pertaining to the luminance levels found in the real world, direct sunlight at noon can be of

the order in excess of 10

cd/m

while a starlit night in the range of 10

-1

cd/m

. This corresponds

to more than 8 orders of magnitude. With regards to human eyes, their dynamic range depends

on the time allowed to adjust or adapt to the given luminance levels. Due to the presence of rods

and cones, human eyes have a remarkable ability to adjust to varying luminance levels, both

dynamically (i.e. instantaneous) and over a period of time (i.e. adaptation time). Given sufﬁcient

adaptation time, the dynamic range of human eyes is about 13 orders of magnitude. However,

without adaptation, the instantaneous human vision range is smaller and they are capable of

dynamically adjusting so that a person can see about 5 orders of magnitude throughout the

December 1, 2014 DRAFT

DRAFT SUBMITTED TO SPIC 4

entire range. Since the typical frequency in video signals does not allow sufﬁcient adaptation

time, the dynamic vision range (5 orders of magnitude) is more relevant in the context of this

paper as well as HDR video processing in general. However, typical digital imaging sensors

(assuming the typical single exposure setting) and LDR displays are not capable of dealing with

such large dynamic range present in the real world, and most of them (both capturing sensors

and displays) can handle upto 3 orders of magnitude. Due to this limitation, the scenes captured

and viewed via LDR technologies will have lower contrast (visual details are either saturated or

noisy) and smaller color gamut than what the eyes can perceive. This in turn can decrease the

immersive experience quotient of the end-user.

HDR imaging technologies therefore aim to overcome the inadequacies of the LDR capture

and display technologies via better video signal capture, representation and display, so that the

dynamic range of the video can better match the instantaneous range of the eye. In particular,

the major distinguishing factor of HDR imaging (in comparison to the traditional LDR one)

is its focus on capturing and displaying scenes as natively (i.e. how they appear in the real

world) as possible by considering physical luminance of the scene in question. Two important

points should, however, be mentioned at the very outset. First, it may be emphasized that in

HDR imaging one usually deals with proportional (and not absolute) luminance values. More

speciﬁcally, unless there is a prior and accurate camera calibration, luminance values in an

HDR video ﬁle represent the real world luminance upto an unknown scale

. This, nonetheless,

is sufﬁcient for most purposes. Secondly, the HDR displays currently available cannot display

luminance beyond the speciﬁed limit, given the hardware limitations. This necessitates a pre-

processing step for both subjective and objective HDR video quality measurement, as elaborated

further in the next section. Despite the two mentioned caveats, HDR imaging can improve the

viewer experience signiﬁcantly as compared to LDR

imaging and, thus an active research area.

Even with calibration, the HDR values represent real physical luminance with certain error. This is because the camera

spectral sensitivity functions which relate scene radiance with captured RGB triplets cannot match the luminous efﬁciency

function of the human visual system.

The terms LDR and HDR are also sometimes respectively referred to as lower or standard dynamic range (SDR) and Higher

Dynamic Range (to explicitly indicate that the range captured is only relatively higher than LDR but not the entire dynamic

range present in a real scene). We, however, do away with such precise distinctions and always assume that the terms HDR and

LDR are used in a relative context throughout this paper.

December 1, 2014 DRAFT

DRAFT SUBMITTED TO SPIC 5

Gabor

filtering

)(

src

ost

)(

hrc

ost

Subband comparison,

pooling across scale,

orientation

src

hrc

Transformation

into emitted

luminance

src

hrc

src

hrc

Transformation into

perceived luminance

Subband

errors

Short term

temporal

pooling

HDR-VQM

src

hrc

Error pooling

Spatial

pooling

Long term

temporal

pooling

Fig. 1: Block diagram of the proposed HDR-VQM.

As already mentioned in the introduction, this paper seeks to address the issue of objective

video quality measurement for HDR video. This is in light of the need to develop and validate

such algorithms to objectively evaluate the perceptual impact of various HDR video processing

tools on video quality. We describe the details of the method and verify its performance in the

following sections.

III. THE PROPOSED OBJECTIVE HDR VIDEO QUALITY MEASURE

A block diagram outlining the major steps in the proposed HDR-VQM is shown in Figure

1. It takes as input the source and the distorted HDR video sequences. Note that throughout the

paper we use the notation src (source) and hrc (hypothetical reference circuit) to respectively

denote reference and distorted video sequences. As shown in the ﬁgure, the ﬁrst two steps are

meant to convert the native input luminance to perceived luminance. These can therefore be seen

as pre-processing steps. Next, the impact of distortions is analyzed by comparing the different

frequency and orientation subbands in src and hrc. The last step is that of error pooling which

is achieved via spatio-temporal processing of the subband errors. This comprises of short term

temporal pooling, spatial pooling and ﬁnally, a long term pooling. A separate block diagram

explaining the error pooling in HDR-VQM is shown in Figure 3. In the following sub-sections,

we elaborate on the various steps in HDR-VQM.

December 1, 2014 DRAFT

HTML Viewer

Frequently Asked Questions (2)

Q1. What are the contributions mentioned in the paper "Hdr-vqm: an objective quality measure for high dynamic range video" ?

To address that, the authors present an objective HDR video quality measure ( HDR-VQM ) based on signal pre-processing, transformation, and subsequent frequency based decomposition. The authors also verified its prediction performance on a comprehensive, in-house subjective HDR video database with 90 sequences, and it was found to be better than some of the existing methods in terms of correlation with subjective scores ( for both across sequence and per sequence cases ).

Q2. What future works have the authors mentioned in the paper "Hdr-vqm: an objective quality measure for high dynamic range video" ?

The immediate future work will ensue further refinement of the presented method in view of some of the mentioned limitations as well as further validation with larger HDR video datasets.

Hdr-vqm

Summary (4 min read)

Introduction

II. BACKGROUND

III. THE PROPOSED OBJECTIVE HDR VIDEO QUALITY MEASURE

A. From native HDR values to emitted luminance: modeling the display processing

B. Transformation from emitted to perceived luminance

C. Computation of subband error signal

D. From spatio-temporal subband errors to overall video quality: The Pooling step

IV. HDR VIDEO DATASET

A. Test material preparation

B. Rating methodology

C. Display

A. Correlation based comparisons

B. Outlier ratio analysis

VI. DISCUSSION

VII. CONCLUDING THOUGHTS

ACKNOWLEDGMENT

Figures (9)

Citations

Cites background or methods or result from "Hdr-vqm"

Cites background from "Hdr-vqm"

References

"Hdr-vqm" refers methods in this paper

"Hdr-vqm" refers methods in this paper

"Hdr-vqm" refers background or methods in this paper

"Hdr-vqm" refers methods in this paper

Related Papers (5)

Frequently Asked Questions (2)

Q1. What are the contributions mentioned in the paper "Hdr-vqm: an objective quality measure for high dynamic range video" ?

Q2. What future works have the authors mentioned in the paper "Hdr-vqm: an objective quality measure for high dynamic range video" ?