scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Hdr-vqm

TL;DR: An objective HDR video quality measure (HDR-VQM) based on signal pre-processing, transformation, and subsequent frequency based decomposition is presented, which is one of the first objective method for high dynamic range video quality estimation.
Abstract: High dynamic range (HDR) signals fundamentally differ from the traditional low dynamic range (LDR) ones in that pixels are related (proportional) to the physical luminance in the scene (i.e. scene-referred). For that reason, the existing LDR video quality measurement methods may not be directly used for assessing quality in HDR videos. To address that, we present an objective HDR video quality measure (HDR-VQM) based on signal pre-processing, transformation, and subsequent frequency based decomposition. Video quality is then computed based on a spatio-temporal analysis that relates to human eye fixation behavior during video viewing. Consequently, the proposed method does not involve expensive computations related to explicit motion analysis in the HDR video signal, and is therefore computationally tractable. We also verified its prediction performance on a comprehensive, in-house subjective HDR video database with 90 sequences, and it was found to be better than some of the existing methods in terms of correlation with subjective scores (for both across sequence and per sequence cases). A software implementation of the proposed scheme is also made publicly available for free download and use. HighlightsThe paper presents one of the first objective method for high dynamic range video quality estimation.It is based on analysis of short term video segments taking into account human viewing behavior.The method described in the paper would be useful in scenarios where HDR video quality needs to be determined in an HDR video chain study.

Summary (4 min read)

Introduction

  • High Dynamic Range (HDR) signals fundamentally differ from the traditional low dynamic range (LDR) ones in that pixels are related to the physical luminance in the scene (i.e. scenereferred).
  • Such high luminance values often exceed the capabilities of the traditional low dynamic range (LDR) capturing and display devices.
  • It is therefore important to develop objective methods for HDR video quality measurement and benchmark their performance against subjective ground truth.
  • The latter quality assessment method employs a computational model to provide estimates of the subjective video quality.

II. BACKGROUND

  • Humans perceive the outside visual world through the interaction between luminance (measured in candela per square meter cd/m2) and the eyes.
  • The rods are more sensitive than cones but do not provide color vision.
  • With regards to human eyes, their dynamic range depends on the time allowed to adjust or adapt to the given luminance levels.
  • HDR imaging technologies therefore aim to overcome the inadequacies of the LDR capture and display technologies via better video signal capture, representation and display, so that the dynamic range of the video can better match the instantaneous range of the eye.
  • This, nonetheless, is sufficient for most purposes.

III. THE PROPOSED OBJECTIVE HDR VIDEO QUALITY MEASURE

  • It takes as input the source and the distorted HDR video sequences.
  • Note that throughout the paper the authors use the notation src and hrc (hypothetical reference circuit) to respectively denote reference and distorted video sequences.
  • As shown in the figure, the first two steps are meant to convert the native input luminance to perceived luminance.
  • These can therefore be seen as pre-processing steps.
  • The last step is that of error pooling which is achieved via spatio-temporal processing of the subband errors.

A. From native HDR values to emitted luminance: modeling the display processing

  • The authors begin with two observations with regard to HDR video signal representation.
  • Therefore, the exact scene luminance at each pixel location will be, generally, unknown.
  • With regard to HDR displays, the inherent hardware limitations will impose a limit on the maximum luminance that can be displayed.
  • While one can adopt different strategies (from simple ones like linear scaling to more sophisticated ones) for the said display based pre-processing, this is not the main focus of the work.
  • For the purpose of the method described in this paper, it is sufficient to highlight that in the general case, it is important that the characteristics of the HDR display are taken into account and the HDR video transformed (pre-processed) accordingly i.e. Nsrc → Esrc and Nhrc → Ehrc, for objective HDR video quality estimation.

B. Transformation from emitted to perceived luminance

  • The second step in the design of HDR-VQM concerns the transformation of the emitted luminance to perceived luminance i.e. Esrc → Psrc and Ehrc → Phrc as indicated in Figure 1.
  • An implication of such non-linearity is that the changes introduced by an HDR video December 1, 2014 DRAFT processing algorithm in the emitted luminance may not have a direct correspondence to the actual modification of visual quality.
  • To further quantify this, it was found that the linear correlation between the original and transformed signals was 0.9334 for PU encoding and 0.9071 for logarithmic, for the range between 1 - 200 cd/m2.
  • Thus, PU encoding can better approximate the response of HVS which is approximately linear at lower luminance and increasingly logarithmic for higher luminance values.

C. Computation of subband error signal

  • The proposed HDR-VQM is based on spatio-temporal analysis of an error video whose frames denote the localized perceptual error between a source and distorted video.
  • The authors first describe the steps to obtain the subband error signal and then present the details of the spatio-temporal processing.
  • The authors employed log-Gabor filters, introduced in [10], to calculate the perceptual error at different scales and orientations.
  • Video frames in the perceived luminance domain (i.e Psrc and Phrc) were decomposed into a set of subbands by computing the inverse DFT of the product of the frames’s DFT with frequency domain filter defined in (1).
  • The authors can then obtain the total error at each pixel in each video frame by pooling across scales and orientations.

D. From spatio-temporal subband errors to overall video quality: The Pooling step

  • Video signals propagate information along both spatial and temporal dimensions.
  • Due to visual acuity limitations of the eye, humans fixate their attention to local regions when viewing a video because only a small area of the eye retina, generally referred to as fovea, has a high visual acuity.
  • In the light of this, a reasonable strategy for objective video quality measurement is by analyzing the video in a spatio-temporal (ST) dimension [12], [13], [14], so that the impact of distortions can be localized along both spatial and temporal axes.
  • 2) Spatial and Long term temporal pooling:.
  • Therefore, the authors first perform spatial pooling on spatio-temporal error frames STv,ts in order to obtain the short-term quality scores, as illustrated in Figure 3.

IV. HDR VIDEO DATASET

  • To the best of their knowledge there are currently no publicly available subjectively annotated HDR video datasets dealing with the issue of visual quality.
  • Therefore, for verifying the prediction performance of HDR-VQM and other objective methods, an in-house and comprehensive HDR video dataset was used.
  • This section provides a brief description of the dataset.

A. Test material preparation

  • The dataset used 10 source HDR video sequences3.
  • The spatial versus temporal information measures (computed on tone mapped version of video frames) for each source sequence is shown in Figure 4.
  • In general, any backward-compatible HDR compression scheme comprises [16] of 3 steps: (a) forward tone mapping in order to convert HDR video to LDR (8-bit precision), (b) compression and decompression of the LDR video by a standard LDR video compression method, (c) inverse tone mapping of the decoded LDR bit stream to reconstruct HDR video.
  • The LDR video was encoded and decoded using H.264/AVC at different bit rates.

B. Rating methodology

  • The authors study involved 25 paid observers who were not expert in image or video processing.
  • They were seated in a standardized room conforming to the International Telecommunication Union Recommendation (ITU-R) BT500-11 recommendations [17].
  • Prior to the test, observers were screened for visual acuity by using a Monoyer optometric table and for normal color vision by using Ishiharas tables.
  • For rating the test stimuli, the authors adopted the absolute category rating with hidden reference (ACR-HR), which is one of the rating methods recommended by the ITU December 1, 2014 DRAFT in Rec. ITU-T P.910 [18].
  • The rating method also includes the source sequences (i.e. undistorted) to be shown as any other test stimulus without informing the observers.

C. Display

  • For displaying the HDR video sequences, SIM2 Solar47 HDR display was used which has a maximum displayable luminance of 4000 cd/m2.
  • In their study this was set to 200 cd/m2 as it provided comfortable viewing conditions for the observers [20].
  • The authors however observed that this approach suffers from at least two drawbacks.
  • To ameliorate these two issues, the authors opted for a temporally more coherent strategy and the normalization factor was determined as the maximum of the mean of top 5% luminance values of all the frames in an HDR video sequence.
  • Then, the native HDR values N were converted to emitted luminance values E as E = N × 179 max(MT5) (6) where the multiplication factor of 179 is the luminous efficacy of equal energy white light that is defined and used by the Radiance file format (RGBE) for the conversion to actual luminance value.

A. Correlation based comparisons

  • The first set of experimental results are reported in terms of two criteria: Pearson linear correlation coefficient Cp (for prediction accuracy) and Spearman rank order correlation coefficient Cs (for monotonicity), between the subjective score and the objective prediction.
  • The better performance of HDR-VQM relative to these methods therefore indicates the added value of taking into account frequency and orientation information.
  • As a result, for similar MOS values across sequences, the corresponding RPSNR values can be quite different.
  • The RPSNR value for the first condition was 32.70 dB while the corresponding subjective score was 1.04.
  • Of course, one should rely on correlation based comparisons and outlier ratio analysis (presented in the next subsection) to draw more general conclusions about the performance of different objective methods.

B. Outlier ratio analysis

  • Outlier ratio analysis is another approach to evaluate objective methods for their prediction accuracy.
  • Particularly, it can be very useful in applications such as video compression where one is generally interested in the rate distortion (RD) behavior of objective methods i.e. how the objective visual quality varies with bit rates for different source sequences and to what extent that compares with the subjective video quality.
  • Therefore, the authors first computed the absolute prediction error between the subjective MOS and logistically transformed objective scores for each of the 80 test conditions.
  • The authors find that HDR-VQM has the least number of outliers (22%).
  • The main advantage of outlier analysis is that it helps to evaluate metric accuracy by taking into account the variability or uncertainty (expressed via 95% confidence intervals in their dataset) in subjective opinions, which are ignored in correlation based comparisons.

VI. DISCUSSION

  • The previous sections proposed and verified the performance of an objective HDR video quality estimator HDR-VQM.
  • Also recall that in (3) the authors did not employ a more sophisticated weighting such as one based on CSF.
  • The authors find that the relative execution time for HDRVQM is reasonable considering the improvements (i.e. smallest % of outliers) in performance over other methods.
  • The reader will also appreciate the fact that video quality judgment, in general, can depend on several extraneous factors (such as display type, viewing distance, ambient lighting conditions etc.) apart from the distortions themselves.
  • This allows HDR-VQM to adapt to some of the physical factors that may affect subjective quality December 1, 2014 DRAFT judgment.

VII. CONCLUDING THOUGHTS

  • HDR imaging is increasingly becoming popular in the multimedia signal processing community primarily as a tool towards enhancing the immersive video experience of the user.
  • There are very few works that address the issue of assessing the impact of HDR video processing algorithms on the perceptual video quality both from subjective and objective angles.
  • To that extent and within the scope of its application, HDR-VQM is a reasonable objective tool for HDR video quality measurement.
  • To enable others to use the proposed method as well as validate it independently, a software implementation will soon be made available online for free download and use.

ACKNOWLEDGMENT

  • The authors wish to thank Romuald Pepion for his help in generating the subjective test results used in this paper.
  • This work has been supported by the NEVEx project FUI11 financed by the French government.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

DRAFT SUBMITTED TO SPIC 1
HDR-VQM: An Objective Quality Measure for
High Dynamic Range Video
Manish Narwaria, Matthieu Perreira Da Silva, Patrick Le Callet
Abstract
High Dynamic Range (HDR) signals fundamentally differ from the traditional low dynamic range
(LDR) ones in that pixels are related (proportional) to the physical luminance in the scene (i.e. scene-
referred). For that reason, the existing LDR video quality measurement methods may not be directly
used for assessing quality in HDR videos. To address that, we present an objective HDR video quality
measure (HDR-VQM) based on signal pre-processing, transformation, and subsequent frequency based
decomposition. Video quality is then computed based on a spatio-temporal analysis that relates to
human eye fixation behavior during video viewing. Consequently, the proposed method does not involve
expensive computations related to explicit motion analysis in the HDR video signal, and is therefore
computationally tractable. We also verified its prediction performance on a comprehensive, in-house
subjective HDR video database with 90 sequences, and it was found to be better than some of the existing
methods in terms of correlation with subjective scores (for both across sequence and per sequence cases).
A software implementation of the proposed scheme is also made publicly available for free download
and use.
Index Terms
High Dynamic Range (HDR) video quality, objective quality, spatio-temporal analysis.
I. INTRODUCTION
The advent of better technologies in the field of visual signal capture and processing has
fueled a paradigm shift in todays’ multimedia communication systems. As a result, the notion
The authors are with IRCCyN/IVC group, University of Nantes, 44306, France e-mail: (manish.narwaria@univ-nantes.fr,
matthieu.perreiradasilva@univ-nantes.fr, patrick.lecallet@univ-nantes.fr).
December 1, 2014 DRAFT

DRAFT SUBMITTED TO SPIC 2
of network-centric Quality of Service (QoS) in multimedia systems is being extended by relying
on the concept of Quality of Experience (QoE) [1]. In this quest of increasing the immersive
video experience and the overall QoE of the end user, newer technologies such as 3D, ultra
high definition (UHD) and, more recently, High Dynamic Range (HDR) imaging have gained
prominence within the multimedia signal processing community. HDR in particular has attracted
attention since it in a way revisits the way we capture and display natural scenes. This is motivated
by the fact that natural scenes often exhibit large ranges of illumination values. However, such
high luminance values often exceed the capabilities of the traditional low dynamic range (LDR)
capturing and display devices. Consequently, it is not possible to properly expose the dark and
the bright areas simultaneously in one image (or video) during capture. This may lead to over-
exposure (saturated pixels that are fully white) and/or under-exposure (very dark or noisy pixels
as sensor’s response falls below its noise threshold). In both cases, visual information is either
lost or altered. HDR imaging focuses on minimizing such losses and therefore aims at improving
the quality of the displayed pixels by incorporating higher contrast and luminance.
As a result, HDR imaging has attracted attention from both academia and industry, and there
has been interest and effort to develop tools/algorithms for HDR video processing [2]. For
instance, there have been recent efforts within the Moving Picture Experts Group (MPEG) for
extending High Efficiency Video Coding (HEVC) to HDR. Likewise, the JPEG has announced
extensions that will feature the original JPEG standard with support for HDR image compression.
However, there is lack of such effort to quantify and measure the impact of such tools on
HDR video quality using both subjective and objective approaches. The issue assumes further
significance given that most of the existing objective methods may not be directly applicable
for HDR quality estimation [3], [4] and [5] (note that these studies only deal with HDR images
and not video). It is therefore important to develop objective methods for HDR video quality
measurement and benchmark their performance against subjective ground truth.
With regards to visual quality measurement, both subjective and objective approaches can
be used. The former involves the use of human subjects to judge and rate the quality of the
test stimuli. With appropriate laboratory conditions and a sufficiently large subject panel, it
remains the most accurate method. The latter quality assessment method employs a computa-
tional (mathematical) model to provide estimates of the subjective video quality. While such
objective models may not mimic subjective opinions accurately in a general scenario, they can
December 1, 2014 DRAFT

DRAFT SUBMITTED TO SPIC 3
be reasonably effective in specific conditions/applications. Hence, they can be an important tool
towards automating the testing and standardization of HDR video processing algorithms such
as HDR video compression, post-processing, inverse video tone mapping etc. especially when
subjective tests may not be feasible. In light of this, we present a computationally tractable HDR
video quality estimation method based on HDR signal transformation and subsequent analysis
of spatio-temporal segments, and also verify its prediction performance based on a test bed of
90 subjectively rated compressed HDR video sequences. To the best of our knowledge, our
study is amongst the first few efforts towards the design and verification of an objective quality
measurement method for HDR video, and is therefore of interest to the video signal processing
community both from subjective and objective quality view points.
II. BACKGROUND
Humans perceive the outside visual world through the interaction between luminance (mea-
sured in candela per square meter cd/m
2
) and the eyes. Luminance first passes through the
cornea, a transparent membrane. Then it enters the pupil, an aperture that is modified by the iris,
a muscular diaphragm. Subsequently, light is refracted by the lens and hits the photoreceptors in
the retina. There are two types of photoreceptors: cones and rods. The cones are located mostly
in the fovea. They are more sensitive at luminance levels between 10
-2
cd/m
2
to 10
8
cd/m
2
(referred to as the photopic or daylight vision) [6]. Further, color vision is due to three types of
cones: short, middle and long wavelength cones. The rods, on the other hand, are sensitive at
luminance levels between 10
-6
cd/m
2
to 10 cd/m
2
(scotopic or night vision). The rods are more
sensitive than cones but do not provide color vision.
Pertaining to the luminance levels found in the real world, direct sunlight at noon can be of
the order in excess of 10
7
cd/m
2
while a starlit night in the range of 10
-1
cd/m
2
. This corresponds
to more than 8 orders of magnitude. With regards to human eyes, their dynamic range depends
on the time allowed to adjust or adapt to the given luminance levels. Due to the presence of rods
and cones, human eyes have a remarkable ability to adjust to varying luminance levels, both
dynamically (i.e. instantaneous) and over a period of time (i.e. adaptation time). Given sufficient
adaptation time, the dynamic range of human eyes is about 13 orders of magnitude. However,
without adaptation, the instantaneous human vision range is smaller and they are capable of
dynamically adjusting so that a person can see about 5 orders of magnitude throughout the
December 1, 2014 DRAFT

DRAFT SUBMITTED TO SPIC 4
entire range. Since the typical frequency in video signals does not allow sufficient adaptation
time, the dynamic vision range (5 orders of magnitude) is more relevant in the context of this
paper as well as HDR video processing in general. However, typical digital imaging sensors
(assuming the typical single exposure setting) and LDR displays are not capable of dealing with
such large dynamic range present in the real world, and most of them (both capturing sensors
and displays) can handle upto 3 orders of magnitude. Due to this limitation, the scenes captured
and viewed via LDR technologies will have lower contrast (visual details are either saturated or
noisy) and smaller color gamut than what the eyes can perceive. This in turn can decrease the
immersive experience quotient of the end-user.
HDR imaging technologies therefore aim to overcome the inadequacies of the LDR capture
and display technologies via better video signal capture, representation and display, so that the
dynamic range of the video can better match the instantaneous range of the eye. In particular,
the major distinguishing factor of HDR imaging (in comparison to the traditional LDR one)
is its focus on capturing and displaying scenes as natively (i.e. how they appear in the real
world) as possible by considering physical luminance of the scene in question. Two important
points should, however, be mentioned at the very outset. First, it may be emphasized that in
HDR imaging one usually deals with proportional (and not absolute) luminance values. More
specifically, unless there is a prior and accurate camera calibration, luminance values in an
HDR video file represent the real world luminance upto an unknown scale
1
. This, nonetheless,
is sufficient for most purposes. Secondly, the HDR displays currently available cannot display
luminance beyond the specified limit, given the hardware limitations. This necessitates a pre-
processing step for both subjective and objective HDR video quality measurement, as elaborated
further in the next section. Despite the two mentioned caveats, HDR imaging can improve the
viewer experience significantly as compared to LDR
2
imaging and, thus an active research area.
1
Even with calibration, the HDR values represent real physical luminance with certain error. This is because the camera
spectral sensitivity functions which relate scene radiance with captured RGB triplets cannot match the luminous efficiency
function of the human visual system.
2
The terms LDR and HDR are also sometimes respectively referred to as lower or standard dynamic range (SDR) and Higher
Dynamic Range (to explicitly indicate that the range captured is only relatively higher than LDR but not the entire dynamic
range present in a real scene). We, however, do away with such precise distinctions and always assume that the terms HDR and
LDR are used in a relative context throughout this paper.
December 1, 2014 DRAFT

DRAFT SUBMITTED TO SPIC 5
Gabor
filtering
)(
,,
src
ost
l
)(
,,
hrc
ost
l
Subband comparison,
pooling across scale,
orientation
src
P
hrc
P
Transformation
into emitted
luminance
src
N
hrc
N
src
E
hrc
E
Transformation into
perceived luminance
Subband
errors
Short term
temporal
pooling
HDR-VQM
src
hrc
Error pooling
Spatial
pooling
Long term
temporal
pooling
Fig. 1: Block diagram of the proposed HDR-VQM.
As already mentioned in the introduction, this paper seeks to address the issue of objective
video quality measurement for HDR video. This is in light of the need to develop and validate
such algorithms to objectively evaluate the perceptual impact of various HDR video processing
tools on video quality. We describe the details of the method and verify its performance in the
following sections.
III. THE PROPOSED OBJECTIVE HDR VIDEO QUALITY MEASURE
A block diagram outlining the major steps in the proposed HDR-VQM is shown in Figure
1. It takes as input the source and the distorted HDR video sequences. Note that throughout the
paper we use the notation src (source) and hrc (hypothetical reference circuit) to respectively
denote reference and distorted video sequences. As shown in the figure, the first two steps are
meant to convert the native input luminance to perceived luminance. These can therefore be seen
as pre-processing steps. Next, the impact of distortions is analyzed by comparing the different
frequency and orientation subbands in src and hrc. The last step is that of error pooling which
is achieved via spatio-temporal processing of the subband errors. This comprises of short term
temporal pooling, spatial pooling and finally, a long term pooling. A separate block diagram
explaining the error pooling in HDR-VQM is shown in Figure 3. In the following sub-sections,
we elaborate on the various steps in HDR-VQM.
December 1, 2014 DRAFT

Citations
More filters
Journal ArticleDOI
TL;DR: It is suggested that the performance of most full-reference metrics can be improved by considering non-linearities of the human visual system, while further efforts are necessary to improve performance of no-reference quality metrics for HDR content.
Abstract: Recent advances in high dynamic range (HDR) capture and display technologies have attracted a lot of interest from scientific, professional, and artistic communities. As in any technology, the evaluation of HDR systems in terms of quality of experience is essential. Subjective evaluations are time consuming and expensive, and thus objective quality assessment tools are needed as well. In this paper, we report and analyze the results of an extensive benchmarking of objective quality metrics for HDR image quality assessment. In total, 35 objective metrics were benchmarked on a database of 20 HDR contents encoded with 3 compression algorithms at 4 bit rates, leading to a total of 240 compressed HDR images, using subjective quality scores as ground truth. Performance indexes were computed to assess the accuracy, monotonicity, and consistency of the metric estimation of subjective scores. Statistical analysis was performed on the performance indexes to discriminate small differences between metrics. Results demonstrated that metrics designed for HDR content, i.e., HDR-VDP-2 and HDR-VQM, are the most reliable predictors of perceived quality. Finally, our findings suggested that the performance of most full-reference metrics can be improved by considering non-linearities of the human visual system, while further efforts are necessary to improve performance of no-reference quality metrics for HDR content.

86 citations


Cites background or methods or result from "Hdr-vqm"

  • ...[15] found that HDR-VQM was performing significantly better than HDR-VDP-2 for both video and still image content....

    [...]

  • ...1 Full-referencemetrics To the best of our knowledge, there are only two metrics for HDR quality assessment that have a publicly available implementation: (1) HDR-VDP: high dynamic range visible difference predictor [10, 12, 13] and (2) HDR-VQM: an objective quality measure for high dynamic range video [15]....

    [...]

  • ...The authors of [15] found that HDR-VQM is the best metric, far beyond HDR-VDP-2, in contradiction to the findings of [9], which showed lower performance for HDR-VQM when compared to HDR-VDP-2....

    [...]

  • ...HDR-VDP-2 [15], which makes it a suitable alternative to HDR-VDP-2....

    [...]

  • ...[15] have reported that their HDR-VQM metric performs similar or slightly better than HDR-VDP-2 for HDR image quality assessment....

    [...]

Proceedings ArticleDOI
05 Jun 2019
TL;DR: Results show that the proposed metric outperforms its counterparts in terms of correlation with mean opinion scores, and can be viewed as an extension for point clouds of the MSDM metric suited for 3D meshes.
Abstract: In this paper, we present PC-MSDM, an objective metric for visual quality assessment of 3D point clouds. This full-reference metric is based on local curvature statistics and can be viewed as an extension for point clouds of the MSDM metric suited for 3D meshes. We evaluate its performance on an open subjective dataset of point clouds compressed by octree pruning; results show that the proposed metric outperforms its counterparts in terms of correlation with mean opinion scores.

62 citations


Cites background from "Hdr-vqm"

  • ...A large number of top-down image quality metrics have been proposed; as for bottom-up models, they have been recently extended to new imaging format such as HDR-VQM [12]....

    [...]

Journal ArticleDOI
TL;DR: FovVideoVDP as mentioned in this paper is a video difference metric that models the spatial, temporal, and peripheral aspects of perception, which is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking.
Abstract: FovVideoVDP is a video difference metric that models the spatial, temporal, and peripheral aspects of perception. While many other metrics are available, our work provides the first practical treatment of these three central aspects of vision simultaneously. The complex interplay between spatial and temporal sensitivity across retinal locations is especially important for displays that cover a large field-of-view, such as Virtual and Augmented Reality displays, and associated methods, such as foveated rendering. Our metric is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking. It accounts for physical specification of the display (luminance, size, resolution) and viewing distance. To validate the metric, we collected a novel foveated rendering dataset which captures quality degradation due to sampling and reconstruction. To demonstrate our algorithm's generality, we test it on 3 independent foveated video datasets, and on a large image quality dataset, achieving the best performance across all datasets when compared to the state-of-the-art.

61 citations

Journal ArticleDOI
TL;DR: This report presents the key research and models that exploit the limitations of perception to tackle visual quality and workload alike, and presents the open problems and promising future research targeting the question of how to minimize the effort to compute and display only the necessary pixels while still offering a user full visual experience.
Abstract: Advances in computer graphics enable us to create digital images of astonishing complexity and realism. However, processing resources are still a limiting factor. Hence, many costly but desirable aspects of realism are often not accounted for, including global illumination, accurate depth of field and motion blur, spectral effects, etc. especially in real-time rendering. At the same time, there is a strong trend towards more pixels per display due to larger displays, higher pixel densities or larger fields of view. Further observable trends in current display technology include more bits per pixel high dynamic range, wider color gamut/fidelity, increasing refresh rates better motion depiction, and an increasing number of displayed views per pixel stereo, multi-view, all the way to holographic or lightfield displays. These developments cause significant unsolved technical challenges due to aspects such as limited compute power and bandwidth. Fortunately, the human visual system has certain limitations, which mean that providing the highest possible visual quality is not always necessary. In this report, we present the key research and models that exploit the limitations of perception to tackle visual quality and workload alike. Moreover, we present the open problems and promising future research targeting the question of how we can minimize the effort to compute and display only the necessary pixels while still offering a user full visual experience.

58 citations

Journal ArticleDOI
TL;DR: This paper uses two sequential convolutional neural networks to model the entire HDR video reconstruction process and produces high‐quality HDR videos and is an order of magnitude faster than the state‐of‐the‐art techniques for sequences with two and three alternating exposures.
Abstract: A practical way to generate a high dynamic range (HDR) video using off‐the‐shelf cameras is to capture a sequence with alternating exposures and reconstruct the missing content at each frame. Unfortunately, existing approaches are typically slow and are not able to handle challenging cases. In this paper, we propose a learning‐based approach to address this difficult problem. To do this, we use two sequential convolutional neural networks (CNN) to model the entire HDR video reconstruction process. In the first step, we align the neighboring frames to the current frame by estimating the flows between them using a network, which is specifically designed for this application. We then combine the aligned and current images using another CNN to produce the final HDR frame. We perform an end‐to‐end training by minimizing the error between the reconstructed and ground truth HDR images on a set of training scenes. We produce our training data synthetically from existing HDR video datasets and simulate the imperfections of standard digital cameras using a simple approach. Experimental results demonstrate that our approach produces high‐quality HDR videos and is an order of magnitude faster than the state‐of‐the‐art techniques for sequences with two and three alternating exposures.

54 citations

References
More filters
Proceedings ArticleDOI
09 Nov 2003
TL;DR: This paper proposes a multiscale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions, and develops an image synthesis method to calibrate the parameters that define the relative importance of different scales.
Abstract: The structural similarity image quality paradigm is based on the assumption that the human visual system is highly adapted for extracting structural information from the scene, and therefore a measure of structural similarity can provide a good approximation to perceived image quality. This paper proposes a multiscale structural similarity method, which supplies more flexibility than previous single-scale methods in incorporating the variations of viewing conditions. We develop an image synthesis method to calibrate the parameters that define the relative importance of different scales. Experimental comparisons demonstrate the effectiveness of the proposed method.

4,333 citations


"Hdr-vqm" refers methods in this paper

  • ...We compared the performance of HDR-VQM with a few popular LDR methods including PSNR and multi-scale SSIM [11]....

    [...]

  • ...We compared the performance of HDR-VQM with a few popular LDR methods including NTIA-VQM [14] PSNR and multi-scale SSIM [13]....

    [...]

  • ...The input to all these methods were the perceived luminance values Psrc and Phrc and hence we refer to them as P-PSNR and P-SSIM....

    [...]

  • ...It is also interesting to point out the proposed HDR-VQM, P-PSNR, and P-SSIM compute quality based on perceived luminance....

    [...]

  • ...Thus, we compute the said error by suing the following equation (similar formulation has been used in previous works such as [13] although not for directly modeling masking effect):...

    [...]

Journal ArticleDOI
TL;DR: The results obtained with six natural images suggest that the orientation and the spatial-frequency tuning of mammalian simple cells are well suited for coding the information in such images if the goal of the code is to convert higher-order redundancy into first- order redundancy.
Abstract: The relative efficiency of any particular image-coding scheme should be defined only in relation to the class of images that the code is likely to encounter. To understand the representation of images by the mammalian visual system, it might therefore be useful to consider the statistics of images from the natural environment (i.e., images with trees, rocks, bushes, etc). In this study, various coding schemes are compared in relation to how they represent the information in such natural images. The coefficients of such codes are represented by arrays of mechanisms that respond to local regions of space, spatial frequency, and orientation (Gabor-like transforms). For many classes of image, such codes will not be an efficient means of representing information. However, the results obtained with six natural images suggest that the orientation and the spatial-frequency tuning of mammalian simple cells are well suited for coding the information in such images if the goal of the code is to convert higher-order redundancy (e.g., correlation between the intensities of neighboring pixels) into first-order redundancy (i.e., the response distribution of the coefficients). Such coding produces a relatively high signal-to-noise ratio and permits information to be transmitted with only a subset of the total number of cells. These results support Barlow's theory that the goal of natural vision is to represent the information in the natural environment with minimal redundancy.

3,077 citations


"Hdr-vqm" refers methods in this paper

  • ...We employed log-Gabor filters, introduced in [12], to calculate the perceptual error at different scales and orientations....

    [...]

Journal ArticleDOI
TL;DR: The independent test results from the VQEG FR-TV Phase II tests are summarized, as well as results from eleven other subjective data sets that were used to develop the NTIA General Model.
Abstract: The National Telecommunications and Information Administration (NTIA) General Model for estimating video quality and its associated calibration techniques were independently evaluated by the Video Quality Experts Group (VQEG) in their Phase II Full Reference Television (FR-TV) test. The NTIA General Model was the only video quality estimator that was in the top performing group for both the 525-line and 625-line video tests. As a result, the American National Standards Institute (ANSI) adopted the NTIA General Model and its associated calibration techniques as a North American Standard in 2003. The International Telecommunication Union (ITU) has also included the NTIA General Model as a normative method in two Draft Recommendations. This paper presents a description of the NTIA General Model and its associated calibration techniques. The independent test results from the VQEG FR-TV Phase II tests are summarized, as well as results from eleven other subjective data sets that were used to develop the method.

1,268 citations


"Hdr-vqm" refers background or methods in this paper

  • ...We compared the performance of HDR-VQM with a few popular LDR methods including NTIA-VQM [14] PSNR and multi-scale SSIM [13]....

    [...]

  • ...In the light of this, a reasonable strategy for objective video quality measurement is by analyzing the video in a spatio-temporal (ST) dimension [14], [15], [16], so that the impact of distortions can be localized along both spatial and temporal axes....

    [...]

Proceedings ArticleDOI
25 Jul 2011
TL;DR: The visibility metric is shown to provide much improved predictions as compared to the original HDR-VDP and VDP metrics, especially for low luminance conditions, and is comparable to or better than for the MS-SSIM, which is considered one of the most successful quality metrics.
Abstract: Visual metrics can play an important role in the evaluation of novel lighting, rendering, and imaging algorithms. Unfortunately, current metrics only work well for narrow intensity ranges, and do not correlate well with experimental data outside these ranges. To address these issues, we propose a visual metric for predicting visibility (discrimination) and quality (mean-opinion-score). The metric is based on a new visual model for all luminance conditions, which has been derived from new contrast sensitivity measurements. The model is calibrated and validated against several contrast discrimination data sets, and image quality databases (LIVE and TID2008). The visibility metric is shown to provide much improved predictions as compared to the original HDR-VDP and VDP metrics, especially for low luminance conditions. The image quality predictions are comparable to or better than for the MS-SSIM, which is considered one of the most successful quality metrics. The code of the proposed metric is available on-line.

691 citations

12 Mar 2013
TL;DR: The concepts and ideas cited in this paper mainly refer to the Quality of Experience of multimedia communication systems, but may be helpful also for other areas where QoE is an issue, and the document will not reflect the opinion of each individual person at all points.
Abstract: This White Paper is a contribution of the European Network on Quality of Experience in Multimedia Systems and Services, Qualinet (COST Action IC 1003, see www.qualinet.eu), to the scientific discussion about the term "Quality of Experience" (QoE) and its underlying concepts. It resulted from the need to agree on a working definition for this term which facilitates the communication of ideas within a multidisciplinary group, where a joint interest around multimedia communication systems exists, however approached from different perspectives. Thus, the concepts and ideas cited in this paper mainly refer to the Quality of Experience of multimedia communication systems, but may be helpful also for other areas where QoE is an issue. The Network of Excellence (NoE) Qualinet aims at extending the notion of network-centric Quality of Service (QoS) in multimedia systems, by relying on the concept of Quality of Experience (QoE). The main scientific objective is the development of methodologies for subjective and objective quality metrics taking into account current and new trends in multimedia communication systems as witnessed by the appearance of new types of content and interactions. A substantial scientific impact on fragmented efforts carried out in this field will be achieved by coordinating the research of European experts under the catalytic COST umbrella. The White Paper has been compiled on the basis of a first open call for ideas which was launched for the February 2012 Qualinet Meeting held in Prague, Czech Republic. The ideas were presented as short statements during that meeting, reflecting the ideas of the persons listed under the headline "Contributors" in the previous section. During the Prague meeting, the ideas have been further discussed and consolidated in the form of a general structure of the present document. An open call for authors was issued at that meeting, to which the persons listed as "Authors" in the previous section have announced their willingness to contribute in the preparation of individual sections. For each section, a coordinating author has been assigned which coordinated the writing of that section, and which is underlined in the author list preceding each section. The individual sections were then integrated and aligned by an editing group (listed as "Editors" in the previous section), and the entire document was iterated with the entire group of authors. Furthermore, the draft text was discussed with the participants of the Dagstuhl Seminar 12181 "Quality of Experience: From User Perception to Instrumental Metrics" which was held in Schlos Dagstuhl, Germany, May 1-4 2012, and a number of changes were proposed, resulting in the present document. As a result of the writing process and the large number of contributors, authors and editors, the document will not reflect the opinion of each individual person at all points. Still, we hope that it is found to be useful for everybody working in the field of Quality of Experience of multimedia communication systems, and most probably also beyond that field.

686 citations


"Hdr-vqm" refers methods in this paper

  • ...of network-centric Quality of Service (QoS) in multimedia systems is being extended by relying on the concept of Quality of Experience (QoE) [1]....

    [...]

Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "Hdr-vqm: an objective quality measure for high dynamic range video" ?

To address that, the authors present an objective HDR video quality measure ( HDR-VQM ) based on signal pre-processing, transformation, and subsequent frequency based decomposition. The authors also verified its prediction performance on a comprehensive, in-house subjective HDR video database with 90 sequences, and it was found to be better than some of the existing methods in terms of correlation with subjective scores ( for both across sequence and per sequence cases ). 

The immediate future work will ensue further refinement of the presented method in view of some of the mentioned limitations as well as further validation with larger HDR video datasets.