scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Characterizing perceptual artifacts in compressed video streams

25 Feb 2014-Proceedings of SPIE (International Society for Optics and Photonics)-Vol. 9014, pp 173-182
TL;DR: This paper reexamine the perceptual artifacts created by standard video compression, summarizing commonly observed spatial and temporal perceptual distortions in compressed video, with emphasis on the perceptual temporal artifacts that have not been well identified or accounted for in previous studies.
Abstract: To achieve optimal video quality under bandwidth and power constraints, modern video coding techniques employ lossy coding schemes, which often create compression artifacts that may lead to degradation of perceptual video quality. Understanding and quantifying such perceptual artifacts play important roles in the development of effective video compression, streaming and quality enhancement systems. Moreover, the characteristics of compression artifacts evolve over time due to the continuous adoption of novel coding structures and strategies during the development of new video compression standards. In this paper, we reexamine the perceptual artifacts created by standard video compression, summarizing commonly observed spatial and temporal perceptual distortions in compressed video, with emphasis on the perceptual temporal artifacts that have not been well identified or accounted for in previous studies. Furthermore, a floating effect detection method is proposed that not only detects the existence of floating, but also segments the spatial regions where floating occurs∗.

Summary (4 min read)

1. INTRODUCTION

  • The demand for high-performance network video communications has been increasing exponentially in recent years.
  • The poor video quality keeps challenging the viewers’ patience and becomes a core threat to the video service ecosystem.
  • Since compression is a major source of video quality degradation, the authors focuses on perceptual artifacts generated by standard video compression techniques in the current work.
  • Various types of artifacts created by standard compression schemes had been summarized previously.
  • Objective VQA techniques had also been designed to automatically evaluate the perceptual quality of compressed video streams.

2. PERCEPTUAL ARTIFACTS IN COMPRESSED VIDEO

  • Both spatial and temporal artifacts may exist in compressed video, where spatial artifacts refer to the distortions that can be observed in individual frames while temporal artifacts can only be seen during video playback.
  • Both spatial and temporal artifacts can be further divided into categories and subcategories of more specific distortion types.
  • A detailed description of the appearance and causes of each type of perceptual compression artifacts will be given in the following sections.
  • In addition to these artifacts, there are a number of other perceptual video artifacts that are often seen in real-world visual communication applications.
  • Since compression is not the main cause of these artifacts, they are beyond the major focus of the current paper.

2.1 Spatial Artifacts

  • Block-based video coding schemes create various spatial artifacts due to block partitioning and quantization.
  • These artifacts include blurring, blocking, ringing, basis pattern effect, and color bleeding.
  • They are detected without referencing to temporally neighboring frames, and thus can be better identified when the video is paused.
  • Due to the complexity of modern compression techniques, these artifacts are interrelated with each other, and the classification here is mainly based on their visual appearance.

2.1.1 Blurring

  • All modern video compression methods involve a frequency transform step followed by a quantization process that often removes small amplitude transform coefficients.
  • Since the energy of natural visual signals concentrate at low frequencies, quantization reduces high frequency energy in such signals, resulting in significant blurring effect in the reconstructed signals.
  • A visual example is given in Fig. 2, where the left picture is a reference frame extracted from the original video, and the middle and right pictures are two decoded H.264/AVC.
  • Frames with the de-blocking filter turned off and on, respectively.
  • It can be observed that without de-blocking filtering, the majority of blur occurs within each block while the blocking artifact across the block boundaries is quite severe, for example, in the marked rectangular region in Fig. 2(b).

2.1.2 Blocking

  • Blocking artifact or blockiness is a very common type of distortion frequently seen in reconstructed video produced by video compression standards, which use blocks of various sizes as the basic units for frequency transformation, quantization and motion estimation/compensation, thus producing false discontinuities across block boundaries.
  • Their visual appearance may be different, depending on the region where blockiness occurs.
  • Mosaic effect usually occurs when there is luminance transitions at large low-energy regions (e.g., walls, black/white boards, and desk surfaces).
  • Due to quantization within each block, nearly all AC coefficients are quantized to zero, and thus each block is reconstructed as a constant DC block, where the DC values vary from block to block.
  • This is often created by a combination of motion estimation/compensation based inter-frame prediction and blocking effect in the previous frame, where blockiness in the previous frame is transformed to the current frame via motion compensation as artificial edges.

2.1.3 Ringing

  • Sharp transitions in images such as strong edges and lines are transformed to many coefficients in frequency domain representations.
  • The quantization process results in partial loss or distortion of these coefficients.
  • When the remaining coefficients are combined to reconstruct the edges or lines, artificial wave-like or ripple structures are created in nearby regions, known as the ringing artifacts.
  • Such ringing artifacts are most significant when the edges or lines are sharp and strong, and when the regions near the edges or lines are smooth, where the visual masking effect is the weakest.
  • It is worth noting that when the ringing effect is combined with object motion in consecutive video frames, a special temporal artifact called mosquito noise is observed, which will be discussed later.

2.1.4 Basis pattern effect

  • The origin of the basis pattern effect is similar to that of the ringing effect, but the spatial regions where the basis pattern effect occurs are not restricted to sharp edges or lines.
  • More specifically, in certain texture regions with moderate energy, when the transform coefficients are quantized, there is a possibility that only one transform coefficient remains (while all other coefficients are quantized to zero or nearly zero).
  • As a result, when the image signal is reconstructed using a single coefficient, the basis pattern (e.g., a DCT basis) associated with the coefficient is created as a representation of the image structure.
  • Since the basis pattern effect usually occurs at texture regions, its visibility depends on the nature of the texture region.
  • By contrast, if the region is in the background and does not attract visual attention, then the effect is often ignored by human observers.

2.1.5 Color bleeding

  • Color bleeding is a result of inconsistent image rendering across the luminance and chromatic channels.
  • In the most popular YCbCr 4:2:0 video format, the color channels Cb and Cr have half resolution of the luminance channel Y in both horizontal and vertical dimensions.
  • After compression, all luminance and chromatic channels exhibit various types of distortions (such as blurring, blocking and ringing described earlier), and more importantly, these distortions are inconsistent across color channels.
  • Moreover, because of the lower resolution in the chromatic channels, the rendering processes inevitably involve interpolation operations, leading to additional inconsistent color spreading in the rendering result.
  • In the literature, it was shown that chromatic distortion is helpful in color image quality assessment,9 but how color bleeding affects the overall perceptual quality of compressed video is still an unsolved problem.

2.2 Temporal Artifacts

  • Temporal artifacts refer to those distortion effects that are not observed when the video is paused but during video playback.
  • Temporal artifacts are of particular interest to us for two reasons.
  • First, as compared to spatial artifacts, temporal artifacts evolve more significantly with the development of video coding techniques.
  • Video, but is largely reduced in the latest HEVC coded video.
  • Second, objective evaluation of such artifacts is more challenging, and popular VQA models often fail to account for these artifacts.

2.2.1 Flickering

  • Flickering artifact generally refers to frequent luminance or chrominance changes along temporal dimension that does not appear in uncompressed reference video.
  • Mosquito noise is a joint effect of object motion and time-varying spatial artifacts (such as ringing and motion prediction error) near sharp object boundaries.
  • Specifically, the ringing and motion prediction error are most manifest at the regions near the boundaries of objects.
  • Coarse-granularity flickering refers to low-frequency sudden luminance changes in large spatial regions that could extend to the entire video frame.

2.2.3 Floating

  • Floating refers to the appearance of illusive motion in certain regions as opposed to their surrounding background.
  • Visually these regions create a strong perceptual illusion as if they were floating on top of the surrounding background.
  • Many video encoders choose to encode the blocks in the texture regions with zero motion and Skip mode.
  • Different from texture floating, edge neighborhood floating may appear without global motion.
  • Previously, this effect was also called stationary area temporal fluctuations.

3. TEXTURE FLOATING DETECTION

  • Among all types of temporal artifacts, texture floating is perhaps the least identified in the literature, but in their study, is found to be highly eye-catching and visually annoying when it exists.
  • Texture floating is typically observed in the video frames with global camera motion, including translation, rotation and zooming, also known as Global motion.
  • Therefore, the authors define two threshold energy parameters E1, E2 in their algorithm to constrain the energy range for texture floating detection.
  • In the reconstruction of video frames, erroneous motion estimation/compensation leads to significant distortions along temporal direction.
  • Fig. 7 demonstrates the performance of the proposed algorithm.

4. CONCLUSION

  • The authors reexamine perceptual artifacts created by state-of-the-art video compression technologies.
  • In particular, the fine classification, the new naming approach, and the corresponding descriptions of the temporal flickering and floating effects are new to the literature.
  • Related features are identified and a novel objective floating artifact detection algorithm is proposed, which not only detects the existence of texture floating, but also locates the texture floating regions in each video frame.
  • The current work also lays out a work plan for future studies.
  • Secondly, video encoders may be designed to eliminate or minimize the impact of these perceptual artifacts.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Characterizing Perceptual Artifacts in Compressed
Video Streams
Kai Zeng, Tiesong Zhao, Abdul Rehman and Zhou Wang
Dept. of Electrical & Computer Engineering, University of Waterloo, Waterloo, ON, Canada
ABSTRACT
To achieve optimal video quality under bandwidth and power constraints, modern video coding techniques em-
ploy lossy coding schemes, which often create compression artifacts that may lead to degradation of perceptual
video quality. Understanding and quantifying such perceptual artifacts play important roles in the development
of effective video compression, streaming and quality enhancement systems. Moreover, the characteristics of
compression artifacts evolve over time due to the continuous adoption of novel coding structures and strategies
during the development of new video compression standards. In this paper, we reexamine the perceptual arti-
facts created by standard video compression, summarizing commonly observed spatial and temporal perceptual
distortions in compressed video, with emphasis on the perceptual temporal artifacts that have not been well
identified or accounted for in previous studies. Furthermore, a floating effect detection method is proposed that
not only detects the existence of floating, but also segments the spatial regions where floating occurs
.
Keywords: video compression, video quality assessment, compression artifact, H.264-MPEG4/AVC, HEVC,
flickering, floating detection
1. INTRODUCTION
The demand for high-performance network video communications has been increasing exponentially in recent
years. According to Cisco Visual Networking Index, the sum of all forms of video (TV, VoD, Internet, and
P2P) will constitute approximately 90 percent of global consumer traffic by 2015.
1
A high-performance video
compression technology is critical for current industrial video communication systems to catch up with such
increasing demand. A fundamental issue in the design of video compression systems is to achieve an optimal
compromise between the availability of resources (i.e. bandwidth, power, and time) and the perceptual quality
of the compressed video. The constraint in available resources often leads to degradations of perceptual quality
by introducing compression artifacts in the decoded video. For example, large quantization step could reduce
power consumption, encoding time, as well as the bandwidth needed to encode the video, but, unfortunately,
also results in video quality degradation.
Consumers’ expectations for better Quality-of-Experience (QoE) nowdays has been higher than ever before.
Despite the fast technological development in telecommunication and display devices, poor video quality origi-
nated from compression and streaming processes has disappointed a large volume of consumers, resulting in major
revenue lost in digital media communication industry. Based on a recent viewer experience study,
2
“In 2012,
global premium content brands lost $2.16 billion of revenue due to poor quality video streams and are expected
to miss out on an astounding $20 billion through 2017”. The poor video quality keeps challenging the viewers’
patience and becomes a core threat to the video service ecosystem. According to the same study,
2
roughly
60% of all video streams experienced quality degradation in 2012. In another recent study,
3
90.4% interviewers
reported “end-user video quality monitoring as either “critical”, “very important”, or “important” to their video
Further author information: (Send correspondence to Kai Zeng)
Kai Zeng: E-mail: kzeng@uwaterloo.ca, Telephone: 1 519 888 4567 ext. 31449
Tiesong Zhao: E-mail: ztiesong@uwaterloo.ca, Telephone: 1 519 888 4567 ext. 31448
Abdul Rehman: E-mail: abdul.rehman@uwaterloo.ca, Telephone: 1 519 888 4567 ext. 31449
Zhou Wang: E-mail: zhou.wang@uwaterloo.ca, Telephone: 1 519 888 4567 ext. 35301
Image and video examples that demonstrate various types of spatial and temporal compression artifacts are available
at https://ece.uwaterloo.ca/
~
z70wang/research/compression_artifacts/.
Presented at: IS&T/SPIE Annual Symposium on Electronic Imaging, San Francisco, CA, Feb. 2-6, 2014
Published in: Human Vision and Electronic Imaging XIX, Proc. SPIE, vol. 9014. @SPIE

initiatives”, and almost half of the customer phone calls is related to video quality problems in Video-on-Demand
(VOD) services and HDTV. Additionally, even though 58.1% of the interviewed subjects reported the end-user
QoE is “critical” and requires to be monitored, only 31% said they use network monitoring tools to discover
quality problems.
3
Therefore, there is an urgent need of effective and efficient objective video quality assessment
(VQA) tools in current media network communication systems that can provide reliable quality measurement of
end users’ visual QoE.
Since compression is a major source of video quality degradation, we focuses on perceptual artifacts generated
by standard video compression techniques in the current work. Various types of artifacts created by standard
compression schemes had been summarized previously.
4
Objective VQA techniques had also been designed to
automatically evaluate the perceptual quality of compressed video streams.
5
However, recent studies suggest
that widely recognized VQA models (though promising) only achieve limited success in predicting the perceptual
coding gain between state-of-the-art video coding techniques, and problems often occur when specific temporal
artifacts appear in the compressed video streams.
6
This is likely due to the adoption of the novel coding
structures and strategies in the latest development of video compression standards such as H.264/AVC
7
and the
high efficiency video coding (HEVC).
8
This motivates us to reexamine the perceptual artifacts created by video
compression, with emphasis on the perceptual temporal artifacts that have not been well identified or accounted
for in previous studies.
In this paper, we first attempt to elaborate various spatial and temporal artifacts originated from standard
video compression. These include both conventional artifacts and those emerged recently in the new coding
standards, such as various flickering and floating effects. Examples are provided to demonstrate the artifacts in
different categories. Possible reasons and consequences of these artifacts together with their perceptual impact
are discussed in the context of compression. Finally, an objective floating artifact detection scheme is proposed,
which not only detects the existence of floating, but also indicates the location of floating regions in each video
frame.
2. PERCEPTUAL ARTIFACTS IN COMPRESSED VIDEO
A diagram that summarizes various types of compression artifacts is given in Fig. 1. Both spatial and temporal
artifacts may exist in compressed video, where spatial artifacts refer to the distortions that can be observed in
individual frames while temporal artifacts can only be seen during video playback. Both spatial and temporal
artifacts can be further divided into categories and subcategories of more specific distortion types. A detailed
description of the appearance and causes of each type of perceptual compression artifacts will be given in the
following sections. In addition to these artifacts, there are a number of other perceptual video artifacts that
are often seen in real-world visual communication applications. These include those artifacts generated during
video acquisition (e.g., camera noise, camera motion blur, and line/frame jittering), during video transmission
in error-prone networks (e.g., video freezing, jittering, and erroneously decoded blocks caused by packet loss and
delay), and during video post-processing and display (e.g., post deblocking and noise filtering, spatial scaling,
retargeting, chromatic aberration, and pincushion distortion). Since compression is not the main cause of these
artifacts, they are beyond the major focus of the current paper.
2.1 Spatial Artifacts
Block-based video coding schemes create various spatial artifacts due to block partitioning and quantization.
These artifacts include blurring, blocking, ringing, basis pattern effect, and color bleeding. They are detected
without referencing to temporally neighboring frames, and thus can be better identified when the video is paused.
Due to the complexity of modern compression techniques, these artifacts are interrelated with each other, and
the classification here is mainly based on their visual appearance.
2.1.1 Blurring
All modern video compression methods involve a frequency transform step followed by a quantization process
that often removes small amplitude transform coefficients. Since the energy of natural visual signals concentrate
at low frequencies, quantization reduces high frequency energy in such signals, resulting in significant blurring
effect in the reconstructed signals. Perceptually, blurring typically manifests itself as a loss of spatial details or

spatial
artifacts
temporal
artifacts
compression
artifacts
ringing
blocking
blurring
color bleeding
mosaicing effect
basis pattern effect
staircase effect
false edge
flickering
floating
jerkiness
edge neighborhood floating
texture floating
mosquito noise
fine-granularity flickering
coarse-granularity flickering
Figure 1. Categorization of perceptual artifacts created by video compression
sharpness at edges or texture regions in the image. Since in block-based coding schemes, frequency transformation
and quantization are usually conducted within individual image blocks, blurring caused by such processes is often
created inside the blocks.
Another source of blurring effect is in-loop de-blocking filtering, which is employed to reduce the blocking
artifact across block boundaries, and are adopted as options by state-of-the-art video coding standards such as
H.264/AVC and HEVC. The de-blocking operators are essentially spatially adaptive low-pass filters that smooth
the block boundaries, and thus produces perceptual blurring effect.
A visual example is given in Fig. 2, where the left picture is a reference frame extracted from the original
video, and the middle and right pictures are two decoded H.264/AVC frames with the de-blocking filter turned
off and on, respectively. It can be observed that without de-blocking filtering, the majority of blur occurs within
each block while the blocking artifact across the block boundaries is quite severe, for example, in the marked
rectangular region in Fig. 2(b). When the de-blocking filter is turned on, much smoother luminance transition
is observed in the same region, as shown in Fig. 2(c), but the overall appearance of the picture is more blurry.
(a)
(c)(b)
Figure 2. An example of spatial artifacts created by video compression. (a) Reference frame; (b) Compressed frame with
de-blocking filter turned off; (c) Compressed frame with de-blocking filter turned on.
2.1.2 Blocking
Blocking artifact or blockiness is a very common type of distortion frequently seen in reconstructed video produced
by video compression standards, which use blocks of various sizes as the basic units for frequency transformation,
quantization and motion estimation/compensation, thus producing false discontinuities across block boundaries.
Although all blocking effects are generated because of similar reasons mentioned above, their visual appearance
may be different, depending on the region where blockiness occurs. Therefore, here we further classify the
blocking effects into three subcategories.

(a)
(b)
Figure 3. An example of blocking artifacts. (a) Reference frame; (b) Compressed frame with three types of blocking
artifacts: mosaic effect (elliptical region); staircase effect (rectangular region); false edge (triangular region).
Mosaic effect usually occurs when there is luminance transitions at large low-energy regions (e.g., walls,
black/white boards, and desk surfaces). Due to quantization within each block, nearly all AC coefficients
are quantized to zero, and thus each block is reconstructed as a constant DC block, where the DC values
vary from block to block. When all blocks are put together, mosaic effect manifests as abrupt luminance
change from one block to another across the space. The mosaic effect is highly visible and annoying to
the visual system, where the visual masking effect (which stands for the reduced visibility of one image
component due to the existence of another neighboring image component) is the weakest at smooth regions.
An example is shown in the marked elliptical region in Fig. 3(b).
Staircase effect typically happens along a diagonal line or curve, which, when mixed with the false
horizontal and vertical edges at block boundaries, creates fake staircase structures. In Fig. 3(b), an example
of staircase effect is highlighted in the marked rectangle region.
False edge is a fake edge that appears near a true edge. This is often created by a combination of motion
estimation/compensation based inter-frame prediction and blocking effect in the previous frame, where
blockiness in the previous frame is transformed to the current frame via motion compensation as artificial
edges. An example is given in the triangle marked region in Fig. 3(b).
2.1.3 Ringing
Sharp transitions in images such as strong edges and lines are transformed to many coefficients in frequency
domain representations. The quantization process results in partial loss or distortion of these coefficients. When
the remaining coefficients are combined to reconstruct the edges or lines, artificial wave-like or ripple structures
are created in nearby regions, known as the ringing artifacts. Such ringing artifacts are most significant when
the edges or lines are sharp and strong, and when the regions near the edges or lines are smooth, where the
visual masking effect is the weakest. Fig. 4(b) shows an example of ringing artifacts. It is worth noting that
when the ringing effect is combined with object motion in consecutive video frames, a special temporal artifact
called mosquito noise is observed, which will be discussed later.
2.1.4 Basis pattern effect
The origin of the basis pattern effect is similar to that of the ringing effect, but the spatial regions where the basis
pattern effect occurs are not restricted to sharp edges or lines. More specifically, in certain texture regions with
moderate energy, when the transform coefficients are quantized, there is a possibility that only one transform
coefficient remains (while all other coefficients are quantized to zero or nearly zero). As a result, when the
image signal is reconstructed using a single coefficient, the basis pattern (e.g., a DCT basis) associated with the
coefficient is created as a representation of the image structure. An example is shown in Fig. 5(b), in which the

(a) (b)
Figure 4. An example of ringing artifact. (a) Reference frame; (b) Compressed frame with ringing artifact.
basis pattern effect is highlighted in the marked rectangular regions. Since the basis pattern effect usually occurs
at texture regions, its visibility depends on the nature of the texture region. If the region is in the foreground
and attract visual attention, the basis pattern effect will have strong impact on perceived video quality. By
contrast, if the region is in the background and does not attract visual attention, then the effect is often ignored
by human observers.
(b)(a)
Figure 5. An example of basis pattern effect. (a) Reference frame; (b) Compressed frame with basis pattern effect.
2.1.5 Color bleeding
Color bleeding is a result of inconsistent image rendering across the luminance and chromatic channels. For
example, in the most popular YCbCr 4:2:0 video format, the color channels Cb and Cr have half resolution
of the luminance channel Y in both horizontal and vertical dimensions. After compression, all luminance and
chromatic channels exhibit various types of distortions (such as blurring, blocking and ringing described earlier),
and more importantly, these distortions are inconsistent across color channels. Moreover, because of the lower
resolution in the chromatic channels, the rendering processes inevitably involve interpolation operations, leading
to additional inconsistent color spreading in the rendering result. In the literature, it was shown that chromatic
distortion is helpful in color image quality assessment,
9
but how color bleeding affects the overall perceptual
quality of compressed video is still an unsolved problem. An example of color bleeding is given in the highlighted
elliptical region in Fig. 6(b).
2.2 Temporal Artifacts
Temporal artifacts refer to those distortion effects that are not observed when the video is paused but during
video playback. Temporal artifacts are of particular interest to us for two reasons. First, as compared to
spatial artifacts, temporal artifacts evolve more significantly with the development of video coding techniques.

Citations
More filters
Journal ArticleDOI
TL;DR: An overview of selected issues pertaining to QeE and its recent applications in video transmission, with consideration of the compelling features of QoE (i.e., context and human factors).
Abstract: The increasing popularity of video (i.e., audio-visual) applications or services over both wired and wireless links has prompted recent growing interests in the investigations of quality of experience (QoE) in online video transmission. Conventional video quality metrics, such as peak-signal-to-noise-ratio and quality of service, only focus on the reception quality from the systematic perspective. As a result, they cannot represent the true visual experience of an individual user. Instead, the QoE introduces a user experience-driven strategy which puts special emphasis on the contextual and human factors in addition to the transmission system. This advantage has raised the popularity and widespread usage of QoE in video transmission. In this paper, we present an overview of selected issues pertaining to QoE and its recent applications in video transmission, with consideration of the compelling features of QoE (i.e., context and human factors). The selected issues include QoE modeling with influence factors in the end-to-end chain of video transmission, QoE assessment (including subjective test and objective QoE monitoring) and QoE management of video transmission over different types of networks. Through the literature review, we observe that the context and human factors in QoE-aware video transmission have attracted significant attentions since the past two to three years. A vast number of high quality works were published in this area, and will be highlighted in this survey. In addition to a thorough summary of recent progresses, we also present an outlook of future developments on QoE assessment and management in video transmission, especially focusing on the context and human factors that have not been addressed yet and the technical challenges that have not been completely solved so far. We believe that our overview and findings can provide a timely perspective on the related issues and the future research directions in QoE-oriented services over video communications.

118 citations


Cites background from "Characterizing perceptual artifacts..."

  • ...compression artifacts on video quality were discussed in [39]....

    [...]

Proceedings ArticleDOI
15 Oct 2018
TL;DR: A Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages of blind video quality assessment into one, where the feature extractor and the regressor are jointly optimized.
Abstract: Blind video quality assessment (BVQA) algorithms are traditionally designed with a two-stage approach - a feature extraction stage that computes typically hand-crafted spatial and/or temporal features, and a regression stage working in the feature space that predicts the perceptual quality of the video. Unlike the traditional BVQA methods, we propose a Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages into one, where the feature extractor and the regressor are jointly optimized. Our model uses a multi-task DNN framework that not only estimates the perceptual quality of the test video but also provides a probabilistic prediction of its codec type. This framework allows us to train the network with two complementary sets of labels, both of which can be obtained at low cost. The training process is composed of two steps. In the first step, early convolutional layers are pre-trained to extract spatiotemporal quality-related features with the codec classification subtask. In the second step, initialized with the pre-trained feature extractor, the whole network is jointly optimized with the two subtasks together. An additional critical step is the adoption of 3D convolutional layers, which creates novel spatiotemporal features that lead to a significant performance boost. Experimental results show that the proposed model clearly outperforms state-of-the-art BVQA methods.The source code of V-MEON is available at https://ece.uwaterloo.ca/~zduanmu/acmmm2018bvqa.

96 citations


Cites background from "Characterizing perceptual artifacts..."

  • ...Since most video compression distortions manifest themselves spatiotemporally [43], it is of vital importance for a BVQA model to be capable of discovering...

    [...]

  • ...However, such a framework fails to take into account the following influencing factors in video perceptual quality: 1) motion-induced blindness [5, 28] to spatial distortions; 2) possible temporal artifacts or incoherence [29, 43]; 3) codec-specific distortions [43]; and 4) interactions between spatial and temporal artifacts [9]....

    [...]

Journal ArticleDOI
TL;DR: This paper provides a first comprehensive review of the most common visual distortions that alter 360-degree signals undergoing state of the art processing in common applications, essential as a basis for benchmarking different processing techniques, allowing the effective design of new algorithms and applications.
Abstract: Omnidirectional (or 360°) images and videos are emergent signals being used in many areas, such as robotics and virtual/augmented reality. In particular, for virtual reality applications, they allow an immersive experience in which the user can interactively navigate through a scene with three degrees of freedom, wearing a head-mounted display. Current approaches for capturing, processing, delivering, and displaying 360° content, however, present many open technical challenges and introduce several types of distortions in the visual signal. Some of the distortions are specific to the nature of 360° images and often differ from those encountered in classical visual communication frameworks. This paper provides a first comprehensive review of the most common visual distortions that alter 360° signals going through the different processing elements of the visual communication pipeline. While their impact on viewers’ visual perception and the immersive experience at large is still unknown—thus, it is an open research topic—this review serves the purpose of proposing a taxonomy of the visual distortions that can be encountered in 360° signals. Their underlying causes in the end-to-end 360° content distribution pipeline are identified. This taxonomy is essential as a basis for comparing different processing techniques, such as visual enhancement, encoding, and streaming strategies, and allowing the effective design of new algorithms and applications. It is also a useful resource for the design of psycho-visual studies aiming to characterize human perception of 360° content in interactive and immersive applications.

66 citations


Cites background from "Characterizing perceptual artifacts..."

  • ...the literature, both for standard 2D [7], [8], [9], [10] and stereoscopic 3D signals [11], [12], [13], [14]....

    [...]

  • ...Flickering refers to frequent changes in luminance or chrominance along the temporal dimension that do not appear in uncompressed video, and can be divided into mosquito noise (when it occurs at the borders of moving objects), coarse-granularity flickering (when it suddenly occurs in large spatial areas) and finegranularity flickering (when it appears to be flashing on a frame-by-frame basis) [10]....

    [...]

  • ...For a more indepth discussion on 2D video artifacts, we refer the reader to [10], [9], [8], [7]....

    [...]

Proceedings ArticleDOI
TL;DR: In this paper, a multi-frame convolutional neural network (MF-CNN) is proposed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are as the input.
Abstract: The past few years have witnessed great success in applying deep learning to enhance the quality of compressed image/video. The existing approaches mainly focus on enhancing the quality of a single frame, ignoring the similarity between consecutive frames. In this paper, we investigate that heavy quality fluctuation exists across compressed video frames, and thus low quality frames can be enhanced using the neighboring high quality frames, seen as Multi-Frame Quality Enhancement (MFQE). Accordingly, this paper proposes an MFQE approach for compressed video, as a first attempt in this direction. In our approach, we firstly develop a Support Vector Machine (SVM) based detector to locate Peak Quality Frames (PQFs) in compressed video. Then, a novel Multi-Frame Convolutional Neural Network (MF-CNN) is designed to enhance the quality of compressed video, in which the non-PQF and its nearest two PQFs are as the input. The MF-CNN compensates motion between the non-PQF and PQFs through the Motion Compensation subnet (MC-subnet). Subsequently, the Quality Enhancement subnet (QE-subnet) reduces compression artifacts of the non-PQF with the help of its nearest PQFs. Finally, the experiments validate the effectiveness and generality of our MFQE approach in advancing the state-of-the-art quality enhancement of compressed video. The code of our MFQE approach is available at this https URL

61 citations

Posted Content
Yi Xu1, Longwen Gao1, Kai Tian1, Shuigeng Zhou1, Huyang Sun 
TL;DR: Zhang et al. as mentioned in this paper proposed an end-to-end deep neural network called non-local ConvLSTM (NL-ConvLSTMs) that exploits multiple consecutive frames.
Abstract: Video compression artifact reduction aims to recover high-quality videos from low-quality compressed videos. Most existing approaches use a single neighboring frame or a pair of neighboring frames (preceding and/or following the target frame) for this task. Furthermore, as frames of high quality overall may contain low-quality patches, and high-quality patches may exist in frames of low quality overall, current methods focusing on nearby peak-quality frames (PQFs) may miss high-quality details in low-quality frames. To remedy these shortcomings, in this paper we propose a novel end-to-end deep neural network called non-local ConvLSTM (NL-ConvLSTM in short) that exploits multiple consecutive frames. An approximate non-local strategy is introduced in NL-ConvLSTM to capture global motion patterns and trace the spatiotemporal dependency in a video sequence. This approximate strategy makes the non-local module work in a fast and low space-cost way. Our method uses the preceding and following frames of the target frame to generate a residual, from which a higher quality frame is reconstructed. Experiments on two datasets show that NL-ConvLSTM outperforms the existing methods.

33 citations

References
More filters
Journal ArticleDOI
TL;DR: A general, spatio-spectrally localized multiscale framework for evaluating dynamic video fidelity that integrates both spatial and temporal aspects of distortion assessment and is found to be quite competitive with, and even outperform, algorithms developed and submitted to the VQEG FRTV Phase 1 study, as well as more recent VQA algorithms tested on this database.
Abstract: There has recently been a great deal of interest in the development of algorithms that objectively measure the integrity of video signals. Since video signals are being delivered to human end users in an increasingly wide array of applications and products, it is important that automatic methods of video quality assessment (VQA) be available that can assist in controlling the quality of video being delivered to this critical audience. Naturally, the quality of motion representation in videos plays an important role in the perception of video quality, yet existing VQA algorithms make little direct use of motion information, thus limiting their effectiveness. We seek to ameliorate this by developing a general, spatio-spectrally localized multiscale framework for evaluating dynamic video fidelity that integrates both spatial and temporal (and spatio-temporal) aspects of distortion assessment. Video quality is evaluated not only in space and time, but also in space-time, by evaluating motion quality along computed motion trajectories. Using this framework, we develop a full reference VQA algorithm for which we coin the term the MOtion-based Video Integrity Evaluation index, or MOVIE index. It is found that the MOVIE index delivers VQA scores that correlate quite closely with human subjective judgment, using the Video Quality Expert Group (VQEG) FRTV Phase 1 database as a test bed. Indeed, the MOVIE index is found to be quite competitive with, and even outperform, algorithms developed and submitted to the VQEG FRTV Phase 1 study, as well as more recent VQA algorithms tested on this database.

729 citations

Reference BookDOI
TL;DR: Video Quality Experts Group .
Abstract: PICTURE CODING AND HUMAN VISUAL SYSTEM FUNDAMENTALS Digital Picture Compression and Coding Structure . Introduction to Digital Picture Coding . Characteristics of Picture Data . Compression and Coding Techniques . Picture Quantization . Rate-Distortion Theory . Human Visual Systems . Digital Picture Coding Standards and Systems . Summary Fundamentals of Human Vision and Vision Modeling . Introduction . A Brief Overview of the Visual System . Color Vision . Luminance and the Perception of Light Intensity . Spatial Vision and Contrast Sensitivity . Temporal Vision and Motion . Visual Modeling . Conclusions Coding Artifacts and Visual Distortions . Introduction . Blocking Effect . Basis Image Effect . Blurring . Color Bleeding . Staircase Effect . Ringing . Mosaic Patterns . False Contouring . False Edges . MC Mismatch . Mosquito Effect . Stationary Area Fluctuations . Chrominance Mismatch . Video Scaling and Field Rate Conversion . Deinterlacing . Summary PICTURE QUALITY ASSESSMENT AND METRICS Video Quality Testing . Introduction . Subjective Assessment Methodologies . Selection of Test Materials . Selection of Participants-Subjects . Experimental Design . International Test Methods . Objective Assessment Methods . Summary Perceptual Video Quality Metrics-A Review . Introduction . Quality Factors . Metric Classification . Pixel-Based Metrics . The Psychophysical Approach . The Engineering Approach . Metric Comparisons . Conclusions and Perspectives Philosophy of Picture Quality Scale . Objective Picture Quality Scale for Image Coding . Application of PQS to a Variety of Electronic Images . Various Categories of Image Systems . Study at ITU . Conclusion Structural Similarity Based Image Quality Assessment . Structural Similarity Based Image Quality . The Structural SIMilarity (SSIM) Index . Image Quality Assessment Based on the SSIM Index . Discussions Vision Model Based Digital Video Impairment Metrics . Introduction . Vision Modeling for Impairment Measurement . Perceptual Blocking Distortion Metric . Perceptual Ringing Distortion Measure . Conclusion Computational Models for Just-Noticeable Difference . Introduction . JND with DCT Subbands . JND with Pixels . JND Model Evaluation . Conclusions No-Reference Quality Metric for Degraded and Enhanced Video . Introduction . State-of-the-Art for No-Reference Metrics . Quality Metric Components and Design . No-Reference Overall Quality Metric . Performance of the Quality Metric . Conclusions and Future Research Video Quality Experts Group . Formation . Goals . Phase I . Phase II . Continuing Work and Directions . Summary PERCEPTUAL CODING AND PROCESSING OF DIGITAL PICTURES HVS Based Perceptual Video Encoders . Introduction . Noise Visibility and Visual Masking . Architectures for Perceptual Based Coding . Standards-Specific Features . Salience/Maskability Pre-Processing . Application to Multi-Channel Encoding Perceptual Image Coding . Introduction . A Perceptual Distortion Metric Based Image Coder . Model Calibration . Performance Evaluation . Perceptual Lossless Coder . Summary Foveated Image and Video Coding . Foveated Human Vision and Foveated Image Processing . Foveation Methods . Scalable Foveated Image and Video Coding . Discussions Artifact Reduction by Post-Processing in Image Compression . Introduction . Image Compression and Coding Artifacts . Reduction of Blocking Artifacts . Reduction of Ringing Artifacts . Summary Reduction of Color Bleeding in DCT Block-Coded Video . Introduction . Detailed Analysis of the Color Bleeding Phenomenon . Description of the Post-Processor . Experimental Results-Concluding Remarks Error Resilience for Video Coding Service . Introduction to Error Resilient Coding Techniques . Error Resilient Coding Methods Compatible with MPEG-2 . Methods for Concealment of Cell Loss . Experimental Procedure . Experimental Results . Conclusions Critical Issues and Challenges . Picture Coding Structures . Vision Modeling Issues . Spatio-Temporal Masking in Video Coding . Picture Quality Assessment . Challenges in Perceptual Coder Design . Codec System Design Optimization . Summary Appendix: VQM Performance Metrics . Metrics Relating to Model Prediction Accuracy . Metrics Relating to Prediction Monotonicity of a Model . Metrics Relating to Prediction Consistency . MATLAB(R) Source Code . Supplementary Analyses INDEX

333 citations

Journal ArticleDOI
TL;DR: This paper presents a comprehensive analysis and classification of the numerous coding artifacts which are introduced into the reconstructed video sequence through the use of the hybrid MC/DPCM/DCT video coding algorithm.

331 citations


"Characterizing perceptual artifacts..." refers background or methods in this paper

  • ...Previously, this effect was also called stationary area temporal fluctuations.(4)...

    [...]

  • ...Various types of artifacts created by standard compression schemes had been summarized previously.(4) Objective VQA techniques had also been designed to automatically evaluate the perceptual quality of compressed video streams....

    [...]

Journal ArticleDOI
TL;DR: The general conclusion is that existing VQA algorithms are not well-equipped to handle distortions that vary over time.
Abstract: We introduce a new video quality database that models video distortions in heavily-trafficked wireless networks and that contains measurements of human subjective impressions of the quality of videos. The new LIVE Mobile Video Quality Assessment (VQA) database consists of 200 distorted videos created from 10 RAW HD reference videos, obtained using a RED ONE digital cinematographic camera. While the LIVE Mobile VQA database includes distortions that have been previously studied such as compression and wireless packet-loss, it also incorporates dynamically varying distortions that change as a function of time, such as frame-freezes and temporally varying compression rates. In this article, we describe the construction of the database and detail the human study that was performed on mobile phones and tablets in order to gauge the human perception of quality on mobile devices. The subjective study portion of the database includes both the differential mean opinion scores (DMOS) computed from the ratings that the subjects provided at the end of each video clip, as well as the continuous temporal scores that the subjects recorded as they viewed the video. The study involved over 50 subjects and resulted in 5,300 summary subjective scores and time-sampled subjective traces of quality. In the behavioral portion of the article we analyze human opinion using statistical techniques, and also study a variety of models of temporal pooling that may reflect strategies that the subjects used to make the final decision on video quality. Further, we compare the quality ratings obtained from the tablet and the mobile phone studies in order to study the impact of these different display modes on quality. We also evaluate several objective image and video quality assessment (IQA/VQA) algorithms with regards to their efficacy in predicting visual quality. A detailed correlation analysis and statistical hypothesis testing is carried out. Our general conclusion is that existing VQA algorithms are not well-equipped to handle distortions that vary over time. The LIVE Mobile VQA database, along with the subject DMOS and the continuous temporal scores is being made available to researchers in the field of VQA at no cost in order to further research in the area of video quality assessment.

299 citations


"Characterizing perceptual artifacts..." refers methods in this paper

  • ...7(a) is the original frame extracted from the “PO Org.yuv” sequence from LIVE Mobile Video Quality Database.17 Fig....

    [...]

  • ...yuv” sequence from LIVE Mobile Video Quality Database.(17) Fig....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a double-layer image quality assessment system is proposed, which uses descriptors based on the color correlogram, analyzing the alterations in the color distribution of an image as a consequence of the occurrence of distortions, for the reduced reference data.
Abstract: Reduced-reference systems can predict in real-time the perceived quality of images for digital broadcasting, only requiring that a limited set of features, extracted from the original undistorted signals, is transmitted together with the image data. This paper uses descriptors based on the color correlogram, analyzing the alterations in the color distribution of an image as a consequence of the occurrence of distortions, for the reduced reference data. The processing architecture relies on a double layer at the receiver end. The first layer identifies the kind of distortion that may affect the received signal. The second layer deploys a dedicated prediction module for each type of distortion; every predictor yields an objective quality score, thus completing the estimation process. Computational-intelligence models are used extensively to support both layers with empirical training. The double-layer architecture implements a general purpose image quality assessment system, not being tied up to specific distortions and, at the same time, it allows us to benefit from the accuracy of specific, distortion-targeted metrics. Experimental results based on subjective quality data confirm the general validity of the approach.

66 citations

Frequently Asked Questions (2)
Q1. What are the contributions mentioned in the paper "Characterizing perceptual artifacts in compressed video streams" ?

In this paper, the authors reexamine the perceptual artifacts created by standard video compression, summarizing commonly observed spatial and temporal perceptual distortions in compressed video, with emphasis on the perceptual temporal artifacts that have not been well identified or accounted for in previous studies. Furthermore, a floating effect detection method is proposed that not only detects the existence of floating, but also segments the spatial regions where floating occurs∗. 

The current work also lays out a work plan for future studies. Firstly, objective VQA methods need to be reexamined and further developed to detect each compression artifacts reliably and efficiently. Secondly, video encoders may be designed to eliminate or minimize the impact of these perceptual artifacts.