Characterizing perceptual artifacts in compressed video streams
Summary (4 min read)
1. INTRODUCTION
- The demand for high-performance network video communications has been increasing exponentially in recent years.
- The poor video quality keeps challenging the viewers’ patience and becomes a core threat to the video service ecosystem.
- Since compression is a major source of video quality degradation, the authors focuses on perceptual artifacts generated by standard video compression techniques in the current work.
- Various types of artifacts created by standard compression schemes had been summarized previously.
- Objective VQA techniques had also been designed to automatically evaluate the perceptual quality of compressed video streams.
2. PERCEPTUAL ARTIFACTS IN COMPRESSED VIDEO
- Both spatial and temporal artifacts may exist in compressed video, where spatial artifacts refer to the distortions that can be observed in individual frames while temporal artifacts can only be seen during video playback.
- Both spatial and temporal artifacts can be further divided into categories and subcategories of more specific distortion types.
- A detailed description of the appearance and causes of each type of perceptual compression artifacts will be given in the following sections.
- In addition to these artifacts, there are a number of other perceptual video artifacts that are often seen in real-world visual communication applications.
- Since compression is not the main cause of these artifacts, they are beyond the major focus of the current paper.
2.1 Spatial Artifacts
- Block-based video coding schemes create various spatial artifacts due to block partitioning and quantization.
- These artifacts include blurring, blocking, ringing, basis pattern effect, and color bleeding.
- They are detected without referencing to temporally neighboring frames, and thus can be better identified when the video is paused.
- Due to the complexity of modern compression techniques, these artifacts are interrelated with each other, and the classification here is mainly based on their visual appearance.
2.1.1 Blurring
- All modern video compression methods involve a frequency transform step followed by a quantization process that often removes small amplitude transform coefficients.
- Since the energy of natural visual signals concentrate at low frequencies, quantization reduces high frequency energy in such signals, resulting in significant blurring effect in the reconstructed signals.
- A visual example is given in Fig. 2, where the left picture is a reference frame extracted from the original video, and the middle and right pictures are two decoded H.264/AVC.
- Frames with the de-blocking filter turned off and on, respectively.
- It can be observed that without de-blocking filtering, the majority of blur occurs within each block while the blocking artifact across the block boundaries is quite severe, for example, in the marked rectangular region in Fig. 2(b).
2.1.2 Blocking
- Blocking artifact or blockiness is a very common type of distortion frequently seen in reconstructed video produced by video compression standards, which use blocks of various sizes as the basic units for frequency transformation, quantization and motion estimation/compensation, thus producing false discontinuities across block boundaries.
- Their visual appearance may be different, depending on the region where blockiness occurs.
- Mosaic effect usually occurs when there is luminance transitions at large low-energy regions (e.g., walls, black/white boards, and desk surfaces).
- Due to quantization within each block, nearly all AC coefficients are quantized to zero, and thus each block is reconstructed as a constant DC block, where the DC values vary from block to block.
- This is often created by a combination of motion estimation/compensation based inter-frame prediction and blocking effect in the previous frame, where blockiness in the previous frame is transformed to the current frame via motion compensation as artificial edges.
2.1.3 Ringing
- Sharp transitions in images such as strong edges and lines are transformed to many coefficients in frequency domain representations.
- The quantization process results in partial loss or distortion of these coefficients.
- When the remaining coefficients are combined to reconstruct the edges or lines, artificial wave-like or ripple structures are created in nearby regions, known as the ringing artifacts.
- Such ringing artifacts are most significant when the edges or lines are sharp and strong, and when the regions near the edges or lines are smooth, where the visual masking effect is the weakest.
- It is worth noting that when the ringing effect is combined with object motion in consecutive video frames, a special temporal artifact called mosquito noise is observed, which will be discussed later.
2.1.4 Basis pattern effect
- The origin of the basis pattern effect is similar to that of the ringing effect, but the spatial regions where the basis pattern effect occurs are not restricted to sharp edges or lines.
- More specifically, in certain texture regions with moderate energy, when the transform coefficients are quantized, there is a possibility that only one transform coefficient remains (while all other coefficients are quantized to zero or nearly zero).
- As a result, when the image signal is reconstructed using a single coefficient, the basis pattern (e.g., a DCT basis) associated with the coefficient is created as a representation of the image structure.
- Since the basis pattern effect usually occurs at texture regions, its visibility depends on the nature of the texture region.
- By contrast, if the region is in the background and does not attract visual attention, then the effect is often ignored by human observers.
2.1.5 Color bleeding
- Color bleeding is a result of inconsistent image rendering across the luminance and chromatic channels.
- In the most popular YCbCr 4:2:0 video format, the color channels Cb and Cr have half resolution of the luminance channel Y in both horizontal and vertical dimensions.
- After compression, all luminance and chromatic channels exhibit various types of distortions (such as blurring, blocking and ringing described earlier), and more importantly, these distortions are inconsistent across color channels.
- Moreover, because of the lower resolution in the chromatic channels, the rendering processes inevitably involve interpolation operations, leading to additional inconsistent color spreading in the rendering result.
- In the literature, it was shown that chromatic distortion is helpful in color image quality assessment,9 but how color bleeding affects the overall perceptual quality of compressed video is still an unsolved problem.
2.2 Temporal Artifacts
- Temporal artifacts refer to those distortion effects that are not observed when the video is paused but during video playback.
- Temporal artifacts are of particular interest to us for two reasons.
- First, as compared to spatial artifacts, temporal artifacts evolve more significantly with the development of video coding techniques.
- Video, but is largely reduced in the latest HEVC coded video.
- Second, objective evaluation of such artifacts is more challenging, and popular VQA models often fail to account for these artifacts.
2.2.1 Flickering
- Flickering artifact generally refers to frequent luminance or chrominance changes along temporal dimension that does not appear in uncompressed reference video.
- Mosquito noise is a joint effect of object motion and time-varying spatial artifacts (such as ringing and motion prediction error) near sharp object boundaries.
- Specifically, the ringing and motion prediction error are most manifest at the regions near the boundaries of objects.
- Coarse-granularity flickering refers to low-frequency sudden luminance changes in large spatial regions that could extend to the entire video frame.
2.2.3 Floating
- Floating refers to the appearance of illusive motion in certain regions as opposed to their surrounding background.
- Visually these regions create a strong perceptual illusion as if they were floating on top of the surrounding background.
- Many video encoders choose to encode the blocks in the texture regions with zero motion and Skip mode.
- Different from texture floating, edge neighborhood floating may appear without global motion.
- Previously, this effect was also called stationary area temporal fluctuations.
3. TEXTURE FLOATING DETECTION
- Among all types of temporal artifacts, texture floating is perhaps the least identified in the literature, but in their study, is found to be highly eye-catching and visually annoying when it exists.
- Texture floating is typically observed in the video frames with global camera motion, including translation, rotation and zooming, also known as Global motion.
- Therefore, the authors define two threshold energy parameters E1, E2 in their algorithm to constrain the energy range for texture floating detection.
- In the reconstruction of video frames, erroneous motion estimation/compensation leads to significant distortions along temporal direction.
- Fig. 7 demonstrates the performance of the proposed algorithm.
4. CONCLUSION
- The authors reexamine perceptual artifacts created by state-of-the-art video compression technologies.
- In particular, the fine classification, the new naming approach, and the corresponding descriptions of the temporal flickering and floating effects are new to the literature.
- Related features are identified and a novel objective floating artifact detection algorithm is proposed, which not only detects the existence of texture floating, but also locates the texture floating regions in each video frame.
- The current work also lays out a work plan for future studies.
- Secondly, video encoders may be designed to eliminate or minimize the impact of these perceptual artifacts.
Did you find this useful? Give us your feedback
Citations
31 citations
Cites background from "Characterizing perceptual artifacts..."
...Second, in the case of low to moderate object motion, if they are accompanied by slow camera motion, humans tend to be more sensitive to temporal artifacts [22] and thus the effect of increasing the frame rate could be strong....
[...]
30 citations
17 citations
Cites background from "Characterizing perceptual artifacts..."
...Temporal artifacts such as flickering, jerkiness and floating can be noticed while the video is being played [31]....
[...]
16 citations
Cites result from "Characterizing perceptual artifacts..."
...Motivated by the success of motion smoothness in video artifact detection [5, 13], we extend it to account for cross-frame rate video quality assessment....
[...]
...In contrast to data-driven models, knowledge-driven approaches focus on the analysis of temporal statistical properties of videos at different frame-rates [5, 6, 7, 8]....
[...]
12 citations
References
40,609 citations
8,646 citations
7,383 citations
"Characterizing perceptual artifacts..." refers background in this paper
...264/AVC(7) and the high efficiency video coding (HEVC).(8) This motivates us to reexamine the perceptual artifacts created by video compression, with emphasis on the perceptual temporal artifacts that have not been well identified or accounted for in previous studies....
[...]
4,333 citations
1,268 citations
Related Papers (5)
Frequently Asked Questions (2)
Q2. What are the future works in "Characterizing perceptual artifacts in compressed video streams" ?
The current work also lays out a work plan for future studies. Firstly, objective VQA methods need to be reexamined and further developed to detect each compression artifacts reliably and efficiently. Secondly, video encoders may be designed to eliminate or minimize the impact of these perceptual artifacts.