scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2021"


Journal ArticleDOI
TL;DR: Versatile Video Coding (VVC) was developed by the Joint Video Experts Team (JVET) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an evergrowing need for improved video compression as well as to support a wider variety of today's media content and emerging applications as mentioned in this paper.
Abstract: Versatile Video Coding (VVC) was finalized in July 2020 as the most recent international video coding standard. It was developed by the Joint Video Experts Team (JVET) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) to serve an ever-growing need for improved video compression as well as to support a wider variety of today’s media content and emerging applications. This paper provides an overview of the novel technical features for new applications and the core compression technologies for achieving significant bit rate reductions in the neighborhood of 50% over its predecessor for equal video quality, the High Efficiency Video Coding (HEVC) standard, and 75% over the currently most-used format, the Advanced Video Coding (AVC) standard. It is explained how these new features in VVC provide greater versatility for applications. Highlighted applications include video with resolutions beyond standard- and high-definition, video with high dynamic range and wide color gamut, adaptive streaming with resolution changes, computer-generated and screen-captured video, ultralow-delay streaming, 360° immersive video, and multilayer coding e.g., for scalability. Furthermore, early implementations are presented to show that the new VVC standard is implementable and ready for real-world deployment.

250 citations


Journal ArticleDOI
01 Jan 2021
TL;DR: In this paper, the Rapid and Accurate Video Quality Evaluator (RAPIQUE) model is proposed for video quality prediction, which combines and leverages the advantages of both quality-aware scene statistics features and semantics-aware deep convolutional features.
Abstract: Blind or no-reference video quality assessment of user-generated content (UGC) has become a trending, challenging, heretofore unsolved problem. Accurate and efficient video quality predictors suitable for this content are thus in great demand to achieve more intelligent analysis and processing of UGC videos. Previous studies have shown that natural scene statistics and deep learning features are both sufficient to capture spatial distortions, which contribute to a significant aspect of UGC video quality issues. However, these models are either incapable or inefficient for predicting the quality of complex and diverse UGC videos in practical applications. Here we introduce an effective and efficient video quality model for UGC content, which we dub the Rapid and Accurate Video Quality Evaluator (RAPIQUE), which we show performs comparably to state-of-the-art (SOTA) models but with orders-of-magnitude faster runtime. RAPIQUE combines and leverages the advantages of both quality-aware scene statistics features and semantics-aware deep convolutional features, allowing us to design the first general and efficient spatial and temporal (space-time) bandpass statistics model for video quality modeling. Our experimental results on recent large-scale UGC video quality databases show that RAPIQUE delivers top performances on all the datasets at a considerably lower computational expense. We hope this work promotes and inspires further efforts towards practical modeling of video quality problems for potential real-time and low-latency applications.

100 citations


Journal ArticleDOI
26 Feb 2021
TL;DR: A technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility is provided.
Abstract: The AV1 video compression format is developed by the Alliance for Open Media consortium. It achieves more than a 30% reduction in bit rate compared to its predecessor VP9 for the same decoded video quality. This article provides a technical overview of the AV1 codec design that enables the compression performance gains with considerations for hardware feasibility.

95 citations


Journal ArticleDOI
TL;DR: In this article, the VIDeo quality EVALuator (VIDEVAL) is proposed to improve the performance of VQA models for UGC/consumer videos.
Abstract: Recent years have witnessed an explosion of user-generated content (UGC) videos shared and streamed over the Internet, thanks to the evolution of affordable and reliable consumer capture devices, and the tremendous popularity of social media platforms. Accordingly, there is a great need for accurate video quality assessment (VQA) models for UGC/consumer videos to monitor, control, and optimize this vast content. Blind quality prediction of in-the-wild videos is quite challenging, since the quality degradations of UGC videos are unpredictable, complicated, and often commingled. Here we contribute to advancing the UGC-VQA problem by conducting a comprehensive evaluation of leading no-reference/blind VQA (BVQA) features and models on a fixed evaluation architecture, yielding new empirical insights on both subjective video quality studies and objective VQA model design. By employing a feature selection strategy on top of efficient BVQA models, we are able to extract 60 out of 763 statistical features used in existing methods to create a new fusion-based model, which we dub the VIDeo quality EVALuator (VIDEVAL), that effectively balances the trade-off between VQA performance and efficiency. Our experimental results show that VIDEVAL achieves state-of-the-art performance at considerably lower computational cost than other leading models. Our study protocol also defines a reliable benchmark for the UGC-VQA problem, which we believe will facilitate further research on deep learning-based VQA modeling, as well as perceptually-optimized efficient UGC video processing, transcoding, and streaming. To promote reproducible research and public evaluation, an implementation of VIDEVAL has been made available online: https://github.com/vztu/VIDEVAL .

74 citations


Journal ArticleDOI
TL;DR: FovVideoVDP as mentioned in this paper is a video difference metric that models the spatial, temporal, and peripheral aspects of perception, which is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking.
Abstract: FovVideoVDP is a video difference metric that models the spatial, temporal, and peripheral aspects of perception. While many other metrics are available, our work provides the first practical treatment of these three central aspects of vision simultaneously. The complex interplay between spatial and temporal sensitivity across retinal locations is especially important for displays that cover a large field-of-view, such as Virtual and Augmented Reality displays, and associated methods, such as foveated rendering. Our metric is derived from psychophysical studies of the early visual system, which model spatio-temporal contrast sensitivity, cortical magnification and contrast masking. It accounts for physical specification of the display (luminance, size, resolution) and viewing distance. To validate the metric, we collected a novel foveated rendering dataset which captures quality degradation due to sampling and reconstruction. To demonstrate our algorithm's generality, we test it on 3 independent foveated video datasets, and on a large image quality dataset, achieving the best performance across all datasets when compared to the state-of-the-art.

61 citations


Journal ArticleDOI
TL;DR: A novel live video transcoding and streaming scheme that maximizes the video bitrate and decreases time-delays and bitrate variations in vehicular fog-computing (VFC)-enabled IoV is proposed, by jointly optimizing vehicle scheduling, bitrate selection, and computational/spectrum resource allocation.
Abstract: With the rapid development of automotive industry and telecommunication technologies, live streaming services in the Internet of Vehicles (IoV) play an even more crucial role in vehicular infotainment systems However, it is a big challenge to provide a high quality, low latency, and low bitrate variance live streaming service for vehicles due to the dynamic properties of wireless resources and channels of IoV To solve this challenge, we propose a novel live video transcoding and streaming scheme that maximizes the video bitrate and decreases time-delays and bitrate variations in vehicular fog-computing (VFC)-enabled IoV, by jointly optimizing vehicle scheduling, bitrate selection, and computational/spectrum resource allocation This joint optimization problem is modeled as a Markov decision process (MDP), considering time-varying characteristics of the available resources and wireless channels of IoV A soft actor–critic deep reinforcement learning (DRL) algorithm that is based on the maximum entropy framework, is subsequently utilized to solve the above MDP Extensive simulation results based on the data set of the real world show that compared to other baseline algorithms, the proposed scheme can effectively improve video quality while decreasing latency and bitrate variations, and access excellent performance in terms of learning speed and stability

53 citations


Proceedings ArticleDOI
19 Jun 2021
TL;DR: The first NTIRE challenge on quality enhancement of compressed video as discussed by the authors has three tracks: Track 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is de-signed for enhancing the video compressed by x265 at a variable bit-rate.
Abstract: This paper reviews the first NTIRE challenge on quality enhancement of compressed video, with focus on proposed solutions and results. In this challenge, the new Large-scale Diverse Video (LDV) dataset is employed. The challenge has three tracks. Tracks 1 and 2 aim at enhancing the videos compressed by HEVC at a fixed QP, while Track 3 is de-signed for enhancing the videos compressed by x265 at a fixed bit-rate. Besides, the quality enhancement of Tracks 1 and 3 targets at improving the fidelity (PSNR), and Track 2 targets at enhancing the perceptual quality. The three tracks totally attract 482 registrations. In the test phase, 12 teams, 8 teams and 11 teams submitted the final results of Tracks 1, 2 and 3, respectively. The proposed methods and solutions gauge the state-of-the-art of video quality enhancement. The homepage of the challenge: https://github.com/RenYang-home/NTIRE21_VEnh

43 citations


Proceedings ArticleDOI
20 Jun 2021
TL;DR: In this article, a local-to-global region-based no-reference perceptual video quality assessment (VQA) architecture is proposed to predict global video quality and achieves state-of-the-art performance on 3 UGC datasets.
Abstract: No-reference (NR) perceptual video quality assessment (VQA) is a complex, unsolved, and important problem for social and streaming media applications. Efficient and accurate video quality predictors are needed to monitor and guide the processing of billions of shared, often imperfect, user-generated content (UGC). Unfortunately, current NR models are limited in their prediction capabilities on real-world, "in-the-wild" UGC video data. To advance progress on this problem, we created the largest (by far) subjective video quality dataset, containing 38,811 real-world distorted videos and 116,433 space-time localized video patches (‘v-patches’), and 5.5M human perceptual quality annotations. Using this, we created two unique NR-VQA models: (a) a local-to-global region-based NR VQA architecture (called PVQ) that learns to predict global video quality and achieves state-of-the-art performance on 3 UGC datasets, and (b) a first-of-a-kind space-time video quality mapping engine (called PVQ Mapper) that helps localize and visualize perceptual distortions in space and time. The entire dataset and prediction models are freely available at https://live.ece.utexas.edu/research.php.

39 citations


Journal ArticleDOI
TL;DR: This work proposes a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction, and proposes a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality.
Abstract: In this work, we propose a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction. In particular, we evaluate the quality of a video by learning effective feature representations in spatial-temporal domain. In the spatial domain, to tackle the resolution and content variations, we impose the Gaussian distribution constraints on the quality features. The unified distribution can significantly reduce the domain gap between different video samples, resulting in more generalized quality feature representation. Along the temporal dimension, inspired by the mechanism of visual perception, we propose a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality. Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings, and achieves comparable performance on intra-dataset configurations, demonstrating the high-generalization capability of the proposed method. The codes are released at.

36 citations


Journal ArticleDOI
TL;DR: This work implemented histogram bit shifting based reversible data hiding by embedding the encrypted watermark in featured video frames by using the Firefly Algorithm, which is capable of hiding high-capacity data in the video signal.
Abstract: In recent years, we face an increasing interest in protecting multimedia data and copyrights due to the high exchange of information. Attackers are trying to get confidential information from various sources, which brings the importance of securing the data. Many researchers implemented techniques to hide secret information to maintain the integrity and privacy of data. In order to protect confidential data, histogram-based reversible data hiding with other cryptographic algorithms are widely used. Therefore, in the proposed work, a robust method for securing digital video is suggested. We implemented histogram bit shifting based reversible data hiding by embedding the encrypted watermark in featured video frames. Histogram bit shifting is used for hiding highly secured watermarks so that security for the watermark symbol is also being achieved. The novelty of the work is that only based on the quality threshold a few unique frames are selected, which holds the encrypted watermark symbol. The optimal value for this threshold is obtained using the Firefly Algorithm. The proposed method is capable of hiding high-capacity data in the video signal. The experimental result shows the higher capacity and video quality compared to other reversible data hiding techniques. The recovered watermark provides better identity identification against various attacks. A high value of PSNR and a low value of BER and MSE is reported from the results.

35 citations


Journal ArticleDOI
TL;DR: A new subjective resource, called the LIVE-YouTube-H FR (LIVE-YT-HFR) dataset, which is comprised of 480 videos having 6 different frame rates, obtained from 16 diverse contents, and is made available online for public use and evaluation purposes.
Abstract: High frame rate (HFR) videos are becoming increasingly common with the tremendous popularity of live, high-action streaming content such as sports. Although HFR contents are generally of very high quality, high bandwidth requirements make them challenging to deliver efficiently, while simultaneously maintaining their quality. To optimize trade-offs between bandwidth requirements and video quality, in terms of frame rate adaptation, it is imperative to understand the intricate relationship between frame rate and perceptual video quality. Towards advancing progression in this direction we designed a new subjective resource, called the LIVE-YouTube-HFR (LIVE-YT-HFR) dataset, which is comprised of 480 videos having 6 different frame rates, obtained from 16 diverse contents. In order to understand the combined effects of compression and frame rate adjustment, we also processed videos at 5 compression levels at each frame rate. To obtain subjective labels on the videos, we conducted a human study yielding 19,000 human quality ratings obtained from a pool of 85 human subjects. We also conducted a holistic evaluation of existing state-of-the-art Full and No-Reference video quality algorithms, and statistically benchmarked their performance on the new database. The LIVE-YT-HFR database has been made available online for public use and evaluation purposes, with hopes that it will help advance research in this exciting video technology direction. It may be obtained at https://live.ece.utexas.edu/research/LIVE_YT_HFR/LIVE_YT_HFR/index.html .

Journal ArticleDOI
TL;DR: A MutiScale Relative Standard Deviation Similarity (MS-RSDS) model for SCV quality evaluation is developed which has relatively low computation complexity, outperforms other IQA/VQA models and can capture the spatiotemporal distortions accurately.
Abstract: With the widespread of application scenarios such as remote office and cloud collaboration, Screen Content Video (SCV) and its processing which show different characteristics from Natural Scene Video (NSV) and its processing, are increasingly attracting researcher’s attention. Among these processing techniques, quality evaluation plays an important role in various media processing systems. Despite extensive research on general Image Quality Assessment (IQA) and Video Quality Assessment (VQA), quality assessment of SCVs remains undeveloped. In particular, SCVs always suffer from compression degradations in all kinds of application scenarios. In this article, we first study subjective SCV quality assessment. Specifically, we first construct a Compressed Screen Content Video Quality (CSCVQ) database with 165 distorted SCVs compressed from 11 most common screen application scenarios using the H.264, HEVC and HEVC-SCC formats. Twenty subjects were recruited to participate in the subjective test on the CSCVQ database. Then we study objective SCV quality assessment and propose a SCV quality measure. We observe that localized protruding information such as curves and dots can be well captured by the local relative standard deviation which then can be used to measure the intra-frame quality. Base on this observation, we develop a MutiScale Relative Standard Deviation Similarity (MS-RSDS) model for SCV quality evaluation. In our model, the relative standard deviation similarity between the reference and distorted SCVs is measured from frame differences between two adjacent frames, which can capture the spatiotemporal distortions accurately. A multiscale strategy is also applied to strengthen the original single-scale model. Extensive experiments are performed to compare the proposed model with the most popular and state-of-the-art quality assessment models on the CSCVQ database. Experimental results show that our proposed MS-RSDS model which has relatively low computation complexity, outperforms other IQA/VQA models.

Journal ArticleDOI
TL;DR: A novel edge-based ABR algorithm is designed that makes bitrate and video chunk source decisions by considering network conditions, QoE objectives, and edge resource availability jointly and achieves quality improvements with a fraction of the computation.
Abstract: Edge computing provides the potential to improve users' Quality of Experience (QoE) in ever-increasing video delivery. However, existing edge-based solutions cannot fully utilize the edge computing power and storage capacity. This paper proposes VIdeo Super-resolution and CAching (VISCA), an edge-assisted adaptive video streaming solution, which integrates super-resolution and edge caching to improve users' QoE. We design a novel edge-based ABR algorithm that makes bitrate and video chunk source decisions by considering network conditions, QoE objectives, and edge resource availability jointly. VISCA utilizes super-resolution to enhance the cached low-quality video at the edge. The super-resolution models used are trained for the most popular videos only in order to achieve quality improvements with a fraction of the computation. A novel cache strategy is also adopted to maximize caching efficiency. To assess the performance of VISCA, an implemented prototype of VISCA was deployed in synthetic and real network contexts. Compared with the existing video streaming solutions, VISCA improves video quality by 28.2%-251.2% and reduces rebuffering time by 16.1%-95.6% in all considered scenarios.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Wang et al. as discussed by the authors proposed a DNN-based framework to thoroughly analyze the importance of content, technical quality, and compression level in perceptual quality for video quality assessment.
Abstract: Video quality assessment for User Generated Content (UGC) is an important topic in both industry and academia. Most existing methods only focus on one aspect of the perceptual quality assessment, such as technical quality or compression artifacts. In this paper, we create a large scale dataset to comprehensively investigate characteristics of generic UGC video quality. Besides the subjective ratings and content labels of the dataset, we also propose a DNN-based framework to thoroughly analyze importance of content, technical quality, and compression level in perceptual quality. Our model is able to provide quality scores as well as human-friendly quality indicators, to bridge the gap between low level video signals to human perceptual quality. Experimental results show that our model achieves state-of-the-art correlation with Mean Opinion Scores (MOS).

Journal ArticleDOI
TL;DR: A reinforcement learning (RL)-based UAV anti-jamming video transmission scheme to choose the video compression quantization parameter, the channel coding rate, the modulation and power control strategies against jamming attacks is proposed.
Abstract: Unmanned aerial vehicles (UAVs) that are widely utilized for video capturing, processing and transmission have to address jamming attacks with dynamic topology and limited energy. In this paper, we propose a reinforcement learning (RL)-based UAV anti-jamming video transmission scheme to choose the video compression quantization parameter, the channel coding rate, the modulation and power control strategies against jamming attacks. More specifically, this scheme applies RL to choose the UAV video compression and transmission policy based on the observed video task priority, the UAV-controller channel state and the received jamming power. This scheme enables the UAV to guarantee the video quality-of-experience (QoE) and reduce the energy consumption without relying on the jamming model or the video service model. A safe RL-based approach is further proposed, which uses deep learning to accelerate the UAV learning process and reduce the video transmission outage probability. The computational complexity is provided and the optimal utility of the UAV is derived and verified via simulations. Simulation results show that the proposed schemes significantly improve the video quality and reduce the transmission latency and energy consumption of the UAV compared with existing schemes.

Journal ArticleDOI
TL;DR: In this article, the authors propose a computational framework for objective quality assessment of 360 images, embodying viewing conditions and behaviors in a unified way, and construct a set of specific quality measures within the proposed framework, and demonstrate their promises on three VR quality databases.
Abstract: Omnidirectional images (also referred to as static 360 panoramas) impose viewing conditions much different from those of regular 2D images. How do humans perceive image distortions in immersive virtual reality (VR) environments is an important problem which receives little attention. We argue that, apart from the distorted panorama itself, two types of VR conditions are crucial in determining viewing behaviors of users and the perceived quality of the panorama: the starting point and the exploration time. We first carry out a psychophysical experiment to investigate the interplay among the VR viewing conditions, the user viewing behaviors, and the perceived quality of 360 images. Then, we provide a thorough analysis of the collected human data, leading to several insightful findings. Moreover, we propose a computational framework for objective quality assessment of 360 images, embodying viewing conditions and behaviors in a unified way. Specifically, we first transform an omnidirectional image to several video representations using different user viewing behaviors under different viewing conditions. We then leverage advanced 2D full-reference video quality models to compute the perceived quality. We construct a set of specific quality measures within the proposed framework, and demonstrate their promises on three VR quality databases.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new model called Space-Time Chips (ST Chips), which uses highly-localized space-time slices called ST Chips to implicitly capture motion.
Abstract: We propose a new model for no-reference video quality assessment (VQA). Our approach uses a new idea of highly-localized space-time (ST) slices called Space-Time Chips (ST Chips). ST Chips are localized cuts of video data along directions that implicitly capture motion. We use perceptually-motivated bandpass and normalization models to first process the video data, and then select oriented ST Chips based on how closely they fit parametric models of natural video statistics. We show that the parameters that describe these statistics can be used to reliably predict the quality of videos, without the need for a reference video. The proposed method implicitly models ST video naturalness, and deviations from naturalness. We train and test our model on several large VQA databases, and show that our model achieves state-of-the-art performance at reduced cost, without requiring motion computation.

Journal ArticleDOI
Nuowen Kan1, Junni Zou1, Chenglin Li1, Wenrui Dai1, Hongkai Xiong1 
TL;DR: The rate adaptation problem for tile-based 360-degree video streaming is formulated as a non-linear discrete optimization problem that targets at maximizing the long-term user QoE under a bandwidth-constrained network and is modeled as a Markov decision process (MDP) and employ the deep reinforcement learning-based algorithm to dynamically learn the optimal bitrate allocation of tiles.
Abstract: Tile-based rate adaption can improve the quality of experience (QoE) for adaptive 360-degree video streaming under constrained network conditions, which, however, is a challenging problem due to the requirements of accurate prediction for users’ viewports and optimal bitrate allocation for tiles. In this paper, we propose a strategy that deploys reinforcement learning-based Rate Adaptation with adaptive Prediction and Tiling for 360-degree video streaming, named RAPT360, to address these challenges. Specifically, to improve the accuracy of the state-of-the-art viewport prediction approaches, we fit the time-varying Laplace distribution-based probability density function of the prediction error for different prediction lengths. On the basis of that, we develop a viewport identification method to determine the viewport area of a user depending on the buffer occupancy, where the obtained viewport can cover the real viewport with any given probability confidence level. We then propose a viewport-aware adaptive tiling scheme to improve the bandwidth efficiency, where three types of tile granularities are allocated according to the shape and position of the 2-D projection of that viewport. By establishing an adaptive streaming model and QoE metric specific to 360-degree videos, we finally formulate the rate adaptation problem for tile-based 360-degree video streaming as a non-linear discrete optimization problem that targets at maximizing the long-term user QoE under a bandwidth-constrained network. To efficiently solve this problem, we model the rate adaptation logic as a Markov decision process (MDP) and employ the deep reinforcement learning (DRL)-based algorithm to dynamically learn the optimal bitrate allocation of tiles. Extensive experimental results show that RAPT360 achieves a performance gain of at least 1.47 dB on average chunk QoE, including a video quality improvement of at least 1.33 dB, in comparison to the existing strategies for tile-based adaptive 360-degree video streaming.

Journal ArticleDOI
TL;DR: In this article, a prediction tool based on DenseNet (a convolutional neural network) is proposed to decrease the VVC coding complexity, which can reduce the coding complexity of 46.10% in VTM10.0 intra coding, while Bjontegaard delta bit rate (BDBR) only increases by 1.86%.
Abstract: The joint video expert team (JVET) is currently developing a new video coding standard called H.266/Versatile Video Coding (VVC). Compared with High Efficiency Video Coding (HEVC), VVC has added a variety of coding tools. These tools have greatly improved video compression efficiency and maintained a high level video quality. However, due to the increase in computational complexity, the encoding time is much longer than HEVC. We propose a prediction tool based on DenseNet (a convolutional neural network) to decrease the VVC coding complexity. We predict the probability that the edge of $4 \times 4$ blocks in each $64 \times 64$ block is the division boundary by Convolutional Neural Networks (CNN). Then, we skip the unnecessary rate distortion optimization (RDO) and speed up the coding by probability vectors in advance. The proposed method can reduce the coding complexity of 46.10% in VTM10.0 intra coding, while Bjontegaard delta bit rate (BDBR) only increases by 1.86%. In the sequence with a resolution greater than 1080P, the acceleration efficiency can be at 64.81%, the BDBR loss only increased by 1.92%.

Journal ArticleDOI
TL;DR: The results show the superiority of the proposed approach in terms of delivered video quality, cache-hit-ratio and backhaul link usage.
Abstract: 360° video is becoming an increasingly popular technology on commercial social platforms and vital part of emerging Virtual Reality/Augmented Reality (VR/AR) applications. However, the delivery of 360° video content in mobile networks is challenging because of its size. The encoding of 360° video into multiple quality layers and tiles and edge cache-assisted video delivery have been proposed as a remedy to the excess bandwidth requirements of 360° video delivery systems. Existing works using the above tools have shown promising performance for Video-on-Demand (VoD) 360° delivery, but they cannot be straightforwardly extended in a live-streaming setup. Motivated by the above, we study edge cache-assisted 360° live video streaming to increase the overall quality of the delivered 360° videos to users and reduce the service cost. We employ Long Short-Term Memory (LSTM) networks to forecast the evolution of the content requests and prefetch content to caches. To further enhance the delivered video quality, users located in the overlap of the coverage areas of multiple Small Base Stations (SBSs) are allowed to receive data from any of these SBSs. We evaluate and compare the performance of our algorithm with Least Frequently Used (LFU), Least Recently Used (LRU), and First In First Out (FIFO) algorithms. The results show the superiority of the proposed approach in terms of delivered video quality, cache-hit-ratio and backhaul link usage.

Journal ArticleDOI
TL;DR: The Structural Similarity (SSIM) Index is a very widely used image/video quality model that continues to play an important role in the perceptual evaluation of compression algorithms, encoding recipes and numerous other image processing algorithms as mentioned in this paper.
Abstract: The Structural Similarity (SSIM) Index is a very widely used image/video quality model that continues to play an important role in the perceptual evaluation of compression algorithms, encoding recipes and numerous other image/video processing algorithms. Several public implementations of the SSIM and Multiscale-SSIM (MS-SSIM) algorithms have been developed, which differ in efficiency and performance. This “bendable ruler” makes the process of quality assessment of encoding algorithms unreliable. To address this situation, we studied and compared the functions and performances of popular and widely used implementations of SSIM, and we also considered a variety of design choices. Based on our studies and experiments, we have arrived at a collection of recommendations on how to use SSIM most effectively, including ways to reduce its computational burden.

Journal ArticleDOI
TL;DR: Quality assessment of 360 video from the cross-lab tests carried out by the Immersive Media Group (IMG) of the Video Quality Experts Group (VQEG) is studied, demonstrating the validity of Absolute Category Rating (ACR) and Degradation category Rating (DCR) for subjective tests with 360 videos.
Abstract: Recently an impressive development in immersive technologies, such as Augmented Reality (AR), Virtual Reality (VR) and 360 video, has been witnessed. However, methods for quality assessment have not been keeping up. This paper studies quality assessment of 360 video from the cross-lab tests (involving ten laboratories and more than 300 participants) carried out by the Immersive Media Group (IMG) of the Video Quality Experts Group (VQEG). These tests were addressed to assess and validate subjective evaluation methodologies for 360 video. Audiovisual quality, simulator sickness symptoms, and exploration behavior were evaluated with short (from 10 seconds to 30 seconds) 360 sequences. The following factors' influences were also analyzed: assessment methodology, sequence duration, Head-Mounted Display (HMD) device, uniform and non-uniform coding degradations, and simulator sickness assessment methods. The obtained results have demonstrated the validity of Absolute Category Rating (ACR) and Degradation Category Rating (DCR) for subjective tests with 360 videos, the possibility of using 10-second videos (with or without audio) when addressing quality evaluation of coding artifacts, as well as any commercial HMD (satisfying minimum requirements). Also, more efficient methods than the long Simulator Sickness Questionnaire (SSQ) have been proposed to evaluate related symptoms with 360 videos. These results have been instrumental for the development of the ITU-T Recommendation P.919. Finally, the annotated dataset from the tests is made publicly available for the research community.

Journal ArticleDOI
TL;DR: Different objective image and video quality assessment algorithms are evaluated, including both FIQA / FVQA algorithms and non-foveated algorithms, on the so called LIVE-Facebook Technologies Foveated-Compressed Virtual Reality (LIVE-FBT-FCVR) databases, and a statistical evaluation of the relative performances of these algorithms is presented.
Abstract: In Virtual Reality (VR), the requirements of much higher resolution and smooth viewing experiences under rapid and often real-time changes in viewing direction, leads to significant challenges in compression and communication. To reduce the stresses of very high bandwidth consumption, the concept of foveated video compression is being accorded renewed interest. By exploiting the space-variant property of retinal visual acuity, foveation has the potential to substantially reduce video resolution in the visual periphery, with hardly noticeable perceptual quality degradations. Accordingly, foveated image / video quality predictors are also becoming increasingly important, as a practical way to monitor and control future foveated compression algorithms. Towards advancing the development of foveated image / video quality assessment (FIQA / FVQA) algorithms, we have constructed 2D and (stereoscopic) 3D VR databases of foveated / compressed videos, and conducted a human study of perceptual quality on each database. Each database includes 10 reference videos and 180 foveated videos, which were processed by 3 levels of foveation on the reference videos. Foveation was applied by increasing compression with increased eccentricity. In the 2D study, each video was of resolution $7680\times 3840$ and was viewed and quality-rated by 36 subjects, while in the 3D study, each video was of resolution $5376\times 5376$ and rated by 34 subjects. Both studies were conducted on top of a foveated video player having low motion-to-photon latency (~50ms). We evaluated different objective image and video quality assessment algorithms, including both FIQA / FVQA algorithms and non-foveated algorithms, on our so called LIVE-Facebook Technologies Foveation-Compressed Virtual Reality (LIVE-FBT-FCVR) databases. We also present a statistical evaluation of the relative performances of these algorithms. The LIVE-FBT-FCVR databases have been made publicly available and can be accessed at https://live.ece.utexas.edu/research/LIVEFBTFCVR/index.html .

Proceedings ArticleDOI
17 Oct 2021
TL;DR: Wang et al. as discussed by the authors proposed a perceptual hierarchical network (PHIQNet) with an integrated attention module that can appropriately simulate the visual mechanisms of contrast sensitivity and selective attention in IQA.
Abstract: No-reference video quality assessment has not been widely benefited from deep learning, mainly due to the complexity, diversity and particularity of modelling spatial and temporal characteristics in quality assessment scenario. Image quality assessment (IQA) performed on video frames plays a key role in NR-VQA. A perceptual hierarchical network (PHIQNet) with an integrated attention module is first proposed that can appropriately simulate the visual mechanisms of contrast sensitivity and selective attention in IQA. Subsequently, perceptual quality features of video frames derived from PHIQNet are fed into a long short-term convolutional Transformer (LSCT) architecture to predict the perceived video quality. LSCT consists of CNN formulating quality features in video frames within short-term units that are then fed into Transformer to capture the long-range dependence and attention allocation over temporal units. Such architecture is in line with the intrinsic properties of VQA. Experimental results on publicly available video quality databases have demonstrated that the LSCT architecture based on PHIQNet significantly outperforms state-of-the-art video quality models.

Journal ArticleDOI
TL;DR: The proposed end-to-end neural network model combines spherical convolutional neural networks (CNN) and non-local neural networks, which can effectively extract complex spatiotemporal information of the panoramic video.
Abstract: Panoramic video and stereoscopic panoramic video are essential carriers of virtual reality content, so it is very crucial to establish their quality assessment models for the standardization of virtual reality industry. However, it is very challenging to evaluate the quality of the panoramic video at present. One reason is that the spatial information of the panoramic video is warped due to the projection process, and the conventional video quality assessment (VQA) method is difficult to deal with this problem. Another reason is that the traditional VQA method is problematic to capture the complex global time information in the panoramic video. In response to the above questions, this paper presents an end-to-end neural network model to evaluate the quality of panoramic video and stereoscopic panoramic video. Compared to other panoramic video quality assessment methods, our proposed method combines spherical convolutional neural networks (CNN) and non-local neural networks, which can effectively extract complex spatiotemporal information of the panoramic video. We evaluate the method in two databases, VRQ-TJU and VR-VQA48. Experiments show the effectiveness of different modules in our method, and our method outperforms state-of-the-art other related methods.

Journal ArticleDOI
TL;DR: A novel reinforcement learning (RL) based viewport-adaptive streaming framework called RLVA is proposed, which optimizes the 360-degree video streaming in viewport prediction, prefetch scheduling and rate adaptation and effectively reduces the impact of viewports prediction errors.
Abstract: The 360-degree video streaming has higher bandwidth requirements compared with traditional video to achieve the same user-perceived playback quality Since users only view part of the entire videos, viewport-adaptive streaming is an effective approach to guarantee video quality However, the performance of viewport-adaptive schemes is highly dependent on the bandwidth estimation and viewport prediction To overcome these issues, we propose a novel reinforcement learning (RL) based viewport-adaptive streaming framework called RLVA, which optimizes the 360-degree video streaming in viewport prediction, prefetch scheduling and rate adaptation Firstly, RLVA adopts ${t}$ location-scale distribution rather than Gaussian distribution to describe the viewport prediction error characteristic more accurately and achieve the tile viewing probability based on the distribution Besides, a tile prefetch scheduling algorithm is proposed to update the tiles according to the latest prediction results, which further reduces the adverse effect of prediction error Furthermore, the tile viewing probabilities are treated as input status of RL algorithm In this way, RL can adjust its policy to adapt to both of the network conditions and viewport prediction error Through extensive evaluations, the simulation results show that the proposed RLVA outperforms other viewport-adaptive methods by about 48%-668% improvement of Quality of Experience (QoE) and effectively reduces the impact of viewport prediction errors

Journal ArticleDOI
TL;DR: In this paper, the influence of the involved parameters is studied based on characteristics of the video, wireless channel capacity, and receivers' aspects, which collapse the quality of experience (QoE).
Abstract: The development of the smart devices had led to demanding high-quality streaming videos over wireless communications. In Multimedia technology, the Ultra-High Definition (UHD) video quality has an important role due to the smart devices that are capable of capturing and processing high-quality video content. Since delivery of the high-quality video stream over the wireless networks adds challenges to the end-users, the network behaviors ‘factors such as delay of arriving packets, delay variation between packets, and packet loss, are impacted on the Quality of Experience (QoE). Moreover, the characteristics of the video and the devices are other impacts, which influenced by the QoE. In this research work, the influence of the involved parameters is studied based on characteristics of the video, wireless channel capacity, and receivers’ aspects, which collapse the QoE. Then, the impact of the aforementioned parameters on both subjective and objective QoE is studied. A smart algorithm for video stream services is proposed to optimize assessing and managing the QoE of clients (end-users). The proposed algorithm includes two approaches: first, using the machine-learning model to predict QoE. Second, according to the QoE prediction, the algorithm manages the video quality of the end-users by offering better video quality. As a result, the proposed algorithm which based on the least absolute shrinkage and selection operator (LASSO) regression is outperformed previously proposed methods for predicting and managing QoE of streaming video over wireless networks.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: In this article, the authors proposed an efficient approach for a video anomaly detection system which is capable of running at the edge devices, e.g., on a roadside camera, and achieved an F1-score of 0.9157 along with 8.4027 root mean square error (RMSE) and ranked fourth in the competition.
Abstract: Due to its relevance in intelligent transportation systems, anomaly detection in traffic videos has recently received much interest. It remains a difficult problem due to a variety of factors influencing the video quality of a real-time traffic feed, such as temperature, perspective, lighting conditions, and so on. Even though state-of-the-art methods perform well on the available benchmark datasets, they need a large amount of external training data as well as substantial computational resources. In this paper, we propose an efficient approach for a video anomaly detection system which is capable of running at the edge devices, e.g., on a roadside camera. The proposed approach comprises a preprocessing module that detects changes in the scene and removes the corrupted frames, a two-stage background modelling module and a two-stage object detector. Finally, a backtracking anomaly detection algorithm computes a similarity statistic and decides on the onset time of the anomaly. We also propose a sequential change detection algorithm that can quickly adapt to a new scene and detect changes in the similarity statistic. Experimental results on the Track 4 test set of the 2021 AI City Challenge show the efficacy of the proposed framework as we achieve an F1-score of 0. 9157 along with 8.4027 root mean square error (RMSE) and are ranked fourth in the competition.

Journal ArticleDOI
TL;DR: In this paper, a subjective experiment was carried out to measure subjective video quality on both luma and chroma distortions, introduced both in isolation and together, and the subjective scores were evaluated by 34 subjects in a controlled environmental setting.
Abstract: Measuring the quality of digital videos viewed by human observers has become a common practice in numerous multimedia applications, such as adaptive video streaming, quality monitoring, and other digital TV applications. Here we explore a significant, yet relatively unexplored problem: measuring perceptual quality on videos arising from both luma and chroma distortions from compression. Toward investigating this problem, it is important to understand the kinds of chroma distortions that arise, how they relate to luma compression distortions, and how they can affect perceived quality. We designed and carried out a subjective experiment to measure subjective video quality on both luma and chroma distortions, introduced both in isolation as well as together. Specifically, the new subjective dataset comprises a total of 210 videos afflicted by distortions caused by varying levels of luma quantization commingled with different amounts of chroma quantization. The subjective scores were evaluated by 34 subjects in a controlled environmental setting. Using the newly collected subjective data, we were able to demonstrate important shortcomings of existing video quality models, especially in regards to chroma distortions. Further, we designed an objective video quality model which builds on existing video quality algorithms, by considering the fidelity of chroma channels in a principled way. We also found that this quality analysis implies that there is room for reducing bitrate consumption in modern video codecs by creatively increasing the compression factor on chroma channels. We believe that this work will both encourage further research in this direction, as well as advance progress on the ultimate goal of jointly optimizing luma and chroma compression in modern video encoders.

Journal ArticleDOI
Laizhong Cui1, Dongyuan Su1, Shu Yang1, Zhi Wang2, Zhong Ming1 
TL;DR: TCLiVi significantly improves the video quality and decreases the rebuffering time, consequently increasing the QoE score by 40.84% in average, and is self-adaptive in different scenarios.
Abstract: Currently, video content accounts for the majority of network traffic. With increased live streaming, rigorous requirements have been introduced for better Quality of Experience (QoE). It is challenging to meet satisfactory QoE in live streaming, where the aim is to achieve a balance between 1) enhancing the video quality and stability and 2) reducing the rebuffering time and end-to-end delay, under different scenarios with various network conditions and user preferences, where the fluctuation in the network throughput degrades the QoE severely. In this paper, we propose an approach to improve the QoE for live video streaming based on Deep Reinforcement Learning (DRL). The new approach jointly adjusts the streaming parameters, including the video bitrate and target buffer size. With the basic DRL framework, TCLiVi can automatically generate the inference model based on the playback information, to achieve the joint optimization of the video quality, stability, rebuffering time and latency parameters. We evaluate our framework on real-world data in different live streaming broadcast scenarios, such as a talent show and a sports competition under different network conditions. We compare TCLiVi with other algorithms, such as the Double DQN, MPC and Buffer-based algorithms. The simulation results show that TCLiVi significantly improves the video quality and decreases the rebuffering time, consequently increasing the QoE score by 40.84% in average. We also show that TCLiVi is self-adaptive in different scenarios.