scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2023"


Proceedings ArticleDOI
TL;DR: In this paper , the authors proposed an energy-efficient video coding strategy based on a new and fast region-of-interest (ROI) detection method, which segments the frame into four regions and extracts two different classes of the ROI for coding and transmission using variable quality levels based on their relevance.
Abstract: Video surveillance systems that involve embedded visual sensor nodes have significant constraints due to their limited energy sources. Reducing the power consumption of the in-node processing and the required bandwidth while maintaining a high QoS is still a challenging task. The difficulty increases when a smart task must be performed on the received video in the destination. In this context, this paper proposes an energy-efficient video coding strategy based on a new and fast region-of-interest (ROI) detection method. The lightweight ROI detection method segments the frame into four regions. And the coding strategy aims to extract two different classes of the ROI for coding and transmission using variable quality levels based on their relevance. Furthermore, the strategy aims to exclude the regions of lower importance and any non-ROI that has insignificant movement. We assess the strategy’s ability to perform object recognition tasks at the destination under quality degradation. The performance results using different datasets demonstrate a better trade-off between awareness of ROI quality, energy consumption and bandwidth savings for the proposed strategy compared to other methods. This results in 96% reduction of bandwidth and 93% reduction in energy for some sequences at the expense of a 1-4 dB decrease in PSNR when compared to MJPEG standard. While the recognition accuracy of the YOLOv3 model at the destination outperforms the other techniques by about 4% to 22%.

5 citations


Journal ArticleDOI
TL;DR: In this article , the benefits and challenges of multi-path video transmission over wireless networks are discussed, and a comprehensive review of the state-of-the-art is provided.

4 citations


Journal ArticleDOI
TL;DR: In this article , a hybrid 6T/8T memory architecture is proposed, where the 8-bit Luminance pixels are stored favourably in consonance with their effect on the output quality.

4 citations


Journal ArticleDOI
TL;DR: In this article , the authors proposed an adaptation framework model, which makes a relation between the QP (quantization parameter) in H.264 and H.265 codecs and the QoS of 5G wireless technology.
Abstract: Nowadays, smart multimedia network services have become crucial in the healthcare system. The network parameters of Quality of Service (QoS) are widely affecting the efficiency and accuracy of multimedia streaming in wireless environments. This paper proposes an adaptation framework model, which makes a relation between the QP (quantization parameter) in H.264 and H.265 codecs and the QoS of 5G wireless technology. Besides, the effect of QP and packet loss have been studied because of their impact on video streaming. Packet loss of 5G wireless network characteristic is emulated to determine the impact of QP on the received video quality using objective and subjective quality metrics such as PSNR (peak signal to noise ratio), SSIM (structure similarity), and DMOS (differential mean opinion score). In this research, a Testbed is implemented to stream the encoded video from the server to the end users. The application model framework has automatically evaluated the QoE (Quality of Experience). Accordingly, the model detects the defect of network packet loss and selects the optimum QP value to enhance the QoE by the end‐users. The application has been tested on low and high video motions with full high definition (HD) resolution (1920 × 1080) which were taken from ( https://www.xiph.org/downloads/). Test results based on the objective and subjective quality measurements indicate that an optimal QP = 35 and QP = 30 have been chosen for low and high motion respectively to satisfy user QoE requirements.

4 citations


Journal ArticleDOI
TL;DR: In this article , the authors evaluated the quality of information regarding premature ejaculation on TikTok by using validated quality assessment tools, including the Patient Education Materials Assessment Tool (PEMAT) and the 5-point modified DISCERN instrument.

3 citations


Proceedings ArticleDOI
20 Apr 2023
TL;DR: In this article , a reduced reference Transcoding Quality Prediction Model (TQPM) is proposed to determine the visual quality score of the video possibly transcoded in multiple stages, where the quality is predicted using Discrete Cosine Transform (DCT)-energy-based features of the videos (i.e., the video's brightness, spatial texture information, and temporal activity) and the target bitrate representation of each transcoding stage.
Abstract: In recent years, video streaming applications have proliferated the demand for Video Quality Assessment (VQA). Reduced reference video quality assessment (RR-VQA) is a category of VQA where certain features (e.g., texture, edges) of the original video are provided for quality assessment. It is a popular research area for various applications such as social media, online games, and video streaming. This paper introduces a reduced reference Transcoding Quality Prediction Model (TQPM) to determine the visual quality score of the video possibly transcoded in multiple stages. The quality is predicted using Discrete Cosine Transform (DCT)-energy-based features of the video (i.e., the video's brightness, spatial texture information, and temporal activity) and the target bitrate representation of each transcoding stage. To do that, the problem is formulated, and a Long Short-Term Memory (LSTM)-based quality prediction model is presented. Experimental results illustrate that, on average, TQPM yields PSNR, SSIM, and VMAF predictions with an R2 score of 0.83, 0.85, and 0.87, respectively, and Mean Absolute Error (MAE) of 1.31 dB, 1.19 dB, and 3.01, respectively, for single-stage transcoding. Furthermore, an R2 score of 0.84, 0.86, and 0.91, respectively, and MAE of 1.32 dB, 1.33 dB, and 3.25, respectively, are observed for a two-stage transcoding scenario. Moreover, the average processing time of TQPM for 4s segments is 0.328s, making it a practical VQA method in online streaming applications.

2 citations


Journal ArticleDOI
TL;DR: In this article , a new gaming-specific No-Reference Video Quality Assessment (NR VQA) model called the Gaming Video Quality Evaluator (GAMIVAL) was proposed, which combines and leverages the advantages of spatial and temporal gaming distorted scene statistics models, a neural noise model, and deep semantic features.
Abstract: The mobile cloud gaming industry has been rapidly growing over the last decade. When streaming gaming videos are transmitted to customers' client devices from cloud servers, algorithms that can monitor distorted video quality without having any reference video available are desirable tools. However, creating No-Reference Video Quality Assessment (NR VQA) models that can accurately predict the quality of streaming gaming videos rendered by computer graphics engines is a challenging problem, since gaming content generally differs statistically from naturalistic videos, often lacks detail, and contains many smooth regions. Until recently, the problem has been further complicated by the lack of adequate subjective quality databases of mobile gaming content. We have created a new gaming-specific NR VQA model called the Gaming Video Quality Evaluator (GAMIVAL), which combines and leverages the advantages of spatial and temporal gaming distorted scene statistics models, a neural noise model, and deep semantic features. Using a support vector regression (SVR) as a regressor, GAMIVAL achieves superior performance on the new LIVE-Meta Mobile Cloud Gaming (LIVE-Meta MCG) video quality database.

2 citations


Journal ArticleDOI
TL;DR: In this paper , a UGC video quality assessment (VQA) database was constructed to provide useful guidance for the UGC-video coding and processing in the hosting platform and an objective quality assessment algorithm was developed to automatically evaluate the quality of the transcoded videos based on the corrupted reference.
Abstract: Recently, we have observed an exponential increase of user-generated content (UGC) videos. The distinguished characteristic of UGC videos originates from the video production and delivery chain, as they are usually acquired and processed by non-professional users before uploading to the hosting platforms for sharing. As such, these videos usually undergo multiple distortion stages that may affect visual quality before ultimately being viewed. Inspired by the increasing consensus that the optimization of the video coding and processing shall be fully driven by the perceptual quality, in this paper, we propose to study the quality of the UGC videos from both objective and subjective perspectives. We first construct a UGC video quality assessment (VQA) database, aiming to provide useful guidance for the UGC video coding and processing in the hosting platform. The database contains source UGC videos uploaded to the platform and their transcoded versions that are ultimately enjoyed by end-users, along with their subjective scores. Furthermore, we develop an objective quality assessment algorithm that automatically evaluates the quality of the transcoded videos based on the corrupted reference, which is in accordance with the application scenarios of UGC video sharing in the hosting platforms. The information from the corrupted reference is well leveraged and the quality is predicted based on the inferred quality maps with deep neural networks (DNN). Experimental results show that the proposed method yields superior performance. Both subjective and objective evaluations of the UGC videos also shed lights on the design of perceptual UGC video coding.

2 citations


Journal ArticleDOI
TL;DR: The LIVE-Meta Mobile Cloud Gaming (LIVE-Meta-MCG) dataset as discussed by the authors is a large-scale subjective study of mobile cloud gaming video quality assessment (MCG-VQA) on a diverse set of gaming videos.
Abstract: We present the outcomes of a recent large-scale subjective study of Mobile Cloud Gaming Video Quality Assessment (MCG-VQA) on a diverse set of gaming videos. Rapid advancements in cloud services, faster video encoding technologies, and increased access to high-speed, low-latency wireless internet have all contributed to the exponential growth of the Mobile Cloud Gaming industry. Consequently, the development of methods to assess the quality of real-time video feeds to end-users of cloud gaming platforms has become increasingly important. However, due to the lack of a large-scale public Mobile Cloud Gaming Video dataset containing a diverse set of distorted videos with corresponding subjective scores, there has been limited work on the development of MCG-VQA models. Towards accelerating progress towards these goals, we created a new dataset, named the LIVE-Meta Mobile Cloud Gaming (LIVE-Meta-MCG) video quality database, composed of 600 landscape and portrait gaming videos, on which we collected 14,400 subjective quality ratings from an in-lab subjective study. Additionally, to demonstrate the usefulness of the new resource, we benchmarked multiple state-of-the-art VQA algorithms on the database. The new database will be made publicly available on our website: \url{https://live.ece.utexas.edu/research/LIVE-Meta-Mobile-Cloud-Gaming/index.html}

1 citations


Proceedings ArticleDOI
01 Jan 2023
TL;DR: In this article , a large-scale HDR video quality dataset for sports content is presented, which includes the above mentioned important issues in live streaming, and a method of merging multi-ple datasets using anchor videos.
Abstract: High Dynamic Range (HDR) video streaming has be-come more popular because of the faithful color and bright-ness presentation. However, the live streaming of HDR, especially of sports content, has unique challenges, as it was usually encoded and distributed in real-time without the post-production workflow. A set of unique problems that occurs only in live streaming, e.g. resolution and frame rate crossover, intra-frame pulsing video quality defects, complex relationship between rate-control mode and video quality, are more salient when the videos are streamed in HDR format. These issues are typically ignored by other subjective databases, disregard the fact that they have a sig-nificant impact on the perceived quality of the videos. In this paper, we present a large-scale HDR video quality dataset for sports content that includes the above mentioned important issues in live streaming, and a method of merging multi-ple datasets using anchor videos. We also benchmarked ex-isting video quality metrics on the new dataset, particularly over the novel scopes included in the database, to evaluate the effectiveness and efficiency of the existing models. We found that despite the strong overall performance over the entire database, most of the tested models perform poorly when predicting human preference for various encoding pa-rameters, such as frame rate and adaptive quantization.

1 citations


Journal ArticleDOI
TL;DR: In this article , a deep learning based methodology is proposed to enhance the availability of video streaming systems by developing a prediction model for video streaming quality, required power consumption, and required bandwidth based on video codec parameters.
Abstract: Availability is one of the primary goals of smart networks, especially, if the network is under heavy video streaming traffic. In this paper, we propose a deep learning based methodology to enhance availability of video streaming systems by developing a prediction model for video streaming quality, required power consumption, and required bandwidth based on video codec parameters. The H.264/AVC codec, which is one of the most popular codecs used in video steaming and conferencing communications, is chosen as a case study in this paper. We model the predicted consumed power, the predicted perceived video quality, and the predicted required bandwidth for the video codec based on video resolution and quantization parameters. We train, validate, and test the developed models through extensive experiments using several video contents. Results show that an accurate model can be built for the needed purpose and the video streaming quality, required power consumption, and required bandwidth can be predicted accurately which can be utilized to enhance network availability in a cooperative environment.

Journal ArticleDOI
TL;DR: In this article , a feature set called HDRMAX features is introduced, which when included into Video Quality Assessment (VQA) algorithms designed for Standard Dynamic Range (SDR) videos, sensitizes them to distortions of High Dynamic Range videos that are inadequately accounted for by these algorithms.
Abstract: We introduce a novel feature set, which we call HDRMAX features, that when included into Video Quality Assessment (VQA) algorithms designed for Standard Dynamic Range (SDR) videos, sensitizes them to distortions of High Dynamic Range (HDR) videos that are inadequately accounted for by these algorithms. While these features are not specific to HDR, and also augment the equality prediction performances of VQA models on SDR content, they are especially effective on HDR. HDRMAX features modify powerful priors drawn from Natural Video Statistics (NVS) models by enhancing their measurability where they visually impact the brightest and darkest local portions of videos, thereby capturing distortions that are often poorly accounted for by existing VQA models. As a demonstration of the efficacy of our approach, we show that, while current state-of-the-art VQA models perform poorly on 10-bit HDR databases, their performances are greatly improved by the inclusion of HDRMAX features when tested on HDR and 10-bit distorted videos.

Journal ArticleDOI
TL;DR: In this paper , the authors present the results of subjective and objective quality assessments of H.264-, H.265-, and VP9-encoded video, and show the impact of environmental conditions on the video quality perceived by the user.
Abstract: The paper presents the results of subjective and objective quality assessments of H.264-, H.265-, and VP9-encoded video. Most of the literature is devoted to subjective quality assessment in well-defined laboratory circumstances. However, the end users usually watch the films in their home environments, which may be different from the conditions recommended for laboratory measurements. This may cause significant differences in the quality assessment scores. Thus, the aim of the research is to show the impact of environmental conditions on the video quality perceived by the user. The subjective assessment was made in two different environments: in the laboratory and in users’ homes, where people often watch movies on their laptops. The video signal was assessed by young viewers who were not experts in the field of quality assessment. The tests were performed taking into account different image resolutions and different bit rates. The research showed strong correlations between the obtained results and the coding bit rates used, and revealed a significant difference between the quality scores obtained in the laboratory and at home. As a conclusion, it must be underlined that the laboratory tests are necessary for comparative purposes, while the assessment of the video quality experienced by end users should be performed under circumstances that are as close as possible to the user’s home environment.


Journal ArticleDOI
TL;DR: In this article , an adaptive solution for underwater video transmissions is proposed that is specifically designed for Multi-Input Multi-Output (MIMO)-based Software-Defined Acoustic Modems (SDAMs).
Abstract: Achieving reliable acoustic wireless video transmissionsin the extreme and uncertain underwater environment is a challenge due to the limited bandwidth and the error-prone nature of the channel. Aiming at optimizing the received video quality and the user’s experience, an adaptive solution for underwater video transmissions is proposed that is specifically designed for Multi-Input Multi-Output (MIMO)-based Software-Defined Acoustic Modems (SDAMs). To keep the video distortion under an acceptable threshold and to keep the Physical-Layer Throughput (PLT) high, cross-layer techniques utilizing diversity-spatial multiplexing and Unequal Error Protection (UEP) are presented along with the scalable video compression at the application layer. Specifically, the scalability of the utilized SDAM with high processing capabilities is exploited in the proposed structure along with the temporal, spatial, and quality scalability of the Scalable Video Coding (SVC) H.264/MPEG-4 AVC compression standard. The transmitter broadcasts one video stream and realizes multicasting at different users. Experimental results at the Sonny Werblin Recreation Center, Rutgers University-NJ, are presented. Several scenarios for unknown channels at the transmitter are experimentally considered when the hydrophones are placed in different locations in the pool to achieve the required SVC-based video Quality of Service (QoS) and Quality of Experience (QoE) given the channel state information and the robustness of different SVC scalability. The video quality level is determined by the best communication link while the transmission scheme is decided based on the worst communication link, which guarantees that each user is able to receive the video with appropriate quality.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a highly efficient VSR network making use of data from decompressed video such as frame type, Group of Pictures (GOP), macroblock type and motion vector.
Abstract: Video compression technology for Ultra-High Definition (UHD) and 8K UHD video has been established and is being widely adopted by major broadcasting companies and video content providers, allowing them to produce high-quality videos that meet the demands of today’s consumers. However, high-resolution video content broadcasting is not an easy problem to be resolved in the near future due to limited resources in network bandwidth and data storage. An alternative solution to overcome the challenges of broadcasting high-resolution video content is to downsample UHD or 8K video at the transmission side using existing infrastructure, and then utilizing Video Super-Resolution (VSR) technology at the receiving end to recover the original quality of the video content. Current deep learning-based methods for Video Super-Resolution (VSR) fail to consider the fact that the delivered video to viewers goes through a compression and decompression process, which can introduce additional distortion and loss of information. Therefore, it is crucial to develop VSR methods that are specifically designed to work with the compression–decompression pipeline. In general, various information in the compressed video is not utilized enough to realize the VSR model. This research proposes a highly efficient VSR network making use of data from decompressed video such as frame type, Group of Pictures (GOP), macroblock type and motion vector. The proposed Convolutional Neural Network (CNN)-based lightweight VSR model is suitable for real-time video services. The performance of the model is extensively evaluated through a series of experiments, demonstrating its effectiveness and applicability in practical scenarios.

Journal ArticleDOI
TL;DR: In this paper , a tile-size-aware deep neural network (DNN) model with a decoupled self-attention architecture is proposed to accurately and efficiently predict the transmission time of video tiles.
Abstract: Mobile 360-degree video streaming has grown significantly in popularity but the quality of experience (QoE) suffers from insufficient and variable wireless network bandwidth. Recently, saliency-driven 360-degree streaming overcomes the buffer size limitation of head movement trajectory (HMT)-driven solutions and thus strikes a better balance between video quality and rebuffering. However, inaccurate network estimations and intrinsic saliency bias still challenge saliency-based streaming approaches, limiting further QoE improvement. To address these challenges, we design a robust saliency-driven quality adaptation algorithm for 360-degree video streaming, RoSal360. Specifically, we present a practical, tile-size-aware deep neural network (DNN) model with a decoupled self-attention architecture to accurately and efficiently predict the transmission time of video tiles. Moreover, we design a reinforcement learning (RL)-driven online correction algorithm to robustly compensate the improper quality allocations due to saliency bias. Through extensive prototype evaluations over real wireless network environments including commodity WiFi, 4G/LTE, and 5G links in the wild, RoSal360 significantly enhances the video quality and reduces the rebuffering ratio, thereby improving the viewer QoE, compared to the state-of-the-art algorithms.

Journal ArticleDOI
01 Mar 2023-Sensors
TL;DR: In this article , the adverse impact of packet loss on video quality encoded with various combinations of compression parameters and resolutions was analyzed for H.264 and H.265 formats at five bit rates.
Abstract: Video delivered over IP networks in real-time applications, which utilize RTP protocol over unreliable UDP such as videotelephony or live-streaming, is often prone to degradation caused by multiple sources. The most significant is the combined effect of video compression and its transmission over the communication channel. This paper analyzes the adverse impact of packet loss on video quality encoded with various combinations of compression parameters and resolutions. For the purposes of the research, a dataset containing 11,200 full HD and ultra HD video sequences encoded to H.264 and H.265 formats at five bit rates was compiled with a simulated packet loss rate (PLR) ranging from 0 to 1%. Objective assessment was conducted by using peak signal to noise ratio (PSNR) and Structural Similarity Index (SSIM) metrics, whereas the well-known absolute category rating (ACR) was used for subjective evaluation. Analysis of the results confirmed the presumption that video quality decreases along with the rise of packet loss rate, regardless of compression parameters. The experiments further led to a finding that the quality of sequences affected by PLR declines with increasing bit rate. Additionally, the paper includes recommendations of compression parameters for use under various network conditions.

Journal ArticleDOI
TL;DR: In this article , a semantic-aware (SA) video compression (SAC) frame-work was proposed, which compresses separately and simultaneously region-of-interest and region-out-ofinterest of automotive camera video frames, before transmitting them to processing unit(s), where the data are used for perception tasks, such as object detection, semantic segmentation, etc.
Abstract: —Assisted and automated driving functions in vehicles exploit sensor data to build situational awareness, however, the data amount required by these functions might exceed the bandwidth of current wired vehicle communication networks. Consequently, sensor data reduction, and automotive camera video compression need investigation. However, conventional video compression schemes, such as H.264 and H.265, have been mainly optimised for human vision. In this paper, we propose a semantic-aware (SA) video compression (SAC) frame- work that compresses separately and simultaneously region-of-interest and region-out-of-interest of automotive camera video frames, before transmitting them to processing unit(s), where the data are used for perception tasks, such as object detection, semantic segmentation, etc. Using our newly proposed technique, the region-of-interest (ROI), encapsulating most of the road stakeholders, retains higher quality using lower compression ratio. The experimental results show that under the same overall compression ratio, our proposed SAC scheme maintains a similar or better image quality, measured accordingly to traditional metrics and to our newly proposed semantic-aware metrics. The newly proposed metrics, namely SA-PSNR, SA-SSIM, and iIoU, give more emphasis to ROI quality, which has an immediate impact on the planning and decisions of assisted and automated driving functions. Using our SA-X264 compression, SA-PSNR and SA-SSIM have an increase of 2.864 and 0.008 respectively compared to traditional H.264, with higher ROI quality and the same compression ratio. Finally, a segmentation-based perception algorithm has been used to compare reconstructed frames, demonstrating a 2.7% mIOU improvement, when using the proposed SAC method versus traditional compression techniques.

Posted ContentDOI
14 Apr 2023
TL;DR: In this paper , a reinforcement learning framework powered by two hierarchical agents is proposed to perform both frame-level and video-level quality assessment of ultrasound images to guarantee both the perceptual and diagnostic values.
Abstract: Ultrasound is the primary modality to examine fetal growth during pregnancy, while the image quality could be affected by various factors. Quality assessment is essential for controlling the quality of ultrasound images to guarantee both the perceptual and diagnostic values. Existing automated approaches often require heavy structural annotations and the predictions may not necessarily be consistent with the assessment results by human experts. Furthermore, the overall quality of a scan and the correlation between the quality of frames should not be overlooked. In this work, we propose a reinforcement learning framework powered by two hierarchical agents that collaboratively learn to perform both frame-level and video-level quality assessments. It is equipped with a specially-designed reward mechanism that considers temporal dependency among frame quality and only requires sparse binary annotations to train. Experimental results on a challenging fetal brain dataset verify that the proposed framework could perform dual-level quality assessment and its predictions correlate well with the subjective assessment results.

Posted ContentDOI
15 Apr 2023
TL;DR: In this paper , a two-stage approach is proposed to generate high-quality videos within a small time window while modeling the video holistically based on the input guidance, which improves over state-of-the-art by up to 9.5% in objective metrics and is preferred by users more than 80% of time.
Abstract: We tackle the long video generation problem, i.e.~generating videos beyond the output length of video generation models. Due to the computation resource constraints, video generation models can only generate video clips that are relatively short compared with the length of real videos. Existing works apply a sliding window approach to generate long videos at inference time, which is often limited to generating recurrent events or homogeneous content. To generate long videos covering diverse content and multiple events, we propose to use additional guidance to control the video generation process. We further present a two-stage approach to the problem, which allows us to utilize existing video generation models to generate high-quality videos within a small time window while modeling the video holistically based on the input guidance. The proposed approach is complementary to existing efforts on video generation, which focus on generating realistic video within a fixed time window. Extensive experiments on challenging real-world videos validate the benefit of the proposed method, which improves over state-of-the-art by up to 9.5% in objective metrics and is preferred by users more than 80% of time.

Posted ContentDOI
23 May 2023
TL;DR: In this paper , the authors propose a framework called Reparo for creating loss-resilient video conferencing using generative deep learning models, which involves generating missing information when a frame or part of a frame is lost.
Abstract: Loss of packets in video conferencing often results in poor quality and video freezing. Attempting to retransmit the lost packets is usually not practical due to the requirement for real-time playback. Using Forward Error Correction (FEC) to recover the lost packets is challenging since it is difficult to determine the appropriate level of redundancy. In this paper, we propose a framework called Reparo for creating loss-resilient video conferencing using generative deep learning models. Our approach involves generating missing information when a frame or part of a frame is lost. This generation is conditioned on the data received so far, and the model's knowledge of how people look, dress, and interact in the visual world. Our experiments on publicly available video conferencing datasets show that Reparo outperforms state-of-the-art FEC-based video conferencing in terms of both video quality (measured by PSNR) and video freezes.

Posted ContentDOI
16 May 2023
TL;DR: Wen et al. as mentioned in this paper proposed a quality assessment model specialized for low-light video enhancement, named Light-VQA, which handcraft corresponding features and integrate them with deep learning-based semantic features as the overall spatial information.
Abstract: Recently, Users Generated Content (UGC) videos becomes ubiquitous in our daily lives. However, due to the limitations of photographic equipments and techniques, UGC videos often contain various degradations, in which one of the most visually unfavorable effects is the underexposure. Therefore, corresponding video enhancement algorithms such as Low-Light Video Enhancement (LLVE) have been proposed to deal with the specific degradation. However, different from video enhancement algorithms, almost all existing Video Quality Assessment (VQA) models are built generally rather than specifically, which measure the quality of a video from a comprehensive perspective. To the best of our knowledge, there is no VQA model specially designed for videos enhanced by LLVE algorithms. To this end, we first construct a Low-Light Video Enhancement Quality Assessment (LLVE-QA) dataset in which 254 original low-light videos are collected and then enhanced by leveraging 8 LLVE algorithms to obtain 2,060 videos in total. Moreover, we propose a quality assessment model specialized in LLVE, named Light-VQA. More concretely, since the brightness and noise have the most impact on low-light enhanced VQA, we handcraft corresponding features and integrate them with deep-learning-based semantic features as the overall spatial information. As for temporal information, in addition to deep-learning-based motion features, we also investigate the handcrafted brightness consistency among video frames, and the overall temporal information is their concatenation. Subsequently, spatial and temporal information is fused to obtain the quality-aware representation of a video. Extensive experimental results show that our Light-VQA achieves the best performance against the current State-Of-The-Art (SOTA) on LLVE-QA and public dataset. Dataset and Codes can be found at https://github.com/wenzhouyidu/Light-VQA.

Posted ContentDOI
22 Mar 2023
TL;DR: In this paper , a data-driven approach for modeling temporal distortions (e.g., frame freezes or skips) that occur during videoconferencing calls is presented.
Abstract: Current state-of-the-art video quality models, such as VMAF, give excellent prediction results by comparing the degraded video with its reference video. However, they do not consider temporal distortions (e.g., frame freezes or skips) that occur during videoconferencing calls. In this paper, we present a data-driven approach for modeling such distortions automatically by training an LSTM with subjective quality ratings labeled via crowdsourcing. The videos were collected from live videoconferencing calls in 83 different network conditions. We applied QR codes as markers on the source videos to create aligned references and compute temporal features based on the alignment vectors. Using these features together with VMAF core features, our proposed model achieves a PCC of 0.99 on the validation set. Furthermore, our model outputs per-frame quality that gives detailed insight into the cause of video quality impairments. The VCM model and dataset are open-sourced at https://github.com/microsoft/Video_Call_MOS.

Posted ContentDOI
16 Mar 2023
TL;DR: In this article , a video quality assessment dataset for perceptual video enhancement (VDPVE) is presented, which consists of 1211 videos with different enhancements, which can be divided into three sub-datasets.
Abstract: Recently, many video enhancement methods have been proposed to improve video quality from different aspects such as color, brightness, contrast, and stability. Therefore, how to evaluate the quality of the enhanced video in a way consistent with human visual perception is an important research topic. However, most video quality assessment methods mainly calculate video quality by estimating the distortion degrees of videos from an overall perspective. Few researchers have specifically proposed a video quality assessment method for video enhancement, and there is also no comprehensive video quality assessment dataset available in public. Therefore, we construct a Video quality assessment dataset for Perceptual Video Enhancement (VDPVE) in this paper. The VDPVE has 1211 videos with different enhancements, which can be divided into three sub-datasets: the first sub-dataset has 600 videos with color, brightness, and contrast enhancements; the second sub-dataset has 310 videos with deblurring; and the third sub-dataset has 301 deshaked videos. We invited 21 subjects (20 valid subjects) to rate all enhanced videos in the VDPVE. After normalizing and averaging the subjective opinion scores, the mean opinion score of each video can be obtained. Furthermore, we split the VDPVE into a training set, a validation set, and a test set, and verify the performance of several state-of-the-art video quality assessment methods on the test set of the VDPVE.


Journal ArticleDOI
TL;DR: In this paper , the authors explore possible origins of the differences and assist the practitioner by defining minimum recommendations to ensure that quality of the video data is maintained through the transcoding process.
Abstract: Video data received for analysis often come in a variety of file formats and compression schemes. These data are often transcoded to a consistent file format for forensic examination and/or ingesting into a video analytic system. The file format often requested is an MP4 file format. The MP4 file format is a very common and a universally accepted file format. The practical application of this transcoding process, across the analytical community, has generated differences in video quality. This study sought to explore possible origins of the differences and assist the practitioner by defining minimum recommendations to ensure that quality of the video data is maintained through the transcoding process. This study sought to generate real world data by asking participants to transcode provided video files to an MP4 file format using programs they would typically utilize to perform this task. The transcoded results were evaluated based on measurable metrics of quality. As the results were analyzed, determining why these differences might have occurred became less about a particular software application and more about the settings employed by the practitioner or of the capabilities of the program. This study supports the need for any video examiner who is transcoding video data to be cognizant of the settings utilized by the programs employed for transcoding video data, as loss of video quality can affect analytics as well as further analysis.

Proceedings ArticleDOI
17 Apr 2023
TL;DR: In this article , an approach based on deep contextual video compression is used to increase efficiency, which relies heavily on specialized algorithms for extracting additional information characterizing the difference of closely spaced frames.
Abstract: The structure of existing neural network video compression methods in most cases includes predictive encoding, which uses a subtraction operation between the predicted and current frames to remove redundancy. To increase efficiency, an approach based on deep contextual video compression is used. In addition to the difference frame, this approach relies heavily on specialized algorithms for extracting additional information characterizing the difference of closely spaced frames. The use of context in this case makes it possible to achieve a better quality of reconstruction of video sequences, in particular for complex textures with a large number of high frequencies. This implies that the proposed method can potentially lead to significant savings in storage and transmission costs while maintaining high-quality video output. This article presents the results of computational experiments to evaluate the effectiveness of the investigated method of deep contextual video compression on real video sequences. Experimental findings demonstrate the advantages of the considered technique in PSNRR/bpp coordinates when compared to the performance of three common video codecs: H.264, H.265, and VP9.

Journal ArticleDOI
TL;DR: In this article , the authors proposed a novel NR-VQA scheme using systematic sampling of spatiotemporal planes (XY, XT, and YT) based on the high standard deviation (σ) of their high-frequency bands to represent distortion.
Abstract: Due to the growing demand for high-quality video services in 4G and 5G applications, measuring the quantitative quality of video services is expected to become a major vital task. The no-reference video quality assessment (NR-VQA) work published so far regresses computationally complex statistical transforms or convolutional neural network (CNN) features to predict a quality score. In this paper, we propose a novel NR-VQA scheme using systematic sampling of spatiotemporal planes (XY, XT, and YT) based on the high standard deviation (σ) of their high-frequency bands to represent distortion. Human visual system (HVS) is highly sensitive to structural information in visual scenes, and distortions disrupt the structural properties. The proposed scheme encodes two-level, three-dimensional structural video information using novel Local Spatiotemporal Tetra Patterns (LSTP) on the sampled highest σ planes from each block of planes. Besides, we extract quality-aware deep features from the second highest σ sampled video frames (XY-spatial) from each block using a fine-tuned CNN model. The extracted LSTP and deep quality-aware features of the two highest σ frames are average pooled and concatenated with the top hundred σ values of other frames to form video-level final features. Finally, the concatenated features are fed to a support vector regressor (SVR) to predict the perceptual quality scores of test videos. The proposed method is evaluated on ten publicly available standard exhaustive VQA databases containing synthetic, authentic, and mixed distortions. Comprehensive, robust, and extensive experiments indicate that the proposed model outperforms all the state-of-the-art VQA models and is consistent with human subjective assessment.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed an efficient rate control (ERC) model, which uses texture and motion-based spatial-temporal information to guide the bit allocation at the CTU level.
Abstract: Despite the fact that Versatile Video Coding (VVC) has achieved superior coding performance, two major problems remain for the rate control (RC) model in VVC. First, the regions concerned by human eyes are not clear enough in the coded video due to the deviation between the target bit allocation strategy of the coding tree unit (CTU) in RC and the human visual attention mechanism (HVAM). Second, there are significant quality fluctuations in the coded video frames due to the inappropriate updating speed. To address the above problems, we propose an efficient rate control (ERC) model. Specifically, in order to make the coded video more consistent with the attention of human eyes, we extract texture and motion-based spatial-temporal information to guide the bit allocation at the CTU level. Furthermore, based on the quasi-Newton algorithm and bit error, we propose an adaptive parameter updating (APU) method with the proper updating speed to precisely control the bits per frame. The proposed ERC outperforms the default RC model of VVC Test Model (VTM) 9.1 by saving the average Bjøntegaard Delta Rate (BD-Rate) on full-frame video sequences by 3.60% and 4.94% under low delay P (LDP) and random access (RA) configurations respectively, with higher bitrate accuracy. Moreover, the Peak Signal-to-Noise Ratio (PSNR) and actual coded bits per frame in the video coded by the proposed ERC are more stable.