scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2022"


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality, which can significantly reduce the domain gap between different video samples, resulting in more generalized quality feature representation.
Abstract: In this work, we propose a no-reference video quality assessment method, aiming to achieve high-generalization capability in cross-content, -resolution and -frame rate quality prediction. In particular, we evaluate the quality of a video by learning effective feature representations in spatial-temporal domain. In the spatial domain, to tackle the resolution and content variations, we impose the Gaussian distribution constraints on the quality features. The unified distribution can significantly reduce the domain gap between different video samples, resulting in more generalized quality feature representation. Along the temporal dimension, inspired by the mechanism of visual perception, we propose a pyramid temporal aggregation module by involving the short-term and long-term memory to aggregate the frame-level quality. Experiments show that our method outperforms the state-of-the-art methods on cross-dataset settings, and achieves comparable performance on intra-dataset configurations, demonstrating the high-generalization capability of the proposed method. The codes are released at https://github.com/Baoliang93/GSTVQA

19 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a method for mining video frames from buffered testing video streams for constructing a fine-tuning set, which can be directly applied as pseudolearning objectives to fine-tune a completely new spatial model that has been pretrained on the DAVIS-TR set.
Abstract: Previous video salient object detection (VSOD) approaches have mainly focused on the perspective of network design for achieving performance improvements. However, with the recent slowdown in the development of deep learning techniques, it might become increasingly difficult to anticipate another breakthrough solely via complex networks. Therefore, this paper proposes a universal learning scheme to obtain a further 3% performance improvement for all state-of-the-art (SOTA) VSOD models. The major highlight of our method is that we propose the ‘motion quality’, a new concept for mining video frames from the ‘buffered’ testing video stream for constructing a fine-tuning set. By using our approach, all frames in this set can all well-detect their salient object by the ‘target SOTA model’ — the one we want to improve. Thus, the VSOD results of the mined set, which were previously derived by the target SOTA model, can be directly applied as pseudolearning objectives to fine-tune a completely new spatial model that has been pretrained on the widely used DAVIS-TR set. Since some spatial scenes in the buffered testing video stream are shown, the fine-tuned spatial model can perform very well for the remaining unseen testing frames, outperforming the target SOTA model significantly. Although offline model fine tuning requires additional time costs, the performance gain can still benefit scenarios without speed requirements. Moreover, its semisupervised methodology might have considerable potential to inspire the VSOD community in the future.

13 citations


Proceedings ArticleDOI
08 Jul 2022
TL;DR: Wang et al. as mentioned in this paper proposed a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation, which can be applied to any dataset without parameter tuning.
Abstract: With the rapid growth of in-the-wild videos taken by non-specialists, blind video quality assessment (VQA) has become a challenging and demanding problem. Although lots of efforts have been made to solve this problem, it remains unclear how the human visual system (HVS) relates to the temporal quality of videos. Meanwhile, recent work has found that the frames of natural video transformed into the perceptual domain of the HVS tend to form a straight trajectory of the representations. With the obtained insight that distortion impairs the perceived video quality and results in a curved trajectory of the perceptual representation, we propose a temporal perceptual quality index (TPQI) to measure the temporal distortion by describing the graphic morphology of the representation. Specifically, we first extract the video perceptual representations from the lateral geniculate nucleus (LGN) and primary visual area (V1) of the HVS, and then measure the straightness and compactness of their trajectories to quantify the degradation in naturalness and content continuity of video. Experiments show that the perceptual representation in the HVS is an effective way of predicting subjective temporal quality, and thus TPQI can, for the first time, achieve comparable performance to the spatial quality metric and be even more effective in assessing videos with large temporal variations. We further demonstrate that by combining with NIQE, a spatial quality metric, TPQI can achieve top performance over popular in-the-wild video datasets. More importantly, TPQI does not require any additional information beyond the video being evaluated and thus can be applied to any datasets without parameter tuning. Source code is available at https://github.com/UoLMM/TPQI-VQA.

13 citations


Proceedings ArticleDOI
10 Oct 2022
TL;DR: Wang et al. as mentioned in this paper proposed an end-to-end spatial feature extraction network to directly learn the quality-aware spatial feature representation from raw pixels of the video frames.
Abstract: Quality assessment for User Generated Content (UGC) videos plays an important role in ensuring the viewing experience of end-users. Previous UGC video quality assessment (VQA) studies either use the image recognition model or the image quality assessment (IQA) models to extract frame-level features of UGC videos for quality regression, which are regarded as the sub-optimal solutions because of the domain shifts between these tasks and the UGC VQA task. In this paper, we propose a very simple but effective UGC VQA model, which tries to address this problem by training an end-to-end spatial feature extraction network to directly learn the quality-aware spatial feature representation from raw pixels of the video frames. We also extract the motion features to measure the temporal-related distortions that the spatial features cannot model. The proposed model utilizes very sparse frames to extract spatial features and dense frames (i.e. the video chunk) with a very low spatial resolution to extract motion features, which thereby has low computational complexity. With the better quality-aware features, we only use the simple multilayer perception layer (MLP) network to regress them into the chunk-level quality scores, and then the temporal average pooling strategy is adopted to obtain the video-level quality score. We further introduce a multi-scale quality fusion strategy to solve the problem of VQA across different spatial resolutions, where the multi-scale weights are obtained from the contrast sensitivity function of the human visual system. The experimental results show that the proposed model achieves the best performance on five popular UGC VQA databases, which demonstrates the effectiveness of the proposed model.

13 citations


Journal ArticleDOI
TL;DR: In this article , the authors provide a comparative subjective quality evaluation between VVC and HEVC for 8K resolution videos, and evaluate the perceived quality improvement offered by 8K over UHD 4K resolution.
Abstract: With the growing data consumption of emerging video applications and users’ requirement for higher resolutions, up to 8K, a huge effort has been made in video compression technologies. Recently, versatile video coding (VVC) has been standardized by the moving picture expert group (MPEG), providing a significant improvement in compression performance over its predecessor high efficiency video coding (HEVC). In this paper, we provide a comparative subjective quality evaluation between VVC and HEVC standards for 8K resolution videos. In addition, we evaluate the perceived quality improvement offered by 8K over UHD 4K resolution. The compression performance of both VVC and HEVC standards has been conducted in random access (RA) coding configuration, using their respective reference software, VVC test model (VTM-11) and HEVC test model (HM-16.20). Objective measurements, using PSNR, MS-SSIM and VMAF metrics have shown that the bitrate gains offered by VVC over HEVC for 8K video content are around 31%, 26% and 35%, respectively. Subjectively, VVC offers an average of around 41% of bitrate reduction over HEVC for the same visual quality. A compression gain of 50% has been reached for some tested video sequences regarding a Student’s t-test analysis. In addition, for most tested scenes, a significant visual difference between uncompressed 4K and 8K has been noticed.

12 citations


Journal ArticleDOI
TL;DR: In this article , the effects of visual attention on the QoE of VR 360-degree videos were evaluated through subjective tests where participants watched degraded versions of 360-videos through a Head-Mounted Display with integrated eye-tracking sensors.
Abstract: Abstract The research domain on the Quality of Experience (QoE) of 2D video streaming has been well established. However, a new video format is emerging and gaining popularity and availability: VR 360-degree video. The processing and transmission of 360-degree videos brings along new challenges such as large bandwidth requirements and the occurrence of different distortions. The viewing experience is also substantially different from 2D video, it offers more interactive freedom on the viewing angle but can also be more demanding and cause cybersickness. The first goal of this article is to complement earlier research by Tran , et al. (2017) [39] testing the effects of quality degradation, freezing, and content on the QoE of 360-videos. The second goal is to test the contribution of visual attention as an influence factor in the QoE assessment. Data was gathered through subjective tests where participants watched degraded versions of 360-videos through a Head-Mounted Display with integrated eye-tracking sensors. After each video they answered questions regarding their quality perception, experience, perceptual load, and cybersickness. Our results showed that the participants rated the overall QoE rather low, and the ratings decreased with added degradations and freezing events. Cyber sickness was found not to be an issue. The effects of the manipulations on visual attention were minimal. Attention was mainly directed by content, but also by surprising elements. The addition of eye-tracking metrics did not further explain individual differences in subjective ratings. Nevertheless, it was found that looking at moving objects increased the negative effect of freezing events and made participants less sensitive to quality distortions. More research is needed to conclude whether visual attention is an influence factor on the QoE in 360-video.

11 citations


Journal ArticleDOI
TL;DR: A comprehensive overview of the work in the field of source video identification can be found in this paper by examining existing techniques, such as photo response nonuniformity (PRNU) and machine learning approaches.

11 citations


Journal ArticleDOI
TL;DR: This paper proposes a relay-assisted cognitive radio network using the multiple description coding technique for video transmission, which shows the benefits of the relay assisted networks by 9% improvement on average outage performance over the non-relay network.
Abstract: Multimedia content delivery, such as video transmission over wireless networks, imposes significant challenges include spectrum capacity and packet losses. The cognitive radio (CR) technology is developed to solve the spectrum issue, while multiple description coding (MDC) is one of the promising source coding techniques to alleviate packet loss problems and exploit the benefit of path diversity. The source information was split into several descriptions in MDC, then transmitted over a network with multiple paths. The quality of the received data increases with the number of descriptions received at the receiver. In this paper, the proposed system comprises of relay-assisted cognitive radio network using the MDC technique for video transmission. In the simulations, the outage performance of the MDC scheme over two networks, which were relay-assisted network and non-relay network, were compared. Then, the outage probability was used to estimate the video quality, peak signal to noise ratio (PSNR) of the received video. The results obtained show the benefits of the relay assisted networks by 9% improvement on average outage performance over the non-relay network. Furthermore, the video performance improved by an average of 9% in PSNR compared to the non-relay system.

11 citations


Journal ArticleDOI
01 Mar 2022-Sensors
TL;DR: A novel, innovative deep learning-based approach for NR-VQA that relies on a set of in parallel pre-trained convolutional neural networks to characterize versatitely the potential image and video distortions to set a new state-of-the-art on two large benchmark video quality assessment databases with authentic distortions.
Abstract: With the constantly growing popularity of video-based services and applications, no-reference video quality assessment (NR-VQA) has become a very hot research topic. Over the years, many different approaches have been introduced in the literature to evaluate the perceptual quality of digital videos. Due to the advent of large benchmark video quality assessment databases, deep learning has attracted a significant amount of attention in this field in recent years. This paper presents a novel, innovative deep learning-based approach for NR-VQA that relies on a set of in parallel pre-trained convolutional neural networks (CNN) to characterize versatitely the potential image and video distortions. Specifically, temporally pooled and saliency weighted video-level deep features are extracted with the help of a set of pre-trained CNNs and mapped onto perceptual quality scores independently from each other. Finally, the quality scores coming from the different regressors are fused together to obtain the perceptual quality of a given video sequence. Extensive experiments demonstrate that the proposed method sets a new state-of-the-art on two large benchmark video quality assessment databases with authentic distortions. Moreover, the presented results underline that the decision fusion of multiple deep architectures can significantly benefit NR-VQA.

10 citations


Journal ArticleDOI
TL;DR: A novel weakly supervised solution to video moment localization by introducing Contrastive Negative sample Mining (CNM), which uses a learnable Gaussian mask to generate positive samples, highlighting the video frames most related to the query, and considers other frames of the video and the whole video as easy and hard negative samples respectively.
Abstract: Video moment localization aims at localizing the video segments which are most related to the given free-form natural language query. The weakly supervised setting, where only video level description is available during training, is getting more and more attention due to its lower annotation cost. Prior weakly supervised methods mainly use sliding windows to generate temporal proposals, which are independent of video content and low quality, and train the model to distinguish matched video-query pairs and unmatched ones collected from different videos, while neglecting what the model needs is to distinguish the unaligned segments within the video. In this work, we propose a novel weakly supervised solution by introducing Contrastive Negative sample Mining (CNM). Specifically, we use a learnable Gaussian mask to generate positive samples, highlighting the video frames most related to the query, and consider other frames of the video and the whole video as easy and hard negative samples respectively. We then train our network with the Intra-Video Contrastive loss to make our positive and negative samples more discriminative. Our method has two advantages: (1) Our proposal generation process with a learnable Gaussian mask is more efficient and makes our positive sample higher quality. (2) The more difficult intra-video negative samples enable our model to distinguish highly confusing scenes. Experiments on two datasets show the effectiveness of our method. Code can be found at https://github.com/minghangz/cnm.

10 citations


Journal ArticleDOI
TL;DR: In this article , the authors provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices, including 80 network scenarios with 171 individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3 G/4 G traces, and 4,022 runs with pre-defined bandwidth changes.
Abstract: Around 4.9 billion Internet users worldwide watch billions of hours of online video every day. As a result, streaming is by far the predominant type of traffic in communication networks. According to Google statistics, three out of five video views come from mobile devices. Thus, in view of the continuous technological advances in end devices and increasing mobile use, datasets for mobile streaming are indispensable in research but only sparsely dealt with in literature so far. With this public dataset, we provide 1,081 hours of time-synchronous video measurements at network, transport, and application layer with the native YouTube streaming client on mobile devices. The dataset includes 80 network scenarios with 171 different individual bandwidth settings measured in 5,181 runs with limited bandwidth, 1,939 runs with emulated 3 G/4 G traces, and 4,022 runs with pre-defined bandwidth changes. This corresponds to 332 GB video payload. We present the most relevant quality indicators for scientific use, i.e., initial playback delay, streaming video quality, adaptive video quality changes, video rebuffering events, and streaming phases.

Journal ArticleDOI
TL;DR: In this paper , a review and classification of the latest research work on RR-based image and video quality assessment is presented, which can be divided into three main categories; full reference (FR), reduced reference (RR) and no-reference (NR).
Abstract: Abstract With the growing demand for image and video-based applications, the requirements of consistent quality assessment metrics of image and video have increased. Different approaches have been proposed in the literature to estimate the perceptual quality of images and videos. These approaches can be divided into three main categories; full reference (FR), reduced reference (RR) and no-reference (NR). In RR methods, instead of providing the original image or video as a reference, we need to provide certain features (i.e., texture, edges, etc.) of the original image or video for quality assessment. During the last decade, RR-based quality assessment has been a popular research area for a variety of applications such as social media, online games, and video streaming. In this paper, we present review and classification of the latest research work on RR-based image and video quality assessment. We have also summarized different databases used in the field of 2D and 3D image and video quality assessment. This paper would be helpful for specialists and researchers to stay well-informed about recent progress of RR-based image and video quality assessment. The review and classification presented in this paper will also be useful to gain understanding of multimedia quality assessment and state-of-the-art approaches used for the analysis. In addition, it will help the reader select appropriate quality assessment methods and parameters for their respective applications.


Journal ArticleDOI
TL;DR: The ETRI-LIVE Space-Time Sub-sampled Video Quality (ETRI-Live STSVQ) dataset as mentioned in this paper contains 437 videos generated by applying various levels of combined space-time subsampling and video compression on 15 diverse video contents.
Abstract: Video dimensions are continuously increasing to provide more realistic and immersive experiences to global streaming and social media viewers. However, increments in video parameters such as spatial resolution and frame rate are inevitably associated with larger data volumes. Transmitting increasingly voluminous videos through limited bandwidth networks in a perceptually optimal way is a current challenge affecting billions of viewers. One recent practice adopted by video service providers is space-time resolution adaptation in conjunction with video compression. Consequently, it is important to understand how different levels of space-time subsampling and compression affect the perceptual quality of videos. Towards making progress in this direction, we constructed a large new resource, called the ETRI-LIVE Space-Time Subsampled Video Quality (ETRI-LIVE STSVQ) database, containing 437 videos generated by applying various levels of combined space-time subsampling and video compression on 15 diverse video contents. We also conducted a large-scale human study on the new dataset, collecting about 15,000 subjective judgments of video quality. We provide a rate-distortion analysis of the collected subjective scores, enabling us to investigate the perceptual impact of space-time subsampling at different bit rates. We also evaluated and compared the performance of leading video quality models on the new database.

Journal ArticleDOI
TL;DR: In this paper , a new objective quality metric that was adapted to the complex characteristics of immersive video (IV) which is prone to errors caused by processing and compression of multiple input views and virtual view synthesis was proposed.
Abstract: This paper presents a new objective quality metric that was adapted to the complex characteristics of immersive video (IV) which is prone to errors caused by processing and compression of multiple input views and virtual view synthesis. The proposed metric, IV-PSNR, contains two techniques that allow for the evaluation of quality loss for typical immersive video distortions: corresponding pixel shift and global component difference. The performed experiments compared the proposal with 31 state-of-the-art quality metrics, showing their performance in the assessment of quality in immersive video coding and processing, and in other applications, using commonly used image quality assessment databases– TID2013 and CVIQ. As presented, IV-PSNR outperforms other metrics in immersive video applications and still can be efficiently used in the evaluation of different images and videos. Moreover, basing the metric on the calculation of PSNR allowed the computational complexity to remain low. Publicly available, efficient implementation of IV-PSNR software was provided by the authors of this paper and is used by ISO/IEC MPEG for evaluation and research on the upcoming MPEG Immersive video (MIV) coding standard.

Journal ArticleDOI
TL;DR: In this paper , a review and classification of the latest research work on RR-based image and video quality assessment is presented, which can be divided into three main categories; full reference (FR), reduced reference (RR) and no-reference (NR).
Abstract: Abstract With the growing demand for image and video-based applications, the requirements of consistent quality assessment metrics of image and video have increased. Different approaches have been proposed in the literature to estimate the perceptual quality of images and videos. These approaches can be divided into three main categories; full reference (FR), reduced reference (RR) and no-reference (NR). In RR methods, instead of providing the original image or video as a reference, we need to provide certain features (i.e., texture, edges, etc.) of the original image or video for quality assessment. During the last decade, RR-based quality assessment has been a popular research area for a variety of applications such as social media, online games, and video streaming. In this paper, we present review and classification of the latest research work on RR-based image and video quality assessment. We have also summarized different databases used in the field of 2D and 3D image and video quality assessment. This paper would be helpful for specialists and researchers to stay well-informed about recent progress of RR-based image and video quality assessment. The review and classification presented in this paper will also be useful to gain understanding of multimedia quality assessment and state-of-the-art approaches used for the analysis. In addition, it will help the reader select appropriate quality assessment methods and parameters for their respective applications.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a spatio-temporal modeling approach to evaluate the quality of the omnidirectional video, based upon which the smoothed distortion value is recursively calculated and consolidated by the characteristics of temporal variations.
Abstract: Omnidirectional video, also known as 360-degree video, has become increasingly popular nowadays due to its ability to provide immersive and interactive visual experiences. However, the ultra high resolution and the spherical observation space brought by the large spherical viewing range make omnidirectional video distinctly different from traditional 2D video. To date, the video quality assessment (VQA) for omnidirectional video is still an open issue. The existing VQA metrics for omnidirectional video only consider the spatial characteristics of distortions, but the temporal change of spatial distortions can also considerably influence human visual perception. In this paper, we propose a spatiotemporal modeling approach to evaluate the quality of the omnidirectional video. Firstly, we construct a spatioral quality assessment unit to evaluate the average distortion in temporal dimension at the eye fixation level, based upon which the smoothed distortion value is recursively calculated and consolidated by the characteristics of temporal variations. Then, we give a detailed solution of how to to integrate the three existing spatial VQA metrics into our approach. Besides, the cross-format omnidirectional video distortion measurement is also investigated. Finally, the spatiotemporal distortion of the whole video sequence is obtained by pooling. Based on the modeling approach, a full reference objective quality assessment metric for omnidirectional video is derived, namely OV-PSNR. The experimental results show that our proposed OV-PSNR greatly improves the prediction performance of the existing VQA metrics for omnidirectional video.

Journal ArticleDOI
TL;DR: In this paper , the authors used ambulatory assessment data from individuals with schizophrenia and controls over a week to assess factors influencing sufficiency and useability of video and audio data, including the effect of relevant variables on video provision and video quality.
Abstract: Ambulatory audio and video recording provides a wealth of information which can be used for a broad range of applications, including digital phenotyping, telepsychiatry, and telepsychology. However, these technologies are in their infancy, and guidelines for their use and analysis have yet to be established. The current project used ambulatory assessment data from individuals with schizophrenia (N = 52) and controls (N = 55) over a week to assess factors influencing sufficiency and useability of video and audio data. Logistic multilevel models examined the effect of relevant variables on video provision and video quality. There was no difference by group in video provision or quality. Videos were less likely to be provided later in the study and later in the day. Video quality was lower later in the day, particularly for controls. Participants were more likely to provide videos if alone or at home than in other settings. Black participants were less likely to have analyzable video frames than White participants. These results suggest potential racial disparities in camera technologies and/or facial analysis algorithms. Implications of these findings and recommendations for future study development, such as instructions to provide to participants to optimize video quality, are discussed.

Journal ArticleDOI
TL;DR: In this article , the authors investigated the user fairness power allocation for video transmission in a non-orthogonal multiple access (NOMA)-assisted wireless system with a pair of users.
Abstract: In this paper, we investigate the user fairness power allocation (PA) for video transmission in a non-orthogonal multiple access (NOMA)-assisted wireless system with a pair of users. Considering that video transmission is usually delay-sensitive, the video quality can be adaptively adjusted based on the network conditions. First, a new performance metric, called the average video quality degradation probability, is proposed to describe the adaptive video quality. Then, based on the principles of NOMA and user queue model, the average video quality degradation probability of both users is derived. Next, a bisection search based power allocation (PA) algorithm is proposed to balance the video quality between the NOMA pair. Simulation results demonstrate that the proposed NOMA scheme is more effective to achieve the user fairness compared with the fixed NOMA scheme, the fractional transmit power allocation (FTPA) based NOMA scheme, the achievable rate fairness (ARF) NOMA scheme, and the optimized time division multiple access (TDMA) scheme.

Journal ArticleDOI
TL;DR: In this paper , the quality of information about vestibular schwannoma on YouTube was assessed using recognized scoring systems and whether video quality metrics correlated with video popularity based on metadata analysis.
Abstract: Patients frequently use the internet to gain information and make decisions about their health conditions. This work aims to assess the quality of information about Vestibular Schwannoma on a popular video sharing platform, YouTube (Alphabet Inc.).To assess quality of the most popular vestibular schwannoma videos using recognized scoring systems and whether video quality metrics correlated with video popularity based on metadata analysis.Public domain.Cross-sectional Study.The YouTube website was systematically searched on separate days with a formal search strategy to identify videos relevant to vestibular schwannoma. Each video was viewed and scored by three independent assessors, using scores for quality and disease specific accuracy. Popularity metrics were analyzed and compared to video quality. Patient surveys were conducted to further assess their perspectives of the included videos.A total of 23 YouTube videos were included. In terms of Essential and Ideal Video Completeness Criteria, the mean scores ranged from 4.8 to 5.0 (out of 12), indicating moderate video quality. The average DISCERN score ranged from 30.0 to 36.7, indicating lower reliability. The mean JAMA scores ranged from 1.96 to 2.48, indicating average quality. Based on metrics including DISCERN and JAMA instruments, the information in the YouTube videos were of low to average quality and reliability. Rater scoring was reliable. Viewer engagement correlated poorly with video quality except for JAMA metrics.Video quality on YouTube with respect to Vestibular Schwannoma is of low to average quality. Viewer engagement and popularity correlated poorly with video quality. Clinicians should direct their patients to high quality videos and should consider uploading their own high-quality videos.

Journal ArticleDOI
TL;DR: In this paper , a bandpass filter based computational model of the lateral geniculate nucleus (LGN) and V1 regions of the human visual system (HVS) was used to validate the perceptual straightening hypothesis.
Abstract: In this work, we address the challenging problem of completely blind video quality assessment (BVQA) of user generated content (UGC). The challenge is twofold since the quality prediction model is oblivious of human opinion scores, and there are no well-defined distortion models for UGC content. Our solution is inspired by a recent computational neuroscience model which hypothesizes that the human visual system (HVS) transforms a natural video input to follow a straighter temporal trajectory in the perceptual domain. A bandpass filter based computational model of the lateral geniculate nucleus (LGN) and V1 regions of the HVS was used to validate the perceptual straightening hypothesis. We hypothesize that distortions in natural videos lead to loss in straightness (or increased curvature) in their transformed representations in the HVS. We provide extensive empirical evidence to validate our hypothesis. We quantify the loss in straightness as a measure of temporal quality, and show that this measure delivers acceptable quality prediction performance on its own. Further, the temporal quality measure is combined with a state-of-the-art blind spatial (image) quality metric to design a blind video quality predictor that we call STraightness Evaluation Metric (STEM). STEM is shown to deliver state-of-the-art performance over the class of BVQA algorithms on five UGC VQA datasets including KoNViD-1K, LIVE-Qualcomm, LIVE-VQC, CVD and YouTube-UGC. Importantly, our solution is completely blind i.e., training-free, generalizes very well, is explainable, has few tunable parameters, and is simple and easy to implement.

Journal ArticleDOI
TL;DR: In this paper , the authors provide a brief overview of adaptive video streaming quality assessment and compare different variations of objective QoE assessment models with or without using machine learning techniques for adaptive videos streaming.

Journal ArticleDOI
TL;DR: The LIVE Livestream Database as discussed by the authors is a video quality database specifically designed for live streaming VQA research, which includes 315 videos of 45 source sequences from 33 original contents impaired by 6 types of distortions.
Abstract: Video livestreaming is gaining prevalence among video streaming service s, especially for the delivery of live, high motion content such as sport ing events. The quality of the se livestreaming videos can be adversely affected by any of a wide variety of events, including capture artifacts, and distortions incurred during coding and transmission. High motion content can cause or exacerbate many kinds of distortion, such as motion blur and stutter. Because of this, the development of objective Video Quality Assessment (VQA) algorithms that can predict the perceptual quality of high motion, live streamed videos is greatly desired. Important resources for developing these algorithms are appropriate databases that exemplify the kinds of live streaming video distortions encountered in practice. Towards making progress in this direction, we built a video quality database specifically designed for live streaming VQA research. The new video database is called the Laboratory for Image and Video Engineering (LIVE) Livestream Database. The LIVE Livestream Database includes 315 videos of 45 source sequences from 33 original contents impaired by 6 types of distortions. We also performed a subjective quality study using the new database, whereby more than 12,000 human opinions were gathered from 40 subjects. We demonstrate the usefulness of the new resource by performing a holistic evaluation of the performance of current state-of-the-art (SOTA) VQA models. We envision that researchers will find the dataset to be useful for the development, testing, and comparison of future VQA models. The LIVE Livestream database is being made publicly available for these purposes at https://live.ece. utexas.edu/research/LIVE_APV_Study/apv_index.html.

Proceedings ArticleDOI
15 Mar 2022
TL;DR: In this article , the authors present an analysis of modern video quality assessments used to assess the quality of video sequences during signal streaming, which will be useful to video compression researchers and will also be useful in creating additional test materials, planning future experiments, and improving adaptive codecs.
Abstract: videos are going to occupy 82% of Internet traffic by 2022, as now we can observe a trend that video streaming is taking up more and more Internet bandwidth. With video traffic skyrocketing, improvements in video encoding technology are critical for video streaming companies. Assessing the quality of video compression in terms of human perception is an important area of research. Subjective evaluations of video quality require a lot of time for experiments and are not feasible in real-time applications. Therefore, interest in the development of models for the objective evaluation of video quality has a high growth trend in recent years. The main problem of modern objective quality assessments is unreliable information provided by developers, poor correlation with the visual perception of artefacts by the end user-human. In this paper, we present an analysis of modern video quality assessments used to assess the quality of video sequences during signal streaming. This work will be useful to video compression researchers, and will also be useful in creating additional test materials, planning future experiments, and improving adaptive codecs.

Proceedings ArticleDOI
10 Jun 2022
TL;DR: Zhang et al. as mentioned in this paper proposed a user perception-based video experience optimization for energy-constrained mobile video streaming, by jointly considering the inherent connection between device state of motion, brightness scaling factor, video bitrate and environment context, and their combined impact on user's visual perception.
Abstract: Brightness scaling (BS) is an emerging and promising technique with outstanding energy efficiency on mobile video streaming. However, existing BS-based approaches totally neglect the inherent interaction effect between BS factor, video bitrate and environment context, and their combined impact on user’s visual perception in mobile scenario, leading to inharmonious between energy consumption and user’s quality of experience (QoE). In this paper, we propose PEO, a novel user-Perception-based video Experience Optimization for energy-constrained mobile video streaming, by jointly considering the inherent connection between device’s state of motion, BS factor, video bitrate and the resulting user-perceived quality. Specifically, by capturing the motion of on-the-run device, PEO first infers the optimal bitrate and BS factor, therefore avoiding bitrate-inefficiency for energy saving while guaranteeing the user-perceived QoE. On that basis, we formulate the device motion-aware and user perception-aware video streaming as an optimization problem where we present an optimal algorithm to maximize the object function, and thus propose an online bitrate selection algorithm. Our evaluation (based on trace analysis and user study) shows that, compared with state-of-the-art techniques, PEO can raise the perceived quality by 23.8%-41.3% and save up to 25.2% energy consumption.

Journal ArticleDOI
TL;DR: ViTrack is a framework for efficient multi-video tracking using computation resource on the edge for commodity video surveillance systems and leverages a Markov Model based approach to efficiently recover missing information and finally derive the complete trajectory.
Abstract: Nowadays, video surveillance systems are widely deployed in various places, e.g., schools, parks, airports, roads, etc. However, existing video surveillance systems are far from full utilization due to high computation overhead in video processing. In this work, we present ViTrack, a framework for efficient multi-video tracking using computation resource on the edge for commodity video surveillance systems. In the heart of ViTrack lies a two layer spatial/temporal compressed target detection method to significantly reduce the computation overhead by combining videos from multiple cameras. Further, ViTrack derives the video relationship and camera information even in absence of camera location, direction, etc. To alleviate the impact of variant video quality and missing targets, ViTrack leverages a Markov Model based approach to efficiently recover missing information and finally derive the complete trajectory. We implement ViTrack on a real deployed video surveillance system with 110 cameras. The experiment results demonstrate that ViTrack can provide efficient trajectory tracking with processing time 45x less than the existing approach. For 110 video cameras, ViTrack can run on a Dell OptiPlex 390 computer to track given targets in almost real time. We believe ViTrack can enable practical video analysis for widely deployed commodity video surveillance systems.


Proceedings ArticleDOI
01 Mar 2022
TL;DR: In this article , an evaluation of the latest MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC) for live gaming video streaming applications is presented in terms of both objective and subjective quality measures.
Abstract: This paper presents an evaluation of the latest MPEG-5 Part 2 Low Complexity Enhancement Video Coding (LCEVC) for live gaming video streaming applications. The results are presented in terms of both objective and subjective quality measures. Our results indicate that LCEVC outperforms both x264 and x265 codecs in terms of bitrate savings using VMAF. Using subjective results, it is found that LCEVC outperforms the respective base codecs, especially for low bitrates. This effect is much more dominant for x264 as compared to x265, with marginal absolute improvement of quality scores for x265.

Journal ArticleDOI
TL;DR: In this paper , an effective transcoding and recommending-based caching algorithm (TRBA) is proposed to maximize the video delivery delay while satisfying users' requirements by iteratively buffering the most valuable video versions, one video version each time, until no more video version or cache space is available.
Abstract: Edge caching can significantly improve the quality of wireless video services by deploying cache servers at network edges. Recently, video conversion and recommendation have been introduced to improve the caching performance at the edges. Specifically, they work to produce lower quality versions of videos via video converting (for the former) or provide alternative similar videos when requested videos are not available by using video recommendation (for the latter). However, existing work in this aspect has utilized these two techniques separately, which largely limit their capabilities in providing improved video services. In this article, we study how to jointly utilize these two techniques in edge caching for improved caching performance. The objective is to maximally reduce the video delivery delay while satisfying users’ requirements. We first formulate the optimal video caching problem in this case and derive its NP-hardness. We, then, propose an effective transcoding- and recommending-based caching algorithm (TRBA). The TRBA works in a greedy manner to iteratively buffer the most valuable video versions, one video version each time, until no more video version or cache space is available. We define the value of a video version as the reduced delivery delay if this version were buffered divided by the extra cache space required for its buffering. The computational complexity of TRBA is deduced as $O(|V|^3|Q|^3)$ , where $|V|$ and $|Q|$ represent the total number of videos and the number of versions per video, respectively. Numerical results demonstrate that, compared with existing caching algorithms, TRBA can significantly improve the caching performance.

Book ChapterDOI
TL;DR: In this article , the authors proposed a new quality metric which extends the peak signal to noise ratio metric with features of the human visual system measured using modern LCD screens, and compared the commonly used quality metrics with metrics containing data modelling human perception.
Abstract: Nowadays, numerous video compression quality assessment metrics are available. Some of these metrics are “objective” and only tangentially represent how a human observer rates video quality. On the other hand, models of the human visual system have been shown to be effective at describing spatial coding. In this work we propose a new quality metric which extends the peak signal to noise ratio metric with features of the human visual system measured using modern LCD screens. We also analyse the current visibility models of the early visual system and compare the commonly used quality metrics with metrics containing data modelling human perception. We examine the Pearson’s linear correlation coefficient of the various video compression quality metrics with human subjective scores on videos from the publicly available Netflix data set. Of the metrics tested, our new proposed metric is found to have the most stable high performance in predicting subjective video compression quality.