scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2019"


Journal ArticleDOI
TL;DR: A new approach for learning-based video quality assessment is proposed, based on the idea of computing features in two levels so that low complexity features are computed for the full sequence first, and then high complexity Features are extracted from a subset of representative video frames, selected by using the low complexity Features.
Abstract: Smartphones and other consumer devices capable of capturing video content and sharing it on social media in nearly real time are widely available at a reasonable cost. Thus, there is a growing need for no-reference video quality assessment (NR-VQA) of consumer produced video content, typically characterized by capture impairments that are qualitatively different from those observed in professionally produced video content. To date, most of the NR-VQA models in prior art have been developed for assessing coding and transmission distortions, rather than capture impairments. In addition, the most accurate NR-VQA methods known in prior art are often computationally complex, and therefore impractical for many real life applications. In this paper, we propose a new approach for learning-based video quality assessment, based on the idea of computing features in two levels so that low complexity features are computed for the full sequence first, and then high complexity features are extracted from a subset of representative video frames, selected by using the low complexity features. We have compared the proposed method against several relevant benchmark methods using three recently published annotated public video quality databases, and our results show that the proposed method can predict subjective video quality more accurately than the benchmark methods. The best performing prior method achieves nearly similar accuracy, but at substantially higher computational cost.

203 citations


Journal ArticleDOI
TL;DR: The live video quality challenge database (LIVE-VQC) as mentioned in this paper is a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions.
Abstract: The great variations of videographic skills in videography, camera designs, compression and processing protocols, communication and bandwidth environments, and displays leads to an enormous variety of video impairments. Current no-reference (NR) video quality models are unable to handle this diversity of distortions. This is true in part because available video quality assessment databases contain very limited content, fixed resolutions, were captured using a small number of camera devices by a few videographers and have been subjected to a modest number of distortions. As such, these databases fail to adequately represent real world videos, which contain very different kinds of content obtained under highly diverse imaging conditions and are subject to authentic, complex, and often commingled distortions that are difficult or impossible to simulate. As a result, NR video quality predictors tested on real-world video data often perform poorly. Toward advancing NR video quality prediction, we have constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions. We collected a large number of subjective video quality scores via crowdsourcing. A total of 4776 unique participants took part in the study, yielding over 205 000 opinion scores, resulting in an average of 240 recorded human opinions per video. We demonstrate the value of the new resource, which we call the live video quality challenge database (LIVE-VQC), by conducting a comparison with leading NR video quality predictors on it. This paper is the largest video quality assessment study ever conducted along several key dimensions: number of unique contents, capture devices, distortion types and combinations of distortions, study participants, and recorded subjective scores. The database is available for download on this link: http://live.ece.utexas.edu/research/LIVEVQC/index.html .

176 citations


Proceedings ArticleDOI
15 Oct 2019
TL;DR: This work proposes an objective no-reference video quality assessment method by integrating both effects of content-dependency and temporal-memory effects into a deep neural network, which outperforms five state-of-the-art methods by a large margin.
Abstract: Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.

170 citations


Journal ArticleDOI
TL;DR: A comprehensive overview of recent and currently undergoing works in the field of QoE modeling for HTTP adaptive streaming is presented, as well as existing challenges and shortcomings.
Abstract: With the recent increased usage of video services, the focus has recently shifted from the traditional quality of service-based video delivery to quality of experience (QoE)-based video delivery. Over the past 15 years, many video quality assessment metrics have been proposed with the goal to predict the video quality as perceived by the end user. HTTP adaptive streaming (HAS) has recently gained much attention and is currently used by the majority of video streaming services, such as Netflix and YouTube. HAS, using reliable transport protocols, such as TCP, does not suffer from image artifacts due to packet losses, which are common in traditional streaming technologies. Hence, the QoE models developed for other streaming technologies alone are not sufficient. Recently, many works have focused on developing QoE models targeting HAS-based applications. Also, the recently published ITU-T Recommendation series P.1203 proposes a parametric bitstream-based model for the quality assessment of progressive download and adaptive audiovisual streaming services over a reliable transport. The main contribution of this paper is to present a comprehensive overview of recent and currently undergoing works in the field of QoE modeling for HAS. The HAS QoE models, influence factors, and subjective test methodologies are discussed, as well as existing challenges and shortcomings. The survey can serve as a guideline for researchers interested in QoE modeling for HAS and also discusses possible future work.

112 citations


Journal ArticleDOI
TL;DR: In rigorous experiments, the proposed algorithms demonstrate the state-of-the-art performance on multiple video applications and are made available as a part of the open source package in https://github.com/Netflix/vmaf.
Abstract: The recently developed video multi-method assessment fusion (VMAF) framework integrates multiple quality-aware features to accurately predict the video quality. However, the VMAF does not yet exploit important principles of temporal perception that are relevant to the perceptual video distortion measurement. Here, we propose two improvements to the VMAF framework, called spatiotemporal VMAF and ensemble VMAF, which leverage perceptually-motivated space–time features that are efficiently calculated at multiple scales. We also conducted a large subjective video study, which we have found to be an excellent resource for training our feature-based approaches. In rigorous experiments, we found that the proposed algorithms demonstrate the state-of-the-art performance on multiple video applications. The compared algorithms will be made available as a part of the open source package in https://github.com/Netflix/vmaf .

90 citations


Journal ArticleDOI
TL;DR: A new mobile video quality database that contains videos afflicted with distortions caused by 26 different stalling patterns, and is making the database publicly available in order to help the advance state-of-the-art research on user-centric mobile network planning and management.
Abstract: Over-the-top mobile adaptive video streaming is invariably influenced by volatile network conditions, which can cause playback interruptions (stalling or rebuffering events) and bitrate fluctuations, thereby impairing users’ quality of experience (QoE). Video quality assessment models that can accurately predict users’ QoE under such volatile network conditions are rapidly gaining attention, since these methods could enable more efficient design of quality control protocols for media-driven services such as YouTube, Amazon, Netflix, and many others. However, the development of improved QoE prediction models requires data sets of videos afflicted with diverse stalling events that have been labeled with ground-truth subjective opinion scores. Toward this end, we have created a new mobile video quality database that we call LIVE Mobile Stall Video Database-II. Our database contains a total of 174 videos afflicted with distortions caused by 26 different stalling patterns. We describe the way we simulated the diverse stalling events to create a corpus of distorted videos, and we detail the human study we conducted to obtain continuous-time subjective scores from 54 subjects. We also present the outcomes of our comprehensive analysis of the impact of several factors that influence subjective QoE, and report the performance of existing QoE-prediction models on our data set. We are making the database (videos, subjective data, and video metadata) publicly available in order to help the advance state-of-the-art research on user-centric mobile network planning and management. The database may be accessed at http://live.ece.utexas.edu/research/LIVEStallStudy/liveMobile.html .

87 citations


Journal ArticleDOI
Yu Zhang1, Xinbo Gao1, Lihuo He1, Wen Lu1, Ran He1 
TL;DR: A general-purpose no-reference VQA framework that is based on weakly supervised learning with a convolutional neural network (CNN) and a resampling strategy that is on a par with some state-of-the-art V QA metrics and has promising robustness.
Abstract: Due to the 3D spatiotemporal regularities of natural videos and small-scale video quality databases, effective objective video quality assessment (VQA) metrics are difficult to obtain but highly desirable. In this paper, we propose a general-purpose no-reference VQA framework that is based on weakly supervised learning with a convolutional neural network (CNN) and a resampling strategy. First, an eight-layer CNN is trained by weakly supervised learning to construct the relationship between the deformations of the 3D discrete cosine transform of video blocks and the corresponding weak labels judged by a full-reference (FR) VQA metric. Thus, the CNN obtains the quality assessment capacity converted from the FR-VQA metric, and the effective features of the distorted videos can be extracted through the trained network. Then, we map the frequency histogram calculated from the quality score vectors predicted by the trained network onto the perceptual quality. Especially, to improve the performance of the mapping function, we transfer the frequency histogram of the distorted images and videos to resample the training set. The experiments are carried out on several widely used VQA databases. The experimental results demonstrate that the proposed method is on a par with some state-of-the-art VQA metrics and has promising robustness.

76 citations


Proceedings ArticleDOI
01 Sep 2019
TL;DR: Experimental results with respect to two publicly available video quality datasets have demonstrate that the proposed quality metric outperforms the other compared NR quality metrics.
Abstract: Video quality assessment (VQA) is a challenging task due to the complexity of modeling perceived quality characteristics in both spatial and temporal domains. A novel no-reference (NR) video quality metric (VQM) is proposed in this paper based on two deep neural networks (NN), namely 3D convolution network (3D-CNN) and a recurrent NN composed of long short-term memory (LSTM) units. 3D-CNNs are utilized to extract local spatiotemporal features from small cubic clips in video, and the features are then fed into the LSTM networks to predict the perceived video quality. Such design can elaborately tackle the issue of insufficient training data whilst also efficiently capture perceptive quality features in both spatial and temporal domains. Experimental results with respect to two publicly available video quality datasets have demonstrate that the proposed quality metric outperforms the other compared NR quality metrics.

70 citations


Proceedings ArticleDOI
Chen Li1, Mai Xu1, Lai Jiang1, Shanyi Zhang1, Xiaoming Tao2 
15 Jun 2019
TL;DR: A viewport-based convolutional neural network (V-CNN) approach for VQA on 360° video, considering both auxiliary tasks of viewport proposal and viewport saliency prediction, which validate the effectiveness of the approach and achieves comparable performance in two auxiliary tasks.
Abstract: Recent years have witnessed the growing interest in visual quality assessment (VQA) for 360° video. Unfortunately, the existing VQA approaches do not consider the facts that: 1) Observers only see viewports of 360° video, rather than patches or whole 360° frames. 2) Within the viewport, only salient regions can be perceived by observers with high resolution. Thus, this paper proposes a viewport-based convolutional neural network (V-CNN) approach for VQA on 360° video, considering both auxiliary tasks of viewport proposal and viewport saliency prediction. Our V-CNN approach is composed of two stages, i.e., viewport proposal and VQA. In the first stage, the viewport proposal network (VP-net) is developed to yield several potential viewports, seen as the first auxiliary task. In the second stage, a viewport quality network (VQ-net) is designed to rate the VQA score for each proposed viewport, in which the saliency map of the viewport is predicted and then utilized in VQA score rating. Consequently, another auxiliary task of viewport saliency prediction can be achieved. More importantly, the main task of VQA on 360° video can be accomplished via integrating the VQA scores of all viewports. The experiments validate the effectiveness of our V-CNN approach in significantly advancing the state-of-the-art performance of VQA on 360° video. In addition, our approach achieves comparable performance in two auxiliary tasks. The code of our V-CNN approach is available at https://github.com/Archer-Tatsu/V-CNN.

64 citations


Journal ArticleDOI
TL;DR: Two machine learning models to be incorporated into the MEC App for popular video prediction and radio channel quality prediction are developed, which allows to consider the effect of non-negligible round-trip times and adjust the video quality more accurately.
Abstract: ETSI multi-access edge computing (MEC) provides an IT service environment and cloud-computing capabilities at the edge of the mobile network, enabling application and content providers to deploy new use cases, such as intelligent video acceleration, with low latency and high bandwidth. Specifically, ETSI MEC introduces an MEC server that implements the edge-cloud platform to host partial server-side service logics in the form of MEC applications (MEC Apps). In this paper, we aim to implement the first proof-of-concept (PoC) in the literature for the MEC-enhanced mobile video streaming service. Our PoC consists of Android User Apps, an MEC App, and the YouTube server. The MEC App implements two main functions: popular video caching and radio analytics/video quality adaptation. The User App provides general functions of a YouTube video streaming app and can access the videos from the cache server or the YouTube server under the MEC server's guidance. In addition to the PoC implementation, this paper further develops two machine learning models to be incorporated into the MEC App for popular video prediction and radio channel quality prediction, which allows to consider the effect of non-negligible round-trip times and adjust the video quality more accurately. The experimental results justify that our models, together with other advantages from MEC, can guarantee good performance for the mobile video streaming service. Finally, we model and investigate the effectiveness of the MEC architecture for improving the quality of experience of video-streaming users.

60 citations


Proceedings ArticleDOI
18 Jun 2019
TL;DR: This work develops and presents a system for REal-time QUality of experience metric detection for Encrypted Traffic, Requet, and compares it with a baseline system and shows that Requet outperforms the baseline system in accuracy of predicting buffer low warning, video state, and video resolution.
Abstract: As video traffic dominates the Internet, it is important for operators to detect video Quality of Experience (QoE) in order to ensure adequate support for video traffic. With wide deployment of end-to-end encryption, traditional deep packet inspection based traffic monitoring approaches are becoming ineffective. This poses a challenge for network operators to monitor user QoE and improve upon their experience. To resolve this issue, we develop and present a system for REal-time QUality of experience metric detection for Encrypted Traffic, Requet. Requet uses a detection algorithm we develop to identify video and audio chunks from the IP headers of encrypted traffic. Features extracted from the chunk statistics are used as input to a Machine Learning (ML) algorithm to predict QoE metrics, specifically, buffer warning (low buffer, high buffer), video state (buffer increase, buffer decay, steady, stall), and video resolution. We collect a large YouTube dataset consisting of diverse video assets delivered over various WiFi network conditions to evaluate the performance. We compare Requet with a baseline system based on previous work and show that Requet outperforms the baseline system in accuracy of predicting buffer low warning, video state, and video resolution by 1.12X, 1.53X, and 3.14X, respectively.

Journal ArticleDOI
TL;DR: Two NR machine learning-based quality estimation models for gaming video streaming, NR-GVSQI, and NR-gVSQE, are presented and it is shown that the proposed models outperform the current state-of-the-art no-reference metrics, while also reaching a prediction accuracy comparable to the best known full reference metric.
Abstract: Recent years have seen increasing growth and popularity of gaming services, both interactive and passive. While interactive gaming video streaming applications have received much attention, passive gaming video streaming, in-spite of its huge success and growth in recent years, has seen much less interest from the research community. For the continued growth of such services in the future, it is imperative that the end user gaming quality of experience (QoE) is estimated so that it can be controlled and maximized to ensure user acceptance. Previous quality assessment studies have shown not so satisfactory performance of existing No-reference (NR) video quality assessment (VQA) metrics. Also, due to the inherent nature and different requirements of gaming video streaming applications, as well as the fact that gaming videos are perceived differently from non-gaming content (as they are usually computer generated and contain artificial/synthetic content), there is a need for application-specific light-weight, no-reference gaming video quality prediction models. In this paper, we present two NR machine learning-based quality estimation models for gaming video streaming, NR-GVSQI, and NR-GVSQE, using NR features, such as bitrate, resolution, and temporal information. We evaluate their performance on different gaming video datasets and show that the proposed models outperform the current state-of-the-art no-reference metrics, while also reaching a prediction accuracy comparable to the best known full reference metric.

Journal ArticleDOI
01 Feb 2019
TL;DR: The experimental results show that the proposed Software-Defined UAV Networking (SD-UAVNet architecture can effectively mitigate the challenges of UAVNet and it provides suitable Quality of Experience (QoE) to end-users.
Abstract: Unmanned Aerial Vehicles (UAVs) empower people to reach endangered areas under emergency situations. By collaborating with each other, multiple UAVs forming a UAV network (UAVNet) could work together to perform specific tasks in a more efficient and intelligent way than having a single UAV. UAVNets pose special characteristics of high dynamics, unstable aerial wireless links, and UAV collision probabilities. To address these challenges, we propose a Software-Defined UAV Networking (SD-UAVNet) architecture, which facilitates the management of UAV networks through a centralized SDN UAV controller. In addition, we introduce a use case scenario to evaluate the optimal UAV relay node placement for life video surveillance services with the proposed architecture. In the SD-UAVNet architecture, the controller considers the global UAV relevant context information to optimize the UAVs’ movements, selects proper routing paths, and prevents UAVs from collisions to determine the relay nodes deployment and guarantee satisfactory video quality. The experimental results show that the proposed SD-UAVNet architecture can effectively mitigate the challenges of UAVNet and it provides suitable Quality of Experience (QoE) to end-users.

Journal ArticleDOI
TL;DR: This paper proposes a video delivery strategy for dynamic streaming services which maximizes time-average streaming quality under a playback delay constraint in wireless caching networks and proves that the proposed video delivery algorithm works reliably and can control the tradeoff between video quality and playback latency.
Abstract: This paper proposes a video delivery strategy for dynamic streaming services which maximizes time-average streaming quality under a playback delay constraint in wireless caching networks. The network where popular videos encoded by scalable video coding are already stored in randomly distributed caching nodes is considered under adaptive video streaming concepts, and distance-based interference management is investigated in this paper. In this network model, a streaming user makes delay-constrained decisions depending on stochastic network states: 1) caching node for video delivery, 2) video quality, and 3) the quantity of video chunks to receive. Since wireless link activation for video delivery may introduce delays, different timescales for updating caching node association, video quality adaptation, and chunk amounts are considered. After associating with a caching node for video delivery, the streaming user chooses combinations of quality and chunk amounts in the small timescale. The dynamic decision making process for video quality and chunk amounts at each slot is modeled using Markov decision process, and the caching node decision is made based on the framework of Lyapunov optimization. Our intensive simulations verify that the proposed video delivery algorithm works reliably and also can control the tradeoff between video quality and playback latency.

Journal ArticleDOI
TL;DR: This letter studies optimal multicast of tiled 360 virtual reality (VR) video from one server (base station or access point) to multiple users to obtain globally optimal closed-form solutions of the two non-convex problems.
Abstract: In this letter, we study optimal multicast of tiled 360 virtual reality (VR) video from one server (base station or access point) to multiple users. We consider random viewing directions and random channel conditions, and adopt time division multiple access. For given video quality, we optimize the transmission time and power allocation to minimize the average transmission energy. For given transmission energy budget, we optimize the transmission time and power allocation as well as the encoding rate of each tile to maximize the received video quality. These two optimization problems are challenging non-convex problems. We obtain globally optimal closed-form solutions of the two non-convex problems, which reveal important design insights for multicast of tiled 360 VR video. Finally, numerical results demonstrate the advantage of the proposed solutions.

Journal ArticleDOI
TL;DR: Temporal down-sampling is utilized to enable both subjective and objective comparisons across a range frame rates and shows that those which explicitly account for temporal distortions provide improved correlation with subjective opinions compared to generic quality metrics such as PSNR.
Abstract: High frame rates are acknowledged to increase the perceived quality of certain video content. However, the lack of high frame rate test content has previously restricted the scope of research in this area—especially in the context of immersive video formats. This problem has been addressed through the publication of a high frame rate video database BVI-HFR, which was captured natively at 120 fps. BVI-HFR spans a variety of scenes, motions, and colors, and is shown to be representative of BBC broadcast content. In this paper, temporal down-sampling is utilized to enable both subjective and objective comparisons across a range frame rates. A large-scale subjective experiment has demonstrated that high frame rates lead to increases in perceived quality, and that a degree of content dependence exists—notably related to camera motion. Various image and video quality metrics have been benchmarked on these subjective evaluations, and analysis shows that those which explicitly account for temporal distortions (e.g., FRQM) provide improved correlation with subjective opinions compared to generic quality metrics such as PSNR.

Journal ArticleDOI
TL;DR: This paper proposes REQUEST, a video chunk request policy for Dynamic Adaptive Streaming over HTTP (DASH) in a smartphone, which can utilize both LTE and Wi-Fi and significantly outperforms other existing schemes in terms of average video bitrate, rebuffering, and resource waste.
Abstract: Exploiting both LTE and Wi-Fi links simultaneously enhances the performance of video streaming services in a smartphone. However, it is challenging to achieve seamless and high quality video while saving battery energy and LTE data usage to prolong the usage time of a smartphone. In this paper, we propose REQUEST, a video chunk request policy for Dynamic Adaptive Streaming over HTTP (DASH) in a smartphone, which can utilize both LTE and Wi-Fi. REQUEST enables seamless DASH video streaming with near optimal video quality under given budgets of battery energy and LTE data usage. Through extensive simulation and measurement in a real environment, we demonstrate that REQUEST significantly outperforms other existing schemes in terms of average video bitrate, rebuffering, and resource waste.

Proceedings ArticleDOI
Tianchi Huang1, Chao Zhou, Rui-Xiao Zhang1, Chenglei Wu1, Xin Yao1, Lifeng Sun1 
15 Oct 2019
TL;DR: This paper proposes Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling low sample efficiency and lack of awareness of the video quality information.
Abstract: Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it is still suffering from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco's neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real world experiments, we demonstrate significant improvements of Comyco's sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time.

Journal ArticleDOI
Liyang Sun1, Fanyi Duanmu1, Yong Liu1, Yao Wang1, Yinghua Ye2, Hang Shi2, David Dai2 
TL;DR: The proposed two-tier systems can achieve a high-level of quality-of-experience in the face of network bandwidth and user FoV dynamics and design periodic and adaptive optimization frameworks to adapt to the bandwidth variations and FoV prediction errors in realtime.
Abstract: 360° video on-demand streaming is a key component of the emerging virtual reality and augmented reality applications. In such applications, sending the entire 360° video demands extremely high network bandwidth that may not be affordable by today’s networks. On the other hand, sending only the predicted user’s field of view (FoV) is not viable as it is hard to achieve perfect FoV prediction in on-demand streaming, where it is better to prefetch the video multiple seconds ahead, to absorb the network bandwidth fluctuation. This paper proposes a two-tier solution, where the base tier delivers the entire 360° span at a lower quality with a long prefetching buffer, and the enhancement tier delivers the predicted FoV at a higher quality using a short buffer. The base tier provides robustness to both network bandwidth variations and FoV prediction errors. The enhancement tier improves the video quality if it is delivered in time and FoV prediction is accurate. We study the optimal rate allocation between the two tiers and buffer provisioning for the enhancement tier to achieve the optimal trade-off between video quality and streaming robustness. We also design periodic and adaptive optimization frameworks to adapt to the bandwidth variations and FoV prediction errors in realtime. Through simulations driven by real LTE and WiGig network bandwidth traces and user FoV traces, we demonstrate that the proposed two-tier systems can achieve a high-level of quality-of-experience in the face of network bandwidth and user FoV dynamics.

Proceedings ArticleDOI
11 Oct 2019
TL;DR: A large-scale measurement campaign on an operational mobile video telephony service is conducted, showing that the application-layer video codec and transport-layer protocols remain highly uncoordinated, which represents one major reason for the low QoE.
Abstract: Despite the pervasive use of real-time video telephony services, the users' quality of experience (QoE) remains unsatisfactory, especially over the mobile Internet. Previous work studied the problem via controlled experiments, while a systematic and in-depth investigation in the wild is still missing. To bridge the gap, we conduct a large-scale measurement campaign on \appname, an operational mobile video telephony service. Our measurement logs fine-grained performance metrics over 1 million video call sessions. Our analysis shows that the application-layer video codec and transport-layer protocols remain highly uncoordinated, which represents one major reason for the low QoE. We thus propose ame, a machine learning based framework to resolve the issue. Instead of blindly following the transport layer's estimation of network capacity, ame reviews historical logs of both layers, and extracts high-level features of codec/network dynamics, based on which it determines the highest bitrates for forthcoming video frames without incurring congestion. To attain the ability, we train ame with the aforementioned massive data traces using a custom-designed imitation learning algorithm, which enables ame to learn from past experience. We have implemented and incorporated ame into \appname. Our experiments show that ame outperforms state-of-the-art solutions, improving video quality while reducing stalling time by multi-folds under various practical scenarios.

Proceedings ArticleDOI
05 Jun 2019
TL;DR: A no-reference video quality machine learning model, that uses only the recorded video to predict video quality scores, that outperforms VMAF for subjective gaming QoE prediction, even though nofu does not require any reference video.
Abstract: Popularity of streaming services for gaming videos has increased tremendously over the last years, e.g. Twitch and Youtube Gaming. Compared to classical video streaming applications, gaming videos have additional requirements. For example, it is important that videos are streamed live with only a small delay. In addition, users expect low stalling, waiting time and in general high video quality during streaming, e.g. using http-based adaptive streaming. These requirements lead to different challenges for quality prediction in case of streamed gaming videos. We describe newly developed features and a no-reference video quality machine learning model, that uses only the recorded video to predict video quality scores. In different evaluation experiments we compare our proposed model nofu with state-of-the-art reduced or full reference models and metrics. In addition, we trained a no-reference baseline model using brisque+niqe features. We show that our model has a similar or better performance than other models. Furthermore, nofu outperforms VMAF for subjective gaming QoE prediction, even though nofu does not require any reference video.

Journal ArticleDOI
TL;DR: This study developed a novel architecture for no-reference VQA based on the features obtained from pretrained convolutional neural networks, transfer learning, temporal pooling, and regression, which demonstrated that the proposed method performed better than other state-of-the-art algorithms.
Abstract: Video quality assessment (VQA) is an important element of various applications ranging from automatic video streaming to display technology. Furthermore, visual quality measurements require a balanced investigation of visual content and features. Previous studies have shown that the features extracted from a pretrained convolutional neural network are highly effective for a wide range of applications in image processing and computer vision. In this study, we developed a novel architecture for no-reference VQA based on the features obtained from pretrained convolutional neural networks, transfer learning, temporal pooling, and regression. In particular, we obtained solutions by only applying temporally pooled deep features and without using manually derived features. The proposed architecture was trained based on the recently published Konstanz natural video quality database (KoNViD-1k), which contains 1200 video sequences with authentic distortion unlike other publicly available databases. The experimental results obtained based on KoNViD-1k demonstrated that the proposed method performed better than other state-of-the-art algorithms. Furthermore, these results were confirmed by tests using the LIVE VQA database, which contains artificially distorted videos.

Proceedings ArticleDOI
01 Dec 2019
TL;DR: A study of subjective and objective quality assessment of 4K ultra-high-definition videos of short duration, similar to DASH segment lengths, finds that possible models trained on this data are more general and applicable to a wider range of real world applications.
Abstract: 4K television screens or even with higher resolutions are currently available in the market. Moreover video streaming providers are able to stream videos in 4K resolution and beyond. Therefore, it becomes increasingly important to have a proper understanding of video quality especially in case of 4K videos. To this effect, in this paper, we present a study of subjective and objective quality assessment of 4K ultra-high-definition videos of short duration, similar to DASH segment lengths. As a first step, we conducted four subjective quality evaluation tests for compressed versions of the 4K videos. The videos were encoded using three different video codecs, namely H.264, HEVC, and VP9. The resolutions of the compressed videos ranged from 360p to 2160p with framerates varying from 15fps to 60fps. All the source 4K contents used were of 60fps. We included low quality conditions in terms of bitrate, resolution and framerate to ensure that the tests cover a wide range of conditions, and that e.g. possible models trained on this data are more general and applicable to a wider range of real world applications. The results of the subjective quality evaluation are analyzed to assess the impact of different factors such as bitrate, resolution, framerate, and content. In the second step, different state-of-the-art objective quality models were applied to all videos and their performance was analyzed in comparison with the subjective ratings, e.g. using Netflix's VMAF. The videos, subjective scores, both MOS and confidence interval per sequence and objective scores are made public for use by the community for further research.

Posted Content
TL;DR: In this article, the authors develop models that infer quality metrics (i.e., startup delay and resolution) for encrypted streaming video services, and demonstrate the model is practical through a 16-month deployment in 66 homes and provide new insights about the relationship between Internet speed and the quality of the corresponding video streams, for a variety of different services.
Abstract: Inferring the quality of streaming video applications is important for Internet service providers, but the fact that most video streams are encrypted makes it difficult to do so. We develop models that infer quality metrics (\ie, startup delay and resolution) for encrypted streaming video services. Our paper builds on previous work, but extends it in several ways. First, the model works in deployment settings where the video sessions and segments must be identified from a mix of traffic and the time precision of the collected traffic statistics is more coarse (\eg, due to aggregation). Second, we develop a single composite model that works for a range of different services (i.e., Netflix, YouTube, Amazon, and Twitch), as opposed to just a single service. Third, unlike many previous models, the model performs predictions at finer granularity (\eg, the precise startup delay instead of just detecting short versus long delays) allowing to draw better conclusions on the ongoing streaming quality. Fourth, we demonstrate the model is practical through a 16-month deployment in 66 homes and provide new insights about the relationships between Internet "speed" and the quality of the corresponding video streams, for a variety of services; we find that higher speeds provide only minimal improvements to startup delay and resolution.

Journal ArticleDOI
TL;DR: Assessment of YouTube as an Informative Resource on Facial Plastic Surgery Procedures YouTube is a major modality for patient education and the quality of facial plastic surgery content on YouTube has not been evaluated.
Abstract: This study investigates the video quality and creator qualification of YouTube videos about facial plastic surgery procedures.

Journal ArticleDOI
TL;DR: An application-layer scheme to jointly exploit the available bandwidth from the LTE and Wi-Fi networks in 360-degree video streaming and a novel buffer strategy is proposed to mitigate the influence of short-time prediction problem for transmitting360-degree videos in time-varying networks.

Proceedings ArticleDOI
01 Apr 2019
TL;DR: The architecture, called iView, intelligently determines video quality and reduces the latency without pre-programmed models or assumptions and advocate multimodal learning and deep reinforcement learning in the design.
Abstract: Recently, the fusion of 360° video and multi-viewpoint video, called multi-viewpoint (MVP) 360° interactive video, has emerged and created much more immersive and interactive user experience, but calls for a low latency solution to request the high-definition contents. Such viewing-related features as head movement have been recently studied, but several key issues still need to be addressed. On the viewer side, it is not clear how to effectively integrate different types of viewing-related features. At the session level, questions such as how to optimize the video quality under dynamic networking conditions and how to build an end-to-end mapping between these features and the quality selection remain to be answered. The solutions to these questions are further complicated given the many practical challenges, e.g., incomplete feature extraction and inaccurate prediction.This paper presents an architecture, called iView, to address the aforementioned issues in an MVP 360° interactive video scenario. To fully understand the viewing-related features and provide a one-step solution, we advocate multimodal learning and deep reinforcement learning in the design. iView intelligently determines video quality and reduces the latency without pre-programmed models or assumptions. We have evaluated iView with multiple real-world video and network datasets. The results showed that our solution effectively utilizes the features of video frames, networking throughput, head movements, and viewpoint selections, achieving at least 27.2%, 15.4%, and 2.8% improvements on the three video datasets, respectively, compared with several state-of-the-art methods.

Journal ArticleDOI
TL;DR: A general-purpose no-reference video quality assessment algorithm based on a long short-term memory (LSTM) network and a pretrained convolutional neural network (CNN) is introduced, which outperforms other state-of-the-art algorithms.
Abstract: A general-purpose no-reference video quality assessment algorithm based on a long short-term memory (LSTM) network and a pretrained convolutional neural network (CNN) is introduced. Considering video sequences as a time series of deep features extracted with the help of a CNN, an LSTM network is trained to predict subjective quality scores. In contrast to previous methods, the resulting algorithm was trained on the recently published Konstanz Natural Video Quality Database (KoNViD-1k), which is the only publicly available database that contains sequences with authentic distortions. The results of experiments on KoNViD-1k demonstrate that the proposed method outperforms other state-of-the-art algorithms. Furthermore, these results are also confirmed using tests on the LIVE Video Quality Assessment Database, which consists of artificially distorted videos.

Journal ArticleDOI
TL;DR: Compared with the state-of-the-art designs, the proposed design demonstrates advantages in computational complexity, bit rate, video quality, throughput, reliability, and flexibility.
Abstract: The growing demand for high-performance ultra-high-definition video coding leads to H.265/high-efficiency video coding (HEVC), where the increased computational complexity and data/timing dependence hinder its coding throughput. To address these challenges, this paper presents four algorithm adaptations and a fully parallel hardware architecture for an H.265/HEVC intra encoder. To the best of our knowledge, this is the first fully parallel H.265/HEVC intra encoder. This design supports 35 prediction modes and all coding tree unit partitions. All PUs are independently processed in four prediction engines for high parallelism. An appropriate set of intra prediction modes, RDO candidates, and CABAC rate estimate instances is assigned to each prediction engine, where internal computational tasks are pipelined and scheduled to maximize the processing throughput. Compared with the HM-15.0 software, the proposed algorithm adaptations lead to a reduction of 27% in computational workload, while the average BD-rate and BD-PSNR are 4.39% and −0.21 dB, respectively. This BD-rate is lower than the existing designs with the same video resolution. FPGA implementation of the proposed design shows that it operates at 120 MHz and supports 45 fps of 1080P video sequences using 201-K logic elements and 120-KB on-chip SRAM. ASIC implementation of the proposed design in TSMC 90-nm technology shows that its clock frequency reaches 320 MHz with a hardware gate count of 2288 K, and that it supports real-time encoding of 30 fps of 4-K video sequences. Compared with the state-of-the-art designs, our proposed design demonstrates advantages in computational complexity, bit rate, video quality, throughput, reliability, and flexibility.

Journal ArticleDOI
TL;DR: The proposed algorithm, which is completely blind (requiring no reference videos or training on subjective scores) is called the Motion and Disparity-based 3D video quality evaluator (MoDi3D), which delivers competitive performance over a wide variety of datasets, including the IRCCYN dataset, the WaterlooIVC Phase I datasets, the LFOVIA dataset, and the proposed LFOVIAS3DPh2 S3D video dataset.
Abstract: We present a new subjective and objective study on full high-definition (HD) stereoscopic (3D or S3D) video quality. In the subjective study, we constructed an S3D video dataset with 12 pristine and 288 test videos, and the test videos are generated by applying the H.264 and H.265 compression, blur, and frame freeze artifacts. We also propose a no reference (NR) objective video quality assessment (QA) algorithm that relies on measurements of the statistical dependencies between the motion and disparity subband coefficients of S3D videos. Inspired by the Generalized Gaussian Distribution (GGD) approach, we model the joint statistical dependencies between the motion and disparity components as following a Bivariate Generalized Gaussian Distribution (BGGD). We estimate the BGGD model parameters ( $\alpha,\,\beta $ ) and the coherence measure ( $\Psi $ ) from the eigenvalues of the sample covariance matrix (M) of the BGGD. In turn, we model the BGGD parameters of pristine S3D videos using a Multivariate Gaussian (MVG) distribution. The likelihood of a test video’s MVG model parameters coming from the pristine MVG model is computed and shown to play a key role in the overall quality estimation. We also estimate the global motion content of each video by averaging the SSIM scores between pairs of successive video frames. To estimate the test S3D video’s spatial quality, we apply the popular 2D NR unsupervised NIQE image QA model on a frame-by-frame basis on both views. The overall quality of a test S3D video is finally computed by pooling the test S3D video’s likelihood estimates, global motion strength, and spatial quality scores. The proposed algorithm, which is completely blind (requiring no reference videos or training on subjective scores) is called the Motion and Disparity-based 3D video quality evaluator (MoDi3D). We show that MoDi3D delivers competitive performance over a wide variety of datasets, including the IRCCYN dataset, the WaterlooIVC Phase I dataset, the LFOVIA dataset, and our proposed LFOVIAS3DPh2 S3D video dataset.