Showing papers on "Video quality published in 2019"

PDF

Open Access

Journal Article•DOI•

Two-Level Approach for No-Reference Consumer Video Quality Assessment

[...]

Jari Korhonen¹•Institutions (1)

20 Jun 2019-IEEE Transactions on Image Processing

TL;DR: A new approach for learning-based video quality assessment is proposed, based on the idea of computing features in two levels so that low complexity features are computed for the full sequence first, and then high complexity Features are extracted from a subset of representative video frames, selected by using the low complexity Features.

...read moreread less

Abstract: Smartphones and other consumer devices capable of capturing video content and sharing it on social media in nearly real time are widely available at a reasonable cost. Thus, there is a growing need for no-reference video quality assessment (NR-VQA) of consumer produced video content, typically characterized by capture impairments that are qualitatively different from those observed in professionally produced video content. To date, most of the NR-VQA models in prior art have been developed for assessing coding and transmission distortions, rather than capture impairments. In addition, the most accurate NR-VQA methods known in prior art are often computationally complex, and therefore impractical for many real life applications. In this paper, we propose a new approach for learning-based video quality assessment, based on the idea of computing features in two levels so that low complexity features are computed for the full sequence first, and then high complexity features are extracted from a subset of representative video frames, selected by using the low complexity features. We have compared the proposed method against several relevant benchmark methods using three recently published annotated public video quality databases, and our results show that the proposed method can predict subjective video quality more accurately than the benchmark methods. The best performing prior method achieves nearly similar accuracy, but at substantially higher computational cost.

...read moreread less

203 citations

Journal Article•DOI•

Large-Scale Study of Perceptual Video Quality

[...]

Zeina Sinno¹, Alan C. Bovik¹•Institutions (1)

University of Texas at Austin¹

01 Feb 2019-IEEE Transactions on Image Processing

TL;DR: The live video quality challenge database (LIVE-VQC) as mentioned in this paper is a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions.

...read moreread less

Abstract: The great variations of videographic skills in videography, camera designs, compression and processing protocols, communication and bandwidth environments, and displays leads to an enormous variety of video impairments. Current no-reference (NR) video quality models are unable to handle this diversity of distortions. This is true in part because available video quality assessment databases contain very limited content, fixed resolutions, were captured using a small number of camera devices by a few videographers and have been subjected to a modest number of distortions. As such, these databases fail to adequately represent real world videos, which contain very different kinds of content obtained under highly diverse imaging conditions and are subject to authentic, complex, and often commingled distortions that are difficult or impossible to simulate. As a result, NR video quality predictors tested on real-world video data often perform poorly. Toward advancing NR video quality prediction, we have constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions. We collected a large number of subjective video quality scores via crowdsourcing. A total of 4776 unique participants took part in the study, yielding over 205 000 opinion scores, resulting in an average of 240 recorded human opinions per video. We demonstrate the value of the new resource, which we call the live video quality challenge database (LIVE-VQC), by conducting a comparison with leading NR video quality predictors on it. This paper is the largest video quality assessment study ever conducted along several key dimensions: number of unique contents, capture devices, distortion types and combinations of distortions, study participants, and recorded subjective scores. The database is available for download on this link: http://live.ece.utexas.edu/research/LIVEVQC/index.html .

...read moreread less

176 citations

Proceedings Article•DOI•

Quality Assessment of In-the-Wild Videos

[...]

Dingquan Li¹, Tingting Jiang¹, Ming Jiang¹•Institutions (1)

Peking University¹

15 Oct 2019

TL;DR: This work proposes an objective no-reference video quality assessment method by integrating both effects of content-dependency and temporal-memory effects into a deep neural network, which outperforms five state-of-the-art methods by a large margin.

...read moreread less

Abstract: Quality assessment of in-the-wild videos is a challenging problem because of the absence of reference videos and shooting distortions. Knowledge of the human visual system can help establish methods for objective quality assessment of in-the-wild videos. In this work, we show two eminent effects of the human visual system, namely, content-dependency and temporal-memory effects, could be used for this purpose. We propose an objective no-reference video quality assessment method by integrating both effects into a deep neural network. For content-dependency, we extract features from a pre-trained image classification neural network for its inherent content-aware property. For temporal-memory effects, long-term dependencies, especially the temporal hysteresis, are integrated into the network with a gated recurrent unit and a subjectively-inspired temporal pooling layer. To validate the performance of our method, experiments are conducted on three publicly available in-the-wild video quality assessment databases: KoNViD-1k, CVD2014, and LIVE-Qualcomm, respectively. Experimental results demonstrate that our proposed method outperforms five state-of-the-art methods by a large margin, specifically, 12.39%, 15.71%, 15.45%, and 18.09% overall performance improvements over the second-best method VBLIINDS, in terms of SROCC, KROCC, PLCC and RMSE, respectively. Moreover, the ablation study verifies the crucial role of both the content-aware features and the modeling of temporal-memory effects. The PyTorch implementation of our method is released at https://github.com/lidq92/VSFA.

...read moreread less

170 citations

Journal Article•DOI•

QoE Modeling for HTTP Adaptive Video Streaming–A Survey and Open Challenges

[...]

Nabajeet Barman¹, Maria G. Martini¹•Institutions (1)

Kingston University¹

25 Mar 2019-IEEE Access

TL;DR: A comprehensive overview of recent and currently undergoing works in the field of QoE modeling for HTTP adaptive streaming is presented, as well as existing challenges and shortcomings.

...read moreread less

Abstract: With the recent increased usage of video services, the focus has recently shifted from the traditional quality of service-based video delivery to quality of experience (QoE)-based video delivery. Over the past 15 years, many video quality assessment metrics have been proposed with the goal to predict the video quality as perceived by the end user. HTTP adaptive streaming (HAS) has recently gained much attention and is currently used by the majority of video streaming services, such as Netflix and YouTube. HAS, using reliable transport protocols, such as TCP, does not suffer from image artifacts due to packet losses, which are common in traditional streaming technologies. Hence, the QoE models developed for other streaming technologies alone are not sufficient. Recently, many works have focused on developing QoE models targeting HAS-based applications. Also, the recently published ITU-T Recommendation series P.1203 proposes a parametric bitstream-based model for the quality assessment of progressive download and adaptive audiovisual streaming services over a reliable transport. The main contribution of this paper is to present a comprehensive overview of recent and currently undergoing works in the field of QoE modeling for HAS. The HAS QoE models, influence factors, and subjective test methodologies are discussed, as well as existing challenges and shortcomings. The survey can serve as a guideline for researchers interested in QoE modeling for HAS and also discusses possible future work.

...read moreread less

112 citations

Journal Article•DOI•

Spatiotemporal Feature Integration and Model Fusion for Full Reference Video Quality Assessment

[...]

Christos G. Bampis¹, Zhi Li¹, Alan C. Bovik²•Institutions (2)

Netflix¹, University of Texas at Austin²

01 Aug 2019-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: In rigorous experiments, the proposed algorithms demonstrate the state-of-the-art performance on multiple video applications and are made available as a part of the open source package in https://github.com/Netflix/vmaf.

...read moreread less

Abstract: The recently developed video multi-method assessment fusion (VMAF) framework integrates multiple quality-aware features to accurately predict the video quality. However, the VMAF does not yet exploit important principles of temporal perception that are relevant to the perceptual video distortion measurement. Here, we propose two improvements to the VMAF framework, called spatiotemporal VMAF and ensemble VMAF, which leverage perceptually-motivated space–time features that are efficiently calculated at multiple scales. We also conducted a large subjective video study, which we have found to be an excellent resource for training our feature-based approaches. In rigorous experiments, we found that the proposed algorithms demonstrate the state-of-the-art performance on multiple video applications. The compared algorithms will be made available as a part of the open source package in https://github.com/Netflix/vmaf .

...read moreread less

90 citations

Journal Article•DOI•

A Subjective and Objective Study of Stalling Events in Mobile Streaming Videos

[...]

Deepti Ghadiyaram¹, Janice Pan¹, Alan C. Bovik¹•Institutions (1)

University of Texas at Austin¹

01 Jan 2019-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A new mobile video quality database that contains videos afflicted with distortions caused by 26 different stalling patterns, and is making the database publicly available in order to help the advance state-of-the-art research on user-centric mobile network planning and management.

...read moreread less

Abstract: Over-the-top mobile adaptive video streaming is invariably influenced by volatile network conditions, which can cause playback interruptions (stalling or rebuffering events) and bitrate fluctuations, thereby impairing users’ quality of experience (QoE). Video quality assessment models that can accurately predict users’ QoE under such volatile network conditions are rapidly gaining attention, since these methods could enable more efficient design of quality control protocols for media-driven services such as YouTube, Amazon, Netflix, and many others. However, the development of improved QoE prediction models requires data sets of videos afflicted with diverse stalling events that have been labeled with ground-truth subjective opinion scores. Toward this end, we have created a new mobile video quality database that we call LIVE Mobile Stall Video Database-II. Our database contains a total of 174 videos afflicted with distortions caused by 26 different stalling patterns. We describe the way we simulated the diverse stalling events to create a corpus of distorted videos, and we detail the human study we conducted to obtain continuous-time subjective scores from 54 subjects. We also present the outcomes of our comprehensive analysis of the impact of several factors that influence subjective QoE, and report the performance of existing QoE-prediction models on our data set. We are making the database (videos, subjective data, and video metadata) publicly available in order to help the advance state-of-the-art research on user-centric mobile network planning and management. The database may be accessed at http://live.ece.utexas.edu/research/LIVEStallStudy/liveMobile.html .

...read moreread less

87 citations

Journal Article•DOI•

Blind Video Quality Assessment With Weakly Supervised Learning and Resampling Strategy

[...]

Yu Zhang¹, Xinbo Gao¹, Lihuo He¹, Wen Lu¹, Ran He¹ - Show less +1 more•Institutions (1)

Xidian University¹

01 Aug 2019-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A general-purpose no-reference VQA framework that is based on weakly supervised learning with a convolutional neural network (CNN) and a resampling strategy that is on a par with some state-of-the-art V QA metrics and has promising robustness.

...read moreread less

Abstract: Due to the 3D spatiotemporal regularities of natural videos and small-scale video quality databases, effective objective video quality assessment (VQA) metrics are difficult to obtain but highly desirable. In this paper, we propose a general-purpose no-reference VQA framework that is based on weakly supervised learning with a convolutional neural network (CNN) and a resampling strategy. First, an eight-layer CNN is trained by weakly supervised learning to construct the relationship between the deformations of the 3D discrete cosine transform of video blocks and the corresponding weak labels judged by a full-reference (FR) VQA metric. Thus, the CNN obtains the quality assessment capacity converted from the FR-VQA metric, and the effective features of the distorted videos can be extracted through the trained network. Then, we map the frequency histogram calculated from the quality score vectors predicted by the trained network onto the perceptual quality. Especially, to improve the performance of the mapping function, we transfer the frequency histogram of the distorted images and videos to resample the training set. The experiments are carried out on several widely used VQA databases. The experimental results demonstrate that the proposed method is on a par with some state-of-the-art VQA metrics and has promising robustness.

...read moreread less

76 citations

Proceedings Article•DOI•

Deep Neural Networks for No-Reference Video Quality Assessment

[...]

Junyong You, Jari Korhonen¹•Institutions (1)

Shenzhen University¹

01 Sep 2019

TL;DR: Experimental results with respect to two publicly available video quality datasets have demonstrate that the proposed quality metric outperforms the other compared NR quality metrics.

...read moreread less

Abstract: Video quality assessment (VQA) is a challenging task due to the complexity of modeling perceived quality characteristics in both spatial and temporal domains. A novel no-reference (NR) video quality metric (VQM) is proposed in this paper based on two deep neural networks (NN), namely 3D convolution network (3D-CNN) and a recurrent NN composed of long short-term memory (LSTM) units. 3D-CNNs are utilized to extract local spatiotemporal features from small cubic clips in video, and the features are then fed into the LSTM networks to predict the perceived video quality. Such design can elaborately tackle the issue of insufficient training data whilst also efficiently capture perceptive quality features in both spatial and temporal domains. Experimental results with respect to two publicly available video quality datasets have demonstrate that the proposed quality metric outperforms the other compared NR quality metrics.

...read moreread less

70 citations

Proceedings Article•DOI•

Viewport Proposal CNN for 360° Video Quality Assessment

[...]

Chen Li¹, Mai Xu¹, Lai Jiang¹, Shanyi Zhang¹, Xiaoming Tao² - Show less +1 more•Institutions (2)

Beihang University¹, Tsinghua University²

15 Jun 2019

TL;DR: A viewport-based convolutional neural network (V-CNN) approach for VQA on 360° video, considering both auxiliary tasks of viewport proposal and viewport saliency prediction, which validate the effectiveness of the approach and achieves comparable performance in two auxiliary tasks.

...read moreread less

Abstract: Recent years have witnessed the growing interest in visual quality assessment (VQA) for 360° video. Unfortunately, the existing VQA approaches do not consider the facts that: 1) Observers only see viewports of 360° video, rather than patches or whole 360° frames. 2) Within the viewport, only salient regions can be perceived by observers with high resolution. Thus, this paper proposes a viewport-based convolutional neural network (V-CNN) approach for VQA on 360° video, considering both auxiliary tasks of viewport proposal and viewport saliency prediction. Our V-CNN approach is composed of two stages, i.e., viewport proposal and VQA. In the first stage, the viewport proposal network (VP-net) is developed to yield several potential viewports, seen as the first auxiliary task. In the second stage, a viewport quality network (VQ-net) is designed to rate the VQA score for each proposed viewport, in which the saliency map of the viewport is predicted and then utilized in VQA score rating. Consequently, another auxiliary task of viewport saliency prediction can be achieved. More importantly, the main task of VQA on 360° video can be accomplished via integrating the VQA scores of all viewports. The experiments validate the effectiveness of our V-CNN approach in significantly advancing the state-of-the-art performance of VQA on 360° video. In addition, our approach achieves comparable performance in two auxiliary tasks. The code of our V-CNN approach is available at https://github.com/Archer-Tatsu/V-CNN.

...read moreread less

64 citations

Journal Article•DOI•

Multi-Access Edge Computing Enhanced Video Streaming: Proof-of-Concept Implementation and Prediction/QoE Models

[...]

Shun-Ren Yang¹, Yu-Ju Tseng, Chen-Chia Huang¹, Wo-Chien Lin¹•Institutions (1)

National Tsing Hua University¹

01 Feb 2019-IEEE Transactions on Vehicular Technology

TL;DR: Two machine learning models to be incorporated into the MEC App for popular video prediction and radio channel quality prediction are developed, which allows to consider the effect of non-negligible round-trip times and adjust the video quality more accurately.

...read moreread less

Abstract: ETSI multi-access edge computing (MEC) provides an IT service environment and cloud-computing capabilities at the edge of the mobile network, enabling application and content providers to deploy new use cases, such as intelligent video acceleration, with low latency and high bandwidth. Specifically, ETSI MEC introduces an MEC server that implements the edge-cloud platform to host partial server-side service logics in the form of MEC applications (MEC Apps). In this paper, we aim to implement the first proof-of-concept (PoC) in the literature for the MEC-enhanced mobile video streaming service. Our PoC consists of Android User Apps, an MEC App, and the YouTube server. The MEC App implements two main functions: popular video caching and radio analytics/video quality adaptation. The User App provides general functions of a YouTube video streaming app and can access the videos from the cache server or the YouTube server under the MEC server's guidance. In addition to the PoC implementation, this paper further develops two machine learning models to be incorporated into the MEC App for popular video prediction and radio channel quality prediction, which allows to consider the effect of non-negligible round-trip times and adjust the video quality more accurately. The experimental results justify that our models, together with other advantages from MEC, can guarantee good performance for the mobile video streaming service. Finally, we model and investigate the effectiveness of the MEC architecture for improving the quality of experience of video-streaming users.

...read moreread less

60 citations

Proceedings Article•DOI•

Requet: real-time QoE detection for encrypted YouTube traffic

[...]

Craig Gutterman¹, Katherine Guo², Sarthak Arora¹, Xiaoyang Wang², Les Wu², Ethan Katz-Bassett¹, Gil Zussman¹ - Show less +3 more•Institutions (2)

Columbia University¹, Bell Labs²

18 Jun 2019

TL;DR: This work develops and presents a system for REal-time QUality of experience metric detection for Encrypted Traffic, Requet, and compares it with a baseline system and shows that Requet outperforms the baseline system in accuracy of predicting buffer low warning, video state, and video resolution.

...read moreread less

Abstract: As video traffic dominates the Internet, it is important for operators to detect video Quality of Experience (QoE) in order to ensure adequate support for video traffic. With wide deployment of end-to-end encryption, traditional deep packet inspection based traffic monitoring approaches are becoming ineffective. This poses a challenge for network operators to monitor user QoE and improve upon their experience. To resolve this issue, we develop and present a system for REal-time QUality of experience metric detection for Encrypted Traffic, Requet. Requet uses a detection algorithm we develop to identify video and audio chunks from the IP headers of encrypted traffic. Features extracted from the chunk statistics are used as input to a Machine Learning (ML) algorithm to predict QoE metrics, specifically, buffer warning (low buffer, high buffer), video state (buffer increase, buffer decay, steady, stall), and video resolution. We collect a large YouTube dataset consisting of diverse video assets delivered over various WiFi network conditions to evaluate the performance. We compare Requet with a baseline system based on previous work and show that Requet outperforms the baseline system in accuracy of predicting buffer low warning, video state, and video resolution by 1.12X, 1.53X, and 3.14X, respectively.

...read moreread less

Journal Article•DOI•

No-Reference Video Quality Estimation Based on Machine Learning for Passive Gaming Video Streaming Applications

[...]

Nabajeet Barman¹, Emmanuel Jammeh¹, Seyed Ali Ghorashi¹, Maria G. Martini¹•Institutions (1)

Kingston University¹

03 Jun 2019-IEEE Access

TL;DR: Two NR machine learning-based quality estimation models for gaming video streaming, NR-GVSQI, and NR-gVSQE, are presented and it is shown that the proposed models outperform the current state-of-the-art no-reference metrics, while also reaching a prediction accuracy comparable to the best known full reference metric.

...read moreread less

Abstract: Recent years have seen increasing growth and popularity of gaming services, both interactive and passive. While interactive gaming video streaming applications have received much attention, passive gaming video streaming, in-spite of its huge success and growth in recent years, has seen much less interest from the research community. For the continued growth of such services in the future, it is imperative that the end user gaming quality of experience (QoE) is estimated so that it can be controlled and maximized to ensure user acceptance. Previous quality assessment studies have shown not so satisfactory performance of existing No-reference (NR) video quality assessment (VQA) metrics. Also, due to the inherent nature and different requirements of gaming video streaming applications, as well as the fact that gaming videos are perceived differently from non-gaming content (as they are usually computer generated and contain artificial/synthetic content), there is a need for application-specific light-weight, no-reference gaming video quality prediction models. In this paper, we present two NR machine learning-based quality estimation models for gaming video streaming, NR-GVSQI, and NR-GVSQE, using NR features, such as bitrate, resolution, and temporal information. We evaluate their performance on different gaming video datasets and show that the proposed models outperform the current state-of-the-art no-reference metrics, while also reaching a prediction accuracy comparable to the best known full reference metric.

...read moreread less

Journal Article•DOI•

Software-Defined Unmanned Aerial Vehicles Networking for Video Dissemination Services

[...]

Zhongliang Zhao¹, Pedro Cumino², Arnaldo Souza², Denis Rosário², Torsten Braun¹, Eduardo Cerqueira², Mario Gerla³ - Show less +3 more•Institutions (3)

University of Bern¹, Federal University of Pará², University of California, Los Angeles³

01 Feb 2019

TL;DR: The experimental results show that the proposed Software-Defined UAV Networking (SD-UAVNet architecture can effectively mitigate the challenges of UAVNet and it provides suitable Quality of Experience (QoE) to end-users.

...read moreread less

Abstract: Unmanned Aerial Vehicles (UAVs) empower people to reach endangered areas under emergency situations. By collaborating with each other, multiple UAVs forming a UAV network (UAVNet) could work together to perform specific tasks in a more efficient and intelligent way than having a single UAV. UAVNets pose special characteristics of high dynamics, unstable aerial wireless links, and UAV collision probabilities. To address these challenges, we propose a Software-Defined UAV Networking (SD-UAVNet) architecture, which facilitates the management of UAV networks through a centralized SDN UAV controller. In addition, we introduce a use case scenario to evaluate the optimal UAV relay node placement for life video surveillance services with the proposed architecture. In the SD-UAVNet architecture, the controller considers the global UAV relevant context information to optimize the UAVs’ movements, selects proper routing paths, and prevents UAVs from collisions to determine the relay nodes deployment and guarantee satisfactory video quality. The experimental results show that the proposed SD-UAVNet architecture can effectively mitigate the challenges of UAVNet and it provides suitable Quality of Experience (QoE) to end-users.

...read moreread less

Journal Article•DOI•

Markov Decision Policies for Dynamic Video Delivery in Wireless Caching Networks

[...]

Minseok Choi¹, Albert No², Mingyue Ji³, Joongheon Kim⁴•Institutions (4)

University of Southern California¹, Hongik University², University of Utah³, Korea University⁴

09 Sep 2019-IEEE Transactions on Wireless Communications

TL;DR: This paper proposes a video delivery strategy for dynamic streaming services which maximizes time-average streaming quality under a playback delay constraint in wireless caching networks and proves that the proposed video delivery algorithm works reliably and can control the tradeoff between video quality and playback latency.

...read moreread less

Abstract: This paper proposes a video delivery strategy for dynamic streaming services which maximizes time-average streaming quality under a playback delay constraint in wireless caching networks. The network where popular videos encoded by scalable video coding are already stored in randomly distributed caching nodes is considered under adaptive video streaming concepts, and distance-based interference management is investigated in this paper. In this network model, a streaming user makes delay-constrained decisions depending on stochastic network states: 1) caching node for video delivery, 2) video quality, and 3) the quantity of video chunks to receive. Since wireless link activation for video delivery may introduce delays, different timescales for updating caching node association, video quality adaptation, and chunk amounts are considered. After associating with a caching node for video delivery, the streaming user chooses combinations of quality and chunk amounts in the small timescale. The dynamic decision making process for video quality and chunk amounts at each slot is modeled using Markov decision process, and the caching node decision is made based on the framework of Lyapunov optimization. Our intensive simulations verify that the proposed video delivery algorithm works reliably and also can control the tradeoff between video quality and playback latency.

...read moreread less

Journal Article•DOI•

Optimal Multicast of Tiled 360 VR Video

[...]

Chengjun Guo¹, Ying Cui¹, Zhi Liu²•Institutions (2)

Shanghai Jiao Tong University¹, Shizuoka University²

01 Feb 2019-IEEE Wireless Communications Letters

TL;DR: This letter studies optimal multicast of tiled 360 virtual reality (VR) video from one server (base station or access point) to multiple users to obtain globally optimal closed-form solutions of the two non-convex problems.

...read moreread less

Abstract: In this letter, we study optimal multicast of tiled 360 virtual reality (VR) video from one server (base station or access point) to multiple users. We consider random viewing directions and random channel conditions, and adopt time division multiple access. For given video quality, we optimize the transmission time and power allocation to minimize the average transmission energy. For given transmission energy budget, we optimize the transmission time and power allocation as well as the encoding rate of each tile to maximize the received video quality. These two optimization problems are challenging non-convex problems. We obtain globally optimal closed-form solutions of the two non-convex problems, which reveal important design insights for multicast of tiled 360 VR video. Finally, numerical results demonstrate the advantage of the proposed solutions.

...read moreread less

Journal Article•DOI•

A Study of High Frame Rate Video Formats

[...]

Alex Mackin¹, Fan Zhang¹, David Bull¹•Institutions (1)

University of Bristol¹

01 Jun 2019-IEEE Transactions on Multimedia

TL;DR: Temporal down-sampling is utilized to enable both subjective and objective comparisons across a range frame rates and shows that those which explicitly account for temporal distortions provide improved correlation with subjective opinions compared to generic quality metrics such as PSNR.

...read moreread less

Abstract: High frame rates are acknowledged to increase the perceived quality of certain video content. However, the lack of high frame rate test content has previously restricted the scope of research in this area—especially in the context of immersive video formats. This problem has been addressed through the publication of a high frame rate video database BVI-HFR, which was captured natively at 120 fps. BVI-HFR spans a variety of scenes, motions, and colors, and is shown to be representative of BBC broadcast content. In this paper, temporal down-sampling is utilized to enable both subjective and objective comparisons across a range frame rates. A large-scale subjective experiment has demonstrated that high frame rates lead to increases in perceived quality, and that a degree of content dependence exists—notably related to camera motion. Various image and video quality metrics have been benchmarked on these subjective evaluations, and analysis shows that those which explicitly account for temporal distortions (e.g., FRQM) provide improved correlation with subjective opinions compared to generic quality metrics such as PSNR.

...read moreread less

Journal Article•DOI•

Seamless Dynamic Adaptive Streaming in LTE/Wi-Fi Integrated Network under Smartphone Resource Constraints

[...]

Jonghoe Koo¹, Juheon Yi², Joongheon Kim³, Mohammad A. Hoque⁴, Sunghyun Choi² - Show less +1 more•Institutions (4)

Samsung¹, Seoul National University², Chung-Ang University³, University of Helsinki⁴

01 Jul 2019-IEEE Transactions on Mobile Computing

TL;DR: This paper proposes REQUEST, a video chunk request policy for Dynamic Adaptive Streaming over HTTP (DASH) in a smartphone, which can utilize both LTE and Wi-Fi and significantly outperforms other existing schemes in terms of average video bitrate, rebuffering, and resource waste.

...read moreread less

Abstract: Exploiting both LTE and Wi-Fi links simultaneously enhances the performance of video streaming services in a smartphone. However, it is challenging to achieve seamless and high quality video while saving battery energy and LTE data usage to prolong the usage time of a smartphone. In this paper, we propose REQUEST, a video chunk request policy for Dynamic Adaptive Streaming over HTTP (DASH) in a smartphone, which can utilize both LTE and Wi-Fi. REQUEST enables seamless DASH video streaming with near optimal video quality under given budgets of battery energy and LTE data usage. Through extensive simulation and measurement in a real environment, we demonstrate that REQUEST significantly outperforms other existing schemes in terms of average video bitrate, rebuffering, and resource waste.

...read moreread less

Proceedings Article•DOI•

Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning

[...]

Tianchi Huang¹, Chao Zhou, Rui-Xiao Zhang¹, Chenglei Wu¹, Xin Yao¹, Lifeng Sun¹ - Show less +2 more•Institutions (1)

Tsinghua University¹

15 Oct 2019

TL;DR: This paper proposes Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling low sample efficiency and lack of awareness of the video quality information.

...read moreread less

Abstract: Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding strategies without any presumptions, has become one of the research hotspots for adaptive streaming. However, it is still suffering from several issues, i.e., low sample efficiency and lack of awareness of the video quality information. In this paper, we propose Comyco, a video quality-aware ABR approach that enormously improves the learning-based methods by tackling the above issues. Comyco trains the policy via imitating expert trajectories given by the instant solver, which can not only avoid redundant exploration but also make better use of the collected samples. Meanwhile, Comyco attempts to pick the chunk with higher perceptual video qualities rather than video bitrates. To achieve this, we construct Comyco's neural network architecture, video datasets and QoE metrics with video quality features. Using trace-driven and real world experiments, we demonstrate significant improvements of Comyco's sample efficiency in comparison to prior work, with 1700x improvements in terms of the number of samples required and 16x improvements on training time required. Moreover, results illustrate that Comyco outperforms previously proposed methods, with the improvements on average QoE of 7.5% - 16.79%. Especially, Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average video quality under the same rebuffering time.

...read moreread less

Journal Article•DOI•

A Two-Tier System for On-Demand Streaming of 360 Degree Video Over Dynamic Networks

[...]

Liyang Sun¹, Fanyi Duanmu¹, Yong Liu¹, Yao Wang¹, Yinghua Ye², Hang Shi², David Dai² - Show less +3 more•Institutions (2)

New York University¹, Huawei²

12 Feb 2019-IEEE Journal on Emerging and Selected Topics in Circuits and Systems

TL;DR: The proposed two-tier systems can achieve a high-level of quality-of-experience in the face of network bandwidth and user FoV dynamics and design periodic and adaptive optimization frameworks to adapt to the bandwidth variations and FoV prediction errors in realtime.

...read moreread less

Abstract: 360° video on-demand streaming is a key component of the emerging virtual reality and augmented reality applications. In such applications, sending the entire 360° video demands extremely high network bandwidth that may not be affordable by today’s networks. On the other hand, sending only the predicted user’s field of view (FoV) is not viable as it is hard to achieve perfect FoV prediction in on-demand streaming, where it is better to prefetch the video multiple seconds ahead, to absorb the network bandwidth fluctuation. This paper proposes a two-tier solution, where the base tier delivers the entire 360° span at a lower quality with a long prefetching buffer, and the enhancement tier delivers the predicted FoV at a higher quality using a short buffer. The base tier provides robustness to both network bandwidth variations and FoV prediction errors. The enhancement tier improves the video quality if it is delivered in time and FoV prediction is accurate. We study the optimal rate allocation between the two tiers and buffer provisioning for the enhancement tier to achieve the optimal trade-off between video quality and streaming robustness. We also design periodic and adaptive optimization frameworks to adapt to the bandwidth variations and FoV prediction errors in realtime. Through simulations driven by real LTE and WiGig network bandwidth traces and user FoV traces, we demonstrate that the proposed two-tier systems can achieve a high-level of quality-of-experience in the face of network bandwidth and user FoV dynamics.

...read moreread less

Proceedings Article•DOI•

Learning to Coordinate Video Codec with Transport Protocol for Mobile Video Telephony

[...]

Anfu Zhou¹, Huanhuan Zhang¹, Guangyuan Su¹, Leilei Wu¹, Ruoxuan Ma¹, Zhen Meng¹, Xinyu Zhang², Xiufeng Xie³, Huadong Ma¹, Xiaojiang Chen⁴ - Show less +6 more•Institutions (4)

Beijing University of Posts and Telecommunications¹, University of California, San Diego², Hewlett-Packard³, Alibaba Group⁴

11 Oct 2019

TL;DR: A large-scale measurement campaign on an operational mobile video telephony service is conducted, showing that the application-layer video codec and transport-layer protocols remain highly uncoordinated, which represents one major reason for the low QoE.

...read moreread less

Abstract: Despite the pervasive use of real-time video telephony services, the users' quality of experience (QoE) remains unsatisfactory, especially over the mobile Internet. Previous work studied the problem via controlled experiments, while a systematic and in-depth investigation in the wild is still missing. To bridge the gap, we conduct a large-scale measurement campaign on \appname, an operational mobile video telephony service. Our measurement logs fine-grained performance metrics over 1 million video call sessions. Our analysis shows that the application-layer video codec and transport-layer protocols remain highly uncoordinated, which represents one major reason for the low QoE. We thus propose ame, a machine learning based framework to resolve the issue. Instead of blindly following the transport layer's estimation of network capacity, ame reviews historical logs of both layers, and extracts high-level features of codec/network dynamics, based on which it determines the highest bitrates for forthcoming video frames without incurring congestion. To attain the ability, we train ame with the aforementioned massive data traces using a custom-designed imitation learning algorithm, which enables ame to learn from past experience. We have implemented and incorporated ame into \appname. Our experiments show that ame outperforms state-of-the-art solutions, improving video quality while reducing stalling time by multi-folds under various practical scenarios.

...read moreread less

Proceedings Article•DOI•

nofu — A Lightweight No-Reference Pixel Based Video Quality Model for Gaming Content

[...]

Steve Goring, Rakesh Rao Ramachandra Rao, Alexander Raake

05 Jun 2019

TL;DR: A no-reference video quality machine learning model, that uses only the recorded video to predict video quality scores, that outperforms VMAF for subjective gaming QoE prediction, even though nofu does not require any reference video.

...read moreread less

Abstract: Popularity of streaming services for gaming videos has increased tremendously over the last years, e.g. Twitch and Youtube Gaming. Compared to classical video streaming applications, gaming videos have additional requirements. For example, it is important that videos are streamed live with only a small delay. In addition, users expect low stalling, waiting time and in general high video quality during streaming, e.g. using http-based adaptive streaming. These requirements lead to different challenges for quality prediction in case of streamed gaming videos. We describe newly developed features and a no-reference video quality machine learning model, that uses only the recorded video to predict video quality scores. In different evaluation experiments we compare our proposed model nofu with state-of-the-art reduced or full reference models and metrics. In addition, we trained a no-reference baseline model using brisque+niqe features. We show that our model has a similar or better performance than other models. Furthermore, nofu outperforms VMAF for subjective gaming QoE prediction, even though nofu does not require any reference video.

...read moreread less

Journal Article•DOI•

No-Reference Video Quality Assessment Based on the Temporal Pooling of Deep Features

[...]

Domonkos Varga¹•Institutions (1)

Budapest University of Technology and Economics¹

12 Apr 2019-Neural Processing Letters

TL;DR: This study developed a novel architecture for no-reference VQA based on the features obtained from pretrained convolutional neural networks, transfer learning, temporal pooling, and regression, which demonstrated that the proposed method performed better than other state-of-the-art algorithms.

...read moreread less

Abstract: Video quality assessment (VQA) is an important element of various applications ranging from automatic video streaming to display technology. Furthermore, visual quality measurements require a balanced investigation of visual content and features. Previous studies have shown that the features extracted from a pretrained convolutional neural network are highly effective for a wide range of applications in image processing and computer vision. In this study, we developed a novel architecture for no-reference VQA based on the features obtained from pretrained convolutional neural networks, transfer learning, temporal pooling, and regression. In particular, we obtained solutions by only applying temporally pooled deep features and without using manually derived features. The proposed architecture was trained based on the recently published Konstanz natural video quality database (KoNViD-1k), which contains 1200 video sequences with authentic distortion unlike other publicly available databases. The experimental results obtained based on KoNViD-1k demonstrated that the proposed method performed better than other state-of-the-art algorithms. Furthermore, these results were confirmed by tests using the LIVE VQA database, which contains artificially distorted videos.

...read moreread less

Proceedings Article•DOI•

AVT-VQDB-UHD-1: A Large Scale Video Quality Database for UHD-1

[...]

Rakesh Rao Ramachandra Rao¹, Steve Goring¹, Werner Robitza¹, Bernhard Feiten², Alexander Raake¹ - Show less +1 more•Institutions (2)

Technische Universität Ilmenau¹, Deutsche Telekom²

01 Dec 2019

TL;DR: A study of subjective and objective quality assessment of 4K ultra-high-definition videos of short duration, similar to DASH segment lengths, finds that possible models trained on this data are more general and applicable to a wider range of real world applications.

...read moreread less

Abstract: 4K television screens or even with higher resolutions are currently available in the market. Moreover video streaming providers are able to stream videos in 4K resolution and beyond. Therefore, it becomes increasingly important to have a proper understanding of video quality especially in case of 4K videos. To this effect, in this paper, we present a study of subjective and objective quality assessment of 4K ultra-high-definition videos of short duration, similar to DASH segment lengths. As a first step, we conducted four subjective quality evaluation tests for compressed versions of the 4K videos. The videos were encoded using three different video codecs, namely H.264, HEVC, and VP9. The resolutions of the compressed videos ranged from 360p to 2160p with framerates varying from 15fps to 60fps. All the source 4K contents used were of 60fps. We included low quality conditions in terms of bitrate, resolution and framerate to ensure that the tests cover a wide range of conditions, and that e.g. possible models trained on this data are more general and applicable to a wider range of real world applications. The results of the subjective quality evaluation are analyzed to assess the impact of different factors such as bitrate, resolution, framerate, and content. In the second step, different state-of-the-art objective quality models were applied to all videos and their performance was analyzed in comparison with the subjective ratings, e.g. using Netflix's VMAF. The videos, subjective scores, both MOS and confidence interval per sequence and objective scores are made public for use by the community for further research.

...read moreread less

Posted Content•

Inferring Streaming Video Quality from Encrypted Traffic: Practical Models and Deployment Experience.

[...]

Paul Schmitt, Francesco Bronzino, Sara Ayoubi, Guilherme B. Martins, Renata Teixeira, Nick Feamster - Show less +2 more

17 Jan 2019-arXiv: Networking and Internet Architecture

TL;DR: In this article, the authors develop models that infer quality metrics (i.e., startup delay and resolution) for encrypted streaming video services, and demonstrate the model is practical through a 16-month deployment in 66 homes and provide new insights about the relationship between Internet speed and the quality of the corresponding video streams, for a variety of different services.

...read moreread less

Abstract: Inferring the quality of streaming video applications is important for Internet service providers, but the fact that most video streams are encrypted makes it difficult to do so. We develop models that infer quality metrics (\ie, startup delay and resolution) for encrypted streaming video services. Our paper builds on previous work, but extends it in several ways. First, the model works in deployment settings where the video sessions and segments must be identified from a mix of traffic and the time precision of the collected traffic statistics is more coarse (\eg, due to aggregation). Second, we develop a single composite model that works for a range of different services (i.e., Netflix, YouTube, Amazon, and Twitch), as opposed to just a single service. Third, unlike many previous models, the model performs predictions at finer granularity (\eg, the precise startup delay instead of just detecting short versus long delays) allowing to draw better conclusions on the ongoing streaming quality. Fourth, we demonstrate the model is practical through a 16-month deployment in 66 homes and provide new insights about the relationships between Internet "speed" and the quality of the corresponding video streams, for a variety of services; we find that higher speeds provide only minimal improvements to startup delay and resolution.

...read moreread less

Journal Article•DOI•

Assessment of YouTube as an Informative Resource on Facial Plastic Surgery Procedures.

[...]

Brittany Ward¹, Max Ward¹, Alexis Nicheporuck¹, Issa Alaeddin¹, Boris Paskhover¹ - Show less +1 more•Institutions (1)

Rutgers University¹

01 Jan 2019-JAMA Facial Plastic Surgery

TL;DR: Assessment of YouTube as an Informative Resource on Facial Plastic Surgery Procedures YouTube is a major modality for patient education and the quality of facial plastic surgery content on YouTube has not been evaluated.

...read moreread less

Abstract: This study investigates the video quality and creator qualification of YouTube videos about facial plastic surgery procedures.

...read moreread less

Journal Article•DOI•

Utility-oriented resource allocation for 360-degree video transmission over heterogeneous networks

[...]

Wei Huang¹, Lianghui Ding¹, Guangtao Zhai¹, Xiongkuo Min¹, Jenq-Neng Hwang², Yiling Xu¹, Wenjun Zhang¹ - Show less +3 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Washington²

01 Jan 2019-Digital Signal Processing

TL;DR: An application-layer scheme to jointly exploit the available bandwidth from the LTE and Wi-Fi networks in 360-degree video streaming and a novel buffer strategy is proposed to mitigate the influence of short-time prediction problem for transmitting360-degree videos in time-varying networks.

...read moreread less

Proceedings Article•DOI•

Towards Low Latency Multi-viewpoint 360° Interactive Video: A Multimodal Deep Reinforcement Learning Approach

[...]

Haitian Pang¹, Zhang Cong², Fangxin Wang³, Jiangchuan Liu³, Lifeng Sun¹ - Show less +1 more•Institutions (3)

Tsinghua University¹, University of Science and Technology of China², Simon Fraser University³

01 Apr 2019

TL;DR: The architecture, called iView, intelligently determines video quality and reduces the latency without pre-programmed models or assumptions and advocate multimodal learning and deep reinforcement learning in the design.

...read moreread less

Abstract: Recently, the fusion of 360° video and multi-viewpoint video, called multi-viewpoint (MVP) 360° interactive video, has emerged and created much more immersive and interactive user experience, but calls for a low latency solution to request the high-definition contents. Such viewing-related features as head movement have been recently studied, but several key issues still need to be addressed. On the viewer side, it is not clear how to effectively integrate different types of viewing-related features. At the session level, questions such as how to optimize the video quality under dynamic networking conditions and how to build an end-to-end mapping between these features and the quality selection remain to be answered. The solutions to these questions are further complicated given the many practical challenges, e.g., incomplete feature extraction and inaccurate prediction.This paper presents an architecture, called iView, to address the aforementioned issues in an MVP 360° interactive video scenario. To fully understand the viewing-related features and provide a one-step solution, we advocate multimodal learning and deep reinforcement learning in the design. iView intelligently determines video quality and reduces the latency without pre-programmed models or assumptions. We have evaluated iView with multiple real-world video and network datasets. The results showed that our solution effectively utilizes the features of video frames, networking throughput, head movements, and viewpoint selections, achieving at least 27.2%, 15.4%, and 2.8% improvements on the three video datasets, respectively, compared with several state-of-the-art methods.

...read moreread less

Journal Article•DOI•

No-reference video quality assessment via pretrained CNN and LSTM networks

[...]

Domonkos Varga¹, Tamás Szirányi•Institutions (1)

Budapest University of Technology and Economics¹

01 Nov 2019-Signal, Image and Video Processing

TL;DR: A general-purpose no-reference video quality assessment algorithm based on a long short-term memory (LSTM) network and a pretrained convolutional neural network (CNN) is introduced, which outperforms other state-of-the-art algorithms.

...read moreread less

Abstract: A general-purpose no-reference video quality assessment algorithm based on a long short-term memory (LSTM) network and a pretrained convolutional neural network (CNN) is introduced. Considering video sequences as a time series of deep features extracted with the help of a CNN, an LSTM network is trained to predict subjective quality scores. In contrast to previous methods, the resulting algorithm was trained on the recently published Konstanz Natural Video Quality Database (KoNViD-1k), which is the only publicly available database that contains sequences with authentic distortions. The results of experiments on KoNViD-1k demonstrate that the proposed method outperforms other state-of-the-art algorithms. Furthermore, these results are also confirmed using tests on the LIVE Video Quality Assessment Database, which consists of artificially distorted videos.

...read moreread less

Journal Article•DOI•

Efficient Algorithm Adaptations and Fully Parallel Hardware Architecture of H.265/HEVC Intra Encoder

[...]

Yuanzhi Zhang¹, Chao Lu¹•Institutions (1)

Southern Illinois University Carbondale¹

01 Nov 2019-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: Compared with the state-of-the-art designs, the proposed design demonstrates advantages in computational complexity, bit rate, video quality, throughput, reliability, and flexibility.

...read moreread less

Abstract: The growing demand for high-performance ultra-high-definition video coding leads to H.265/high-efficiency video coding (HEVC), where the increased computational complexity and data/timing dependence hinder its coding throughput. To address these challenges, this paper presents four algorithm adaptations and a fully parallel hardware architecture for an H.265/HEVC intra encoder. To the best of our knowledge, this is the first fully parallel H.265/HEVC intra encoder. This design supports 35 prediction modes and all coding tree unit partitions. All PUs are independently processed in four prediction engines for high parallelism. An appropriate set of intra prediction modes, RDO candidates, and CABAC rate estimate instances is assigned to each prediction engine, where internal computational tasks are pipelined and scheduled to maximize the processing throughput. Compared with the HM-15.0 software, the proposed algorithm adaptations lead to a reduction of 27% in computational workload, while the average BD-rate and BD-PSNR are 4.39% and −0.21 dB, respectively. This BD-rate is lower than the existing designs with the same video resolution. FPGA implementation of the proposed design shows that it operates at 120 MHz and supports 45 fps of 1080P video sequences using 201-K logic elements and 120-KB on-chip SRAM. ASIC implementation of the proposed design in TSMC 90-nm technology shows that its clock frequency reaches 320 MHz with a hardware gate count of 2288 K, and that it supports real-time encoding of 30 fps of 4-K video sequences. Compared with the state-of-the-art designs, our proposed design demonstrates advantages in computational complexity, bit rate, video quality, throughput, reliability, and flexibility.

...read moreread less

Journal Article•DOI•

Study of Subjective Quality and Objective Blind Quality Prediction of Stereoscopic Videos

[...]

Balasubramanyam Appina¹, Sathya Veera Reddy Dendi¹, K. Manasa¹, Sumohana S. Channappayya¹, Alan C. Bovik² - Show less +1 more•Institutions (2)

Indian Institute of Technology, Hyderabad¹, University of Texas at Austin²

10 May 2019-IEEE Transactions on Image Processing

TL;DR: The proposed algorithm, which is completely blind (requiring no reference videos or training on subjective scores) is called the Motion and Disparity-based 3D video quality evaluator (MoDi3D), which delivers competitive performance over a wide variety of datasets, including the IRCCYN dataset, the WaterlooIVC Phase I datasets, the LFOVIA dataset, and the proposed LFOVIAS3DPh2 S3D video dataset.

...read moreread less

Abstract: We present a new subjective and objective study on full high-definition (HD) stereoscopic (3D or S3D) video quality. In the subjective study, we constructed an S3D video dataset with 12 pristine and 288 test videos, and the test videos are generated by applying the H.264 and H.265 compression, blur, and frame freeze artifacts. We also propose a no reference (NR) objective video quality assessment (QA) algorithm that relies on measurements of the statistical dependencies between the motion and disparity subband coefficients of S3D videos. Inspired by the Generalized Gaussian Distribution (GGD) approach, we model the joint statistical dependencies between the motion and disparity components as following a Bivariate Generalized Gaussian Distribution (BGGD). We estimate the BGGD model parameters ( $\alpha,\,\beta $ ) and the coherence measure ( $\Psi $ ) from the eigenvalues of the sample covariance matrix (M) of the BGGD. In turn, we model the BGGD parameters of pristine S3D videos using a Multivariate Gaussian (MVG) distribution. The likelihood of a test video’s MVG model parameters coming from the pristine MVG model is computed and shown to play a key role in the overall quality estimation. We also estimate the global motion content of each video by averaging the SSIM scores between pairs of successive video frames. To estimate the test S3D video’s spatial quality, we apply the popular 2D NR unsupervised NIQE image QA model on a frame-by-frame basis on both views. The overall quality of a test S3D video is finally computed by pooling the test S3D video’s likelihood estimates, global motion strength, and spatial quality scores. The proposed algorithm, which is completely blind (requiring no reference videos or training on subjective scores) is called the Motion and Disparity-based 3D video quality evaluator (MoDi3D). We show that MoDi3D delivers competitive performance over a wide variety of datasets, including the IRCCYN dataset, the WaterlooIVC Phase I dataset, the LFOVIA dataset, and our proposed LFOVIAS3DPh2 S3D video dataset.

...read moreread less

Collapse