scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2018"


Proceedings ArticleDOI
16 Apr 2018
TL;DR: This work designs a framework that ties together front-end devices with more powerful backend “helpers” to allow deep learning to be executed locally or remotely in the cloud/edge, and designs an Android application that performs real-time object detection for AR applications.
Abstract: Deep learning shows great promise in providing more intelligence to augmented reality (AR) devices, but few AR apps use deep learning due to lack of infrastructure support. Deep learning algorithms are computationally intensive, and front-end devices cannot deliver sufficient compute power for real-time processing. In this work, we design a framework that ties together front-end devices with more powerful backend “helpers” (e.g., home servers) to allow deep learning to be executed locally or remotely in the cloud/edge. We consider the complex interaction between model accuracy, video quality, battery constraints, network data usage, and network conditions to determine an optimal offloading strategy. Our contributions are: (1) extensive measurements to understand the tradeoffs between video quality, network conditions, battery consumption, processing delay, and model accuracy; (2) a measurement-driven mathematical framework that efficiently solves the resulting combinatorial optimization problem; (3) an Android application that performs real-time object detection for AR applications, with experimental results that demonstrate the superiority of our approach.

346 citations


Proceedings ArticleDOI
15 Oct 2018
TL;DR: This work conducts an IRB-approved user study and develops novel online algorithms that determine which spatial portions to fetch and their corresponding qualities for Flare, a practical system for streaming 360-degree videos on commodity mobile devices.
Abstract: Flare is a practical system for streaming 360-degree videos on commodity mobile devices. It takes a viewport-adaptive approach, which fetches only portions of a panoramic scene that cover what a viewer is about to perceive. We conduct an IRB-approved user study where we collect head movement traces from 130 diverse users to gain insights on how to design the viewport prediction mechanism for Flare. We then develop novel online algorithms that determine which spatial portions to fetch and their corresponding qualities. We also innovate other components in the streaming pipeline such as decoding and server-side transmission. Through extensive evaluations (~400 hours' playback on WiFi and ~100 hours over LTE), we show that Flare significantly improves the QoE in real-world settings. Compared to non-viewport-adaptive approaches, Flare yields up to 18x quality level improvement on WiFi, and achieves high bandwidth reduction (up to 35%) and video quality enhancement (up to 4.9x) on LTE.

201 citations


Journal ArticleDOI
TL;DR: A new database, comprising a total of 208 videos, which model six common in-capture distortions of digital videos, and evaluated several top-performing no-reference IQA and VQA algorithms on the new database and studied how real-world in- capture distortions challenge both human viewers as well as automatic perceptual quality prediction models.
Abstract: Digital videos often contain visual distortions that are introduced by the camera’s hardware or processing software during the capture process. These distortions often detract from a viewer’s quality of experience. Understanding how human observers perceive the visual quality of digital videos is of great importance to camera designers. Thus, the development of automatic objective methods that accurately quantify the impact of visual distortions on perception has greatly accelerated. Video quality algorithm design and verification require realistic databases of distorted videos and human judgments of them. However, most current publicly available video quality databases have been created under highly controlled conditions using graded, simulated, and post-capture distortions (such as jitter and compression artifacts) on high-quality videos. The commercial plethora of hand-held mobile video capture devices produces videos often afflicted by a variety of complex distortions generated during the capturing process. These in-capture distortions are not well-modeled by the synthetic, post-capture distortions found in existing VQA databases. Toward overcoming this limitation, we designed and created a new database that we call the LIVE-Qualcomm mobile in-capture video quality database, comprising a total of 208 videos, which model six common in-capture distortions. We also conducted a subjective quality assessment study using this database, in which each video was assessed by 39 unique subjects. Furthermore, we evaluated several top-performing no-reference IQA and VQA algorithms on the new database and studied how real-world in-capture distortions challenge both human viewers as well as automatic perceptual quality prediction models. The new database is freely available at: http://live.ece.utexas.edu/research/incaptureDatabase/index.html .

120 citations


Proceedings ArticleDOI
10 Jun 2018
TL;DR: This work develops a novel tile-based layered approach to stream 360° content on smartphones to avoid bandwidth wastage while maintaining high video quality and is the first 360° streaming framework that takes into account the practical limitations of Android based smartphones.
Abstract: The popularity of 360° videos has grown rapidly due to the immersive user experience. 360° videos are displayed as a panorama and the view automatically adapts with the head movement. Existing systems stream 360° videos in a similar way as regular videos, where all data of the panoramic view is transmitted. This is wasteful since a user only views a small portion of the 360° view. To save bandwidth, recent works propose the tile-based streaming, which divides the panoramic view to multiple smaller sized tiles and streams only the tiles within a user's field of view (FoV) predicted based on the recent head position. Interestingly, the tile-based streaming has only been simulated or implemented on desktops. We find that it cannot run in real-time even on the latest smartphone (e.g., Samsung S7, Samsung S8 and Huawei Mate 9) due to hardware and software limitations. Moreover, it results in significant video quality degradation due to head movement prediction error, which is hard to avoid. Motivated by these observations, we develop a novel tile-based layered approach to stream 360° content on smartphones to avoid bandwidth wastage while maintaining high video quality. Through real system experiments, we show our approach can achieve up to 69% improvement in user QoE and 49% in bandwidth savings over existing approaches. To the best of our knowledge, this is the first 360° streaming framework that takes into account the practical limitations of Android based smartphones.

114 citations


Book ChapterDOI
Woojae Kim1, Jongyoo Kim2, Sewoong Ahn1, Jinwoo Kim1, Sanghoon Lee1 
08 Sep 2018
TL;DR: A novel full-reference VQA framework named Deep Video Quality Assessor (DeepVQA) is proposed to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolved neural aggregation network (CNAN) and to manipulate the temporal variation of distortions.
Abstract: Incorporating spatio-temporal human visual perception into video quality assessment (VQA) remains a formidable issue. Previous statistical or computational models of spatio-temporal perception have limitations to be applied to the general VQA algorithms. In this paper, we propose a novel full-reference (FR) VQA framework named Deep Video Quality Assessor (DeepVQA) to quantify the spatio-temporal visual perception via a convolutional neural network (CNN) and a convolutional neural aggregation network (CNAN). Our framework enables to figure out the spatio-temporal sensitivity behavior through learning in accordance with the subjective score. In addition, to manipulate the temporal variation of distortions, we propose a novel temporal pooling method using an attention model. In the experiment, we show DeepVQA remarkably achieves the state-of-the-art prediction accuracy of more than 0.9 correlation, which is \(\sim \)5% higher than those of conventional methods on the LIVE and CSIQ video databases.

104 citations


Proceedings ArticleDOI
Hyunho Yeo, Youngmok Jung1, Jaehong Kim, Jinwoo Shin1, Dongsu Han1 
08 Oct 2018
TL;DR: A new video delivery framework that utilizes client computation and recent advances in deep neural networks (DNNs) to reduce the dependency for delivering high-quality video and enhance the video quality independent to the available bandwidth is presented.
Abstract: Internet video streaming has experienced tremendous growth over the last few decades. However, the quality of existing video delivery critically depends on the bandwidth resource. Consequently, user quality of experience (QoE) suffers inevitably when network conditions become unfavorable. We present a new video delivery framework that utilizes client computation and recent advances in deep neural networks (DNNs) to reduce the dependency for delivering high-quality video. The use of DNNs enables us to enhance the video quality independent to the available bandwidth. We design a practical system that addresses several challenges, such as client heterogeneity, interaction with bitrate adaptation, and DNN transfer, in enabling the idea. Our evaluation using 3G and broadband network traces shows the proposed system outperforms the current state of the art, enhancing the average QoE by 43.08% using the same bandwidth budget or saving 17.13% of bandwidth while providing the same user QoE.

97 citations


Journal ArticleDOI
TL;DR: This paper has constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions, and demonstrates the value of the new resource, which is called the live video quality challenge database (LIVE-VQC), by conducting a comparison with leading NR video quality predictors on it.
Abstract: The great variations of videographic skills, camera designs, compression and processing protocols, and displays lead to an enormous variety of video impairments. Current no-reference (NR) video quality models are unable to handle this diversity of distortions. This is true in part because available video quality assessment databases contain very limited content, fixed resolutions, were captured using a small number of camera devices by a few videographers and have been subjected to a modest number of distortions. As such, these databases fail to adequately represent real world videos, which contain very different kinds of content obtained under highly diverse imaging conditions and are subject to authentic, often commingled distortions that are impossible to simulate. As a result, NR video quality predictors tested on real-world video data often perform poorly. Towards advancing NR video quality prediction, we constructed a large-scale video quality assessment database containing 585 videos of unique content, captured by a large number of users, with wide ranges of levels of complex, authentic distortions. We collected a large number of subjective video quality scores via crowdsourcing. A total of 4776 unique participants took part in the study, yielding more than 205000 opinion scores, resulting in an average of 240 recorded human opinions per video. We demonstrate the value of the new resource, which we call the LIVE Video Quality Challenge Database (LIVE-VQC), by conducting a comparison of leading NR video quality predictors on it. This study is the largest video quality assessment study ever conducted along several key dimensions: number of unique contents, capture devices, distortion types and combinations of distortions, study participants, and recorded subjective scores. The database is available for download on this link: this http URL .

97 citations


Proceedings ArticleDOI
15 Oct 2018
TL;DR: A Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages of blind video quality assessment into one, where the feature extractor and the regressor are jointly optimized.
Abstract: Blind video quality assessment (BVQA) algorithms are traditionally designed with a two-stage approach - a feature extraction stage that computes typically hand-crafted spatial and/or temporal features, and a regression stage working in the feature space that predicts the perceptual quality of the video. Unlike the traditional BVQA methods, we propose a Video Multi-task End-to-end Optimized neural Network (V-MEON) that merges the two stages into one, where the feature extractor and the regressor are jointly optimized. Our model uses a multi-task DNN framework that not only estimates the perceptual quality of the test video but also provides a probabilistic prediction of its codec type. This framework allows us to train the network with two complementary sets of labels, both of which can be obtained at low cost. The training process is composed of two steps. In the first step, early convolutional layers are pre-trained to extract spatiotemporal quality-related features with the codec classification subtask. In the second step, initialized with the pre-trained feature extractor, the whole network is jointly optimized with the two subtasks together. An additional critical step is the adoption of 3D convolutional layers, which creates novel spatiotemporal features that lead to a significant performance boost. Experimental results show that the proposed model clearly outperforms state-of-the-art BVQA methods.The source code of V-MEON is available at https://ece.uwaterloo.ca/~zduanmu/acmmm2018bvqa.

96 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: This paper describes an open dataset and software for ITU-T Ree, the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming, and shows the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.
Abstract: This paper describes an open dataset and software for ITU-T Ree. P.1203. As the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming (HAS), it has been extensively trained and validated on over a thousand audiovisual sequences containing HAS-typical effects (such as stalling, coding artifacts, quality switches). Our dataset comprises four of the 30 official subjective databases at a bitstream feature level. The paper also includes subjective results and the model performance. Our software for the standard was made available to the public, too, and it is used for all the analyses presented. Among other previously unpublished details, we show the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.

95 citations


Journal ArticleDOI
TL;DR: A study of subjective and objective quality assessment of compressed 4K ultra-high-definition (UHD) videos in an immersive viewing environment and investigates added values of UHD over conventional high definition (HD) in terms of perceptual quality.
Abstract: We present a study of subjective and objective quality assessment of compressed 4K ultra-high-definition (UHD) videos in an immersive viewing environment. First, we conduct a subjective quality evaluation experiment for 4K UHD videos compressed by three state-of-the-art video coding techniques, i.e., Advanced Video Coding, High Efficiency Video Coding, and VP9. In particular, we aim at investigating added values of UHD over conventional high definition (HD) in terms of perceptual quality. The results are systematically analyzed in various viewpoints, such as coding scheme, bitrate, and video content. Second, existing state-of-the-art objective quality assessment techniques are benchmarked using the subjective data in order to investigate their validity and limitation for 4K UHD videos. Finally, the video and subjective data are made publicly available for further research by the research community.

90 citations


Journal ArticleDOI
TL;DR: Recent advances, such as different projection methods benefiting video coding, specialized video quality evaluation metrics and optimized methods for transmission, are all presented and classified in this paper.

Journal ArticleDOI
TL;DR: A new depth perception quality metric (DPQM) is proposed and it is verified that it outperforms existing metrics on the authors' published 3D video extension of High Efficiency Video Coding (3D-HEVC) video database and validated by applying the crucial part of the DPQM to a novel blind stereoscopic video quality evaluator (BSVQE).
Abstract: Stereoscopic video quality assessment (SVQA) is a challenging problem. It has not been well investigated on how to measure depth perception quality independently under different distortion categories and degrees, especially exploit the depth perception to assist the overall quality assessment of 3D videos. In this paper, we propose a new depth perception quality metric (DPQM) and verify that it outperforms existing metrics on our published 3D video extension of High Efficiency Video Coding (3D-HEVC) video database. Furthermore, we validate its effectiveness by applying the crucial part of the DPQM to a novel blind stereoscopic video quality evaluator (BSVQE) for overall 3D video quality assessment. In the DPQM, we introduce the feature of auto-regressive prediction-based disparity entropy (ARDE) measurement and the feature of energy weighted video content measurement, which are inspired by the free-energy principle and the binocular vision mechanism. In the BSVQE, the binocular summation and difference operations are integrated together with the fusion natural scene statistic measurement and the ARDE measurement to reveal the key influence from texture and disparity. Experimental results on three stereoscopic video databases demonstrate that our method outperforms state-of-the-art SVQA algorithms for both symmetrically and asymmetrically distorted stereoscopic video pairs of various distortion types.

Posted Content
TL;DR: The objective of this paper is to study the relationship between 3D quality and bitrate at different frame rates and show that increasing the frame rate of 3D videos beyond 60 fps may not be visually distinguishable.
Abstract: Increasing the frame rate of a 3D video generally results in improved Quality of Experience (QoE). However, higher frame rates involve a higher degree of complexity in capturing, transmission, storage, and display. The question that arises here is what frame rate guarantees high viewing quality of experience given the existing/required 3D devices and technologies (3D cameras, 3D TVs, compression, transmission bandwidth, and storage capacity). This question has already been addressed for the case of 2D video, but not for 3D. The objective of this paper is to study the relationship between 3D quality and bitrate at different frame rates. Our performance evaluations show that increasing the frame rate of 3D videos beyond 60 fps may not be visually distinguishable. In addition, our experiments show that when the available bandwidth is reduced, the highest possible 3D quality of experience can be achieved by adjusting (decreasing) the frame rate instead of increasing the compression ratio. The results of our study are of particular interest to network providers for rate adaptation in variable bitrate channels.

Journal ArticleDOI
TL;DR: A two time-scale dynamic caching scheme for ABR streaming in VNs, in which the video quality adaptation at the application layer and cache placement at the BS are performed at a larger time- Scale while the video data transmission at the physical layer is performed at smaller time- scale.
Abstract: Adaptive bitrate (ABR) streaming has recently been deployed in vehicular networks (VNs) to deal with the time-varying channels due to reasons such as high user mobility. Caching at the wireless edge (e.g., base station) to support ABR streaming is a challenging problem. In this paper, we propose a two time-scale dynamic caching scheme for ABR streaming in VNs, in which the video quality adaptation at the application layer and cache placement at the BS are performed at a larger time-scale while the video data transmission at the physical layer is performed at a smaller time-scale. Lyapunov optimization technique is employed to maximize the time-averaged network reward, which is the weighted sum of video quality and backhaul saving. Without the prior knowledge of channel statistics, we develop a dynamic cache algorithm (DCA) to obtain the video quality adaptation, cache placement, and radio bandwidth allocation decisions. For the arbitrary sample path of channel states, we compare the network reward achieved by DCA with that achieved by an optimal $T$ -slot lookahead algorithm, i.e., the knowledge of the future channel path over an interval of length $T$ time slots. Simulation results demonstrate the advantages of DCA for ABR streaming in time-varying VNs over the static cache approach.

Proceedings ArticleDOI
23 Jul 2018
TL;DR: The proposed spherical structural similarity index (S-SSIM) outperforms state-of-the-art objective quality assessment metrics in omnidirectional video quality assessment.
Abstract: Objective quality assessment plays a crucial role in the evaluation and optimization processes of Virtual Reality (VR) technologies, for which state-of-the-art objective quality evaluation metrics for omnidirectional video, i.e., 360 degree video, are typically derived from traditional MSE (or PSNR). Here we propose an objective omnidirectional video quality assessment method based on structural similarity (SSIM) in the spherical domain. Adopting the relationship of the structural similarity between the 2-D plane and sphere, the interference brought by the projection between the two domains can be well handled in the assessment process. The performance of the proposed spherical structural similarity (S-SSIM) index is evaluated with a subjective omnidirectional video quality assessment database. As demonstrated in the experimental results, the proposed S-SSIM outperforms state-of-the-art objective quality assessment metrics in omnidirectional video quality assessment.

Journal ArticleDOI
TL;DR: A node association algorithm is proposed that maximizes time-averaged video quality for multiple users under a playback delay constraint and two ways to cope with the collision are proposed: scheduling of one user and non-orthogonal multiple access.
Abstract: This paper considers one-hop device-to-device-assisted wireless caching networks that cache video files of varying quality levels, with the assumption that the base station can control the video quality but cache-enabled devices cannot. Two problems arise in such a caching network: file placement problem and node association problem . This paper suggests a method to cache videos of different qualities, and thus of varying file sizes, by maximizing the sum of video quality measures that users can enjoy. There exists an interesting tradeoff between video quality and video diversity, i.e., the ability to provision diverse video files. By caching high-quality files, the cache-enabled devices can provide high-quality video, but cannot cache a variety of files. Conversely, when the device caches various files, it cannot provide a good quality for file-requesting users. In addition, when multiple devices cache the same file but their qualities are different, advanced node association is required for file delivery. This paper proposes a node association algorithm that maximizes time-averaged video quality for multiple users under a playback delay constraint. In this algorithm, we also consider request collision , the situation where several users request files from the same device at the same time, and we propose two ways to cope with the collision: scheduling of one user and non-orthogonal multiple access. Simulation results verify that the proposed caching method and the node association algorithm work reliably.

Journal ArticleDOI
TL;DR: A modified display protocol of the high resolution sequences for the subjective rating test is proposed, in which an optimal display resolution is determined based on the geometry constraints between screen and human eyes, to ensure the reliability of subjective quality opinion in terms of video coding.
Abstract: With the development of virtual reality, higher quality panoramic videos are in great demand to guarantee the immersive viewing experience. Therefore, quality assessment attaches much importance to correlated technologies. Considering the geometric transformation in projection and the limited resolution of head-mounted device (HMD), a modified display protocol of the high resolution sequences for the subjective rating test is proposed, in which an optimal display resolution is determined based on the geometry constraints between screen and human eyes. By sampling the videos to the optimal resolution before coding, the proposed method significantly alleviates the interference of HMD sampling while displaying, thus ensuring the reliability of subjective quality opinion in terms of video coding. Using the proposed display protocol, a subjective quality database for panoramic videos is established for video coding applications. The proposed database contains 50 distorted sequences obtained from ten raw panoramic video sequences. Distortions are introduced with the High Efficiency Video Coding compression. Each sequence is evaluated by 30 subjects on video quality, following the absolute category rating with hidden reference method. The rating scores and differential mean opinion scores (DMOSs) are recorded and included in the database. With the proposed database, several state-of-the-art objective quality assessment methods are further evaluated with correlation analysis. The database, including the video sequences, subjective rating scores and DMOS, can be used to facilitate future researches on coding applications.

Proceedings ArticleDOI
TL;DR: QARC (video Quality Aware Rate Control) is proposed, a rate control algorithm that aims to obtain a higher perceptual video quality with possible lower sending rate and transmission latency and is evaluated via trace-driven simulation, outperforming existing approach with improvements in average video quality.
Abstract: Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively for real-time video streaming has become an upcoming and interesting issue. Recent work focuses on providing high video bitrates instead of video qualities. Nevertheless, we notice that there exists a trade-off between sending bitrate and video quality, which motivates us to focus on how to get a balance between them. In this paper, we propose QARC (video Quality Awareness Rate Control), a rate control algorithm that aims to have a higher perceptual video quality with possibly lower sending rate and transmission latency. Starting from scratch, QARC uses deep reinforcement learning(DRL) algorithm to train a neural network to select future bitrates based on previously observed network status and past video frames, and we design a neural network to predict future perceptual video quality as a vector for taking the place of the raw picture in the DRL's inputs. We evaluate QARC over a trace-driven emulation. As excepted, QARC betters existing approaches.

Journal ArticleDOI
TL;DR: A variety of recurrent dynamic neural networks are proposed that conduct continuous-time subjective QoE prediction on video streams impaired by both compression artifacts and rebuffering events, and ways of aggregating different models into a forecasting ensemble that delivers improved results with reduced forecasting variance are evaluated.
Abstract: Streaming video services represent a very large fraction of global bandwidth consumption. Due to the exploding demands of mobile video streaming services, coupled with limited bandwidth availability, video streams are often transmitted through unreliable, low-bandwidth networks. This unavoidably leads to two types of major streaming-related impairments: compression artifacts and/or rebuffering events. In streaming video applications, the end-user is a human observer; hence being able to predict the subjective Quality of Experience (QoE) associated with streamed videos could lead to the creation of perceptually optimized resource allocation strategies driving higher quality video streaming services. We propose a variety of recurrent dynamic neural networks that conduct continuous-time subjective QoE prediction. By formulating the problem as one of time-series forecasting, we train a variety of recurrent neural networks and non-linear autoregressive models to predict QoE using several recently developed subjective QoE databases. These models combine multiple, diverse neural network inputs, such as predicted video quality scores, rebuffering measurements, and data related to memory and its effects on human behavioral responses, using them to predict QoE on video streams impaired by both compression artifacts and rebuffering events. Instead of finding a single time-series prediction model, we propose and evaluate ways of aggregating different models into a forecasting ensemble that delivers improved results with reduced forecasting variance. We also deploy appropriate new evaluation metrics for comparing time-series predictions in streaming applications. Our experimental results demonstrate improved prediction performance that approaches human performance. An implementation of this work can be found at https://github.com/christosbampis/NARX_QoE_release .

Proceedings ArticleDOI
15 Oct 2018
TL;DR: Wang et al. as discussed by the authors proposed QARC (video Quality Aware Rate Control), a rate control algorithm that aims to obtain a higher perceptual video quality with possible lower sending rate and transmission latency.
Abstract: Real-time video streaming is now one of the main applications in all network environments. Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively has become an upcoming and interesting issue. To tackle this problem, most proposed rate control methods work for providing high video bitrates instead of video qualities. Nevertheless, we notice that there exists a trade-off between sending bitrate and video quality, which motivates us to focus on how to reach a balance between them. In this paper, we propose QARC (video Quality Aware Rate Control), a rate control algorithm that aims to obtain a higher perceptual video quality with possible lower sending rate and transmission latency. Starting from scratch, QARC uses deep reinforcement learning(DRL) algorithm to train a neural network to select future bitrates based on previously observed network status and past video frames. To overcome the "state explosion problem'', we design a neural network to predict future perceptual video quality as a vector for taking the place of the raw picture in the DRL's inputs. We evaluate QARC via trace-driven simulation, outperforming existing approach with improvements in average video quality of 18% - 25% and decreasing in average latency with 23% -45%. Meanwhile, comparing QARC with offline optimal high bitrate method on various network conditions, we find that QARC also yields a solid result.

Posted Content
TL;DR: The LIVE-NFLX-II database is designed, a highly-realistic database which contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content, and builds on recent advancements in content-adaptive encoding.
Abstract: Measuring Quality of Experience (QoE) and integrating these measurements into video streaming algorithms is a multi-faceted problem that fundamentally requires the design of comprehensive subjective QoE databases and metrics. To achieve this goal, we have recently designed the LIVE-NFLX-II database, a highly-realistic database which contains subjective QoE responses to various design dimensions, such as bitrate adaptation algorithms, network conditions and video content. Our database builds on recent advancements in content-adaptive encoding and incorporates actual network traces to capture realistic network variations on the client device. Using our database, we study the effects of multiple streaming dimensions on user experience and evaluate video quality and quality of experience models. We believe that the tools introduced here will help inspire further progress on the development of perceptually-optimized client adaptation and video streaming strategies. The database is publicly available at this http URL.

Journal ArticleDOI
TL;DR: The proposed solution aims to deliver high visual quality, in real time, around the users' fixations points while lowering the quality everywhere else while substantially reducing the overall bandwidth requirements for supporting VR video experiences.
Abstract: This paper presents a novel approach to content delivery for video streaming services. It exploits information from connected eye-trackers embedded in the next generation of VR Head Mounted Displays (HMDs). The proposed solution aims to deliver high visual quality, in real time, around the users' fixations points while lowering the quality everywhere else. The goal of the proposed approach is to substantially reduce the overall bandwidth requirements for supporting VR video experiences while delivering high levels of user perceived quality. The prerequisites to achieve these results are: (1) mechanisms that can cope with different degrees of latency in the system and (2) solutions that support fast adaptation of video quality in different parts of a frame, without requiring a large increase in bitrate. A novel codec configuration, capable of supporting near-instantaneous video quality adaptation in specific portions of a video frame, is presented. The proposed method exploits in-built properties of HEVC encoders and while it introduces a moderate amount of error, these errors are indetectable by users. Fast adaptation is the key to enable gaze-aware streaming and its reduction in bandwidth. A testbed implementing gaze-aware streaming, together with a prototype HMD with in-built eye tracker, is presented and was used for testing with real users. The studies quantified the bandwidth savings achievable by the proposed approach and characterize the relationships between Quality of Experience (QoE) and network latency. The results showed that up to 83% less bandwidth is required to deliver high QoE levels to the users, as compared to conventional solutions.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: In this article, the performance of objective video quality assessment metrics for gaming videos considering passive streaming applications is evaluated on a dataset of 24 reference videos and 576 compressed sequences obtained by encoding them at 24 different resolution-bitrate pairs.
Abstract: Video Quality assessment is imperative to estimate and hence manage the Quality of Experience (QoE) in video streaming applications to the end-user. Recent years have seen a tremendous advancement in the field of objective video quality assessment (VQA) metrics, with the development of models that can predict the quality of the videos streamed over the Internet. However, no work so far has attempted to study the performance of such quality assessment metrics on gaming videos, which are artificial and synthetic and have different streaming requirements than traditionally streamed videos. Towards this end, we present in this paper a study of the performance of objective quality assessment metrics for gaming videos considering passive streaming applications. Objective quality assessment considering eight widely used VQA metrics is performed on a dataset of 24 reference videos and 576 compressed sequences obtained by encoding them at 24 different resolution-bitrate pairs. We present an evaluation of the performance behavior of the VQA metrics. Our results indicate that VMAF predicts subjective video quality ratings the best, while NIQE turns out to be a promising alternative as a no-reference metric in some scenarios.

Journal ArticleDOI
TL;DR: A new high-definition video quality database, referred to as BVI-HD, which contains 32 reference and 384 distorted video sequences plus subjective scores, to compare the subjective quality between HEVC and synthesised content, and evaluate the performance of nine state-of-the-art, full-reference objective quality metrics.
Abstract: This paper introduces a new high-definition video quality database, referred to as BVI-HD, which contains 32 reference and 384 distorted video sequences plus subjective scores. The reference material in this database was carefully selected to optimize the coverage range and distribution uniformity of five low-level video features, while the included 12 distortions, using both original high efficiency video coding (HEVC) and HEVC with synthesis mode, represent state-of-the-art approaches to compression. The range of quantization parameters included in the database for HEVC compression was determined by a subjective study, the results of which indicate that a wider range of QP values should be used than the current recommendation. The subjective opinion scores for all 384 distorted videos were collected from a total of 86 subjects, using a double stimulus test methodology. Based on these results, we compare the subjective quality between HEVC and synthesised content, and evaluate the performance of nine state-of-the-art, full-reference objective quality metrics. This database has now been made available online, representing a valuable resource to those concerned with compression performance evaluation and objective video quality assessment.

Journal ArticleDOI
TL;DR: A free available data set (VRQ-TJU) for VR quality assessment is proposed with subjective scores for each sample data and an end-to-end 3-D convolutional neural network is introduced to predict the VR video quality without a referenced VR video.
Abstract: Virtual reality (VR), a new type of simulation and interaction technology, has aroused widespread attention and research interest. It is necessary to evaluate the VR quality and provide a standard for the rapidly developing technology. To the best of our knowledge, a few researchers have built benchmark databases and designed related algorithms, which has hindered the further development of the VR technology. In this paper, a free available data set (VRQ-TJU) for VR quality assessment is proposed with subjective scores for each sample data. The validity for the designed database has been proved based on the traditional multimedia quality assessment metrics. In addition, an end-to-end 3-D convolutional neural network is introduced to predict the VR video quality without a referenced VR video. This method can extract spatiotemporal features and does not require using hand-crafted features. At the same time, a new score fusion strategy is designed based on the characteristics of the VR video projection process. Taking the pre-processed VR video patches as input, the network captures local spatiotemporal features and gets the score of every patch. Then, the new quality score fusion strategy is applied to get the final score. Such approach shows advanced performance on this database.

Journal ArticleDOI
TL;DR: This paper simplifies the design of DASH by only exploiting client-side buffer state information and proposes a pure buffer-based DASH scheme to optimize user QoE, and shows that this approach can achieve the best performance compared with other alternative approaches.
Abstract: Recently, the prevalence of mobile devices together with the outburst of user-generated contents has fueled the tremendous growth of the Internet traffic taken by video streaming. To improve user-perceived quality-of-experience (QoE), dynamic adaptive streaming via HTTP (DASH) has been widely adopted by practical systems to make streaming smooth under limited bandwidth. However, previous DASH approaches mostly performed complicated rate adaptation based on bandwidth estimation, which has been proven to be unreliable over HTTP. In this paper, we simplify the design by only exploiting client-side buffer state information and propose a pure buffer-based DASH scheme to optimize user QoE. Our approach can not only get rid of the drawback caused by inaccurate bandwidth estimation, but also incur very limited overhead. We explicitly define an integrated user QoE model, which takes playback freezing, bitrate switch, and video quality into account, and then formulate the problem into a non-linear stochastic optimal control problem. Next, we utilize control theory to design a dynamic buffer-based controller for DASH, which determines video bitrate of each chunk to be requested and stabilize the buffer level in the meanwhile. Extensive experiments have been conducted to validate the advantages of our approach, and the results show that our approach can achieve the best performance compared with other alternative approaches.

Proceedings ArticleDOI
20 May 2018
TL;DR: In this paper, the authors propose a framework for viewport-driven rate-distortion optimized video streaming that integrates the user view navigation pattern and the spatiotemporal ratedistortion characteristics to maximize the delivered user quality of experience for the given network/system resources.
Abstract: The growing popularity of virtual and augmented reality communications and $\mathbf{360^\circ}$ video streaming is moving video communication systems into much more dynamic and resource-limited operating settings. The enormous data volume of $\mathbf{360^\circ}$ videos requires an efficient use of network bandwidth to maintain the desired quality of experience for the end user. To this end, we propose a framework for viewport-driven rate-distortion optimized $\mathbf{360^\circ}$ video streaming that integrates the user view navigation pattern and the spatiotemporal rate-distortion characteristics of the $\mathbf{360^\circ}$ video content to maximize the delivered user quality of experience for the given network/system resources. The framework comprises a methodology for constructing dynamic heat maps that capture the likelihood of navigating different spatial segments of a $\mathbf{360^\circ}$ video over time by the user, an analysis and characterization of its spatiotemporal rate-distortion characteristics that leverage preprocessed spatial tilling of the $\mathbf{360^\circ}$ view sphere, and an optimization problem formulation that characterizes the delivered user quality of experience given the user navigation patterns, $\mathbf{360^\circ}$ video encoding decisions, and the available system/network resources. Our experimental results demonstrate the advantages of our framework over the conventional approach of streaming a monolithic uniformly-encoded $\mathbf{360^\circ}$ video and a state-of-the-art reference method. Considerable video quality gains of 4 - 5 dB are demonstrated in the case of two popular 4K $\mathbf{360^\circ}$ videos.

Journal ArticleDOI
TL;DR: This paper presents a database consisting of videos at full high definition and ultrahigh definition resolutions, and presents a QoE evaluation framework comprising a learning-based model during playback and an exponential model during rebuffering, and performs an objective evaluation of popular video quality assessment and continuous timeQoE metrics over the constructed database.
Abstract: A continuous evaluation of the end user’s quality-of-experience (QoE) is essential for efficient video streaming. This is crucial for networks with constrained resources that offer time-varying channel quality to its users. In hypertext transfer protocol-based video streaming, the QoE is measured by quantifying the perceptual impact of distortions caused by rate adaptation or interruptions in playback due to rebuffering events. The resulting impact on the QoE due to these distortions has been studied individually in the literature. However, the QoE is determined by an interplay of these distortions, and therefore necessitates a combined study of them. To the best of our knowledge, there is no publicly available database that studies these distortions jointly on a continuous time basis. In this paper, our contributions are twofold. First, we present a database consisting of videos at full high definition and ultrahigh definition resolutions. We consider various levels of rate adaptation and rebuffering distortions together in these videos as experienced in a typical realistic setting. A subjective evaluation of these videos is conducted on a continuous time scale. Second, we present a QoE evaluation framework comprising a learning-based model during playback and an exponential model during rebuffering. Furthermore, we perform an objective evaluation of popular video quality assessment and continuous time QoE metrics over the constructed database. The objective evaluation study demonstrates that the performance of the proposed QoE model is superior to that of the objective metrics. The database is publicly available for download at http://www.iith.ac.in/~lfovia/downloads.html .

Proceedings ArticleDOI
01 Dec 2018
TL;DR: A new No Reference (NR) gaming video quality metric called NR-GVQM is presented with performance comparable to state-of-the-art Full Reference metrics and two approaches to reduce computational complexity are presented.
Abstract: Gaming as a popular system has recently expanded the associated services, by stepping into live streaming services. Live gaming video streaming is not only limited to cloud gaming services, such as Geforce Now, but also include passive streaming, where the players' gameplay is streamed both live and ondemand over services such as Twitch.tv and YouTubeGaming. So far, in terms of gaming video quality assessment, typical video quality assessment methods have been used. However, their performance remains quite unsatisfactory. In this paper, we present a new No Reference (NR) gaming video quality metric called NR-GVQM with performance comparable to state-of-the-art Full Reference (FR) metrics. NR-GVQM is designed by training a Support Vector Regression (SVR) with the Gaussian kernel using nine frame-level indexes such as naturalness and blockiness as input features and Video Multimethod Assessment Fusion (VMAF) scores as the ground truth. Our results based on a publicly available dataset of gaming videos are shown to have a correlation score of 0.98 with VMAF and 0.89 with MOS scores. We further present two approaches to reduce computational complexity.

Journal ArticleDOI
TL;DR: The quality of anterior cervical discectomy and fusion (ACDF) videos available on YouTube is low, with the majority of videos produced by unreliable sources and should not be recommended as patient education tools for ACDF.
Abstract: Study design Cross sectional study. Purpose To assess the quality of anterior cervical discectomy and fusion (ACDF) videos available on YouTube and identify factors associated with video quality. Overview of literature Patients commonly use the internet as a source of information regarding their surgeries. However, there is currently limited information regarding the quality of online videos about ACDF. Methods A search was performed on YouTube using the phrase 'anterior cervical discectomy and fusion.' The Journal of the American Medical Association (JAMA), DISCERN, and Health on the Net (HON) systems were used to rate the first 50 videos obtained. Information about each video was collected, including number of views, duration since the video was posted, percentage positivity (defined as number of likes the video received, divided by the total number of likes or dislikes of that video), number of comments, and the author of the video. Relationships between video quality and these factors were investigated. Results The average number of views for each video was 96,239. The most common videos were those published by surgeons and those containing patient testimonies. Overall, the video quality was poor, with mean scores of 1.78/5 using the DISCERN criteria, 1.63/4 using the JAMA criteria, and 1.96/8 using the HON criteria. Surgeon authors' videos scored higher than patient testimony videos when reviewed using the HON or JAMA systems. However, no other factors were found to be associated with video quality. Conclusions The quality of ACDF videos on YouTube is low, with the majority of videos produced by unreliable sources. Therefore, these YouTube videos should not be recommended as patient education tools for ACDF.