scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2016"


Proceedings ArticleDOI
10 Apr 2016
TL;DR: This work formulate bitrate adaptation as a utility maximization problem and devise an online control algorithm called BOLA that uses Lyapunov optimization techniques to minimize rebuffering and maximize video quality and proves that B OLA achieves a time-average utility that is within an additive term O(1/V) of the optimal value.
Abstract: Modern video players employ complex algorithms to adapt the bitrate of the video that is shown to the user. Bitrate adaptation requires a tradeoff between reducing the probability that the video freezes and enhancing the quality of the video shown to the user. A bitrate that is too high leads to frequent video freezes (i.e., rebuffering), while a bitrate that is too low leads to poor video quality. Video providers segment the video into short chunks and encode each chunk at multiple bitrates. The video player adaptively chooses the bitrate of each chunk that is downloaded, possibly choosing different bitrates for successive chunks. While bitrate adaptation holds the key to a good quality of experience for the user, current video players use ad-hoc algorithms that are poorly understood. We formulate bitrate adaptation as a utility maximization problem and devise an online control algorithm called BOLA that uses Lyapunov optimization techniques to minimize rebuffering and maximize video quality. We prove that BOLA achieves a time-average utility that is within an additive term O(1/V) of the optimal value, for a control parameter V related to the video buffer size. Further, unlike prior work, our algorithm does not require any prediction of available network bandwidth. We empirically validate our algorithm in a simulated network environment using an extensive collection of network traces. We show that our algorithm achieves near-optimal utility and in many cases significantly higher utility than current state-of-the-art algorithms. Our work has immediate impact on real-world video players and BOLA is part of the reference player implementation for the evolving DASH standard for video transmission.

508 citations


Journal ArticleDOI
TL;DR: This work develops a new VQA model called the video intrinsic integrity and distortion evaluation oracle (VIIDEO), which is able to predict the quality of distorted videos without any external knowledge about the pristine source, anticipated distortions, or human judgments of video quality.
Abstract: Considerable progress has been made toward developing still picture perceptual quality analyzers that do not require any reference picture and that are not trained on human opinion scores of distorted images. However, there do not yet exist any such completely blind video quality assessment (VQA) models. Here, we attempt to bridge this gap by developing a new VQA model called the video intrinsic integrity and distortion evaluation oracle (VIIDEO). The new model does not require the use of any additional information other than the video being quality evaluated. VIIDEO embodies models of intrinsic statistical regularities that are observed in natural vidoes, which are used to quantify disturbances introduced due to distortions. An algorithm derived from the VIIDEO model is thereby able to predict the quality of distorted videos without any external knowledge about the pristine source, anticipated distortions, or human judgments of video quality. Even with such a paucity of information, we are able to show that the VIIDEO algorithm performs much better than the legacy full reference quality measure MSE on the LIVE VQA database and delivers performance comparable with a leading human judgment trained blind VQA model. We believe that the VIIDEO algorithm is a significant step toward making real-time monitoring of completely blind video quality possible. The software release of VIIDEO can be obtained online ( http://live.ece.utexas.edu/research/quality/VIIDEO_release.zip ).

291 citations


Proceedings ArticleDOI
TL;DR: In this article, a viewport-adaptive 360-degree video streaming system is proposed to reduce the bandwidth waste, while still providing an immersive experience, by preparing multiple video representations, which differ not only by their bit-rate, but also by the qualities of different scene regions.
Abstract: The delivery and display of 360-degree videos on Head-Mounted Displays (HMDs) presents many technical challenges. 360-degree videos are ultra high resolution spherical videos, which contain an omnidirectional view of the scene. However only a portion of this scene is displayed on the HMD. Moreover, HMD need to respond in 10 ms to head movements, which prevents the server to send only the displayed video part based on client feedback. To reduce the bandwidth waste, while still providing an immersive experience, a viewport-adaptive 360-degree video streaming system is proposed. The server prepares multiple video representations, which differ not only by their bit-rate, but also by the qualities of different scene regions. The client chooses a representation for the next segment such that its bit-rate fits the available throughput and a full quality region matches its viewing. We investigate the impact of various spherical-to-plane projections and quality arrangements on the video quality displayed to the user, showing that the cube map layout offers the best quality for the given bit-rate budget. An evaluation with a dataset of users navigating 360-degree videos demonstrates that segments need to be short enough to enable frequent view switches.

228 citations


Journal ArticleDOI
TL;DR: The merits of an HTTP/2 push-based approach to segment duration reduction, a measurement study on the available bandwidth in real 4G/LTE networks, and the induced bit-rate overhead for HEVC-encoded video segments with a sub-second duration are discussed.
Abstract: In HTTP Adaptive Streaming, video content is temporally divided into multiple segments, each encoded at several quality levels. The client can adapt the requested video quality to network changes, generally resulting in a smoother playback. Unfortunately, live streaming solutions still often suffer from playout freezes and a large end-to-end delay. By reducing the segment duration, the client can use a smaller temporal buffer and respond even faster to network changes. However, since segments are requested subsequently, this approach is susceptible to high round-trip times. In this letter, we discuss the merits of an HTTP/2 push-based approach. We present the details of a measurement study on the available bandwidth in real 4G/LTE networks, and analyze the induced bit-rate overhead for HEVC-encoded video segments with a sub-second duration. Through an extensive evaluation with the generated video content, we show that the proposed approach results in a higher video quality (+7.5%) and a lower freeze time (−50.4%), and allows to reduce the live delay compared with traditional solutions over HTTP/1.1.

201 citations


Journal ArticleDOI
TL;DR: The tests showed that bit rate savings of 59% on average can be achieved by HEVC for the same perceived video quality, which is higher than a bit rate saving demonstrated with the PSNR objective quality metric.
Abstract: The High Efficiency Video Coding (HEVC) standard (ITU-T H.265 and ISO/IEC 23008-2) has been developed with the main goal of providing significantly improved video compression compared with its predecessors. In order to evaluate this goal, verification tests were conducted by the Joint Collaborative Team on Video Coding of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29. This paper presents the subjective and objective results of a verification test in which the performance of the new standard is compared with its highly successful predecessor, the Advanced Video Coding (AVC) video compression standard (ITU-T H.264 and ISO/IEC 14496-10). The test used video sequences with resolutions ranging from 480p up to ultra-high definition, encoded at various quality levels using the HEVC Main profile and the AVC High profile. In order to provide a clear evaluation, this paper also discusses various aspects for the analysis of the test results. The tests showed that bit rate savings of 59% on average can be achieved by HEVC for the same perceived video quality, which is higher than a bit rate saving of 44% demonstrated with the PSNR objective quality metric. However, it has been shown that the bit rates required to achieve good quality of compressed content, as well as the bit rate savings relative to AVC, are highly dependent on the characteristics of the tested content.

191 citations


Proceedings ArticleDOI
01 Sep 2016
TL;DR: It is demonstrated by experimental results that the proposed JND analysis performed in the difference domain, called the D-method, achieves a lower BIC (Bayesian information criteria) value than the previously proposed G-method.
Abstract: A compressed video quality assessment dataset based on the just noticeable difference (JND) model, called MCL-JCV, is recently constructed and released. In this work, we explain its design objectives, selected video content and subject test procedures. Then, we conduct statistical analysis on collected JND data. We compute the difference between every two adjacent JND points and propose an outlier detection algorithm to remove unreliable data. We also show that each JND difference group can be well approximated by a normal distribution so that we can adopt the Gaussian mixture model (GMM) to characterize the distribution of multiple JND points. Finally, it is demonstrated by experimental results that the proposed JND analysis performed in the difference domain, called the D-method, achieves a lower BIC (Bayesian information criteria) value than the previously proposed G-method.

178 citations


Journal ArticleDOI
TL;DR: A new NR-VQA metric based on the spatiotemporal natural video statistics in 3D discrete cosine transform (3D-DCT) domain is proposed, which is universal for multiple types of distortions and robust to different databases.
Abstract: It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types, which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are first extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression model afterward. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in the 3D-DCT domain that has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; and 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing full-reference VQA and reduced-reference VQA metrics.

168 citations


Proceedings ArticleDOI
14 Nov 2016
TL;DR: This work develops predictive models for detecting different levels of QoE degradation that is caused by three key influence factors, i.e. stalling, the average video quality and the quality variations, and shows that despite encryption this methodology is able to accurately detect QOE problems with 72\%-92\% accuracy.
Abstract: Tracking and maintaining satisfactory QoE for video streaming services is becoming a greater challenge for mobile network operators than ever before. Downloading and watching video content on mobile devices is currently a growing trend among users, that is causing a demand for higher bandwidth and better provisioning throughout the network infrastructure. At the same time, popular demand for privacy has led many online streaming services to adopt end-to-end encryption, leaving providers with only a handful of indicators for identifying QoE issues. In order to address these challenges, we propose a novel methodology for detecting video streaming QoE issues from encrypted traffic. We develop predictive models for detecting different levels of QoE degradation that is caused by three key influence factors, i.e. stalling, the average video quality and the quality variations. The models are then evaluated on the production network of a large scale mobile operator, where we show that despite encryption our methodology is able to accurately detect QoE problems with 72\%-92\% accuracy, while even higher performance is achieved when dealing with cleartext traffic

155 citations


Proceedings ArticleDOI
TL;DR: This paper is dedicated to discussion of immersive media distribution format and quality estimation process and accuracy and reliability of the proposed objective quality estimation method had been verified with spherical panoramic images demonstrating good correlation results with subjective quality estimation held by a group of experts.
Abstract: Virtual reality (VR)/ augmented reality (AR) applications allow users to view artificial content of a surrounding space simulating presence effect with a help of special applications or devices. Synthetic contents production is well known process form computer graphics domain and pipeline has been already fixed in the industry. However emerging multimedia formats for immersive entertainment applications such as free-viewpoint television (FTV) or spherical panoramic video require different approaches in content management and quality assessment. The international standardization on FTV has been promoted by MPEG. This paper is dedicated to discussion of immersive media distribution format and quality estimation process. Accuracy and reliability of the proposed objective quality estimation method had been verified with spherical panoramic images demonstrating good correlation results with subjective quality estimation held by a group of experts.

145 citations


Journal ArticleDOI
TL;DR: This paper proposes an efficient general-purpose no-reference video quality assessment (VQA) framework that is based on 3D shearlet transform and convolutional neural network and demonstrates that SACONVA performs well in predicting video quality and is competitive with current state-of-the-art full-reference VQA methods and general- Purpose NR-VQ a algorithms.
Abstract: In this paper, we propose an efficient general-purpose no-reference (NR) video quality assessment (VQA) framework that is based on 3D shearlet transform and convolutional neural network (CNN). Taking video blocks as input, simple and efficient primary spatiotemporal features are extracted by 3D shearlet transform, which are capable of capturing natural scene statistics properties. Then, CNN and logistic regression are concatenated to exaggerate the discriminative parts of the primary features and predict a perceptual quality score. The resulting algorithm, which we name shearlet- and CNN-based NR VQA (SACONVA), is tested on well-known VQA databases of Laboratory for Image & Video Engineering, Image & Video Processing Laboratory, and CSIQ. The testing results have demonstrated that SACONVA performs well in predicting video quality and is competitive with current state-of-the-art full-reference VQA methods and general-purpose NR-VQA algorithms. Besides, SACONVA is extended to classify different video distortion types in these three databases and achieves excellent classification accuracy. In addition, we also demonstrate that SACONVA can be directly applied in real applications such as blind video denoising.

119 citations


Proceedings Article
16 Mar 2016
TL;DR: The design and implementation of Critical Feature Analytics is presented and it is demonstrated that CFA leads to significant improvements in video quality; e.g., 32% less buffering time and 12% higher bitrate than a random decision maker.
Abstract: Many prior efforts have suggested that Internet video Quality of Experience (QoE) could be dramatically improved by using data-driven prediction of video quality for different choices (e.g., CDN or bitrate) to make optimal decisions. However, building such a prediction system is challenging on two fronts. First, the relationships between video quality and observed session features can be quite complex. Second, video quality changes dynamically. Thus, we need a prediction model that is (a) expressive enough to capture these complex relationships and (b) capable of updating quality predictions in near real-time. Unfortunately, several seemingly natural solutions (e.g., simple machine learning approaches and simple network models) fail on one or more fronts. Thus, the potential benefits promised by these prior efforts remain unrealized. We address these challenges and present the design and implementation of Critical Feature Analytics (CFA). The design of CFA is driven by domain-specific insights that video quality is typically determined by a small subset of critical features whose criticality persists over several tens of minutes. This enables a scalable and accurate workflow where we automatically learn critical features for different sessions on coarse-grained timescales, while updating quality predictions in near real-time. Using a combination of a real-world pilot deployment and trace-driven analysis, we demonstrate that CFA leads to significant improvements in video quality; e.g., 32% less buffering time and 12% higher bitrate than a random decision maker.

Journal ArticleDOI
TL;DR: A content-aware CMT (CMT-CA) solution that featured by the unequal frame-level scheduling based on estimated video parameters and feedback channel status is proposed to address the total distortion of parallel video transmission over multiple wireless access networks.
Abstract: Delivering high-definition (HD) wireless video under stringent delay constraint is challenging with regard to the limited network resources and high transmission rate. Concurrent multipath transfer (CMT) using stream control transmission protocol (SCTP) exploits the multihoming feature of mobile devices to establish associations with different access networks. In this paper, we study the multihomed HD video communication with SCTP over heterogeneous wireless networks. The existing CMT schemes mainly treat the traffic data in a content-agnostic fashion, and thus cannot effectively leverage the scarce wireless resources to maximize the perceived video quality. To address this critical issue, we propose a content-aware CMT (CMT-CA) solution that featured by the unequal frame-level scheduling based on estimated video parameters and feedback channel status. First, we develop an analytical framework to model the total distortion of parallel video transmission over multiple wireless access networks. Second, we introduce a joint congestion control and data distribution scheme to minimize the total distortion based on online quality evaluation and Markov decision process (MDP). The performance of CMT-CA is evaluated through extensive semi-physical emulations in Exata involving HD video encoded with H.264 codec. Experimental results show that CMT-CA outperforms the reference schemes in terms of video peak signal-to-noise ratio (PSNR), end-to-end delay, and goodput. Or conversely, CMT-CA achieves the same video quality with $20$ percent bandwidth conservation.

Journal ArticleDOI
TL;DR: A Social-aware video multiCast system leveraging device-to-device (D2D) communications to stimulate effective cooperation among mobile users (clients), by making use of two types of important social ties, i.e., social trust and social reciprocity.
Abstract: To meet the explosive demand on delivering high-definition video steams over cellular networks, we design a So cial-aware video multi Cast (SoCast) system leveraging device-to-device (D2D) communications One salient feature of SoCast is to stimulate effective cooperation among mobile users (clients), by making use of two types of important social ties, ie, social trust and social reciprocity By using SoCast, clients form groups to obtain missing packets from other clients and restore incomplete video frames, according to the unique video encoding structure In return, the user perception of the mobile video quality can be substantially improved Specifically, we first cast the problem of social ties based group formation among clients for cooperative video multicast as a coalitional game, and then devise a distributed algorithm to obtain the core solution (group formation) for the formulated coalitional game Further, a resource allocation scheme is proposed for the base station to handle D2D radio resource requests from client groups Extensive numerical studies using real video traces corroborate the significant gain using SoCast

Journal ArticleDOI
TL;DR: An optical flow-based no-reference video quality assessment (NR-VQA) algorithm for assessing the perceptual quality of natural videos based on the hypothesis that distortions affect flow statistics both locally and globally is presented.
Abstract: We present a simple yet effective optical flow-based full-reference video quality assessment (FR-VQA) algorithm for assessing the perceptual quality of natural videos. Our algorithm is based on the premise that local optical flow statistics are affected by distortions and the deviation from pristine flow statistics is proportional to the amount of distortion. We characterize the local flow statistics using the mean, the standard deviation, the coefficient of variation (CV), and the minimum eigenvalue ( $\lambda _{\mathrm{ min}}$ ) of the local flow patches. Temporal distortion is estimated as the change in the CV of the distorted flow with respect to the reference flow, and the correlation between $\lambda _{\mathrm{ min}}$ of the reference and of the distorted patches. We rely on the robust multi-scale structural similarity index for spatial quality estimation. The computed temporal and spatial distortions, thus, are then pooled using a perceptually motivated heuristic to generate a spatio-temporal quality score. The proposed method is shown to be competitive with the state-of-the-art when evaluated on the LIVE SD database, the EPFL Polimi SD database, and the LIVE Mobile HD database. The distortions considered in these databases include those due to compression, packet-loss, wireless channel errors, and rate-adaptation. Our algorithm is flexible enough to allow for any robust FR spatial distortion metric for spatial distortion estimation. In addition, the proposed method is not only parameter-free but also independent of the choice of the optical flow algorithm. Finally, we show that the replacement of the optical flow vectors in our proposed method with the much coarser block motion vectors also results in an acceptable FR-VQA algorithm. Our algorithm is called the flow similarity index.

Proceedings ArticleDOI
10 May 2016
TL;DR: A Video Control Plane is built which enforces Video Quality Fairness among concurrent video flows generated by heterogeneous client devices and a max-min fairness optimization problem is solved at run-time.
Abstract: In this paper we investigate several network-assisted streaming approaches which rely on active cooperation between video streaming applications and the network. We build a Video Control Plane which enforces Video Quality Fairness among concurrent video flows generated by heterogeneous client devices. To the purpose, a max-min fairness optimization problem is solved at run-time. We compare two approaches to actuate the optimal solution in an SDN network: the first one allocating network bandwidth slices to video flows, the second one guiding video players in the video bitrate selection. Performance is assessed through several QoE-related metrics, such as Video Quality Fairness, video quality, and switching frequency. The impact of client-side adaptation algorithms is also investigated.

Proceedings ArticleDOI
19 Aug 2016
TL;DR: Two algorithm optimizations for a distributed cloud-based encoding pipeline are described, including per-title complexity analysis for bitrate-resolution selection and per-chunk bitrate control for consistent-quality encoding, which result in more efficient bandwidth usage and more consistent video quality.
Abstract: A cloud-based encoding pipeline which generates streams for video-on-demand distribution typically processes a wide diversity of content that exhibit varying signal characteristics. To produce the best quality video streams, the system needs to adapt the encoding to each piece of content, in an automated and scalable way. In this paper, we describe two algorithm optimizations for a distributed cloud-based encoding pipeline: (i) per-title complexity analysis for bitrate-resolution selection; and (ii) per-chunk bitrate control for consistent-quality encoding. These improvements result in a number of advantages over a simple “one-size-fits-all” encoding system, including more efficient bandwidth usage and more consistent video quality.

Proceedings ArticleDOI
06 Jun 2016
TL;DR: New methodology for objective models performance evaluation is proposed, based on determining the classification abilities of the models considering two scenarios inspired by the real applications, which enables to easily evaluate the performance on the data from multiple subjective experiments.
Abstract: There are several standard methods for evaluating the performance of models for objective quality assessment with respect to results of subjective tests. However, all of them suffer from one or more of the following drawbacks: They do not consider the uncertainty in the subjective scores, requiring the models to make certain decision where the correct behavior is not known. They are vulnerable to the quality range of the stimuli in the experiments. In order to compare the models, they require a mapping of predicted values to the subjective scores, thus not comparing the models exactly as they are used in the real scenarios. In this paper, new methodology for objective models performance evaluation is proposed. The method is based on determining the classification abilities of the models considering two scenarios inspired by the real applications. It does not suffer from the previously stated drawbacks and enables to easily evaluate the performance on the data from multiple subjective experiments. Moreover, techniques to determine statistical significance of the performance differences are suggested. The proposed framework is tested on several selected metrics and datasets, showing the ability to provide a complementary information about the models' behavior while being in parallel with other state-of-the-art methods.

Journal ArticleDOI
TL;DR: A novel scheduling framework dubbed delAy Stringent COded Transmission (ASCOT) that featured by frame-level data protection and allocation overmultiple wireless access networks that outperforms existing transmission schemes in improving the video PSNR, reducing the end-to-end delay, and increasing the goodput.
Abstract: Delivering high-quality mobile video with the limited radio resources is challenging due to the time-varying channel status and stringent Quality of Service (QoS) requirements Multi-homing support enables mobile terminals to establish multiple simultaneous associations for enhancing transmission performance In this paper, we study the multi-homed communication of delay-constrained High Definition (HD) video in heterogeneous wireless networks The low-delay encoded HD video streaming consists exclusively of Intra (I) and Predicted (P) frames In the capacity-limited wireless networks, it is highly possible the large-size I frames experience deadline violations and induce severe quality degradations To address the challenging problem, we propose a novel scheduling framework dubbed del A y S tringent CO ded T ransmission (ASCOT) that featured by frame-level data protection and allocation over multiple wireless access networks First, we perform online video distortion estimation to simulate multi-homed transmission impairments according to the feedback channel status and input video data Second, we control the frame protection level by adapting the Forward Error Correction (FEC) coding redundancy and video data allocation to achieve target quality The performance of the proposed ASCOT is evaluated through semi-physical emulations in Exata using real-time H264 video streaming Experimental results show that ASCOT outperforms existing transmission schemes in improving the video PSNR (Peak Signal-to-Noise Ratio), reducing the end-to-end delay, and increasing the goodput Or conversely, ASCOT achieves the same video quality with approximately $20$ percent bandwidth conservation

Journal ArticleDOI
TL;DR: This paper presents a novel rate control framework based on the Lagrange multiplier in high-efficiency video coding that outperforms the rate control used in HEVC Test Model by providing a more accurate rate regulation, lower video quality fluctuation, and stabler buffer fullness.
Abstract: Video quality fluctuation plays a significant role in human visual perception, and hence, many rate control approaches have been widely developed to maintain consistent quality for video communication. This paper presents a novel rate control framework based on the Lagrange multiplier in high-efficiency video coding. With the assumption of constant quality control, a new relationship between the distortion and the Lagrange multiplier is established. Based on the proposed distortion model and buffer status, we obtain a computationally feasible solution to the problem of minimizing the distortion variation across video frames at the coding tree unit level. Extensive simulation results show that our method outperforms the rate control used in HEVC Test Model (HM) by providing a more accurate rate regulation, lower video quality fluctuation, and stabler buffer fullness. The average peak signal-to-noise ratio (PSNR) and PSNR deviation improvements are about 0.37 dB and 57.14% in the low-delay (P and B) video communication, where the complexity overhead is $\sim 4.44\%$ .

Journal ArticleDOI
TL;DR: A generic framework utilizing the powerful and elastic cloud computing services for crowdsourced live streaming with heterogeneous broadcasters and viewers is presented, and an optimal scheduler for allocating cloud instances with no regional constraints is developed.
Abstract: With the advances in personal computing devices and the prevalence of broadband network and wireless mobile network accesses, end-users are no longer pure content consumers, but contributors, too. In today's crowdsourced streaming systems, numerous broadcasters lively stream their video content, e.g., live events or online game scenes, to fellow viewers. Compared to professional video producers and broadcasters, these new generation broadcasters are geo-distributed globally and highly heterogeneous in terms of the generated video quality and the network/system configurations. The scalability and heterogeneity challenges therefore lie on both broadcasters and the viewers, which call for massive transcoding, and two critical issues: 1) choosing video representation set that maximizes viewer satisfaction and 2) allocating computational resources that minimize operational costs, must be systematically optimized in the global scale. In this paper, we present a generic framework utilizing the powerful and elastic cloud computing services for crowdsourced live streaming with heterogeneous broadcasters and viewers. We jointly consider the viewer satisfaction and the service availability/pricing of geo-distributed cloud resources for transcoding. We develop an optimal scheduler for allocating cloud instances with no regional constraints. We then extend the solution to accommodate regional constraints, and discuss a series of practical enhancements, including popularity forecasting, initialization latency, and viewer feedbacks. Our solutions have been evaluated under diverse networks and cloud system configurations as well as parameter settings. The trace-driven simulation confirms the superiority of our design, while our Planetlab-based experiment offers further practical hints toward real-world migration.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: This paper describes a novel general-purpose NR-IQA framework which is based on deep Convolutional Neural Networks (CNN), Directly taking a raw image as input and outputting the image quality score, which provides an end-to-end solution to the NR- IQA problem and frees us from designing hand-crafted features.
Abstract: The state-of-the-art general-purpose no-reference image or video quality assessment (NR-I/VQA) algorithms usually rely on elaborated hand-crafted features which capture the Natural Scene Statistics (NSS) properties. However, designing these features is usually not an easy problem. In this paper, we describe a novel general-purpose NR-IQA framework which is based on deep Convolutional Neural Networks (CNN). Directly taking a raw image as input and outputting the image quality score, this new framework integrates the feature learning and regression into one optimization process, which provides an end-to-end solution to the NR-IQA problem and frees us from designing hand-crafted features. This approach achieves excellent performance on the LIVE dataset and is very competitive with other state-of-the-art NR-IQA algorithms.

Journal ArticleDOI
TL;DR: A combination of geometric transformations and outliers rejection to obtain a robust inter-frame motion estimation, and a Kalman filter based on an ANN learned model of the MAV that includes the control action for motion intention estimation are used.
Abstract: The emerging branch of micro aerial vehicles (MAVs) has attracted a great interest for their indoor navigation capabilities, but they require a high quality video for tele-operated or autonomous tasks. A common problem of on-board video quality is the effect of undesired movements, so different approaches solve it with both mechanical stabilizers or video stabilizer software. Very few video stabilizer algorithms in the literature can be applied in real-time but they do not discriminate at all between intentional movements of the tele-operator and undesired ones. In this paper, a novel technique is introduced for real-time video stabilization with low computational cost, without generating false movements or decreasing the performance of the stabilized video sequence. Our proposal uses a combination of geometric transformations and outliers rejection to obtain a robust inter-frame motion estimation, and a Kalman filter based on an ANN learned model of the MAV that includes the control action for motion intention estimation.

Journal ArticleDOI
TL;DR: This paper takes peak signal-to-noise ratio (PSNR) as the measurement for video quality and considers both the downlink transmission energy and reception energy in a D2D-assisted cellular communication in video stream sharing scenario and proposes two energy saving solutions.
Abstract: Cellular network is widely used and device-to-device (D2D)-assisted approaches have been proposed for improving performance on spectrum efficiency, overall throughput and energy efficiency. However, few of them have considered the downlink transmission for multiple concurrent devices from an energy efficiency perspective. In this paper, we focus on a D2D-assisted cellular communication in video stream sharing scenario. Two energy saving solutions for downlink transmission are proposed with constraint on D2D cluster’s energy consumption. We take peak signal-to-noise ratio (PSNR) as the measurement for video quality and consider both the downlink transmission energy and reception energy. In particular, we propose the D2D cluster formation approach and the D2D caching performance both for the purpose of energy saving, with distributed merge-and-split algorithm adopted from the perspective of coalition game theory and a relaxation factor defined to give constraints on total energy consumption for each cluster. Both D2D cluster and D2D caching approaches are effective for energy saving for the BS combined with all user devices; however, D2D cluster brings an unfairness problem between the cluster head and other cluster nodes. Therefore, we compare the two approaches on energy saving performance as well as fairness measurement. Moreover, a centralized algorithm for D2D cluster is also proposed as a benchmark for the distributed D2D cluster algorithm. Simulation result shows considerable amount of energy saving in the proposed D2D cluster and caching assisted cellular network for video stream sharing problem.

Journal ArticleDOI
TL;DR: Experimental results show that CAASS can dynamically adjust the service level according to the environment variation and outperforms the existing streaming approaches in adaptive streaming media distribution according to peak signal-to-noise ratio (PSNR).
Abstract: We consider the problem of streaming media transmission in a heterogeneous network from a multisource server to home multiple terminals. In wired network, the transmission performance is limited by network state (e.g., the bandwidth variation, jitter, and packet loss). In wireless network, the multiple user terminals can cause bandwidth competition. Thus, the streaming media distribution in a heterogeneous network becomes a severe challenge which is critical for QoS guarantee. In this paper, we propose a context-aware adaptive streaming media distribution system (CAASS), which implements the context-aware module to perceive the environment parameters and use the strategy analysis (SA) module to deduce the most suitable service level. This approach is able to improve the video quality for guarantying streaming QoS. We formulate the optimization problem of QoS relationship with the environment parameters based on the QoS testing algorithm for IPTV in ITU-T G.1070. We evaluate the performance of the proposed CAASS through 12 types of experimental environments using a prototype system. Experimental results show that CAASS can dynamically adjust the service level according to the environment variation (e.g., network state and terminal performances) and outperforms the existing streaming approaches in adaptive streaming media distribution according to peak signal-to-noise ratio (PSNR).

Journal ArticleDOI
TL;DR: While Google Glass provides a great breadth of functionality as a wearable device with two-way communication capabilities, current hardware limitations prevent its use as a telementoring device in surgery as the video quality is inadequate for safe telementored.
Abstract: The goal of telementoring is to recreate face-to-face encounters with a digital presence. Open-surgery telementoring is limited by lack of surgeon’s point-of-view cameras. Google Glass is a wearable computer that looks like a pair of glasses but is equipped with wireless connectivity, a camera, and viewing screen for video conferencing. This study aimed to assess the safety of using Google Glass by assessing the video quality of a telementoring session. Thirty-four (n = 34) surgeons at a single institution were surveyed and blindly compared via video captured with Google Glass versus an Apple iPhone 5 during the open cholecystectomy portion of a Whipple. Surgeons were asked to evaluate the quality of the video and its adequacy for safe use in telementoring. Thirty-four of 107 invited surgical attendings (32 %) responded to the anonymous survey. A total of 50 % rated the Google Glass video as fair with the other 50 % rating it as bad to poor. A total of 52.9 % of respondents rated the Apple iPhone video as good. A significantly greater proportion of respondents felt Google Glass video quality was inadequate for telementoring versus the Apple iPhone’s (82.4 vs 26.5 %, p < 0.0001). Intraclass correlation coefficient was 0.924 (95 % CI 0.660–0.999, p < 0.001). While Google Glass provides a great breadth of functionality as a wearable device with two-way communication capabilities, current hardware limitations prevent its use as a telementoring device in surgery as the video quality is inadequate for safe telementoring. As the device is still in initial phases of development, future iterations or competitor devices may provide a better telementoring application for wearable devices.

Proceedings ArticleDOI
22 Aug 2016
TL;DR: This work developed a heuristic to identify policing from server-side traces and built a pipeline to deploy it at scale on traces from a large online content provider, collected from hundreds of servers worldwide.
Abstract: Large flows like videos consume significant bandwidth. Some ISPs actively manage these high volume flows with techniques like policing, which enforces a flow rate by dropping excess traffic. While the existence of policing is well known, our contribution is an Internet-wide study quantifying its prevalence and impact on video quality metrics. We developed a heuristic to identify policing from server-side traces and built a pipeline to deploy it at scale on traces from a large online content provider, collected from hundreds of servers worldwide. Using a dataset of 270 billion packets served to 28,400 client ASes, we find that, depending on region, up to 7% of lossy transfers are policed. Loss rates are on average six times higher when a trace is policed, and it impacts video playback quality. We show that alternatives to policing, like pacing and shaping, can achieve traffic management goals while avoiding the deleterious effects of policing.

Journal ArticleDOI
TL;DR: This paper presents a mathematical framework to analyze the frame-level energy-quality tradeoff for delay-constrained multihomed video communication over multiple communication paths and develops scheduling algorithms for prioritized frame scheduling and unequal loss protection to achieve target video quality with minimum device energy consumption.
Abstract: The technological evolutions in wireless communication systems prompt the bandwidth aggregation (e.g., Wi-Fi and LTE radio interfaces) for concurrent video transmission to hand-held devices. However, multipath video transport to the battery-limited mobile terminals is confronted with challenging technical problems: 1) high-quality real-time video streaming is throughput-demanding and delay-sensitive; 2) mobile device energy and video quality are not adequately considered in conventional multipath protocols; and 3) wireless networks are error-prone and bandwidth-limited. To enable the energy-efficient and quality-guaranteed live video streaming over heterogeneous wireless access networks, this paper proposes an energy-video aware multipath transport protocol (EVIS). First, we present a mathematical framework to analyze the frame-level energy-quality tradeoff for delay-constrained multihomed video communication over multiple communication paths. Second, we develop scheduling algorithms for prioritized frame scheduling and unequal loss protection to achieve target video quality with minimum device energy consumption. EVIS is able to effectively leverage video frame priority and rateless Raptor coding to jointly optimize energy efficiency and perceived quality. We conduct performance evaluation through extensive emulations in Exata involving real-time H.264 video streaming. Emulation results demonstrate that EVIS advances the state-of-the-art with remarkable improvements in energy conservation, video peak signal-to-noise ratio (PSNR), end-to-end delay, and goodput.

Journal ArticleDOI
TL;DR: This proposal enhances the legacy multicast transmission over LTE systems by exploiting multiuser diversity and the users' channel quality feedbacks and is designed to take advantage of the frequency selectivity in the subgroup formation.
Abstract: The growing demand for mobile multicast services, such as Internet Protocol television (IPTV) and video streaming, requires effective radio resource management (RRM) to handle traffic with strict quality-of-service constraints over Long-Term Evolution (LTE) and beyond systems Special care is needed to limit system performance degradation when multiple multicast streams are simultaneously transmitted To this aim, this paper proposes an RRM policy based on a subgrouping technique for the delivery of scalable multicast video flows in a cell Our proposal enhances the legacy multicast transmission over LTE systems by exploiting multiuser diversity and the users' channel quality feedbacks Moreover, it is designed to take advantage of the frequency selectivity in the subgroup formation Simulation results demonstrate the effectiveness of the proposed scheme, which outperforms existing approaches from the literature It succeeds in achieving higher spectral efficiency and guaranteeing adequate video quality to all multicast receivers and improved quality to those with good channel conditions

Journal ArticleDOI
TL;DR: A no-reference video quality metric that takes three impairment factors into account, initial buffering delay, temporal interruptions or pauses, and video resolution changes during a video transmission is proposed and implemented on PC and mobile hand-held devices.
Abstract: Currently, dynamic adaptive streaming over HTTP (DASH) standard is used to change the video resolution according to the network capacity of the end user’s. However, if the video resolution changes frequently, the attention of the users can be affected decreasing their quality-of-experience (QoE). In this context, this paper proposes a no-reference video quality metric that takes three impairment factors into account, initial buffering delay, temporal interruptions or pauses, and video resolution changes during a video transmission. Also, the temporal locations of pauses and resolution changes are considered. In order to perform this task, extensive subjective tests were conducted. Experimental results show that the evaluators’ QoE is highly correlated with the frequency of pauses, video resolution changes and initial delay. Based on these results a video streaming quality metric for DASH services (VsQM $_{\text {DASH}}$ ) is formulated and implemented on PC and mobile hand-held devices. Finally, the performance of VsQM $_{\text {DASH}}$ is validated with additional subjective tests reaching a person correlation coefficient of 0.99. Furthermore, experimental results demonstrated that the proposed metric has low complexity because it is based on application level parameters and its energy consumption is negligible in relation to the energy consumed by the video player; therefore, can be easily implemented in consumer electronic devices.

Journal ArticleDOI
TL;DR: This work introduces an adaptation algorithm for HTTP-based live streaming called LOLYPOP (short for low-latency prediction-based adaptation), which is designed to operate with a transport latency of a few seconds, and leverages Transmission Control Protocol throughput predictions on multiple time scales.
Abstract: Recently, Hypertext Transfer Protocol (HTTP)-based adaptive streaming has become the de facto standard for video streaming over the Internet. It allows clients to dynamically adapt media characteristics to the varying network conditions to ensure a high quality of experience (QoE)—that is, minimize playback interruptions while maximizing video quality at a reasonable level of quality changes. In the case of live streaming, this task becomes particularly challenging due to the latency constraints. The challenge further increases if a client uses a wireless access network, where the throughput is subject to considerable fluctuations. Consequently, live streams often exhibit latencies of up to 20 to 30 seconds. In the present work, we introduce an adaptation algorithm for HTTP-based live streaming called LOLYPOP (short for low-latency prediction-based adaptation), which is designed to operate with a transport latency of a few seconds. To reach this goal, LOLYPOP leverages Transmission Control Protocol throughput predictions on multiple time scales, from 1 to 10 seconds, along with estimations of the relative prediction error distributions. In addition to satisfying the latency constraint, the algorithm heuristically maximizes the QoE by maximizing the average video quality as a function of the number of skipped segments and quality transitions. To select an efficient prediction method, we studied the performance of several time series prediction methods in IEEE 802.11 wireless access networks. We evaluated LOLYPOP under a large set of experimental conditions, limiting the transport latency to 3 seconds, against a state-of-the-art adaptation algorithm called FESTIVE. We observed that the average selected video representation index is by up to a factor of 3 higher than with the baseline approach. We also observed that LOLYPOP is able to reach points from a broader region in the QoE space, and thus it is better adjustable to the user profile or service provider requirements.