scispace - formally typeset
Search or ask a question

Showing papers on "Video quality published in 2017"


Proceedings ArticleDOI
30 Jun 2017
TL;DR: KoNViD-1k is reported on, a subjectively annotated VQA database consisting of 1,200 public-domain video sequences, fairly sampled from a large public video dataset, YFCC100m, aimed at ‘in the wild’ authentic distortions.
Abstract: Subjective video quality assessment (VQA) strongly depends on semantics, context, and the types of visual distortions. Currently, all existing VQA databases include only a small number of video sequences with artificial distortions. The development and evaluation of objective quality assessment methods would benefit from having larger datasets of real-world video sequences with corresponding subjective mean opinion scores (MOS), in particular for deep learning purposes. In addition, the training and validation of any VQA method intended to be ‘general purpose’ requires a large dataset of video sequences that are representative of the whole spectrum of available video content and all types of distortions. We report our work on KoNViD-1k, a subjectively annotated VQA database consisting of 1,200 public-domain video sequences, fairly sampled from a large public video dataset, YFCC100m. We present the challenges and choices we have made in creating such a database aimed at ‘in the wild’ authentic distortions, depicting a wide variety of content.

217 citations


Journal ArticleDOI
TL;DR: This survey paper looks at emerging research into the application of client-side, server- side, and in-network rate adaptation techniques to support DASH-based content delivery and provides context and motivation for the application.
Abstract: With companies such as Netflix and YouTube accounting for more than 50% of the peak download traffic on North American fixed networks in 2015, video streaming represents a significant source of Internet traffic. Multimedia delivery over the Internet has evolved rapidly over the past few years. The last decade has seen video streaming transitioning from User Datagram Protocol to Transmission Control Protocol-based technologies. Dynamic adaptive streaming over HTTP (DASH) has recently emerged as a standard for Internet video streaming. A range of rate adaptation mechanisms are proposed for DASH systems in order to deliver video quality that matches the throughput of dynamic network conditions for a richer user experience. This survey paper looks at emerging research into the application of client-side, server-side, and in-network rate adaptation techniques to support DASH-based content delivery. We provide context and motivation for the application of these techniques and review significant works in the literature from the past decade. These works are categorized according to the feedback signals used and the end-node that performs or assists with the adaptation. We also provide a review of several notable video traffic measurement and characterization studies and outline open research questions in the field.

216 citations


Proceedings ArticleDOI
21 May 2017
TL;DR: The impact of various spherical-to-plane projections and quality arrangements on the video quality displayed to the user is investigated, showing that the cube map layout offers the best quality for the given bit-rate budget.
Abstract: The delivery and display of 360-degree videos on Head-Mounted Displays (HMDs) presents many technical challenges. 360-degree videos are ultra high resolution spherical videos, which contain an omnidirectional view of the scene. However only a portion of this scene is displayed on the HMD. Moreover, HMD need to respond in 10 ms to head movements, which prevents the server to send only the displayed video part based on client feedback. To reduce the bandwidth waste, while still providing an immersive experience, a viewport-adaptive 360-degree video streaming system is proposed. The server prepares multiple video representations, which differ not only by their bit-rate, but also by the qualities of different scene regions. The client chooses a representation for the next segment such that its bit-rate fits the available throughput and a full quality region matches its viewing. We investigate the impact of various spherical-to-plane projections and quality arrangements on the video quality displayed to the user, showing that the cube map layout offers the best quality for the given bit-rate budget. An evaluation with a dataset of users navigating 360-degree videos demonstrates that segments need to be short enough to enable frequent view switches.

213 citations


Journal ArticleDOI
TL;DR: A new video quality database is created, which simulates a typical video streaming application, using long video sequences and interesting Netflix content, and it is found that objective video quality models are unreliable for QoE prediction on videos suffering from both rebuffering events and bitrate changes.
Abstract: HTTP adaptive streaming is being increasingly deployed by network content providers, such as Netflix and YouTube. By dividing video content into data chunks encoded at different bitrates, a client is able to request the appropriate bitrate for the segment to be played next based on the estimated network conditions. However, this can introduce a number of impairments, including compression artifacts and rebuffering events, which can severely impact an end-user’s quality of experience (QoE). We have recently created a new video quality database, which simulates a typical video streaming application, using long video sequences and interesting Netflix content. Going beyond previous efforts, the new database contains highly diverse and contemporary content, and it includes the subjective opinions of a sizable number of human subjects regarding the effects on QoE of both rebuffering and compression distortions. We observed that rebuffering is always obvious and unpleasant to subjects, while bitrate changes may be less obvious due to content-related dependencies. Transient bitrate drops were preferable over rebuffering only on low complexity video content, while consistently low bitrates were poorly tolerated. We evaluated different objective video quality assessment algorithms on our database and found that objective video quality models are unreliable for QoE prediction on videos suffering from both rebuffering events and bitrate changes. This implies the need for more general QoE models that take into account objective quality models, rebuffering-aware information, and memory. The publicly available video content as well as metadata for all of the videos in the new database can be found at http://live.ece.utexas.edu/research/LIVE_NFLXStudy/nflx_index.html .

155 citations


Journal ArticleDOI
Zhengfang Duanmu1, Kai Zeng1, Kede Ma1, Abdul Rehman1, Zhou Wang1 
TL;DR: This work builds a streaming video database and carries out a subjective user study to investigate the human responses to the combined effect of video compression, initial buffering, and stalling, and proposes a novel QoE prediction approach named Streaming QOE Index that accounts for the instantaneous quality degradation due to perceptual video presentation impairment, the playback stalling events, and the instantaneous interactions between them.
Abstract: With the rapid growth of streaming media applications, there has been a strong demand of quality-of-experience (QoE) measurement and QoE-driven video delivery technologies. Most existing methods rely on bitrate and global statistics of stalling events for QoE prediction. This is problematic for two reasons. First, using the same bitrate to encode different video content results in drastically different presentation quality. Second, the interactions between video presentation quality and playback stalling experiences are not accounted for. In this work, we first build a streaming video database and carry out a subjective user study to investigate the human responses to the combined effect of video compression, initial buffering, and stalling. We then propose a novel QoE prediction approach named Streaming QoE Index that accounts for the instantaneous quality degradation due to perceptual video presentation impairment, the playback stalling events, and the instantaneous interactions between them. Experimental results show that the proposed model is in close agreement with subjective opinions and significantly outperforms existing QoE models. The proposed model provides a highly effective and efficient meanings for QoE prediction in video streaming services. 1

144 citations


Journal ArticleDOI
TL;DR: A new family of I/VQA models, which this work calls the spatial efficient entropic differencing for quality assessment (SpEED-QA) model, relies on local spatial operations on image frames and frame differences to compute perceptually relevant image/video quality features in an efficient way.
Abstract: Many image and video quality assessment (I/VQA) models rely on data transformations of image/video frames, which increases their programming and computational complexity. By comparison, some of the most popular I/VQA models deploy simple spatial bandpass operations at a couple of scales, making them attractive for efficient implementation. Here we design reduced-reference image and video quality models of this type that are derived from the high-performance reduced reference entropic differencing (RRED) I/VQA models. A new family of I/VQA models, which we call the spatial efficient entropic differencing for quality assessment (SpEED-QA) model, relies on local spatial operations on image frames and frame differences to compute perceptually relevant image/video quality features in an efficient way. Software for SpEED-QA is available at: http://live.ece.utexas.edu/research/Quality/SpEED_Demo.zip.

140 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper built a large-scale JND-based coded video quality dataset, which consists of 220 5-s sequences in four resolutions (i.e., 1920 × 1080, 1280 × 720, 960 × 540 and 640 × 360 ).

118 citations


Journal ArticleDOI
TL;DR: An overview of selected issues pertaining to QeE and its recent applications in video transmission, with consideration of the compelling features of QoE (i.e., context and human factors).
Abstract: The increasing popularity of video (i.e., audio-visual) applications or services over both wired and wireless links has prompted recent growing interests in the investigations of quality of experience (QoE) in online video transmission. Conventional video quality metrics, such as peak-signal-to-noise-ratio and quality of service, only focus on the reception quality from the systematic perspective. As a result, they cannot represent the true visual experience of an individual user. Instead, the QoE introduces a user experience-driven strategy which puts special emphasis on the contextual and human factors in addition to the transmission system. This advantage has raised the popularity and widespread usage of QoE in video transmission. In this paper, we present an overview of selected issues pertaining to QoE and its recent applications in video transmission, with consideration of the compelling features of QoE (i.e., context and human factors). The selected issues include QoE modeling with influence factors in the end-to-end chain of video transmission, QoE assessment (including subjective test and objective QoE monitoring) and QoE management of video transmission over different types of networks. Through the literature review, we observe that the context and human factors in QoE-aware video transmission have attracted significant attentions since the past two to three years. A vast number of high quality works were published in this area, and will be highlighted in this survey. In addition to a thorough summary of recent progresses, we also present an outlook of future developments on QoE assessment and management in video transmission, especially focusing on the context and human factors that have not been addressed yet and the technical challenges that have not been completely solved so far. We believe that our overview and findings can provide a timely perspective on the related issues and the future research directions in QoE-oriented services over video communications.

118 citations


Proceedings ArticleDOI
07 Jun 2017
TL;DR: The author and colleagues used VMAF to measure the quality of a 4K dataset encoded with the RealMedia video CODEC at a range of bitrates and gathered subjective quality assessments from a group of viewers for the same dataset.
Abstract: Measuring video quality with standard metrics ensures that operators can deliver to consumers the desired quality of experience (QoE) at an optimal cost. Such metrics also allow CODEC engineers to optimize the performance of their encoding algorithms. This paper briefly surveys existing video quality metrics and then presents results of the new Video Multi-Method Assessment Fusion (VMAF) metric [1] proposed by Netflix. The author and colleagues used VMAF to measure the quality of a 4K dataset encoded with the RealMedia video CODEC at a range of bitrates. They also gathered subjective quality assessments from a group of viewers for the same dataset. The paper presents findings of correlation between subjective and objective results.

115 citations


Journal ArticleDOI
TL;DR: This work proposes a systematic solution for content delivery over UDNs by integrating collaboration with intelligence by designing a hybrid video coding scheme that is flexible and robust to the dynamic wireless environment.
Abstract: With the increasing popularity of traffic-intensive video applications, UDNs are treated as one of the most promising technologies for massive video delivery. However, due to the drastic interference between neighboring cells, how to achieve high energy and spectrum efficiency is still an open and challenging problem. This work proposes a systematic solution for content delivery over UDNs by integrating collaboration with intelligence. In particular, we first design a hybrid video coding scheme that is flexible and robust to the dynamic wireless environment. Then an active and proactive video update strategy is designed by intelligently alleviating the impact of the interference. Finally, a collaborative video scheduling scheme is developed to maximize the video quality as well as the energy and spectrum efficiency. Importantly, we summarize three fundamental design guidelines, and believe that they are useful for improving the transmission capacity of UDNs.

103 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: This paper reviews standard approaches toward 360 degree video encoding and compares these to a new, as yet unpublished, approach by Oculus which is referred to as the offset cubic projection, which can produce better or similar visual quality while using less than 50% pixels under reasonable assumptions about user behavior.
Abstract: 360 degree video is anew generation of video streaming technology that promises greater immersiveness than standard video streams. This level of immersiveness is similar to that produced by virtual reality devices -- users can control the field of view using head movements rather than needing to manipulate external devices. Although 360 degree video could revolutionize streaming technology, large scale adoption is hindered by a number of factors. 360 degree video streams have larger bandwidth requirements, require faster responsiveness to user inputs, and users may be more sensitive to lower quality streams.; AB@In this paper, we review standard approaches toward 360 degree video encoding and compare these to a new, as yet unpublished, approach by Oculus which we refer to as the offset cubic projection. Compared to the standard cubic encoding, the offset cube encodes a distorted version of the spherical surface, devoting more information (i.e., pixels) to the view in a chosen direction. We estimate that the offset cube representation can produce better or similar visual quality while using less than 50% pixels under reasonable assumptions about user behavior, resulting in 5.6% to 16.4% average savings in video bitrate. During 360 degree video streaming, Oculus uses a combination of quality level adaptation and view orientation adaptation. We estimate that this combination of streaming adaptation in two dimensions can cause over 57% extra segments to be downloaded compared to an ideal downloading strategy, wasting 20% of the total downloading bandwidth.

Journal ArticleDOI
TL;DR: This paper presents a novel and practically feasible system architecture named MVP (mobile edge virtualization with adaptive prefetching), which enables content providers to embed their content intelligence as a virtual network function into the mobile network operator's infrastructure edge in order to achieve quality of experience (QoE)-assured 4K video on demand (VoD) delivery across the global Internet.
Abstract: Internet video streaming applications have been demanding more bandwidth and higher video quality, especially with the advent of virtual reality and augmented reality appli-cations. While adaptive strea ming protocols like MPEG-DASH (dynamic adaptive streaming over HTTP) allows video quality to be flexibly adapted, e.g., degraded when mobile network condition deteriorates, this is not an option if the application itself requires guaranteed 4K quality at all time. On the other hand, conventional end-to-end transmission control protocol (TCP) has been struggling in supporting 4K video delivery across long-distance Internet paths containing both fixed and mobile network segments with heterogeneous characteristics. In this paper, we present a novel and practically feasible system architecture named MVP (mobile edge virtualization with adaptive prefetching), which enables content providers to embed their content intelligence as a virtual network function into the mobile network operator's infrastructure edge. Based on this architecture, we present a context-aware adaptive video prefetching scheme in order to achieve quality of experience (QoE)-assured 4K video on demand (VoD) delivery across the global Internet. Through experiments based on a real LTE-A network infrastructure, we demonstrate that our proposed scheme is able to achieve QoE-assured 4K VoD streaming, especially when the video source is located remotely in the public Internet, in which case none of the state-of-the-art solutions is able to support such an objective at global Internet scale.

Journal ArticleDOI
TL;DR: The main components of a surveillance system are presented and studied thoroughly, and the most important deep learning algorithms are presented, along with the smart analytics that they utilize.

Journal ArticleDOI
TL;DR: A methodology for the classification of end users’ QoE when watching YouTube videos is presented, based only on statistical properties of encrypted network traffic, and YouTube’s adaptation algorithm is analysed, providing valuable insight into the logic behind the quality level selection strategy.
Abstract: Due to the widespread use of encryption in Over-The-Top video streaming traffic, network operators generally lack insight into application-level quality indicators (e.g., video quality levels, buffer underruns, stalling duration). They are thus faced with the challenge of finding solutions for monitoring service performance and estimating customer Quality of Experience (QoE) degradations based solely on passive monitoring solutions deployed within their network. We address this challenge by considering the concrete case of YouTube, whereby we present a methodology for the classification of end users' QoE when watching YouTube videos, based only on statistical properties of encrypted network traffic. We have developed a system called YouQ which includes tools for monitoring and analysis of application-level quality indicators and corresponding traffic traces. Collected data is then used for the development of machine learning models for QoE classification based on computed traffic features per video session. To test the YouQ system and methodology, we collected a dataset corresponding to 1060 different YouTube videos streamed across 39 different bandwidth scenarios, and tested various classification models. Classification accuracy was found to be up to 84% when using three QoE classes ("low", "medium" or "high") and up to 91% when using binary classification (classes "low" and "high"). To improve the models in the future, we discuss why and when prediction errors occur. Moreover, we have analysed YouTube's adaptation algorithm, thus providing valuable insight into the logic behind the quality level selection strategy, which may also be of interest in improving future QoE estimation algorithms.

Proceedings ArticleDOI
01 May 2017
TL;DR: The scalable video quality model part of the P.1203 Recommendation series, developed in a competition within ITU-T Study Group 12 previously referred to as P.NATS, provides integral quality predictions for 1 up to 5 min long media sessions for HTTP Adaptive Streaming with up to HD video resolution.
Abstract: The paper presents the scalable video quality model part of the P.1203 Recommendation series, developed in a competition within ITU-T Study Group 12 previously referred to as P.NATS. It provides integral quality predictions for 1 up to 5 min long media sessions for HTTP Adaptive Streaming (HAS) with up to HD video resolution. The model is available in four modes of operation for different levels of media-related bitstream information, reflecting different types of encryption of the media stream. The video quality model presented in this paper delivers short-term video quality estimates that serve as input to the integration component of the P.1203 model. The scalable approach consists in the usage of the same components for spatial and temporal scaling degradations across all modes. The third component of the model addresses video coding artifacts. To this aim, a single model parameter is introduced that can be derived from different types of bitstream input information. Depending on the complexity of the available input, one of four scaling-levels of the model is applied. The paper presents the different novelties of the model and scientific choices made during its development, the test design, and an analysis of the model performance across the different modes.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed algorithm can save 44.92% encoding time on average with negligible loss of video quality.
Abstract: Screen content coding (SCC) has evolved into the extension of the High Efficiency Video Coding (HEVC). Low-latency, real-time transport between devices in the form of screen content video is becoming popular in many applications. However, the complexity of encoder is still very high for intra prediction in HEVC-based SCC. This paper proposes a fast intra prediction method based on content property analysis for HEVC-based SCC. First, coding units (CUs) are classified into natural content CUs (NCCUs) and screen content CUs (SCCUs), based on the statistic characteristics of the content. For NCCUs, the newly adopted prediction modes, including intra block copy mode and palette mode are skipped, if the DC or PLANR mode is the best mode, after testing the traditional intra prediction rough modes. In addition, the quadtree partition process is also terminated due to the homogeneous and smooth block usually chooses a large size CU. For SCCUs, a rank-based decision strategy is introduced to terminate the splitting process of current CU. For all CUs, the bit per pixel of current CU is used to make a CU size decision. Meanwhile, the depth information of neighboring CUs and co-located CU are utilized to further improve the performance. Experimental results show that the proposed algorithm can save 44.92% encoding time on average with negligible loss of video quality.

Journal ArticleDOI
TL;DR: This work investigates the temporal masking effect from the perspective of perceived utility, which allows it to preserve the quality of the high utility content and substitute the low utility regions with the corresponding smooth version.
Abstract: In this work, we propose a utility-driven preprocessing technique for high-efficiency screen content video (SCV) compression based on the temporal masking effect, which was found to be a fundamental attribute that plays an important role in human visual perception of video quality, but has not been fully exploited in the context of SCV coding. Specifically, we investigate the temporal masking effect from the perspective of perceived utility, which allows us to preserve the quality of the high utility content and substitute the low utility regions with the corresponding smooth version. To distinguish the regional utilities, a specifically designed block type identification algorithm for screen content is employed to measure the local properties. Subsequently, the Gaussian filter is applied to smooth out the high-frequency components in the detected low utility regions to save consumption bits. Validations based on subjective testings show that the proposed approach is capable of achieving significant bitrate savings with little sacrifice on the final utility compared with the conventional SCV coding scheme.

Journal ArticleDOI
TL;DR: This paper presents a comprehensive survey of the most significant research activities in the area of client-side HTTP-based adaptive video streaming, decomposing the ABR module into three subcomponents, namely: resource estimation function, chunk request scheduling, and adaptation module.
Abstract: HTTP adaptive streaming (HAS) is the most recent attempt regarding video quality adaptation. It enables cheap and easy to implement streaming technology without the need for a dedicated infrastructure. By using a combination of TCP and HTTP it has the advantage of reusing all the existing technologies designed for ordinary web. Equally important is that HAS traffic passes through firewalls and works well when NAT is deployed. The rate adaptation controller of HAS, commonly called adaptive bitrate selection (ABR), is currently receiving a lot of attention from both industry and academia. However, most of the research efforts concentrate on a specific aspect or a particular methodology without considering the overall context. This paper presents a comprehensive survey of the most significant research activities in the area of client-side HTTP-based adaptive video streaming. It starts by decomposing the ABR module into three subcomponents, namely: resource estimation function, chunk request scheduling, and adaptation module. Each subcomponent encapsulates a particular function that is vital to the operation of an ABR scheme. A review of each of the subcomponents and how they interact with each other is presented. Furthermore, those external factors that are known to have a direct impact on the performance of an ABR module, such as content nature, CDN, and context, are discussed. In conclusion, this paper provides an extensive reference for further research in the field.

Proceedings ArticleDOI
18 Mar 2017
TL;DR: This work proposes using layered encoding for 360-degree video to improve QoE by reducing the probability of video freezes and the latency of response to the user head movements, which reduces the storage requirements significantly and improves in-network cache performance.
Abstract: Virtual reality and 360-degree video streaming are growing rapidly; however, streaming 360-degree video is very challenging due to high bandwidth requirements. To address this problem, the video quality is adjusted according to the user viewport prediction. High quality video is only streamed for the user viewport, reducing the overall bandwidth consumption. Existing solutions use shallow buffers limited by the accuracy of viewport prediction. Therefore, playback is prone to video freezes which are very destructive for the Quality of Experience(QoE). We propose using layered encoding for 360-degree video to improve QoE by reducing the probability of video freezes and the latency of response to the user head movements. Moreover, this scheme reduces the storage requirements significantly and improves in-network cache performance.

Proceedings ArticleDOI
19 Oct 2017
TL;DR: A generic theoretical model is proposed to find out the optimal set of quality-variable video versions based on traces of head positions of users watching a 360-degree video, and a simplified version of the model with two quality levels and restricted shapes for the QER is solved.
Abstract: With the decreasing price of Head-Mounted Displays (HMDs), 360-degree videos are becoming popular. The streaming of such videos through the Internet with state of the art streaming architectures requires, to provide high immersion feeling, much more bandwidth than the median user's access bandwidth. To decrease the need for bandwidth consumption while providing high immersion to users, scientists and specialists proposed to prepare and encode 360-degree videos into quality-variable video versions and to implement viewport-adaptive streaming. Quality-variable versions are different versions of the same video with non-uniformly spread quality: there exists some so-called Quality Emphasized Regions (QERs). With viewport-adaptive streaming the client, based on head movement prediction, downloads the video version with the high quality region closer to where the user will watch. In this paper we propose a generic theoretical model to find out the optimal set of quality-variable video versions based on traces of head positions of users watching a 360-degree video. We propose extensions to adapt the model to popular quality-variable version implementations such as tiling and offset projection. We then solve a simplified version of the model with two quality levels and restricted shapes for the QER. With this simplified model, we show that an optimal set of four quality-variable video versions prepared by a streaming server, together with a perfect head movement prediction, allow for 45% bandwidth savings to display video with the same average quality as state of the art solutions or allows an increase of 102% of the displayed quality for the same bandwidth budget.

Journal ArticleDOI
TL;DR: This paper develops a priority-based channel allocation scheme to assign channels to the mobile stations based on their QoE requirements, and proposes a handoff management technique to overcome the interruptions caused by the handoff.
Abstract: Cognitive radio (CR) is among the promising solutions for overcoming the spectrum scarcity problem in the forthcoming fifth-generation (5G) cellular networks, whereas mobile stations are expected to support multimode operations to maintain connectivity to various radio access points. However, particularly for multimedia services, because of the time-varying channel capacity, the random arrivals of legacy users, and the on-negligible delay caused by spectrum handoff, it is challenging to achieve seamless streaming leading to minimum quality of experience (QoE) degradation. The objective of this paper is to manage spectrum handoff delays by allocating channels based on the user QoE expectations, minimizing the latency, providing seamless multimedia service, and improving QoE. First, to minimize the handoff delays, we use channel usage statistical information to compute the channel quality. Based on this, the cognitive base station maintains a ranking index of the available channels to facilitate the cognitive mobile stations. Second, to enhance channel utilization, we develop a priority-based channel allocation scheme to assign channels to the mobile stations based on their QoE requirements. Third, to minimize handoff delays, we employ the hidden markov model (HMM) to predict the state of the future time slot. However, due to sensing errors, the scheme proactively performs spectrum sensing and reactively acts handoffs. Fourth, we propose a handoff management technique to overcome the interruptions caused by the handoff. In such a way that, when a handoff is predicted, we use scalable video coding to extract the base layer and transmit it during a certain interval time before handoff occurrence to be shown during handoff delays, hence providing seamless service. Our simulation results highlight the performance gain of the proposed framework in terms of channel utilization and received video quality.

Journal ArticleDOI
TL;DR: The proposed ELBA scheme is able to effectively leverage the wireless channel diversity and video frame priority for enabling energy-minimized quality-guaranteed streaming to multihomed devices within imposed deadline and demonstrates the performance advantages over existing bandwidth aggregation schemes in energy conservation, video quality, and end-to-end delay.
Abstract: The spectrum limitation of single wireless networks prompts the bandwidth aggregation of heterogeneous access medium (e.g., LTE and Wi-Fi) to support high-quality real-time video services. Energy consumption of mobile devices is of vital significance to provide user-satisfied multimedia streaming applications. However, it is challenging to develop an energy-efficient bandwidth aggregation scheme with regard to the stringent delay and quality constraints imposed by wireless video transmission. To address the critical problem, this paper presents an Energy-quaLity aware Bandwidth Aggregation (ELBA) scheme. First, we develop an analytical framework to model the delay-constrained energy-quality tradeoff for multipath video transmission over heterogeneous wireless networks. Second, we propose a bandwidth aggregation framework that integrates energy-minimized rate adaptation, delay-constrained unequal protection, and quality-aware packet distribution. The proposed ELBA scheme is able to effectively leverage the wireless channel diversity and video frame priority for enabling energy-minimized quality-guaranteed streaming to multihomed devices within imposed deadline. We conduct the performance evaluation through both experiments over real wireless networks and extensive emulations in Exata platform. Experimental results demonstrate the performance advantages of ELBA over existing bandwidth aggregation schemes in energy conservation, video quality, and end-to-end delay.

Journal ArticleDOI
TL;DR: A novel automated and computationally efficient video assessment method that enables accurate real-time (online) analysis of delivered quality in an adaptable and scalable manner and is flexible and dynamically adaptable to new content and scalable with the number of videos.
Abstract: Video content providers put stringent requirements on the quality assessment methods realized on their services. They need to be accurate, real-time, adaptable to new content, and scalable as the video set grows. In this letter, we introduce a novel automated and computationally efficient video assessment method. It enables accurate real-time (online) analysis of delivered quality in an adaptable and scalable manner. Offline deep unsupervised learning processes are employed at the server side and inexpensive no-reference measurements at the client side. This provides both real-time assessment and performance comparable to the full reference counterpart, while maintaining its no-reference characteristics. We tested our approach on the LIMP Video Quality Database (an extensive packet loss impaired video set) obtaining a correlation between $78\%$ and $91\%$ to the FR benchmark (the video quality metric). Due to its unsupervised learning essence, our method is flexible and dynamically adaptable to new content and scalable with the number of videos.

Journal ArticleDOI
TL;DR: The proposed approximation has nearly the same arithmetic complexity and hardware requirement as those of recently proposed related methods, but involves significantly less error energy and offers better peak signal-to-noise ratio than the others when DCTs of length more than 8 are used.
Abstract: An approximate kernel for the discrete cosine transform (DCT) of length 4 is derived from the 4-point DCT defined by the High Efficiency Video Coding (HEVC) standard and used for the computation of DCT and inverse DCT (IDCT) of power-of-two lengths. There are two reasons for considering the DCT of length 4 as the basic module. First, it allows computation of DCTs of lengths 4, 8, 16, and 32 prescribed by the HEVC. Second, the DCTs generated by the 4-point DCT not only involve lower complexity, but also offer better compression performance. Fully parallel and area-constrained architectures for the proposed approximate DCT are proposed to have flexible tradeoff between the area and time complexities. In addition, a reconfigurable architecture is proposed where an 8-point DCT can be used in place of a pair of 4-point DCTs. Using the same reconfiguration scheme, a 32-point DCT could be configured for parallel computation of two 16-point DCTs or four 8-point DCTs or eight 4-point DCTs. The proposed reconfigurable design can support real-time coding for high-definition video sequences in the 8k ultrahigh-definition television format ( $7680\times 4320$ at 30 frames/s). A unified forward and inverse transform architecture is also proposed where the hardware complexity is reduced by sharing hardware between the DCT and IDCT computations. The proposed approximation has nearly the same arithmetic complexity and hardware requirement as those of recently proposed related methods, but involves significantly less error energy and offers better peak signal-to-noise ratio than the others when DCTs of length more than 8 are used. A detailed comparison of the complexity, energy efficiency, and compression performance of different DCT approximation schemes for video coding is also presented. It is shown that the proposed approximation provides a better compressed-image quality than other approximate DCTs. The proposed method can perform HEVC-compliant video coding with marginal degradation of video quality and a slight increase the in bit rate, with a fraction of computational complexity of the latter.

Proceedings ArticleDOI
28 Nov 2017
TL;DR: POI360 is designed, a portable interactive 360° video telephony system that jointly investigates both panoramic video compression and responsive video stream rate control, and designs an adaptive compression scheme, which dynamically adjusts the compression strategy to stabilize the video quality within ROI under various user input and network condition.
Abstract: Panoramic or 360° video streaming has been supported by a wide range of content providers and mobile devices. Yet existing work primarily focused on streaming on-demand 360° videos stored on servers. In this paper, we examine a more challenging problem: Can we stream real-time interactive 360° videos across existing LTE cellular networks, so as to trigger new applications such as ubiquitous 360° video chat and panoramic outdoor experience sharing? To explore the feasibility and challenges underlying this vision, we design POI360, a portable interactive 360° video telephony system that jointly investigates both panoramic video compression and responsive video stream rate control. For the challenge that the legacy spatial compression algorithms for 360° video suffer from severe quality fluctuations as the user changes her region-of-interest (ROI), we design an adaptive compression scheme, which dynamically adjusts the compression strategy to stabilize the video quality within ROI under various user input and network condition. In addition, to meet the responsiveness requirement of panoramic video telephony, we leverage the diagnostic statistics on commodity phones to promptly detect cellular link congestion, hence significantly boosting the rate control responsiveness. Extensive field tests for our real-time POI360 prototype validate its effectiveness in enabling panoramic video telephony over the highly dynamic cellular networks.

Journal ArticleDOI
TL;DR: This work proposes a first of a kind continuous QoE prediction engine based on a nonlinear autoregressive model with exogenous outputs that is driven by an objective measure of perceptual video quality, rebuffering-aware information, and aQoE memory descriptor that accounts for recency.
Abstract: Streaming video data accounts for a large portion of mobile network traffic Given the throughput and buffer limitations that currently affect mobile streaming, compression artifacts and rebuffering events commonly occur Being able to predict the effects of these impairments on perceived video quality of experience (QoE) could lead to improved resource allocation strategies enabling the delivery of higher quality video Toward this goal, we propose a first of a kind continuous QoE prediction engine Prediction is based on a nonlinear autoregressive model with exogenous outputs Our QoE prediction model is driven by three QoE-aware inputs: An objective measure of perceptual video quality, rebuffering-aware information, and a QoE memory descriptor that accounts for recency We evaluate our method on a recent QoE dataset containing continuous time subjective scores

Posted Content
TL;DR: An effort to build a large-scale JND-based coded video quality dataset called the Video set, which is an acronym for “Video Subject Evaluation Test (SET)”, and the properties of collected JND data are described.
Abstract: A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., $1920 \times 1080$, $1280 \times 720$, $960 \times 540$ and $640 \times 360$). For each of the 880 video clips, we encode it using the H.264 codec with $QP=1, \cdots, 51$ and measure the first three JND points with 30+ subjects. The dataset is called the "VideoSet", which is an acronym for "Video Subject Evaluation Test (SET)". This work describes the subjective test procedure, detection and removal of outlying measured data, and the properties of collected JND data. Finally, the significance and implications of the VideoSet to future video coding research and standardization efforts are pointed out. All source/coded video clips as well as measured JND data included in the VideoSet are available to the public in the IEEE DataPort.

Proceedings ArticleDOI
23 Oct 2017
TL;DR: This paper performs rigorous analysis of 1300 VR head traces and proposes a multicast DASH-based tiled streaming solution, including a new tile weighting approach and a rate adaptation algorithm, to be utilized in mobile networks that support multicast such as LTE.
Abstract: Streaming virtual reality (VR) content is becoming increasingly popular. Advances in VR technologies now allow providing users with an immersive experience by live streaming popular events, such as the Super Bowl, in the form of 360-degree videos. Such services are highly interactive and impose substantial load on the network, especially cellular networks with inconsistent link capacities. In this paper, we perform rigorous analysis of 1300 VR head traces and propose a multicast DASH-based tiled streaming solution, including a new tile weighting approach and a rate adaptation algorithm, to be utilized in mobile networks that support multicast such as LTE. Our proposed solution weighs video tiles based on user's viewports, divides users into subgroups based on their channel conditions and tile weights, and determines the bitrate for each tile in each subgroup. Tiles in the viewports of users are assigned the highest bitrate, while other tiles are assigned bitrates proportional to the probability of users changing their viewports to include those tiles. We compare the proposed solution against the closest ones in the literature using simulated LTE networks and show that it substantially outperforms them. For example, it assigns up to 46% higher video bitrates to video tiles in the users' viewports than current approaches which substantially improves the video quality experienced by the users, without increasing the total load imposed on the network.

Proceedings ArticleDOI
04 Jul 2017
TL;DR: It is found that most of the objective quality measures are well correlated with subjective quality, and among the evaluated quality measures, PSNR is shown to be the most appropriate for 360 video communications.
Abstract: 360 videos are becoming more and more popular on video streaming platforms. However, a good quality metric for 360 videos is still an open issue. In this work, we investigate both objective and subjective quality metrics for 360 videos. The goals are to understand the perceived quality range provided by existing mobile 360 videos and, especially, to identify appropriate objective quality metrics for 360 video communications. To that end, a subjective test is conducted in this study. Then, the relationship between objective quality and subjective quality is investigated. Especially, ten objective quality measures are computed, considering the coding distortion measurement, cross-format distortion measurement, and end-to-end distortion measurement. It is found that most of the objective quality measures are well correlated with subjective quality. Also, among the evaluated quality measures, PSNR is shown to be the most appropriate for 360 video communications.

Posted Content
TL;DR: This work proposes Video Assessment of TemporaL Artifacts and Stalls (Video ATLAS): a machine learning framework where a number of QoE-related features are combined, including objective quality features, rebuffering-aware features and memory-driven features to makeQoE predictions.
Abstract: Mobile streaming video data accounts for a large and increasing percentage of wireless network traffic. The available bandwidths of modern wireless networks are often unstable, leading to difficulties in delivering smooth, high-quality video. Streaming service providers such as Netflix and YouTube attempt to adapt their systems to adjust in response to these bandwidth limitations by changing the video bitrate or, failing that, allowing playback interruptions (rebuffering). Being able to predict end user' quality of experience (QoE) resulting from these adjustments could lead to perceptually-driven network resource allocation strategies that would deliver streaming content of higher quality to clients, while being cost effective for providers. Existing objective QoE models only consider the effects on user QoE of video quality changes or playback interruptions. For streaming applications, adaptive network strategies may involve a combination of dynamic bitrate allocation along with playback interruptions when the available bandwidth reaches a very low value. Towards effectively predicting user QoE, we propose Video Assessment of TemporaL Artifacts and Stalls (Video ATLAS): a machine learning framework where we combine a number of QoE-related features, including objective quality features, rebuffering-aware features and memory-driven features to make QoE predictions. We evaluated our learning-based QoE prediction model on the recently designed LIVE-Netflix Video QoE Database which consists of practical playout patterns, where the videos are afflicted by both quality changes and rebuffering events, and found that it provides improved performance over state-of-the-art video quality metrics while generalizing well on different datasets. The proposed algorithm is made publicly available at this http URL release_v2.rar.