scispace - formally typeset
Search or ask a question

Showing papers presented at "ACM SIGMM Conference on Multimedia Systems in 2020"


Proceedings ArticleDOI
27 May 2020
TL;DR: The proposed dataset is the first to provide complementary 4K sequences up to 120 fps and is therefore particularly valuable for cutting-edge multimedia applications and should be included in subjective and objective quality assessments of next-generation VVC codecs.
Abstract: This paper provides an overview of our open Ultra Video Group (UVG) dataset that is composed of 16 versatile 4K (3840×2160) test video sequences. These natural sequences were captured either at 50 or 120 frames per second (fps) and stored online in raw 8-bit and 10-bit 4:2:0 YUV formats. The dataset is published on our website (ultravideo.cs.tut.fi) under a non-commercial Creative Commons BY-NC license. In this paper, all UVG sequences are described in detail and characterized by their spatial and temporal perceptual information, rate-distortion behavior, and coding complexity with the latest HEVC/H.265 and VVC/H.266 reference video codecs. The proposed dataset is the first to provide complementary 4K sequences up to 120 fps and is therefore particularly valuable for cutting-edge multimedia applications. Our evaluations also show that it comprehensively complements the existing 4K test set in VVC standardization, so we recommend including it in subjective and objective quality assessments of next-generation VVC codecs.

231 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: This paper presents a 5G trace dataset collected from a major Irish mobile operator, composed of client-side cellular key performance indicators (KPIs) comprised of channel-related metrics, context- related metrics, cell-related metric and throughput information, which is the first publicly available dataset that contains throughput, channel and context information for 5G networks.
Abstract: In this paper, we present a 5G trace dataset collected from a major Irish mobile operator. The dataset is generated from two mobility patterns (static and car), and across two application patterns (video streaming and file download). The dataset is composed of client-side cellular key performance indicators (KPIs) comprised of channel-related metrics, context-related metrics, cell-related metrics and throughput information. These metrics are generated from a well-known non-rooted Android network monitoring application, G-NetTrack Pro. To the best of our knowledge, this is the first publicly available dataset that contains throughput, channel and context information for 5G networks. To supplement our real-time 5G production network dataset, we also provide a 5G large scale multi-cell ns-3 simulation framework. The availability of the 5G/mmwave module for the ns-3 mmwave network simulator provides an opportunity to improve our understanding of the dynamic reasoning for adaptive clients in 5G multi-cell wireless scenarios. The purpose of our framework is to provide additional information (such as competing metrics for users connected to the same cell), thus providing otherwise unavailable information about the base station (eNodeB or eNB) environment and scheduling principle, to end user. Our framework permits other researchers to investigate this interaction through the generation of their own synthetic datasets.

95 citations


Proceedings ArticleDOI
Liyang Sun1, Yixiang Mao1, Tongyu Zong1, Yong Liu1, Yao Wang1 
27 May 2020
TL;DR: The idea of "flocking" is used to improve the performance of both prediction of field of view (FoV) and caching on the edge servers for live 360-degree video streaming and a collaborative FoV prediction scheme where the actual FoV information of users in the front of the flock are utilized to predict of users behind them.
Abstract: Streaming of live 360-degree video allows users to follow a live event from any view point and has already been deployed on some commercial platforms. However, the current systems can only stream the video at relatively low-quality because the entire 360-degree video is delivered to the users under limited bandwidth. In this paper, we propose to use the idea of "flocking" to improve the performance of both prediction of field of view (FoV) and caching on the edge servers for live 360-degree video streaming. By assigning variable playback latencies to all the users in a streaming session, a "streaming flock" is formed and led by low latency users in the front of the flock. We propose a collaborative FoV prediction scheme where the actual FoV information of users in the front of the flock are utilized to predict of users behind them. We further propose a network condition aware flocking strategy to reduce the video freeze and increase the chance for collaborative FoV prediction on all users. Flocking also facilitates caching as video tiles downloaded by the front users can be cached by an edge server to serve the users at the back of the flock, thereby reducing the traffic in the core network. We propose a latency-FoV based caching strategy and investigate the potential gain of applying transcoding on the edge server. We conduct experiments using real-world user FoV traces and WiGig network bandwidth traces to evaluate the gains of the proposed strategies over benchmarks. Our experimental results demonstrate that the proposed streaming system can roughly double the effective video rate, which is the video rate inside a user's actual FoV, compared to the prediction only based on the user's own past FoV trajectory, while reducing video freeze. Furthermore, edge caching can reduce the traffic in the core network by about 80%, which can be increased to 90% with transcoding on edge server.

40 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: This paper proposes a latency compensation technique using game adaptation that mitigates the influence of delay on QoE and shows that the majority of the proposed adaptation techniques lead to significant improvements in the cloud gaming QoEs.
Abstract: Cloud Gaming (CG) is an immersive multimedia service that promises many benefits. In CG, the games are rendered in a cloud server, and the resulted scenes are streamed as a video sequence to the client. Using CG users are not forced to update their gaming hardware frequently, and available games can be played on any operating system or suitable device. However, cloud gaming requires a reliable and low-latency network, which makes it a very challenging service. Transmission latency strongly affects the playability of a cloud game and consequently reduces the users' Quality of Experience (QoE). In this paper, we propose a latency compensation technique using game adaptation that mitigates the influence of delay on QoE. This technique uses five game characteristics for the adaptation. These characteristics, in addition to an Aim-assistance technique, were implemented in four games for evaluation. A subjective study using 194 participants was conducted using a crowdsourcing approach. The results showed that the majority of the proposed adaptation techniques lead to significant improvements in the cloud gaming QoE.

31 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: PMData is a dataset that combines traditional lifelogging data with sports-activity data that enables the development of novel data analysis and machine-learning applications where, for instance, additional sports data is used to predict and analyze everyday developments, like a person's weight and sleep patterns.
Abstract: In this paper, we present PMData: a dataset that combines traditional lifelogging data with sports-activity data. Our dataset enables the development of novel data analysis and machine-learning applications where, for instance, additional sports data is used to predict and analyze everyday developments, like a person's weight and sleep patterns; and applications where traditional lifelog data is used in a sports context to predict athletes' performance. PMData combines input from Fitbit Versa 2 smartwatch wristbands, the PMSys sports logging smartphone application, and Google forms. Logging data has been collected from 16 persons for five months. Our initial experiments show that novel analyses are possible, but there is still room for improvement.

30 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: This paper provides a gaming video quality dataset that considers hardware accelerated engines for video compression using the H.264 standard, and builds two novel parametric-based models, a planning and a monitoring model, for gaming quality estimation.
Abstract: The gaming industry is one of the largest digital markets for decades and is steady developing as evident by new emerging gaming services such as gaming video streaming, online gaming, and cloud gaming. While the market is rapidly growing, the quality of these services depends strongly on network characteristics as well as resource management. With the advancement of encoding technologies such as hardware accelerated engines, fast encoding is possible for delay sensitive applications such as cloud gaming. Therefore, already existing video quality models do not offer a good performance for cloud gaming applications. Thus, in this paper, we provide a gaming video quality dataset that considers hardware accelerated engines for video compression using the H.264 standard. In addition, we investigate the performance of signal-based and parametric video quality models on the new gaming video dataset. Finally, we build two novel parametric-based models, a planning and a monitoring model, for gaming quality estimation. Both models are based on perceptual video quality dimensions and can be used to optimize the resource allocation of gaming video streaming services.

29 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: A novel algorithm for bitrate adaptation in HTTP Adaptive Streaming (HAS), based on Online Convex Optimization (OCO), is shown to provide a robust adaptation strategy which, unlike most of the state-of-the-art techniques, does not require parameter tuning, channel model assumptions, throughput estimation or application-specific adjustments.
Abstract: Achieving low-latency is paramount for live streaming scenarios, that are now-days becoming increasingly popular. In this paper, we propose a novel algorithm for bitrate adaptation in HTTP Adaptive Streaming (HAS), based on Online Convex Optimization (OCO). The proposed algorithm, named Learn2Adapt-LowLatency (L2A-LL), is shown to provide a robust adaptation strategy which, unlike most of the state-of-the-art techniques, does not require parameter tuning, channel model assumptions, throughput estimation or application-specific adjustments. These properties make it very suitable for users who typically experience fast variations in channel characteristics. The proposed algorithm has been implemented in DASH-IF's reference video player (dash.js) and has been made publicly available for research purposes at [22]. Real experiments show that L2A-LL reduces latency significantly, while providing a high average streaming bit-rate, without impairing the overall Quality of Experience (QoE); a result that is independent of the channel and application scenarios. The presented optimization framework, is robust due to its design principle; its ability to learn and allows for modular QoE prioritization, while it facilitates easy adjustments to consider applications beyond live streaming and/or multiple user classes.

27 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: The experimental results show that the proposed coding scheme makes Kvazaar 125 times as fast as the HEVC reference software HM on the Intel Xeon E5-2699 v4 22-core processor at the additional coding cost of only 2.4% on average.
Abstract: High Efficiency Video Coding (HEVC) is the key to economic video transmission and storage in the current multimedia applications but tackling its inherent computational complexity requires powerful video codec implementations. This paper presents Kvazaar 2.0 HEVC encoder that is the new release of our academic open-source software (github.com/ultravideo/kvazaar). Kvazaar 2.0 introduces novel inter coding functionality that is built on advanced rate-distortion optimization (RDO) scheme and speeded up with several early termination mechanisms, SIMD-optimized coding tools, and parallelization strategies. Our experimental results show that the proposed coding scheme makes Kvazaar 125 times as fast as the HEVC reference software HM on the Intel Xeon E5-2699 v4 22-core processor at the additional coding cost of only 2.4% on average. In constant quantization parameter (QP) coding, Kvazaar is also 3 times as fast as the respective preset of the well-known practical x265 HEVC encoder and is still able to attain 10.7% lower average bit rate than x265 for the same objective visual quality. These results indicate that Kvazaar has become one of the leading open-source HEVC encoders in practical high-efficiency video coding.

23 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: The design and development of an architecture intended for volumetric videoconferencing that provides a highly realistic 3D representation of the participants, based on pointclouds are introduced.
Abstract: The advent of affordable 3D capture and display hardware is making volumetric videoconferencing feasible. This technology increases the immersion of the participants, breaking the flat restriction of 2D screens, by allowing them to collaborate and interact in shared virtual reality spaces. In this paper we introduce the design and development of an architecture intended for volumetric videoconferencing that provides a highly realistic 3D representation of the participants, based on pointclouds. A pointcloud representation is suitable for real-time applications like video conferencing, due to its low-complexity and because it does not need a time consuming reconstruction process. As transport protocol we selected low latency DASH, due to its popularity and client-based adaptation mechanisms for tiling. This paper presents the architectural design, details the implementation, and provides some referential results. The demo will showcase the system in action, enabling volumetric videoconferencing using pointclouds.

22 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: This paper re-visit and extend several important components in adaptive streaming systems to enhance the low-latency performance, which includes bitrate adaptation, playback control and throughput measurement modules.
Abstract: Live streaming remains a challenge in the adaptive streaming space due to the stringent requirements for not just quality and rebuffering, but also latency. Many solutions have been proposed to tackle streaming in general, but only few have looked into better catering to the more challenging low-latency live streaming scenarios. In this paper, we re-visit and extend several important components (collectively called Low-on-Latency, LoL) in adaptive streaming systems to enhance the low-latency performance. LoL includes bitrate adaptation (both heuristic and learning-based), playback control and throughput measurement modules.

21 citations


Proceedings ArticleDOI
27 May 2020
TL;DR: CAdViSE, the Cloud-based Adaptive Video Streaming Evaluation framework for the automated testing of adaptive media players is introduced, which aims to demonstrate a test environment which can be instantiated in a cloud infrastructure, examines multiple media players with different network attributes at defined points of the experiment time, and concludes the evaluation with visualized statistics and insights into the results.
Abstract: Attempting to cope with fluctuations of network conditions in terms of available bandwidth, latency and packet loss, and to deliver the highest quality of video (and audio) content to users, research on adaptive video streaming has attracted intense efforts from the research community and huge investments from technology giants. How successful these efforts and investments are, is a question that needs precise measurements of the results of those technological advancements. HTTP-based Adaptive Streaming (HAS) algorithms, which seek to improve video streaming over the Internet, introduce video bitrate adaptivity in a way that is scalable and efficient. However, how each HAS implementation takes into account the wide spectrum of variables and configuration options, brings a high complexity to the task of measuring the results and visualizing the statistics of the performance and quality of experience. In this paper, we introduce CAdViSE, our Cloud-based Adaptive Video Streaming Evaluation framework for the automated testing of adaptive media players. The paper aims to demonstrate a test environment which can be instantiated in a cloud infrastructure, examines multiple media players with different network attributes at defined points of the experiment time, and finally concludes the evaluation with visualized statistics and insights into the results.

Proceedings ArticleDOI
Craig Gutterman1, Brayn Fridman1, Trey Gilliland1, Yusheng Hu1, Gil Zussman1 
27 May 2020
TL;DR: A new adaptive bitrate (ABR) scheme, Stallion, for STAndard Low-LAtency vIdeo cONtrol, which shows 1.8x increase in bitrate, and 4.3x reduction in the number of stalls.
Abstract: As video traffic continues to dominate the Internet, interest in near-second low-latency streaming has increased. Existing low-latency streaming platforms rely on using tens of seconds of video in the buffer to offer a seamless experience. Striving for near-second latency requires the receiver to make quick decisions regarding the download bitrate and the playback speed. To cope with the challenges, we design a new adaptive bitrate (ABR) scheme, Stallion, for STAndard Low-LAtency vIdeo cONtrol. Stallion uses a sliding window to measure the mean and standard deviation of both the bandwidth and latency. We evaluate Stallion and compare it to the standard DASH DYNAMIC algorithm over a variety of networking conditions. Stallion shows 1.8x increase in bitrate, and 4.3x reduction in the number of stalls.

Proceedings ArticleDOI
27 May 2020
TL;DR: UbiPoint is introduced, a freehand mid-air interaction technique that uses the monocular camera embedded in smartglasses to detect the user's hand without relying on gloves, markers, or sensors, enabling intuitive and non-intrusive interaction.
Abstract: Throughout the past decade, numerous interaction techniques have been designed for mobile and wearable devices. Among these devices, smartglasses mostly rely on hardware interfaces such as touchpad and buttons, which are often cumbersome and counterintuitive to use. Furthermore, smartglasses feature cheap and low-power hardware preventing the use of advanced pointing techniques. To overcome these issues, we introduce UbiPoint, a freehand mid-air interaction technique. UbiPoint uses the monocular camera embedded in smartglasses to detect the user's hand without relying on gloves, markers, or sensors, enabling intuitive and non-intrusive interaction. We introduce a computationally fast and light-weight algorithm for fingertip detection, which is especially suited for the limited hardware specifications and the short battery life of smartglasses. UbiPoint processes pictures at a rate of 20 frames per second with high detection accuracy - no more than 6 pixels deviation. Our evaluation shows that UbiPoint, as a mid-air non-intrusive interface, delivers a better experience for users and smart glasses interactions, with users completing typical tasks 1.82 times faster than when using the original hardware.

Proceedings ArticleDOI
27 May 2020
TL;DR: An open software that enables to evaluate heterogeneous head motion prediction methods on various common grounds and provides the description of the algorithms used to compute the saliency maps either estimated from the raw video content or from the users' statistics.
Abstract: The streaming transmissions of 360° videos is a major challenge for the development of Virtual Reality, and require a reliable head motion predictor to identify which region of the sphere to send in high quality and save data rate. Different head motion predictors have been proposed recently. Some of these works have similar evaluation metrics or even share the same dataset, however, none of them compare with each other. In this article we introduce an open software that enables to evaluate heterogeneous head motion prediction methods on various common grounds. The goal is to ease the development of new head/eye motion prediction methods. We first propose an algorithm to create a uniform data structure from each of the datasets. We also provide the description of the algorithms used to compute the saliency maps either estimated from the raw video content or from the users' statistics. We exemplify how to run existing approaches on customizable settings, and finally present the targeted usage of our open framework: how to train and evaluate a new prediction method, and compare it with existing approaches and baselines in common settings. The entire material (code, datasets, neural network weights and documentation) is publicly available.

Proceedings ArticleDOI
27 May 2020
TL;DR: This demo shows multi-viewpoints and overlays with an adaptive bit rate viewport-dependent streaming framework that allows for user interaction, such as switching between viewpoints and enabling/disabling of the overlays.
Abstract: The second edition of the Omnidirectional MediA Format (OMAF) standard developed by the Moving Picture Experts Group (MPEG) defines two major features: overlays and multi-viewpoints. Overlays help in enhancing the immersive experience by providing additional information about the omnidirectional background video content. The multi-viewpoint feature enables the content to be captured/experienced from multiple spatial locations. These two powerful features along with interactivity, dispense the content provider with new possibilities of storytelling (for example, non-linear) using immersive media.In this demo, we show multi-viewpoints and overlays with an adaptive bit rate viewport-dependent streaming framework. The framework uses tiles for multi-viewpoints which, along with overlays, are encoded at multiple qualities using the HEVC Main 10 profile. The encoded tiles of multi-viewpoints and overlay videos are encapsulated in ISO Base Media File Format (ISOBMFF) and fragmented as Dynamic Adaptive Streaming over HTTP (MPEG-DASH) segments. The DASH segments are then fetched by the OMAF player based on the user's viewing conditions and rendered on the user device. Additionally, the framework allows for user interaction, such as switching between viewpoints and enabling/disabling of the overlays.

Proceedings ArticleDOI
27 May 2020
TL;DR: A system has been developed to detect fingerspelling in ASL and Bengali Sign Language using (data) gloves containing some suitably positioned sensors and shows a promising accuracy of (up to) 96%.
Abstract: Sign language is a method of communication primarily used by the hearing impaired and mute persons. In this method, letters and words are expressed by hand gestures. In fingerspelling, meaningful words are constructed by signaling multiple letters in a sequence. In this paper, a system has been developed to detect fingerspelling in American Sign Language (ASL) and Bengali Sign Language (BdSL) using (data) gloves containing some suitably positioned sensors. The methodologies employed can be used even in resource-constrained environments. The system is capable of accurately detecting both static and dynamic symbols in the alphabets. The system shows a promising accuracy of (up to) 96%. Furthermore, this work presents a novel approach to perform a continuous assessment of symbols from a stream of run-time data.

Proceedings ArticleDOI
27 May 2020
TL;DR: This application contains generic interfaces that allow for easy deployment of various augmented/mixed reality clients using the same server implementation and uses 6DoF head movement prediction techniques, WebRTC protocol and hardware video encoding to ensure low-latency in the processing chain.
Abstract: Volumetric video is an emerging technology for immersive representation of 3D spaces that captures objects from all directions using multiple cameras and creates a dynamic 3D model of the scene. However, processing volumetric content requires high amounts of processing power and is still a very demanding task for today's mobile devices. To mitigate this, we propose a volumetric video streaming system that offloads the rendering to a powerful cloud/edge server and only sends the rendered 2D view to the client instead of the full volumetric content. We use 6DoF head movement prediction techniques, WebRTC protocol and hardware video encoding to ensure low-latency in different parts of the processing chain. We demonstrate our system using both a browser-based client and a Microsoft HoloLens client. Our application contains generic interfaces that allow for easy deployment of various augmented/mixed reality clients using the same server implementation.

Proceedings ArticleDOI
27 May 2020
TL;DR: Experimental results demonstrate that the proposed method is promising and can bring some of the benefits of expensive hyperspectral cameras to the low-cost and pervasive RGB cameras, enabling many new applications and enhancing the performance of others.
Abstract: A hyperspectral camera captures a scene in many frequency bands across the spectrum, providing rich information and facilitating numerous applications. The potential of hyperspectral imaging has been established for decades. However, to date hyperspectral imaging has only seen success in specialized and large-scale industrial and military applications. This is mainly due to the high cost of hyperspectral cameras (upwards of $20K) and the complexity of the acquisition system which makes the technology out of reach for many commercial and end-user applications. In this paper, we propose a deep learning based approach to convert RGB image sequences taken by regular cameras to (partial) hyperspectral images. This can enable, for example, low-cost mobile phones to leverage the characteristics of hyperspectral images in implementing novel applications. We show the benefits of the conversion model by designing a vein localization and visualization application that traditionally uses hyperspectral images. Our application uses only RGB images and produces accurate results. Vein visualization is important for point-of-care medical applications. We collected hyperspectral data to validate the proposed conversion model. Experimental results demonstrate that the proposed method is promising and can bring some of the benefits of expensive hyperspectral cameras to the low-cost and pervasive RGB cameras, enabling many new applications and enhancing the performance of others. We also evaluate the vein visualization application and show its accuracy.

Proceedings ArticleDOI
27 May 2020
TL;DR: Two open-source software tools are presented that are able to measure and register resources consumption metrics for any Windows program and measure QoS metrics from DASH streaming sessions by running on top of TShark, if a non-secure HTTP connection is used.
Abstract: When designing and deploying multimedia systems, it is essential to accurately know about the necessary requirements and the Quality of Service (QoS) offered to the customers. This paper presents two open-source software tools that contribute to these key needs. The first tool is able to measure and register resources consumption metrics for any Windows program (i.e. process id), like the CPU, GPU and RAM usage. Unlike the Task Manager, which requires manual visual inspection for just a subset of these metrics, the developed tool runs on top of the Powershell to periodically measure these metrics, calculate statistics, and register them in log files. The second tool is able to measure QoS metrics from DASH streaming sessions by running on top of TShark, if a non-secure HTTP connection is used. For each DASH chunk, the tool registers: the round-trip time from request to download, the number of TCP segments and bytes, the effective bandwidth, the selected DASH representation, and the associated parameters in the MPD (e.g., resolution, bitrate). It also registers the MPD and the total amount of downloaded frames and bytes. The advantage of this second tool is that these metrics can be registered regardless of the player used, even from a device connected to the same network than the DASH player.

Proceedings ArticleDOI
27 May 2020
TL;DR: This paper develops component power models that provide online estimation of the power draw for each component involved in Adaptive Bitrate streaming, and quantifies the power breakdown in ABR streaming for both regular videos and the emerging 360° panoramic videos.
Abstract: Adaptive Bitrate (ABR) streaming is widely used in commercial video services. In this paper, we profile energy consumption of ABR streaming on mobile devices. This profiling is important, since the insights can help developing more energy-efficient ABR streaming pipelines and techniques. We first develop component power models that provide online estimation of the power draw for each component involved in ABR streaming. Using these models, we then quantify the power breakdown in ABR streaming for both regular videos and the emerging 360° panoramic videos. Our measurements validate the accuracy of the power models and provide a number of insights. We discuss use cases of the developed power models, and explore two energy reduction strategies for ABR streaming. Evaluation demonstrates that these simple strategies can provide up to 30% energy savings, with little degradation in viewing quality.

Proceedings ArticleDOI
27 May 2020
TL;DR: SALI360 is proposed that practically solves the mismatch by utilizing the characteristics of the human vision system (HVS) and improves viewers' quality of perception (QoP) while reducing content size with geometry-based 360° content encoding.
Abstract: In accordance with the recent enhancement of display technology, users demand a higher quality of streaming service, which escalates the bandwidth requirement. Considering the recent advent of high FPS (frame per second) 4K and 8K resolution 360° videos, such bandwidth concern further intensifies in 360° Virtual Reality (VR) content streaming even at a larger scale. However, the currently available bandwidth in most of the developed countries can hardly support the bandwidth required to stream such a scale of content. To address the mismatch between the demand on higher quality of streaming service and the saturated network improvement, we propose SALI360 that practically solves the mismatch by utilizing the characteristics of the human vision system (HVS). By pre-rendering a set of regions - where viewers are expected to fixate - on 360° VR content in higher quality than the other regions, SALI360 improves viewers' quality of perception (QoP) while reducing content size with geometry-based 360° content encoding. In our user experiment, we compare the performance of SALI360 to the existing 360° content-encoding techniques based on 20 viewers' head movement and eye gaze traces. To evaluate viewers' QoP, we propose FoL (field of look) that captures viewers' quality perception area in the visual focal field (8°) rather than a wide (around 90°) field of view (FoV). Results of our experimental 360° VR video streaming show that SALI360 achieves 53.3% of PSNR improvement in FoL, while gaining 9.3% of PSNR improvement in FoV. In addition, our subjective study on 93 participants verifies that SALI360 improves viewers' QoP in the 360° VR streaming service.

Proceedings ArticleDOI
27 May 2020
TL;DR: This paper addresses and analyzes the main problems raised by the use of the existing HTTP adaptive streaming algorithms and proposes two methodologies to make these algorithms more efficient in P2P networks regardless of the ABR algorithm used, one favoring overall QoE and one favoring P1P efficiency.
Abstract: As video traffic becomes the dominant part of the global Internet traffic, keeping a good quality of experience (QoE) becomes more challenging. To improve QoE, HTTP adaptive streaming with various adaptive bitrate (ABR) algorithms has been massively deployed for video delivery. Based on their required input information, these algorithms can be classified, into buffer-based, throughput-based or hybrid buffer-throughput algorithms. Nowadays, due to their low cost and high scalability, peer-to-peer (P2P) networks have become an efficient alternative for video delivery over the Internet, and many attempts at merging HTTP adaptive streaming and P2P networks have surfaced. However, the impact of merging these two approaches is still not clear enough, and interestingly, the existing HTTP adaptive streaming algorithms lack testing in a P2P environment. In this paper, we address and analyze the main problems raised by the use of the existing HTTP adaptive streaming algorithms in the context of P2P networks. We propose two methodologies to make these algorithms more efficient in P2P networks regardless of the ABR algorithm used, one favoring overall QoE and one favoring P2P efficiency. Additionally, we propose two new metrics to quantify the P2P efficiency for ABR delivery over P2P.

Proceedings ArticleDOI
27 May 2020
TL;DR: The GPAC open-source multimedia framework as discussed by the authors has recently undergone a major re-architecture to offer developers and end users a completely configurable media pipeline in a simple way, review the core concepts of this new design, their reasoning and the new features they unlock.
Abstract: Modern multimedia frameworks mix a variety of functionalities, such as network inputs and outputs, multiplexing stacks, compression, uncompressed domain effects and scripting, and require realtime processing for live services. They usually end up becoming very difficult to apprehend for end users and/or third-party developers, with complex testing and maintenance. The GPAC open-source media framework is no exception here. After 15 years of development and experiences in interactive media content, the possibilities offered by the framework were heavily restrained by a fixed media pipeline approach, despite the large number of tools available in its code base. In this paper, we discuss the major re-architecture undergone by GPAC to offer developers and end users a completely configurable media pipeline in a simple way, review the core concepts of this new design, their reasoning and the new features they unlock. We show how various complex use cases can now simply be achieved and how the re-architecture improved GPAC stability, making it a first-class candidate for research, commercial and educational projects involving multimedia processing.

Proceedings ArticleDOI
27 May 2020
TL;DR: This paper proposes re-designed decoding tasks that parallelize the decoder using Load balanced task parallelization and CTU (Coding Tree Unit) based data parallelization, which overcomes the limitations of the existing parallelization techniques by fully utilizing the available CPU computation resource without compromising on the coding efficiency and the memory bandwidth.
Abstract: Versatile Video Coding (VVC) standard is currently being prepared as the latest video coding standard of the ITU-T and ISO/IEC. The primary goal of the VVC, expected to be finalized in 2020, is to further improve compression performance compared to its predecessor HEVC. The frame level, slice level or Wavefront parallel processing (WPP) existing in VTM (VVC Test Model) doesn't fully utilize the CPU capabilities available in today's multicore systems. Moreover, VTM decoder sequentially processes the decoding tasks. This design is not parallelization friendly. This paper proposes re-designed decoding tasks that parallelize the decoder using: 1. Load balanced task parallelization and 2. CTU (Coding Tree Unit) based data parallelization. The design overcomes the limitations of the existing parallelization techniques by fully utilizing the available CPU computation resource without compromising on the coding efficiency and the memory bandwidth. The parallelization of CABAC and the slice decoding tasks is based on a load sharing scheme, while parallelization of each sub-module of the slice decoding task uses CTU level data parallelization. The parallelization scheme may either remain restricted within an individual decoding task or utilize between task parallelization. Such parallelization techniques achieve real-time VVC decoding on multi-core CPUs, for bitstreams generated using VTM5.0 using Random-Access configuration. An overall average decoding time reduction of 88.97% (w.r.t. VTM5.0 decoder) is achieved for 4K sequences on a 10-core processor.

Proceedings ArticleDOI
27 May 2020
TL;DR: This work develops QuRate, a quality-aware and user-centric frame rate adaptation mechanism to tackle the power consumption issue in immersive video streaming and demonstrates that QuRate is capable of extending the smartphone battery life by up to 1.24X while maintaining the perceivable video quality during immersiveVideo streaming.
Abstract: Smartphones have recently become a popular platform for deploying the computation-intensive virtual reality (VR) applications, such as immersive video streaming (a.k.a., 360-degree video streaming). One specific challenge involving the smartphone-based head mounted display (HMD) is to reduce the potentially huge power consumption caused by the immersive video. To address this challenge, we first conduct an empirical power measurement study on a typical smartphone immersive streaming system, which identifies the major power consumption sources. Then, we develop QuRate, a quality-aware and user-centric frame rate adaptation mechanism to tackle the power consumption issue in immersive video streaming. QuRate optimizes the immersive video power consumption by modeling the correlation between the perceivable video quality and the user behavior. Specifically, QuRate builds on top of the user's reduced level of concentration on the video frames during view switching and dynamically adjusts the frame rate without impacting the perceivable video quality. We evaluate QuRate with a comprehensive set of experiments involving 5 smartphones, 21 users, and 6 immersive videos using empirical user head movement traces. Our experimental results demonstrate that QuRate is capable of extending the smartphone battery life by up to 1.24X while maintaining the perceivable video quality during immersive video streaming. Also, we conduct an Institutional Review Board (IRB)-approved subjective user study to further validate the minimum video quality impact caused by QuRate.

Proceedings ArticleDOI
27 May 2020
TL;DR: This paper proposes a hierarchical Software Defined Network (SDN) controller architecture to near-optimally allocate a gaming session to a DC while minimizing network delay and maximizing bandwidth utilization, and proposes the Online Convex Optimization (OCO) as a practical solution.
Abstract: Gaming on demand is an emerging service that combines techniques from Cloud Computing and Online Gaming. This new paradigm is garnering prominence in the gaming industry and leading to a new "anywhere and anytime" online gaming model. Despite its advantages, cloud gaming's Quality of Experience (QoE) is challenged by high and varying end-to-end communication delay. Since the significant part of the computational processing, including game rendering and video compression, is performed on the cloud, properly allocating game requests to the geographically distributed data centers (DCs) can lead to QoE improvements resulting from lower delays. In this paper, we propose a hierarchical Software Defined Network (SDN) controller architecture to near-optimally allocate a gaming session to a DC while minimizing network delay and maximizing bandwidth utilization. To do so, we formulate an optimization problem, and propose the Online Convex Optimization (OCO) as a practical solution. Simulation results indicate that the proposed method can provide close-to-optimal solutions, and outperforms classic offline techniques e.g. Lagrangean relaxation. In addition, the proposed model improves the bandwidth utilization of DCs, and reduces end-to-end delay and delay variation by gamers. As a byproduct, our proposed method also achieves better fairness among multiple competing players in comparison with existing methods.

Proceedings ArticleDOI
27 May 2020
TL;DR: This paper presents an open software, called BiQPS, using a Long-Short Term Memory (LSTM) network to predict the overall quality of HAS sessions, and it is found that biQPS outperforms four existing models.
Abstract: HTTP Adaptive Streaming (HAS) has become a popular solution for multimedia delivery nowadays. However, because of throughput fluctuations, video quality may be dramatically varying. Also, stalling events may occur during a streaming session, causing negative impacts on user experience. Therefore, a main challenge in HAS is how to evaluate the overall quality of a session taking into account the impacts of quality variations and stalling events. In this paper, we present an open software, called BiQPS, using a Long-Short Term Memory (LSTM) network to predict the overall quality of HAS sessions. The prediction is based on bitstream-level parameters, so it can be directly applied in practice. Through experiment results, it is found that BiQPS outperforms four existing models. Our software has been made available to the public at https://github.com/TranHuyen1191/BiQPS.

Proceedings ArticleDOI
27 May 2020
TL;DR: A broad investigation of video encoding efficiency for variable segment durations is provided and a measurement study evaluates the impact of segment duration variability on the performance of HAS using three adaptation heuristics and the dash.js reference implementation.
Abstract: HTTP Adaptive Streaming (HAS) is the de-facto standard for video delivery over the Internet. It enables dynamic adaptation of video quality by splitting a video into small segments and providing multiple quality levels per segment. So far, HAS services typically utilize a fixed segment duration. This reduces the encoding and streaming variability and thus allows a faster encoding of the video content and a reduced prediction complexity for adaptive bit rate algorithms. Due to the content-agnostic placement of I-frames at the beginning of each segment, additional encoding overhead is introduced. In order to mitigate this overhead, variable segment durations, which take encoder placed I-frames into account, have been proposed recently. Hence, a lower number of I-frames is needed, thus achieving a lower video bitrate without quality degradation. While several proposals exploiting variable segment durations exist, no comparative study highlighting the impact of this technique on coding efficiency and adaptive streaming performance has been conducted yet. This paper conducts such a holistic comparison within the adaptive video streaming eco-system. Firstly, it provides a broad investigation of video encoding efficiency for variable segment durations. Secondly, a measurement study evaluates the impact of segment duration variability on the performance of HAS using three adaptation heuristics and the dash.js reference implementation. Our results show that variable segment durations increased the Quality of Experience for 54% of the evaluated streaming sessions, while reducing the overall bitrate by 7% on average.

Proceedings ArticleDOI
27 May 2020
TL;DR: This work presents MANTIS, a time-shifted prefetching solution that prefetches content during off-peak periods of network connectivity, and develops an accurate prediction algorithm using a K-nearest neighbor classifier approach.
Abstract: The load on wireless cellular networks is not uniformly distributed through the day, and is significantly higher during peak periods. In this context, we present MANTIS, a time-shifted prefetching solution that prefetches content during off-peak periods of network connectivity. We specifically focus on YouTube given that it represents a significant portion of overall wireless data-usage. We make the following contributions: first, we collect and analyze a real-life dataset of YouTube watch history from 206 users comprised of over 1.8 million videos spanning over a 1-year period and present insights on a typical user's viewing behavior; second, we develop an accurate prediction algorithm using a K-nearest neighbor classifier approach; third, we evaluate the prefetching algorithm on two different datasets and show that MANTIS is able to reduce the traffic during peak periods by 34%; and finally, we develop a proof-of-concept prototype for MANTIS and perform a user study.

Proceedings ArticleDOI
27 May 2020
TL;DR: This paper introduces and releases publicly lip reading resources for Romanian language, and proposes two strong baselines via VGG-M and Inception-V4 state-of-the-art deep network architectures.
Abstract: Automatic lip reading is a challenging and important research topic as it allows to transcript visual-only recordings of a speaker into editable text. There are many useful applications of such technology, starting from the aid of hearing impaired people, to improving general automatic speech recognition. In this paper, we introduce and release publicly lip reading resources for Romanian language. Two distinct collections are proposed: (i) wild LRRo data is designed for an Internet in-the-wild, ad-hoc scenario, coming with more than 35 different speakers, 1.1k words, a vocabulary of 21 words, and more than 20 hours; (ii) lab LRRo data, addresses a lab controlled scenario for more accurate data, coming with 19 different speakers, 6.4k words, a vocabulary of 48 words, and more than 5 hours. This is the first resource available for Romanian lip reading and would serve as a pioneering foundation for this under-resourced language. Nevertheless, given the fact that word-level models are not strongly language dependent, these resources will also contribute to the general lip-reading task via transfer learning. To provide a validation and reference for future developments, we propose two strong baselines via VGG-M and Inception-V4 state-of-the-art deep network architectures.