Showing papers presented at "ACM SIGMM Conference on Multimedia Systems in 2021"

PDF

Open Access

Proceedings Article•DOI•

AliceVision Meshroom: An open-source 3D reconstruction pipeline

[...]

Carsten Griwodz¹, Simone Gasparini², Lilian Calvet², Pierre Gurdjos², Fabien Castan, Benoit Maujean, Gregoire De Lillo, Yann Lanthony - Show less +4 more•Institutions (2)

University of Oslo¹, University of Toulouse²

24 Jun 2021

TL;DR: Meshroom as discussed by the authors is a photogrammetry pipeline for reconstructing 3D scenes from a set of unordered images, which allows the user to customize the different pipelines to adjust them to their domain specific needs.

...read moreread less

Abstract: This paper introduces the Meshroom software and its underlying 3D computer vision framework AliceVision. This solution provides a photogrammetry pipeline to reconstruct 3D scenes from a set of unordered images. It also features other pipelines for fusing multi-bracketing low dynamic range images into high dynamic range, stitching multiple images into a panorama and estimating the motion of a moving camera. Meshroom's node-graph architecture allows the user to customize the different pipelines to adjust them to their domain specific needs. The user can interactively add other processing nodes to modify a pipeline, export intermediate data to analyze the result of the algorithms and easily compare the outputs given by different sets of parameters. The software package is released in open source and relies on open file formats. These features enable researchers to conveniently run the pipelines, access and visualize the data at each step, thus promoting the sharing and the reproducibility of the results.

...read moreread less

77 citations

Proceedings Article•DOI•

VRComm: an end-to-end web system for real-time photorealistic social VR communication

[...]

Simon Gunkel, Rick Hindriks, Karim El Assal, Hans Maarten Stokking, Sylvie Dijkstra-Soudarissanane, Frank B. ter Haar, Omar Aziz Niamut - Show less +3 more

15 Jul 2021

TL;DR: In this article, the authors present a VR communication framework that enables remote communication in virtual environments with real-time photorealistic user representation based on RGBD cameras and web browser clients, deployed on common off-the-shelf hardware devices.

...read moreread less

Abstract: Tools and platforms that enable remote communication and collaboration provide a strong contribution to societal challenges. Virtual meetings and conferencing, in particular, can help to reduce commutes and lower our ecological footprint, and can alleviate physical distancing measures in case of global pandemics. In this paper, we outline how to bridge the gap between common video conferencing systems and emerging social VR platforms to allow immersive communication in Virtual Reality (VR). We present a novel VR communication framework that enables remote communication in virtual environments with real-time photorealistic user representation based on colour-and-depth (RGBD) cameras and web browser clients, deployed on common off-the-shelf hardware devices. The paper's main contribution is threefold: (a) a new VR communication framework, (b) a novel approach for real-time depth data transmitting as a 2D grayscale for 3D user representation, including a central MCU-based approach for this new format and (c) a technical evaluation of the system with respect to processing delay, CPU and GPU usage.

...read moreread less

14 citations

Proceedings Article•DOI•

CrossRoI: cross-camera region of interest optimization for efficient real time video analytics at scale

[...]

Hongpeng Guo¹, Shuochao Yao², Zhe Yang¹, Qian Zhou¹, Klara Nahrstedt¹ - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, George Mason University²

15 Jul 2021

TL;DR: CrossRoI is presented, a resource-efficient system that enables real time video analytics at scale via harnessing the videos content associations and redundancy across a fleet of cameras to drastically reduce the communication and computation costs.

...read moreread less

Abstract: Video cameras are pervasively deployed in city scale for public good or community safety (i.e. traffic monitoring or suspected person tracking). However, analyzing large scale video feeds in real time is data intensive and poses severe challenges to today's network and computation systems. We present CrossRoI, a resource-efficient system that enables real time video analytics at scale via harnessing the videos content associations and redundancy across a fleet of cameras. CrossRoI exploits the intrinsic physical correlations of cross-camera viewing fields to drastically reduce the communication and computation costs. CrossRoI removes the repentant appearances of same objects in multiple cameras without harming comprehensive coverage of the scene. CrossRoI operates in two phases - an offline phase to establish cross-camera correlations, and an efficient online phase for real time video inference. Experiments on real-world video feeds show that CrossRoI achieves 42% ~ 65% reduction for network overhead and 25% ~ 34% reduction for response delay in real time video analytics applications with more than 99% query accuracy, when compared to baseline methods. If integrated with SotA frame filtering systems, the performance gains of CrossRoI reaches 50% ~ 80% (network overhead) and 33% ~ 61% (end-to-end delay).

...read moreread less

14 citations

Proceedings Article•DOI•

CDN and SDN Support and Player Interaction for HTTP Adaptive Video Streaming

[...]

Reza Farahani¹•Institutions (1)

Alpen-Adria-Universität Klagenfurt¹

24 Jun 2021

TL;DR: In this paper, the authors leverage the aforementioned modern networking paradigms and design network-assistance for/by HAS clients to improve HAS systems performance and CDN/network utilization.

...read moreread less

Abstract: Video streaming has become one of the most prevailing, bandwidth-hungry, and latency-sensitive Internet applications. HTTP Adaptive Streaming (HAS) has become the dominant video delivery mechanism over the Internet. Lack of coordination among the clients and lack of awareness of the network in pure client-based adaptive video bitrate approaches have caused problems, such as sub-optimal data throughput from Content Delivery Network (CDN) or origin servers, high CDN costs, and non-satisfactory users' experience. Recent studies have shown that network-assisted HAS techniques by utilizing modern networking paradigms, e.g., Software Defined Networking (SDN), Network Function Virtualization(NFV), and edge computing can significantly improve HAS system performance. In this doctoral study, we leverage the aforementioned modern networking paradigms and design network-assistance for/by HAS clients to improve HAS systems performance and CDN/network utilization. We present four fundamental research questions to target different challenges in devising a network-assisted HAS system.

...read moreread less

13 citations

Proceedings Article•DOI•

Towards cloud-edge collaborative online video analytics with fine-grained serverless pipelines

[...]

Miao Zhang¹, Fangxin Wang², Yifei Zhu³, Jiangchuan Liu¹, Zhi Wang⁴ - Show less +1 more•Institutions (4)

Simon Fraser University¹, The Chinese University of Hong Kong², Shanghai Jiao Tong University³, Tsinghua University⁴

15 Jul 2021

TL;DR: Li et al. as mentioned in this paper proposed CEVAS, a Cloud-Edge collaborative Video Analytics system empowered by fine-grained Serverless pipelines, which builds flexible serverless-based infrastructures to facilitate finegrained and adaptive partitioning of cloud-edge workloads for multiple concurrent query pipelines.

...read moreread less

Abstract: The ever-growing deployment scale of surveillance cameras and the users' increasing appetite for real-time queries have urged online video analytics. Synergizing the virtually unlimited cloud resources with agile edge processing would deliver an ideal online video analytics system; yet, given the complex interaction and dependency within and across video query pipelines, it is easier said than done. This paper starts with a measurement study to acquire a deep understanding of video query pipelines on real-world camera streams. We identify the potentials and practical challenges towards cloud-edge collaborative video analytics. We then argue that the newly emerged serverless computing paradigm is the key to achieve fine-grained resource partitioning with minimum dependency. We accordingly propose CEVAS, a Cloud-Edge collaborative Video Analytics system empowered by fine-grained Serverless pipelines. It builds flexible serverless-based infrastructures to facilitate fine-grained and adaptive partitioning of cloud-edge workloads for multiple concurrent query pipelines. With the optimized design of individual modules and their integration, CEVAS achieves real-time responses to highly dynamic input workloads. We have developed a prototype of CEVAS over Amazon Web Services (AWS) and conducted extensive experiments with real-world video streams and queries. The results show that by judiciously coordinating the fine-grained serverless resources in the cloud and at the edge, CEVAS reduces 86.9% cloud expenditure and 74.4% data transfer overhead of a pure cloud scheme and improves the analysis throughput of a pure edge scheme by up to 20.6%. Thanks to the fine-grained video content-aware forecasting, CEVAS is also more adaptive than the state-of-the-art cloud-edge collaborative scheme.

...read moreread less

12 citations

Proceedings Article•DOI•

A distributed, decoupled system for losslessly streaming dynamic light probes to thin clients

[...]

Michael Stengel¹, Zander Majercik¹, Benjamin Boudaoud¹, Morgan McGuire¹•Institutions (1)

Nvidia¹

15 Jul 2021

TL;DR: In this paper, the authors present a networked, high-performance graphics system that combines dynamic, high quality, ray traced global illumination computed on a server with direct illumination and primary visibility computed on the client.

...read moreread less

Abstract: We present a networked, high-performance graphics system that combines dynamic, high-quality, ray traced global illumination computed on a server with direct illumination and primary visibility computed on a client. This approach provides many of the image quality benefits of real-time ray tracing on low-power and legacy hardware, while maintaining a low latency response and mobile form factor. As opposed to streaming full frames from rendering servers to end clients, our system distributes the graphics pipeline over a network by computing diffuse global illumination on a remote machine. Diffuse global illumination is computed using a recent irradiance volume representation combined with a new lossless, HEVC-based, hardware-accelerated encoding, and a perceptually-motivated update scheme. Our experimental implementation streams thousands of irradiance probes per second and requires less than 50 Mbps of throughput, reducing the consumed bandwidth by 99.4% when streaming at 60 Hz compared to traditional lossless texture compression. The bandwidth reduction achieved with our approach allows higher quality and lower latency graphics than state-of-the-art remote rendering via video streaming. In addition, our split-rendering solution decouples remote computation from local rendering and so does not limit local display update rate or display resolution.

...read moreread less

10 citations

Proceedings Article•DOI•

CWIPC-SXR: Point Cloud dynamic human dataset for Social XR

[...]

Ignacio Reimat¹, Evangelos Alexiou¹, Jack Jansen¹, Irene Viola¹, Shishir Subramanyam¹, Pablo Cesar¹ - Show less +2 more•Institutions (1)

Centrum Wiskunde & Informatica¹

24 Jun 2021

TL;DR: In this article, a dynamic point cloud dataset that depicts humans interacting in social XR settings is presented, using commodity hardware to capture a total of 45 unique sequences, according to several use cases.

...read moreread less

Abstract: Real-time, immersive telecommunication systems are quickly becoming a reality, thanks to the advances in acquisition, transmission, and rendering technologies. Point clouds in particular serve as a promising representation in these type of systems, offering photorealistic rendering capabilities with low complexity. Further development of transmission, coding, and quality evaluation algorithms, though, is currently hindered by the lack of publicly available datasets that represent realistic scenarios of remote communication between people in real-time. In this paper, we release a dynamic point cloud dataset that depicts humans interacting in social XR settings. Using commodity hardware, we capture a total of 45 unique sequences, according to several use cases for social XR. As part of our release, we provide annotated raw material, resulting point cloud sequences, and an auxiliary software toolbox to acquire, process, encode, and visualize data, suitable for real-time applications. The dataset can be accessed via the following link: https://www.dis.cwi.nl/cwipc-sxr-dataset/.

...read moreread less

10 citations

Proceedings Article•DOI•

4DLFVD: A 4D Light Field Video Dataset

[...]

Xinjue Hu¹, Chenchen Wang², Pan Yuxuan², Yunming Liu², Yumei Wang², Yu Liu², Lin Zhang³, Shervin Shirmohammadi¹ - Show less +4 more•Institutions (3)

University of Ottawa¹, Peking University², Beijing Information Science & Technology University³

24 Jun 2021

TL;DR: In this article, the authors present a 10 x 10 LF capture matrix composed of 100 cameras, each with a 1920 x 1056 resolution. And they used this matrix to record videos in real and varying illumination and scene dynamics conditions.

...read moreread less

Abstract: We present a 4D Light Field (LF) video dataset, collected by a custom-made camera matrix, to be used for designing and testing algorithms and systems for LF video coding, processing, and streaming. Compared to existing LF datasets, ours provides LF videos, as opposed to only images, and at higher frame resolution, higher number of viewpoints, and/or higher framerate, offering the best visual quality LF video dataset. To achieve this, we built a 10 x 10 LF capture matrix composed of 100 cameras, each with a 1920 x 1056 resolution. We used this matrix to record videos in real and varying illumination and scene dynamics conditions. The dataset contains a total of nine groups of LF videos: eight groups collected with a fixed camera matrix position and orientation recording indoor potted plants, furniture, etc., and the last group collected by rotating around an outdoor environment with roadside vehicles, pedestrians, etc. Each group of LF videos consists of 100 video streams encoded with H.265/HEVC. Scene changes vary from static to slightly dynamic to highly dynamic, providing a good level of diversity. As an example, we present the results of a depth estimation method and show that our dataset can be used for applications such as objection detection, 3D modeling, and others.

...read moreread less

10 citations

Proceedings Article•DOI•

Tightrope walking in low-latency live streaming: optimal joint adaptation of video rate and playback speed

[...]

Liyang Sun¹, Tongyu Zong¹, Siquan Wang¹, Yong Liu¹, Yao Wang¹ - Show less +1 more•Institutions (1)

New York University¹

15 Jul 2021

TL;DR: In this article, a detailed chunk-level dynamic model was developed to characterize how video rate and playback speed jointly control the evolution of a live streaming session, and the optimal joint video rate-playback speed adaptation was studied as a non-linear optimal control problem.

...read moreread less

Abstract: It is highly challenging to simultaneously achieve high-rate and low-latency in live video streaming. Chunk-based streaming and playback speed adaptation are two promising new trends to achieve high user Quality-of-Experience (QoE). To thoroughly understand their potentials, we develop a detailed chunk-level dynamic model that characterizes how video rate and playback speed jointly control the evolution of a live streaming session. Leveraging on the model, we first study the optimal joint video rate-playback speed adaptation as a non-linear optimal control problem. We further develop model-free joint adaptation strategies using deep reinforcement learning. Through extensive experiments, we demonstrate that our proposed joint adaptation algorithms significantly outperform rate-only adaptation algorithms and the recently proposed low-latency video streaming algorithms that separately adapt video rate and playback speed without joint optimization. In a wide-range of network conditions, the model-based and model-free algorithms can achieve close-to-optimal trade-offs tailored for users with different QoE preferences.

...read moreread less

10 citations

Proceedings Article•DOI•

Full UHD 360-Degree Video Dataset and Modeling of Rate-Distortion Characteristics and Head Movement Navigation

[...]

Jacob Chakareski¹, Ridvan Aksu², Viswanathan Swaminathan³, Michael Zink⁴•Institutions (4)

New Jersey Institute of Technology¹, University of Alabama², Adobe Systems³, University of Massachusetts Amherst⁴

24 Jun 2021

TL;DR: In this paper, the authors investigate the rate-distortion characteristics of full ultra-high definition (UHD) 360° videos and capture corresponding head movement navigation data of virtual reality (VR) headsets.

...read moreread less

Abstract: We investigate the rate-distortion (R-D) characteristics of full ultra-high definition (UHD) 360° videos and capture corresponding head movement navigation data of virtual reality (VR) headsets. We use the navigation data to analyze how users explore the 360° look-around panorama for such content and formulate related statistical models. The developed R-D characteristics and modeling capture the spatiotemporal encoding efficiency of the content at multiple scales and can be exploited to enable higher operational efficiency in key use cases. The high quality expectations for next generation immersive media necessitate the understanding of these intrinsic navigation and content characteristics of full UHD 360° videos.

...read moreread less

9 citations

Proceedings Article•DOI•

LiveROI: region of interest analysis for viewport prediction in live mobile virtual reality streaming

[...]

Xianglong Feng¹, Weitian Li¹, Sheng Wei¹•Institutions (1)

Rutgers University¹

15 Jul 2021

TL;DR: LiveROI as discussed by the authors employs an action recognition algorithm to analyze the video content and uses the analysis results as the basis of viewport prediction to eliminate the need of historical video/user data and employs adaptive user preference modeling and word embedding to dynamically select the video viewport at runtime based on the user head orientation.

...read moreread less

Abstract: Virtual reality (VR) streaming can provide immersive video viewing experience to the end users but with huge bandwidth consumption. Recent research has adopted selective streaming to address the bandwidth challenge, which predicts and streams the user's viewport of interest with high quality and the other portions of the video with low quality. However, the existing viewport prediction mechanisms mainly target the video-on-demand (VOD) scenario relying on historical video and user trace data to build the prediction model. The community still lacks an effective viewport prediction approach to support live VR streaming, the most engaging and popular VR streaming experience. We develop a region of interest (ROI)-based viewport prediction approach, namely LiveROI, for live VR streaming. LiveROI employs an action recognition algorithm to analyze the video content and uses the analysis results as the basis of viewport prediction. To eliminate the need of historical video/user data, LiveROI employs adaptive user preference modeling and word embedding to dynamically select the video viewport at runtime based on the user head orientation. We evaluate LiveROI with 12 VR videos viewed by 48 users obtained from a public VR head movement dataset. The results show that LiveROI achieves high prediction accuracy and significant bandwidth savings with real-time processing to support live VR streaming.

...read moreread less

Proceedings Article•DOI•

COSMOS on Steroids: a Cheap Detector for Cheapfakes

[...]

Tankut Akgul¹, Tugce Erkilic Civelek², Deniz Ugur², Ali C. Begen²•Institutions (2)

Istanbul Technical University¹, Özyeğin University²

24 Jun 2021

TL;DR: In this article, the authors proposed four methods to improve the detection accuracy of COSMOS, which range from differential sensing and fake-or-fact checking that detect contradicting or fake captions to object-caption matching.

...read moreread less

Abstract: The growing prevalence of visual disinformation has become an important problem to solve nowadays. Cheapfake is a new term used for the altered media generated by non-AI techniques. In their recent COSMOS work, the authors developed a self-supervised training strategy that detected whether different captions for a given image were out-of-context, meaning that even though pointing to the same object(s) in the image, the captions implied different meanings. In this paper, we propose four methods to improve the detection accuracy of COSMOS. These methods range from differential sensing and fake-or-fact checking that detect contradicting or fake captions to object-caption matching and threshold adjustment that modify the baseline algorithm for improved accuracy.

...read moreread less

Proceedings Article•DOI•

Playing chunk-transferred DASH segments at low latency with QLive

[...]

Praveen Kumar Yadav, Abdelhak Bentaleb¹, May Lim¹, Junyi Huang², Wei Tsang Ooi¹, Roger Zimmermann¹ - Show less +2 more•Institutions (2)

National University of Singapore¹, Xi'an Jiaotong University²

15 Jul 2021

TL;DR: In this paper, the authors leverage a simple and intuitive method to resolve the fundamental problem of bandwidth estimation for low latency live streaming through the use of a hybrid of an existing chunk parser and proposed filtering of downloaded chunk data.

...read moreread less

Abstract: More users have a growing interest in low latency over-the-top (OTT) applications such as online video gaming, video chat, online casino, sports betting, and live auctions. OTT applications face challenges in delivering low latency live streams using Dynamic Adaptive Streaming over HTTP (DASH) due to large playback buffer and video segment duration. A potential solution to this issue is the use of HTTP chunked transfer encoding (CTE) with the common media application format (CMAF). This combination allows the delivery of each segment in several chunks to the client, starting before the segment is fully available in real-time. However, CTE and CMAF alone are not sufficient as they do not address other limitations and challenges at the client-side, including inaccurate bandwidth measurement, latency control, and bitrate selection. In this paper, we leverage a simple and intuitive method to resolve the fundamental problem of bandwidth estimation for low latency live streaming through the use of a hybrid of an existing chunk parser and proposed filtering of downloaded chunk data. Next, we model the playback buffer as a M/D/1/K queue to limit the playback delay. The combination of these techniques is collectively called QLive. QLive uses the relationship between the estimated bandwidth, total buffer capacity, instantaneous playback speed, and buffer occupancy to decide the playback speed and the bitrate of the representation to download. We evaluated QLive under a diverse set of scenarios and found that it controls the latency to meet the given latency requirement, with an average latency up to 21 times lower than the compared methods. The average playback speed of QLive ranges between 1.01 - 1.26X and it plays back at 1X speed up to 97% longer than the compared algorithms, without sacrificing the quality of the video. Moreover, the proposed bandwidth estimator has a 94% accuracy and is unaffected by a spike in instantaneous playback latency, unlike the compared state-of-the-art counterparts.

...read moreread less

Proceedings Article•DOI•

A Hybrid Receiver-side Congestion Control Scheme for Web Real-time Communication

[...]

Bo Wang¹, Yuan Zhang¹, Size Qian¹, Zipeng Pan¹, Yuhong Xie¹ - Show less +1 more•Institutions (1)

Communication University of China¹

24 Jun 2021

TL;DR: In this article, a hybrid receiver-side congestion control (HRCC) framework was proposed, which combines a heuristic congestion control scheme with an RL-Agent that periodically generates a gain coefficient to tune the bandwidth estimated by the heuristic scheme.

...read moreread less

Abstract: Web real-time communication (WebRTC) employs congestion control to ensure the quality of experience (QoE). Different from congestion control schemes for TCP, WebRTC keeps a low-level playback buffer that considers excessively delayed packets as losses, which makes the congestion control for WebRTC more challenging. Existing heuristic schemes estimate the network conditions based on hand-crafted rules that may be suboptimal, leading to under-utilization or over-utilization of link capacity in many cases. On the other hand, the existing learning-based schemes train a model that acts in a large action space, which is hard to converge to a stable status and has low performance over unpredictable network conditions. In this paper, we propose a hybrid receiver-side congestion control (HRCC) framework, which combines a heuristic congestion control scheme with an RL-Agent that periodically generates a gain coefficient to tune the bandwidth estimated by the heuristic scheme. Extensive simulation experiments demonstrate that the HRCC's RL-Agent effectively tunes the bandwidth estimate of the heuristic scheme. The hybrid scheme achieves higher bandwidth utilization than the fully heuristic scheme with similar queuing delay and packet loss and outperforms the fully RL-based scheme on overall performance.

...read moreread less

Proceedings Article•DOI•

uvgVenctester: Open-Source Test Automation Framework for Comprehensive Video Encoder Benchmarking

[...]

Joose Sainio, Alexandre Mercat, Jarno Vanne

24 Jun 2021

TL;DR: Venctester as discussed by the authors is an open-source test automation framework for video encoder performance and conformance testing with the desired set of test video sequences, which includes support for the popular AVC, HEVC, VVC, VP9 and AV1 video coding formats and the state-of-the-art HM, Kvazaar, x265, VTM, VVenC, SVT-VP9, and SVT1 video encoders.

...read moreread less

Abstract: The agile and efficient development of modern video encoders calls for automated testing methodologies. This paper presents the first-of-its-kind open-source test automation framework called uvgVenctester (github.com/ultravideo/uvgVenctester) that is designed for comprehensive performance and conformance testing of video encoders with the desired set of test video sequences. Our framework comes with built-in support for the popular AVC, HEVC, VVC, VP9, and AV1 video coding formats and the state-of-the-art HM, Kvazaar, x265, VTM, VVenC, SVT-VP9, and SVT-AV1 video encoders. Furthermore, there are no technical limitations of adopting other formats or encoders. The developers can evaluate the encoder of interest under the three primary usage scenarios: 1) conformance testing of the encoded bitstream; 2) rate-distortion-complexity comparison with the other encoders; and 3) systematic exploration of encoding parameters. The framework provides commonly used analysis tools to quantify encoding quality, speed, and bitrate with versatile set of absolute and comparative results such as Bjontegaard Delta (BD)-Rate for PSNR, SSIM, and VMAF quality metrics. The supported output formats include CSV, graph, and comparison table. They ensure that the results are available in human and machine-readable formats. To the best of our knowledge, the proposed framework is currently the most comprehensive and modular open-source software toolset for video encoder benchmarking.

...read moreread less

Proceedings Article•DOI•

Performance of Low-Latency HTTP-based Streaming Players

[...]

Bo Zhang, Thiago Teixeira, Yuriy Reznik

24 Jun 2021

TL;DR: In this article, the authors evaluate the performance of low-latency HTTP Live Streaming (LL-HLS) and Low-Latency Dynamic Adaptive Streaming over HTTP (DASH) based players.

...read moreread less

Abstract: Reducing end-to-end streaming latency is critical to HTTP-based live video streaming. There are currently two technologies in this domain: Low-Latency HTTP Live Streaming (LL-HLS) and Low-Latency Dynamic Adaptive Streaming over HTTP (LL-DASH). Many players support LL-HLS and/or LL-DASH protocols, including Apple's AVPlayer, Shaka player, HLS.js Dash.js, and others. This paper is dedicated to the analysis of the performance of low-latency players and streaming protocols. The evaluation is based on a series of live streaming experiments, repeated using identical video content, encoders, encoding profiles, and network conditions, emulated by using traces of real-world networks. Several performance metrics, such as average stream bitrate, the amounts of downloaded media data, streaming latency, as well as buffering and stream switching statistics are captured and reported in our experiments. These results are subsequently used to describe the observed differences in the performance of LL-HLS and LL-DASH-based players.

...read moreread less

Proceedings Article•DOI•

QPlane: An Open-Source Reinforcement Learning Toolkit for Autonomous Fixed Wing Aircraft Simulation

[...]

David J. Richter¹, Ricardo A. Calix¹•Institutions (1)

Purdue University¹

24 Jun 2021

TL;DR: QPlane as discussed by the authors is an alternative toolkit for RL training of fixed wing aircraft, which is easily modifiable for different scenarios and is replicable and flexible for ease of implementation to high performance computing.

...read moreread less

Abstract: Reinforcement Learning (RL) is a fast-growing field of research that is mostly applied in the realm of video games due to the compatibility of RL and game tasks. AI Gym has established itself as the gold standard toolkit for Reinforcement Learning research. Unfortunately, toolkits like AI Gym are very optimized for benchmark purposes and may not always be suitable for real world type problems. Additionally, fixed wing flight simulation has specific requirements and may need other solutions. In this paper, we propose QPlane as an alternative toolkit for RL training of fixed wing aircraft. QPlane was developed in an effort to create a RL toolkit for fixed wing aircraft simulation that is easily modifiable for different scenarios. QPlane is replicable and flexible for ease of implementation to high performance computing, and is modular for quick environment and algorithm replacement. In this paper we present and discuss details of QPlane, as well as proof of concept results.

...read moreread less

Proceedings Article•DOI•

Standards-based Streaming Analytics and its Visualization

[...]

Stefan Pham¹, Mariana Avelino¹, Daniel Silhavy¹, Troung-Sinh An¹, Stefan Arbanowski¹ - Show less +1 more•Institutions (1)

Fokus¹

24 Jun 2021

TL;DR: In this paper, the authors consider SAND (Server and Network Assisted DASH), CMCD (Common Media Client Data) and Streaming Quality of Experience Events, Properties and Metrics (CTA-2066) as standards to enable interoperable, standard-based streaming analytics for the predominant streaming formats MPEG-DASH and HLS.

...read moreread less

Abstract: As OTT (over-the-top) media streaming and underlying technologies have matured, streaming analytics has become more important, especially in a heterogeneous device ecosystem, where new devices or software updates can potentially cause streaming issues. In this paper we consider SAND (Server and Network Assisted DASH), CMCD (Common Media Client Data) and Streaming Quality of Experience Events, Properties and Metrics (CTA-2066) as standards to enable interoperable, standard-based streaming analytics for the predominant streaming formats MPEG-DASH and HLS. We focus on the visualization aspect of streaming metrics in UI (user interface) dashboards.

...read moreread less

Proceedings Article•DOI•

AMP: authentication of media via provenance

[...]

Paul England¹, Henrique S. Malvar¹, Eric Horvitz¹, Jack W. Stokes¹, Cédric Fournet¹, Rebecca Burke-Aguero¹, Amaury Chamayou¹, Sylvan Clebsch¹, Manuel Costa¹, John Deutscher¹, Shabnam Erfani¹, Matt Gaylor¹, Andrew Jenks¹, Kevin Kane¹, Elissa M. Redmiles¹, Alex Shamis¹, Isha Sharma¹, John C. Simmons¹, Sam Wenker¹, Anika Zaman¹ - Show less +16 more•Institutions (1)

Microsoft¹

15 Jul 2021

TL;DR: AMP as discussed by the authors is a system that ensures the authentication of media via certifying provenance by creating one or more publisher-signed manifests for a media instance uploaded by a content provider, which are stored in a database allowing fast lookup from applications such as browsers.

...read moreread less

Abstract: Advances in graphics and machine learning have led to the general availability of easy-to-use tools for modifying and synthesizing media. The proliferation of these tools threatens to cast doubt on the veracity of all media. One approach to thwarting the flow of fake media is to detect modified or synthesized media through machine learning methods. While detection may help in the short term, we believe that it is destined to fail as the quality of fake media generation continues to improve. Soon, neither humans nor algorithms will be able to reliably distinguish fake versus real content. Thus, pipelines for assuring the source and integrity of media will be required---and increasingly relied upon. We present AMP, a system that ensures the authentication of media via certifying provenance. AMP creates one or more publisher-signed manifests for a media instance uploaded by a content provider. These manifests are stored in a database allowing fast lookup from applications such as browsers. For reference, the manifests are also registered and signed by a permissioned ledger, implemented using the Confidential Consortium Framework (CCF). CCF employs both software and hardware techniques to ensure the integrity and transparency of all registered manifests. AMP, through its use of CCF, enables a consortium of media providers to govern the service while making all its operations auditable. The authenticity of the media can be communicated to the user via visual elements in the browser, indicating that an AMP manifest has been successfully located and verified.

...read moreread less

Proceedings Article•DOI•

DataPlanner: data-budget driven approach to resource-efficient ABR streaming

[...]

Yanyuan Qin¹, Chinmaey Shende¹, Cheonjin Park¹, Subhabrata Sen², Bing Wang¹ - Show less +1 more•Institutions (2)

University of Connecticut¹, AT&T Labs²

15 Jul 2021

TL;DR: In this paper, a framework for quality-aware adaptive bitrate (ABR) streaming involving a per-session data budget constraint is proposed, where fine-grained perceptual quality information is known to the planning scheme, and another for such information is not available.

...read moreread less

Abstract: Over-the-top video (OTT) streaming accounts for the majority of traffic on cellular networks, and also places a heavy demand on users' limited monthly cellular data budgets. In contrast to much of traditional research that focuses on improving the quality, we explore a different direction---using data budget information to better manage the data usage of mobile video streaming, while minimizing the impact on users' quality of experience (QoE). Specifically, we propose a novel framework for quality-aware Adaptive Bitrate (ABR) streaming involving a per-session data budget constraint. Under the framework, we develop two planning based strategies, one for the case where fine-grained perceptual quality information is known to the planning scheme, and another for the case where such information is not available. Evaluations for a wide range of network conditions, using different videos covering a variety of content types and encodings, demonstrate that both these strategies use much less data compared to state-of-the-art ABR schemes, while still providing comparable QoE. Our proposed approach is designed to work in conjunction with existing ABR streaming workflows, enabling ease of adoption.

...read moreread less

Proceedings Article•DOI•

Livelyzer: analyzing the first-mile ingest performance of live video streaming

[...]

Xiao Zhu¹, Subhabrata Sen², Z. Morley Mao¹•Institutions (2)

University of Michigan¹, AT&T Labs²

15 Jul 2021

TL;DR: In this article, the authors propose Livelyzer, a generalized active measurement and black-box testing framework for analyzing the performance of this component in popular live streaming software and services under controlled settings.

...read moreread less

Abstract: Over-the-top (OTT) live video traffic has grown significantly, fueled by fundamental shifts in how users consume video content (e.g., increased cord-cutting) and by improvements in camera technologies, computing power, and wireless resources. A key determining factor for the end-to-end live streaming QoE is the design of the first-mile upstream ingest path that captures and transmits the live content in real-time, from the broadcaster to the remote video server. This path often involves either a Wi-Fi or cellular component, and is likely to be bandwidth-constrained with time-varying capacity, making the task of high-quality video delivery challenging. Today, there is little understanding of the state of the art in the design of this critical path, with existing research focused mainly on the downstream distribution path, from the video server to end viewers. To shed more light on the first-mile ingest aspect of live streaming, we propose Livelyzer, a generalized active measurement and black-box testing framework for analyzing the performance of this component in popular live streaming software and services under controlled settings. We use Livelyzer to characterize the ingest behavior and performance of several live streaming platforms, identify design deficiencies that lead to poor performance, and propose best practice design recommendations to improve the same.

...read moreread less

Proceedings Article•DOI•

In-Network Scalable Video Adaption Using Big Packet Protocol

[...]

Stuart Clayman¹, Muge Sayit²•Institutions (2)

University College London¹, Ege University²

24 Jun 2021

TL;DR: In this paper, the authors show how SVC Scalable Video can be adaptated in the network in an effective way, when the Big Packet Protocol (BPP) is used.

...read moreread less

Abstract: The essence of this work is to show how SVC Scalable Video can be adaptated in the network in an effective way, when the Big Packet Protocol (BPP) is used. This demo shows the advantages of BPP, which is a recently proposed transport protocol devised for real-time applications. We will show that in-network adaption can be provided using this new protocol. We show how a network node can change the packets during their transmission, but still present a very usable video stream to the client. The preliminary results show that BPP is a good alternative transport for video transmission.

...read moreread less

Proceedings Article•DOI•

Adaptive Streaming Playback Statistics Dataset

[...]

Thiago Teixeira, Bo Zhang, Yuriy Reznik

24 Jun 2021

TL;DR: In this article, the authors proposed a dataset capturing statistics of several large-scale real-world streaming events, delivering videos to different devices (TVs, desktops, mobiles, tablets, etc.), and over different networks (from 2.5G, 3G, and other early generation mobile networks to 5G and broadband).

...read moreread less

Abstract: We propose dataset capturing statistics of several large-scale real-world streaming events, delivering videos to different devices (TVs, desktops, mobiles, tablets, etc.), and over different networks (from 2.5G, 3G, and other early generation mobile networks to 5G and broadband). The data we capture include network-related statistics, playback statistics (session- and player-event-level), and characteristics of the encoded streams. Such data should enable a broad level of possible applications and uses in the research community: from analysis of the effectiveness of algorithms in streaming players to studies of QoE metrics, and end-to-end system optimizations. Examples of such possible studies based on the proposed datasets are also provided.

...read moreread less

Proceedings Article•DOI•

User Mobility Simulator for Full-Immersive Multiuser Virtual Reality with Redirected Walking

[...]

Filip Lemic¹, Jakob Struye¹, Jeroen Famaey¹•Institutions (1)

IMEC¹

24 Jun 2021

TL;DR: In this article, the authors present a simulator for enabling seamless mobility of the VR users in the virtual worlds, while simultaneously constraining them inside shared physical spaces through redirected walking, which can capture a set of performance metrics characterizing the number of perceivable resets and the distances between such resets for each user.

...read moreread less

Abstract: Full-immersive multiuser Virtual Reality (VR) setups envision supporting seamless mobility of the VR users in the virtual worlds, while simultaneously constraining them inside shared physical spaces through redirected walking. For enabling high data rate and low latency delivery of video content in such setups, the supporting wireless networks will have to utilize highly directional communication links, where these links will ideally have to “track” the mobile VR users for maintaining the Line-of-Sight (LoS) connectivity. The design decisions about the mobility patterns of the VR users in the virtual worlds will thus have a substantial effect on the mobility of these users in the physical environments, and therefore also on performance of the underlying networks. Hence, there is a need for a tool that can provide a mapping between design decisions about the users' mobility in the virtual words, and their effects on the mobility in constrained physical setups. To address this issue, we have developed and in this paper present a simulator for enabling this functionality. Given a set of VR users with their virtual movement trajectories, the outline of the physical deployment environment, and a redirected walking algorithm for avoiding physical collisions, the simulator is able to derive the physical movements of the users. Based on the derived physical movements, the simulator can capture a set of performance metrics characterizing the number of perceivable resets and the distances between such resets for each user. The simulator is also able to indicate the predictability of the physical movement trajectories, which can serve as an indication of the complexity of supporting a given virtual movement pattern by the underlying networks.

...read moreread less

Proceedings Article•DOI•

REEFT-360: Real-time Emulation and Evaluation Framework for Tile-based 360 Streaming under Time-varying Conditions

[...]

Eric Lindskog¹, Niklas Carlsson¹•Institutions (1)

Linköping University¹

24 Jun 2021

TL;DR: The authorsFT-360 The authors is a real-time emulation framework that captures tile-quality adaptation under time-varying bandwidth conditions and a multi-step evaluation process that allows the calculation of MS-SSIM scores and other frame-based metrics, while accounting for user's head movements.

...read moreread less

Abstract: With 360° video streaming, the user's field of view (a.k.a. viewport) is at all times determined by the user's current viewing direction. Since any two users are unlikely to look in the exact same direction as each other throughout the viewing of a video, the frame-by-frame video sequence displayed during a playback session is typically unique. This complicates the direct comparison of the perceived Quality of Experience (QoE) using popular metrics such as the Multiscale-Structural Similarity (MS-SSIM). Furthermore, there is an absence of light-weight emulation frameworks for tiled-based 360° video streaming that allow easy testing of different algorithm designs and tile sizes. To address these challenges, we present REEFT-360, which consists of (1) a real-time emulation framework that captures tile-quality adaptation under time-varying bandwidth conditions and (2) a multi-step evaluation process that allows the calculation of MS-SSIM scores and other frame-based metrics, while accounting for the user's head movements. Importantly, the framework allows speedy implementation and testing of alternative head-movement prediction and tile-based prefetching solutions, allows testing under a wide range of network conditions, and can be used either with a human user or head-movement traces. The developed software tool is shared with the paper. We also present proof-of-concept evaluation results that highlight the importance of including a human subject in the evaluation.

...read moreread less

Proceedings Article•DOI•

Foveated streaming of real-time graphics

[...]

Gazi Illahi¹, Matti Siekkinen², Teemu Kämäräinen², Antti Ylä-Jääski¹•Institutions (2)

Aalto University¹, University of Helsinki²

15 Jul 2021

TL;DR: In this paper, the authors study three different methods to produce a foveated video stream of real-time rendered graphics in a remote rendered system: (1) Foveated shading as part of the rendering pipeline, (2) foveation as post processing step after rendering and before video encoding, and (3) video encoding.

...read moreread less

Abstract: Remote rendering systems comprise powerful servers that render graphics on behalf of low-end client devices and stream the graphics as compressed video, enabling high end gaming and Virtual Reality on those devices. One key challenge with them is the amount of bandwidth required for streaming high quality video. Humans have spatially non-uniform visual acuity: We have sharp central vision but our ability to discern details rapidly decreases with angular distance from the point of gaze. This phenomenon called foveation can be taken advantage of to reduce the need for bandwidth. In this paper, we study three different methods to produce a foveated video stream of real-time rendered graphics in a remote rendered system: 1) foveated shading as part of the rendering pipeline, 2) foveation as post processing step after rendering and before video encoding, 3) foveated video encoding. We report results from a number of experiments with these methods. They suggest that foveated rendering alone does not help save bandwidth. Instead, the two other methods decrease the resulting video bitrate significantly but they also have different quality per bit and latency profiles, which makes them desirable solutions in slightly different situations.

...read moreread less

Proceedings Article•DOI•

Content-Aware Playback Speed Control for Low-Latency Live Streaming of Sports

[...]

Omer F. Aladag¹, Deniz Ugur², Mehmet N. Akcay², Ali C. Begen²•Institutions (2)

Turkcell¹, Özyeğin University²

24 Jun 2021

TL;DR: In this article, a content-aware playback speed control (CAPSC) algorithm is proposed for live streaming of sports content, which allows the streaming client to slow down the playback when there is a risk of stalling.

...read moreread less

Abstract: There are two main factors that determine the viewer experience during the live streaming of sports content: latency and stalls. Latency should be low and stalls should not occur. Yet, these two factors work against each other and it is not trivial to strike the best trade-off between them. One of the best tools we have today to manage this trade-off is the adaptive playback speed control. This tool allows the streaming client to slow down the playback when there is a risk of stalling and increase the playback when there is no risk of stalling but the live latency is higher than desired. While adaptive playback generally works well, the artifacts due to the changes in the playback speed should preferably be unnoticeable to the viewers. However, this mostly depends on the portion of the audio/video content subject to the playback speed change. In this paper, we advance the state-of-the-art by developing a content-aware playback speed control (CAPSC) algorithm and demonstrate a number of examples showing its significance. We make the running code available and provide a demo page hoping that it will be a useful tool for the developers and content providers.

...read moreread less

Proceedings Article•DOI•

EScALation: a framework for efficient and scalable spatio-temporal action localization

[...]

Bo Chen¹, Klara Nahrstedt¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

15 Jul 2021

TL;DR: EScALation as mentioned in this paper proposes a frame sampling technique that utilizes the temporal correlation between frames and selects key frame(s) from a temporally correlated set of frames to perform bounding box detection.

...read moreread less

Abstract: Spatio-temporal action localization aims to detect the spatial location and the start/end time of the action in a video. The state-of-the-art approach uses convolutional neural networks to extract possible bounding boxes for the action in each frame and then link bounding boxes into action tubes based on the location and the class-specific score of each bounding box. Though this approach has been successful at achieving a good localization accuracy, it is computation-intensive. High-end GPUs are usually demanded for it to achieve real-time performance. In addition, this approach does not scale well on a large number of action classes. In this work, we present a framework, EScALation, for making spatio-temporal action localization efficient and scalable. Our framework involves two main strategies. One is the frame sampling technique that utilizes the temporal correlation between frames and selects key frame(s) from a temporally correlated set of frames to perform bounding box detection. The other is the class filtering technique that exploits bounding box information to predict the action class prior to linking bounding boxes. We compare EScALation with the state-of-the-art approach on UCF101-24 and J-HMDB-21 datasets. One of our experiments shows EScALation is able to save 72.2% of the time with only 6.1% loss of mAP. In addition, we show that EScALation scales better to a large number of action classes than the state-of-the-art approach.

...read moreread less

Proceedings Article•DOI•

XR Carousel: A Visualization Tool For Volumetric Video

[...]

Sylvie Dijkstra-Soudarissanane¹, Simon Gunkel¹, Alexandre Gabriel¹, Leonor Fermoselle¹, Frank B. ter Haar¹, Omar Aziz Niamut¹ - Show less +2 more•Institutions (1)

Netherlands Organisation for Applied Scientific Research¹

24 Jun 2021

TL;DR: In this article, the authors present a visualization tool that helps assessing the visual quality of a 3D representation employing various coding schemes, allowing for subjective testing by showing the differences between the selected encoding parameters.

...read moreread less

Abstract: Recent years have seen a new uptake in immersive media and eXtended Reality (XR). And due to a global pandemic, computer-mediated communication over video conferencing tools became a new normal of everyday remote collaboration and virtual meetings. Social XR leverages XR technologies for remote communication and collaboration. But in order for XR to facilitate a high level of (social) presence and thus high-quality mediated social contact between users, we need high-quality 3D representation of users. One approach to providing detailed 3D user representations as new immersive media is to use point clouds or meshes, but these representation formats come with complexity on compression bitrate and processing time. In the example of virtual meetings, compression has to fulfill stringent requirements such as low latency and high quality. As the compression techniques for 3D immersive media steadily advance, it is important to be able to easily compare different compression techniques on their technical and visual merits in an easy way. The proposed demonstrator in this paper is a visualization tool that helps assessing the visual quality of a 3D representation employing various coding schemes. The complete end-to-end rendering/encoding chain can be easily assessed, allowing for subjective testing by showing the differences between the selected encoding parameters. The tool presented in this demo paper offers an improved and easy visual process for the comparison of encoders of immersive media.

...read moreread less

Proceedings Article•DOI•

EvLag: A Tool for Monitoring and Lagging Linux Input Devices

[...]

Shengmei Liu¹, Mark Claypool¹•Institutions (1)

Worcester Polytechnic Institute¹

24 Jun 2021

TL;DR: EvLag as discussed by the authors is a tool for adding latency to user input devices in Linux, regardless of the application being run, enabling user studies for systems and software that cannot be modified (e.g., commercial games).

...read moreread less

Abstract: Understanding the effects of latency on interaction is important for building software, such as computer games, that perform well over a range of system configurations. Unfortunately, user studies evaluating latency must each write their own code to add latency to user input and, even worse, must limit themselves to open source applications. To address these shortcomings, this paper presents EvLag, a tool for adding latency to user input devices in Linux. EvLag provides a custom amount of latency for each device regardless of the application being run, enabling user studies for systems and software that cannot be modified (e.g., commercial games). Evaluation shows EvLag has low overhead and accurately adds the expected amount of latency to user input. In addition, EvLag can log user input events for post study analysis with several utilities provided to facilitate output event parsing.

...read moreread less