Showing papers presented at "ACM SIGMM Conference on Multimedia Systems in 2018"

PDF

Open Access

Proceedings Article•DOI•

A dataset of head and eye movements for 360° videos

[...]

Erwan J. David¹, Jesus Gutierrez¹, Antoine Coutrot¹, Matthieu Perreira Da Silva¹, Patrick Le Callet¹ - Show less +1 more•Institutions (1)

University of Nantes¹

12 Jun 2018

TL;DR: This paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to the previous dataset for still images and its associated code is made publicly available to support research on visual attention for 360° content.

...read moreread less

Abstract: Research on visual attention in 360° content is crucial to understand how people perceive and interact with this immersive type of content and to develop efficient techniques for processing, encoding, delivering and rendering. And also to offer a high quality of experience to end users. The availability of public datasets is essential to support and facilitate research activities of the community. Recently, some studies have been presented analyzing exploration behaviors of people watching 360° videos, and a few datasets have been published. However, the majority of these works only consider head movements as proxy for gaze data, despite the importance of eye movements in the exploration of omnidirectional content. Thus, this paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to our previous dataset for still images [14]. Head and eye tracking data was obtained from 57 participants during a free-viewing experiment with 19 videos. In addition, guidelines on how to obtain saliency maps and scanpaths from raw data are provided. Also, some statistics related to exploration behaviors are presented, such as the impact of the longitudinal starting position when watching omnidirectional videos was investigated in this test. This dataset and its associated code are made publicly available to support research on visual attention for 360° content.

...read moreread less

138 citations

Proceedings Article•DOI•

Beyond throughput: a 4G LTE dataset with channel and context metrics

[...]

Darijo Raca¹, Jason J. Quinlan¹, Ahmed H. Zahran¹, Cormac J. Sreenan¹•Institutions (1)

University College Cork¹

12 Jun 2018

TL;DR: The purpose of this dataset is to provide additional information (such as competing metrics for users connected to the same cell), thus providing otherwise unavailable information about the eNodeB environment and scheduling principle, to end user.

...read moreread less

Abstract: In this paper, we present a 4G trace dataset composed of client-side cellular key performance indicators (KPIs) collected from two major Irish mobile operators, across different mobility patterns (static, pedestrian, car, bus and train). The 4G trace dataset contains 135 traces, with an average duration of fifteen minutes per trace, with viewable throughput ranging from 0 to 173 Mbit/s at a granularity of one sample per second. Our traces are generated from a well-known non-rooted Android network monitoring application, G-NetTrack Pro. This tool enables capturing various channel related KPIs, context-related metrics, downlink and uplink throughput, and also cell-related information. To the best of our knowledge, this is the first publicly available dataset that contains throughput, channel and context information for 4G networks.To supplement our real-time 4G production network dataset, we also provide a synthetic dataset generated from a large-scale 4G ns-3 simulation that includes one hundred users randomly scattered across a seven-cell cluster. The purpose of this dataset is to provide additional information (such as competing metrics for users connected to the same cell), thus providing otherwise unavailable information about the eNodeB environment and scheduling principle, to end user. In addition to this dataset, we also provide the code and context information to allow other researchers to generate their own synthetic datasets.

...read moreread less

121 citations

Proceedings Article•DOI•

From theory to practice: improving bitrate adaptation in the DASH reference player

[...]

Kevin Spiteri¹, Ramesh K. Sitaraman¹, Daniel Sparacio•Institutions (1)

University of Massachusetts Amherst¹

12 Jun 2018

TL;DR: Three novel adaptive bitrate algorithms are developed that provide higher QoE to the user in terms of higher bitrate, fewer rebuffers, and lesser bitrate oscillations and perform very well for live streams that require low latency, a challenging scenario for ABR algorithms.

...read moreread less

Abstract: Modern video streaming uses adaptive bitrate (ABR) algorithms than run inside video players and continually adjust the quality (i.e., bitrate) of the video segments that are downloaded and rendered to the user. To maximize the quality-of-experience of the user, ABR algorithms must stream at a high bitrate with low rebuffering and low bitrate oscillations. Further, a good ABR algorithm is responsive to user and network events and can be used in demanding scenarios such as low-latency live streaming. Recent research papers provide an abundance of ABR algorithms, but fall short on many of the above real-world requirements.We develop Sabre, an open-source publicly-available simulation tool that enables fast and accurate simulation of adaptive streaming environments. We used Sabre to design and evaluate BOLA-E and DYNAMIC, two novel ABR algorithms. We also developed a FAST SWITCHING algorithm that can replace segments that have already been downloaded with higher-bitrate (thus higher-quality) segments. The new algorithms provide higher QoE to the user in terms of higher bitrate, fewer rebuffers, and lesser bitrate oscillations. In addition, these algorithms react faster to user events such as startup and seek, and respond more quickly to network events such as improvements in throughput. Further, they perform very well for live streams that require low latency, a challenging scenario for ABR algorithms. Overall, our algorithms offer superior video QoE and responsiveness for real-life adaptive video streaming, in comparison to the state-of-the-art. Importantly all three algorithms presented in this paper are now part of the official DASH reference player dash.js and are being used by video providers in production environments. While our evaluation and implementation are focused on the DASH environment, our algorithms are equally applicable to other adaptive streaming formats such as Apple HLS.

...read moreread less

99 citations

Proceedings Article•DOI•

HTTP adaptive streaming QoE estimation with ITU-T rec. P. 1203: open databases and software

[...]

Werner Robitza¹, Steve Goring¹, Alexander Raake¹, David Lindegren², Gunnar Heikkilä², Jörgen Gustafsson², Peter List³, Bernhard Feiten³, Ulf Wüstenhagen³, Marie-Neige Garcia⁴, Kazuhisa Yamagishi⁵, Simon Broom - Show less +8 more•Institutions (5)

Technische Universität Ilmenau¹, Ericsson², Deutsche Telekom³, Technical University of Berlin⁴, Nippon Telegraph and Telephone⁵

12 Jun 2018

TL;DR: This paper describes an open dataset and software for ITU-T Ree, the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming, and shows the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.

...read moreread less

Abstract: This paper describes an open dataset and software for ITU-T Ree. P.1203. As the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming (HAS), it has been extensively trained and validated on over a thousand audiovisual sequences containing HAS-typical effects (such as stalling, coding artifacts, quality switches). Our dataset comprises four of the 30 official subjective databases at a bitstream feature level. The paper also includes subjective results and the model performance. Our software for the standard was made available to the public, too, and it is used for all the analyses presented. Among other previously unpublished details, we show the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.

...read moreread less

95 citations

Proceedings Article•DOI•

Multi-path multi-tier 360-degree video streaming in 5G networks

[...]

Liyang Sun¹, Fanyi Duanmu¹, Yong Liu¹, Yao Wang¹, Yinghua Ye², Hang Shi², David Dai² - Show less +3 more•Institutions (2)

New York University¹, Huawei²

12 Jun 2018

TL;DR: Novel multi-path multi-tier 360° video streaming solutions are developed to simultaneously address the dynamics in both network bandwidth and user viewing direction to achieve a high-level of Quality-of-Experience (QoE) in the challenging 5G wireless network environment.

...read moreread less

Abstract: 360° video streaming is a key component of the emerging Virtual Reality (VR) and Augmented Reality (AR) applications. In 360° video streaming, a user may freely navigate through the captured 360° video scene by changing her desired Field-of-View. High-throughput and low-delay data transfers enabled by 5G wireless networks can potentially facilitate untethered 360° video streaming experience. Meanwhile, the high volatility of 5G wireless links present unprecedented challenges for smooth 360° video streaming. In this paper, novel multi-path multi-tier 360° video streaming solutions are developed to simultaneously address the dynamics in both network bandwidth and user viewing direction. We systematically investigate various design trade-offs on streaming quality and robustness. Through simulations driven by real 5G network bandwidth traces and user viewing direction traces, we demonstrate that the proposed 360° video streaming solutions can achieve a high-level of Quality-of-Experience (QoE) in the challenging 5G wireless network environment.

...read moreread less

69 citations

Proceedings Article•DOI•

Latency and throughput characterization of convolutional neural networks for mobile computer vision

[...]

Jussi Hanhirova¹, Teemu Kämäräinen¹, Sipi Seppälä¹, Matti Siekkinen¹, Vesa Hirvisalo¹, Antti Ylä-Jääski¹ - Show less +2 more•Institutions (1)

Aalto University¹

12 Jun 2018

TL;DR: It is shown that there exists significant latency-throughput trade-offs but the behavior is very complex, and several factors that affect the performance and yield this complex behavior are demonstrated.

...read moreread less

Abstract: We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency-throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.

...read moreread less

68 citations

Proceedings Article•DOI•

AVtrack360: an open dataset and software recording people's head rotations watching 360° videos on an HMD

[...]

Stephan Fremerey¹, Ashutosh Singla¹, Kay Meseberg, Alexander Raake¹•Institutions (1)

Technische Universität Ilmenau¹

12 Jun 2018

TL;DR: The paper presents the general angular ranges of the subjects' exploration behavior as well as an analysis of the areas where most of the time was spent, and several methods for evaluating head rotation data are presented and discussed.

...read moreread less

Abstract: In this paper, we present a viewing test with 48 subjects watching 20 different entertaining omnidirectional videos on an HTC Vive Head Mounted Display (HMD) in a task-free scenario. While the subjects were watching the contents, we recorded their head movements. The obtained dataset is publicly available in addition to the links and timestamps of the source contents used. Within this study, subjects were also asked to fill in the Simulator Sickness Questionnaire (SSQ) after every viewing session. Within this paper, at first SSQ results are presented. Several methods for evaluating head rotation data are presented and discussed. In the course of the study, the collected dataset is published along with the scripts for evaluating the head rotation data. The paper presents the general angular ranges of the subjects' exploration behavior as well as an analysis of the areas where most of the time was spent. The collected information can be presented as head-saliency maps, too. In case of videos, head-saliency data can be used for training saliency models, as information for evaluating decisions during content creation, or as part of streaming solutions for region-of-interest-specific coding as with the latest tile-based streaming solutions, as discussed also in standardization bodies such as MPEG.

...read moreread less

64 citations

Proceedings Article•DOI•

Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients

[...]

Enrique Garcia-Ceja¹, Michael Riegler¹, Petter Jakobsen², Jim Torresen¹, Tine Nordgreen², Ketil J. Oedegaard², Ole Bernt Fasmer³ - Show less +3 more•Institutions (3)

University of Oslo¹, University of Bergen², Haukeland University Hospital³

12 Jun 2018

TL;DR: A unique dataset containing sensor data collected from patients suffering from depression, which contains motor activity recordings of 23 unipolar and bipolar depressed patients and 32 healthy controls is presented.

...read moreread less

Abstract: Wearable sensors measuring different parts of people's activity are a common technology nowadays. In research, data collected using these devices also draws attention. Nevertheless, datasets containing sensor data in the field of medicine are rare. Often, data is non-public and only results are published. This makes it hard for other researchers to reproduce and compare results or even collaborate. In this paper we present a unique dataset containing sensor data collected from patients suffering from depression. The dataset contains motor activity recordings of 23 unipolar and bipolar depressed patients and 32 healthy controls. For each patient we provide sensor data over several days of continuous measuring and also some demographic data. The severity of the patients' depressive state was labeled using ratings done by medical experts on the Montgomery-Asberg Depression Rating Scale (MADRS). In this respect, the here presented dataset can be useful to explore and understand the association between depression and motor activity better. By making this dataset available, we invite and enable interested researchers the possibility to tackle this challenging and important societal problem.

...read moreread less

63 citations

Proceedings Article•DOI•

The prefetch aggressiveness tradeoff in 360° video streaming

[...]

Mathias Almquist¹, Viktor Almquist¹, Vengatanathan Krishnamoorthi¹, Niklas Carlsson¹, Derek L. Eager² - Show less +1 more•Institutions (2)

Linköping University¹, University of Saskatchewan²

12 Jun 2018

TL;DR: This paper presents the first characterization of the prefetch aggressiveness tradeoff, and provides insights into how to best design tomorrow's delivery systems for 360° videos, allowing content providers to reduce bandwidth costs and improve users' playback experiences.

...read moreread less

Abstract: With 360° video, only a limited fraction of the full view is displayed at each point in time. This has prompted the design of streaming delivery techniques that allow alternative playback qualities to be delivered for each candidate viewing direction. However, while prefetching based on the user's expected viewing direction is best done close to playback deadlines, large buffers are needed to protect against shortfalls in future available bandwidth. This results in conflicting goals and an important prefetch aggressiveness tradeoff problem regarding how far ahead in time from the current play-point prefetching should be done. This paper presents the first characterization of this tradeoff. The main contributions include an empirical characterization of head movement behavior based on data from viewing sessions of four different categories of 360° video, an optimization-based comparison of the prefetch aggressiveness tradeoffs seen for these video categories, and a data-driven discussion of further optimizations, which include a novel system design that allows both tradeoff objectives to be targeted simultaneously. By qualitatively and quantitatively analyzing the above tradeoffs, we provide insights into how to best design tomorrow's delivery systems for 360° videos, allowing content providers to reduce bandwidth costs and improve users' playback experiences.

...read moreread less

50 citations

Proceedings Article•DOI•

Multi-codec DASH dataset

[...]

Anatoliy Zabrovskiy¹, Christian Feldmann, Christian Timmerer¹•Institutions (1)

Alpen-Adria-Universität Klagenfurt¹

12 Jun 2018

TL;DR: In this paper, the authors proposed a multi-codec DASH dataset comprising AVC, HEVC, VP9, and AV1 in order to enable interoperability testing and streaming experiments for the efficient usage of these codecs under various conditions.

...read moreread less

Abstract: The number of bandwidth-hungry applications and services is constantly growing. HTTP adaptive streaming of audio-visual content accounts for the majority of today's internet traffic. Although the internet bandwidth increases also constantly, audio-visual compression technology is inevitable and we are currently facing the challenge to be confronted with multiple video codecs.This paper proposes a multi-codec DASH dataset comprising AVC, HEVC, VP9, and AV1 in order to enable interoperability testing and streaming experiments for the efficient usage of these codecs under various conditions. We adopt state of the art encoding and packaging options and also provide basic quality metrics along with the DASH segments. Additionally, we briefly introduce a multi-codec DASH scheme and possible usage scenarios. Finally, we provide a preliminary evaluation of the encoding efficiency in the context of HTTP adaptive streaming services and applications.

...read moreread less

46 citations

Proceedings Article•DOI•

Predicting the performance of virtual reality video streaming in mobile networks

[...]

Roberto Iraja Tavares da Costa Filho¹, Marcelo Caggiani Luizelli, Maria Torres Vega², Jeroen van der Hooft², Stefano Petrangeli², Tim Wauters², Filip De Turck², Luciano Paschoal Gaspary¹ - Show less +4 more•Institutions (2)

Universidade Federal do Rio Grande do Sul¹, Ghent University²

12 Jun 2018

TL;DR: PerCEIVE is a two-stage method for predicting the perceived quality of adaptive VR videos when streamed through mobile networks, using network Quality of Service (QoS) indicators as predictors and employs the predicted VR video playout performance metrics to model and estimate end-user perceived quality.

...read moreread less

Abstract: The demand of Virtual Reality (VR) video streaming to mobile devices is booming, as VR becomes accessible to the general public. However, the variability of conditions of mobile networks affects the perception of this type of high-bandwidth-demanding services in unexpected ways. In this situation, there is a need for novel performance assessment models fit to the new VR applications. In this paper, we present PERCEIVE, a two-stage method for predicting the perceived quality of adaptive VR videos when streamed through mobile networks. By means of machine learning techniques, our approach is able to first predict adaptive VR video playout performance, using network Quality of Service (QoS) indicators as predictors. In a second stage, it employs the predicted VR video playout performance metrics to model and estimate end-user perceived quality. The evaluation of PERCEIVE has been performed considering a real-world environment, in which VR videos are streamed while subjected to LTE/4G network condition. The accuracy of PERCEIVE has been assessed by means of the residual error between predicted and measured values. Our approach predicts the different performance metrics of the VR playout with an average prediction error lower than 3.7% and estimates the perceived quality with a prediction error lower than 4% for over 90% of all the tested cases. Moreover, it allows us to pinpoint the QoS conditions that affect adaptive VR streaming services the most.

...read moreread less

Proceedings Article•DOI•

Virtual reality conferencing: multi-user immersive VR experiences on the web

[...]

Simon Gunkel, Hans Maarten Stokking, Martin Prins, Nanda van der Stap, Frank B. ter Haar, Omar Aziz Niamut - Show less +2 more

12 Jun 2018

TL;DR: This demo paper presents efforts towards Social VR services based on photo-realistic video recordings, and focuses on the communication between multiple people (max 3) and the integration of new media formats to represent users as 3D point clouds.

...read moreread less

Abstract: Virtual Reality (VR) and 360-degree video are set to become part of the future social environment, enriching and enhancing the way we share experiences and collaborate remotely. While Social VR applications are getting more momentum, most services regarding Social VR focus on animated avatars. In this demo, we present our efforts towards Social VR services based on photo-realistic video recordings. In this demo paper, we focus on two parts, the communication between multiple people (max 3) and the integration of new media formats to represent users as 3D point clouds. We enhance a green screen (chroma key) like cut-out of the person with depth data, allowing point cloud based rendering in the client. Further, the paper presents a user study with 54 people evaluating a three-people communication use case and a technical analysis to move towards 3D representations of users. This demo consists of two shared virtual environments to communicate and interact with others, i.e. i) a 360-degree virtual space with users being represented as 2D video streams (with the background removed) and ii) a 3D space with users being represented as point clouds (based on color and depth video data).

...read moreread less

Proceedings Article•DOI•

Classifying flows and buffer state for youtube's HTTP adaptive streaming service in mobile networks

[...]

Dimitrios Tsilimantos¹, Theodoros Karagkioules¹, Stefan Valentin¹•Institutions (1)

Huawei¹

12 Jun 2018

TL;DR: In this paper, a traffic profiling solution is proposed to passively estimate parameters of HTTP Adaptive Streaming (HAS) applications at the lower layers by observing IP packet arrivals and detecting the state of an HAS client's play-back buffer in real time.

...read moreread less

Abstract: Accurate cross-layer information is very useful to optimize mobile networks for specific applications. However, providing application-layer information to lower protocol layers has become very difficult due to the wide adoption of end-to-end encryption and due to the absence of cross-layer signaling standards. As an alternative, this paper presents a traffic profiling solution to passively estimate parameters of HTTP Adaptive Streaming (HAS) applications at the lower layers. By observing IP packet arrivals, our machine learning system identifies video flows and detects the state of an HAS client's play-back buffer in real time. Our experiments with YouTube's mobile client show that Random Forests achieve very high accuracy even with a strong variation of link quality. Since this high performance is achieved at IP level with a small, generic feature set, our approach requires no Deep Packet Inspection (DPI), comes at low complexity, and does not interfere with end-to-end encryption. Traffic profiling is, thus, a powerful new tool for monitoring and managing even encrypted HAS traffic in mobile networks.

...read moreread less

Proceedings Article•DOI•

Lapgyn4: a dataset for 4 automatic content analysis problems in the domain of laparoscopic gynecology

[...]

Andreas Leibetseder¹, Stefan Petscharnig¹, Manfred Jürgen Primus¹, Sabrina Kletz¹, Bernd Münzer¹, Klaus Schoeffmann¹, Jörg Keckstein - Show less +3 more•Institutions (1)

Alpen-Adria-Universität Klagenfurt¹

12 Jun 2018

TL;DR: Four datasets extracted from gynecologic, laparoscopic interventions are published with the intend on encouraging research in the field of post-surgical automatic media analysis, designed with the following use cases in mind: medical image retrieval based on a query image, detection of instrument counts, surgical actions and anatomical structures.

...read moreread less

Abstract: Modern imaging technology enables medical practitioners to perform minimally invasive surgery (MIS), i.e. a variety of medical interventions inflicting minimal trauma upon patients, hence, greatly improving their recoveries. Not only patients but also surgeons can benefit from this technology, as recorded media can be utilized for speeding-up tedious and time-consuming tasks such as treatment planning or case documentation. In order to improve the predominantly manually conducted process of analyzing said media, with this work we publish four datasets extracted from gynecologic, laparoscopic interventions with the intend on encouraging research in the field of post-surgical automatic media analysis. These datasets are designed with the following use cases in mind: medical image retrieval based on a query image, detection of instrument counts, surgical actions and anatomical structures, as well as distinguishing on which anatomical structure a certain action is performed. Furthermore, we provide suggestions for evaluation metrics and first baseline experiments.

...read moreread less

Proceedings Article•DOI•

A canadian french emotional speech dataset

[...]

Philippe Gournay¹, Olivier Lahaie¹, Roch Lefebvre¹•Institutions (1)

Université de Sherbrooke¹

12 Jun 2018

TL;DR: The newly released Canadian French Emotional (CaFE) speech dataset is introduced and details about its design and content are given.

...read moreread less

Abstract: Until recently, there was no emotional speech dataset available in Canadian French. This was a limiting factor for research activities not only in Canada, but also elsewhere. This paper introduces the newly released Canadian French Emotional (CaFE) speech dataset and gives details about its design and content. This dataset contains six different sentences, pronounced by six male and six female actors, in six basic emotions plus one neutral emotion. The six basic emotions are acted in two different intensities. The audio is digitally recorded at high-resolution (192 kHz sampling rate, 24 bits per sample). This new dataset is freely available under a Creative Commons license (CC BY-NC-SA 4.0).

...read moreread less

Proceedings Article•DOI•

Multi-profile ultra high definition (UHD) AVC and HEVC 4K DASH datasets

[...]

Jason J. Quinlan, Cormac J. Sreenan

12 Jun 2018

TL;DR: The primary goal of the dataset is to provide the wide range of video content required for validating DASH Quality of Experience (QoE) delivery over networks, ranging from constrained cellular and satellite systems to future high speed architectures such as the proposed 5G mmwave technology.

...read moreread less

Abstract: In this paper we present a Multi-Profile Ultra High Definition (UHD) DASH dataset composed of both AVC (H.264) and HEVC (H.265) video content, generated from three well known open-source 4K video clips. The representation rates and resolutions of our dataset range from 40Mbps in 4K down to 235kbps in 320x240, and are comparable to rates utilised by on demand services such as Netflix, Youtube and Amazon Prime. We provide our dataset for both realtime testbed evaluation and trace-based simulation. The real-time testbed content provides a means of evaluating DASH adaptation techniques on physical hardware, while our trace-based content offers simulation over frameworks such as ns-2 and ns-3. We also provide the original pre-DASH MP4 files and our associated DASH generation scripts, so as to provide researchers with a mechanism to create their own DASH profile content locally. Which improves the reproducibility of results and remove re-buffering issues caused by delay/jitter/losses in the Internet.The primary goal of our dataset is to provide the wide range of video content required for validating DASH Quality of Experience (QoE) delivery over networks, ranging from constrained cellular and satellite systems to future high speed architectures such as the proposed 5G mmwave technology.

...read moreread less

Proceedings Article•DOI•

Valid.IoT: a framework for sensor data quality analysis and interpolation

[...]

Daniel Kuemper, Thorben Iggena, Ralf Toenjes, Elke Pulvermueller¹•Institutions (1)

University of Osnabrück¹

12 Jun 2018

TL;DR: The Valid.IoT Framework is proposed as an attachable IoT framework component that can be linked to generate QoI vectors and Interpolated sensory data with plausibility and quality estimations to a variety of platforms.

...read moreread less

Abstract: Heterogeneous sensor device networks with diverse maintainers and information collected via social media as well as crowdsourcing tend to be elements of uncertainty in IoT and Smart City networks. Often, there is no ground truth available that can be used to check the plausibility and concordance of the new information. This paper proposes the Valid.IoT Framework as an attachable IoT framework component that can be linked to generate QoI vectors and Interpolated sensory data with plausibility and quality estimations to a variety of platforms. The framework utilises extended infrastructure knowledge and infrastructure-aware interpolation algorithms to validate crowdsourced and device generated sensor information through sensor fusion.

...read moreread less

Proceedings Article•DOI•

A QoE assessment method based on EDA, heart rate and EEG of a virtual reality assistive technology system

[...]

Débora Pereira Salgado¹, Felipe Roque Martins², Thiago Braga Rodrigues¹, Conor Keighrey¹, Ronan Flynn¹, Eduardo Lázaro Martins Naves², Niall Murray¹ - Show less +3 more•Institutions (2)

Athlone Institute of Technology¹, Federal University of Uberlandia²

12 Jun 2018

TL;DR: This demo will capture and present the users EEG, heart Rate, EDA and head motion during the use of AT VR application, composed of the sensor and presentation systems: for acquisition of biological signals constituted by wearable sensors and the virtual wheelchair simulator that interfaces to a typical LCD display.

...read moreread less

Abstract: The1 key aim of various assistive technology (AT) systems is to augment an individual's functioning whilst supporting an enhanced quality of life (QoL). In recent times, we have seen the emergence of Virtual Reality (VR) based assistive technology systems made possible by the availability of commercially available Head Mounted Displays (HMDs). The use of VR for AT aims to support levels of interaction and immersion not previously possibly with more traditional AT solutions. Crucial to the success of these technologies is understanding, from the user perspective, the influencing factors that affect the user Quality of Experience (QoE). In addition to the typical QoE metrics, other factors to consider are human behavior like mental and emotional state, posture and gestures. In terms of trying to objectively quantify such factors, there are wide ranges of wearable sensors that are able to monitor physiological signals and provide reliable data. In this demo, we will capture and present the users EEG, heart Rate, EDA and head motion during the use of AT VR application. The prototype is composed of the sensor and presentation systems: for acquisition of biological signals constituted by wearable sensors and the virtual wheelchair simulator that interfaces to a typical LCD display.

...read moreread less

Proceedings Article•DOI•

Want to play DASH?: a game theoretic approach for adaptive streaming over HTTP

[...]

Abdelhak Bentaleb¹, Ali C. Begen², Saad Harous³, Roger Zimmermann¹•Institutions (3)

National University of Singapore¹, Özyeğin University², United Arab Emirates University³

12 Jun 2018

TL;DR: This paper presents a practical implementation integrated in the dash.js reference player and provides substantial comparisons against the state-of-the-art methods using trace-driven and real-world experiments.

...read moreread less

Abstract: In streaming media, it is imperative to deliver a good viewer experience to preserve customer loyalty. Prior research has shown that this is rather difficult when shared Internet resources struggle to meet the demand from streaming clients that are largely designed to behave in their own self-interest. To date, several schemes for adaptive streaming have been proposed to address this challenge with varying success. In this paper, we take a different approach and develop a game theoretic approach. We present a practical implementation integrated in the dash.js reference player and provide substantial comparisons against the state-of-the-art methods using trace-driven and real-world experiments. Our approach outperforms its competitors in the average viewer experience by 38.5% and in video stability by 62%.

...read moreread less

Proceedings Article•DOI•

Film editing: new levers to improve VR streaming

[...]

Savino Dambra¹, Giuseppe Samela, Lucile Sassatelli, Romaric Pighetti, Ramon Aparicio-Pardo, Anne-Marie Pinna-Déry - Show less +2 more•Institutions (1)

Centre national de la recherche scientifique¹

12 Jun 2018

TL;DR: An MPEG DASH-SRD player for Android and the Samsung Gear VR is built, featuring FoV-based quality decision and a replacement strategy to allow the tiles' buffers to build up while keeping their state up-to-date with the current FoV as much as bandwidth allows.

...read moreread less

Abstract: Streaming Virtual Reality (VR), even under the mere form of 360° videos, is much more complex than for regular videos because to lower the required rates, the transmission decisions must take the user's head position into account. The way the user exploits her/his freedom is therefore crucial for the network load. In turn, the way the user moves depends on the video content itself. VR is however a whole new medium, for which the film-making language does not exist yet, its "grammar" only being invented. We present a strongly inter-disciplinary approach to improve the streaming of 360° videos: designing high-level content manipulations (film editing) to limit and even control the user's motion in order to consume less bandwidth while maintaining the user's experience. We build an MPEG DASH-SRD player for Android and the Samsung Gear VR, featuring FoV-based quality decision and a replacement strategy to allow the tiles' buffers to build up while keeping their state up-to-date with the current FoV as much as bandwidth allows. The editing strategies we design have been integrated within the player, and the streaming module has been extended to benefit from the editing. Two sets of user experiments enabled to show that editing indeed impacts head velocity (reduction of up to 30%), consumed bandwidth (reduction of up to 25%) and subjective assessment. User's attention driving tools from other communities can hence be designed in order to improve streaming. We believe this innovative work opens up the path to a whole new field of possibilities in defining degrees of freedom to be wielded for VR streaming optimization.

...read moreread less

Proceedings Article•DOI•

Cataract-101: video dataset of 101 cataract surgeries

[...]

Klaus Schoeffmann¹, Mario Taschwer¹, Stephanie Sarny², Bernd Münzer¹, Manfred Jürgen Primus¹, Doris Putzgruber - Show less +2 more•Institutions (2)

Alpen-Adria-Universität Klagenfurt¹, Medical University of Graz²

12 Jun 2018

TL;DR: A public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months is provided, providing the basis for experience-based video analytics.

...read moreread less

Abstract: Cataract surgery is one of the most frequently performed microscopic surgeries in the field of ophthalmology. The goal behind this kind of surgery is to replace the human eye lense with an artificial one, an intervention that is often required due to aging. The entire surgery is performed under microscopy, but co-mounted cameras allow to record and archive the procedure. Currently, the recorded videos are used in a postoperative manner for documentation and training. An additional benefit of recording cataract videos is that they enable video analytics (i.e., manual and/or automatic video content analysis) to investigate medically relevant research questions (e.g., the cause of complications). This, however, necessitates a medical multimedia information system trained and evaluated on existing data, which is currently not publicly available. In this work we provide a public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months. These surgeons are grouped into moderately experienced and highly experienced surgeons (assistant vs. senior physicians), providing the basis for experience-based video analytics. All videos have been annotated with quasi-standardized operation phases by a senior ophthalmic surgeon.

...read moreread less

Proceedings Article•DOI•

Cardea: context-aware visual privacy protection for photo taking and sharing

[...]

Jiayu Shu¹, Rui Zheng¹, Pan Hui¹•Institutions (1)

Hong Kong University of Science and Technology¹

12 Jun 2018

TL;DR: This work proposes Cardea, a context-aware visual privacy protection mechanism that protects people's visual privacy in photos according to their privacy preferences, and presents how Cardea can be integrated into privacy-protecting camera apps and online social media and networking sites.

...read moreread less

Abstract: The growing popularity of mobile and wearable devices with built-in cameras and social media sites are now threatening people's visual privacy. Motivated by recent user studies that people's visual privacy concerns are closely related to context, we propose Cardea, a context-aware visual privacy protection mechanism that protects people's visual privacy in photos according to their privacy preferences. We define four context elements in a photo, including location, scene, others' presences, and hand gestures. Users can specify their context-dependent privacy preferences based on the above four elements. Cardea will offer fine-grained visual privacy protection service to those who request protection using their identifiable information. We present how Cardea can be integrated into: a) privacy-protecting camera apps, where captured photos will be processed before being saved locally; and b) online social media and networking sites, where uploaded photos will first be examined to protect individuals' visual privacy, before they become visible to others. Our evaluation results on an implemented prototype demonstrate that Cardea is effective with 86% overall accuracy and is welcomed by users, showing promising future of context-aware visual privacy protection for photo taking and sharing.

...read moreread less

Proceedings Article•DOI•

OpenCV.js: computer vision processing for the open web platform

[...]

Sajjad Taheri¹, Alexander Vedienbaum¹, Alexandru Nicolau¹, Ningxin Hu², Mohammad R. Haghighat² - Show less +1 more•Institutions (2)

University of California¹, Intel²

12 Jun 2018

TL;DR: This work aims to enable web with computer vision by bringing hundreds of OpenCV functions to the open web platform by leveraging OpenCV efficiency, completeness, API maturity, and its communitys collective knowledge.

...read moreread less

Abstract: The Web is the world's most ubiquitous compute platform and the foundation of digital economy. Ever since its birth in early 1990's, web capabilities have been increasing in both quantity and quality. However, in spite of all such progress, computer vision is not mainstream on the web yet. The reasons are historical and include lack of sufficient performance of JavaScript, lack of camera support in the standard web APIs, and lack of comprehensive computer-vision libraries. These problems are about to get solved, resulting in the potential of an immersive and perceptual web with transformational effects including in online shopping, education, and entertainment among others. This work aims to enable web with computer vision by bringing hundreds of OpenCV functions to the open web platform. OpenCV is the most popular computer-vision library with a comprehensive set of vision functions and a large developer community. OpenCV is implemented in C++ and up until now, it was not available in the web browsers without the help of unpopular native plugins. This work leverage OpenCV efficiency, completeness, API maturity, and its communitys collective knowledge. It is provided in a format that is easy for JavaScript engines to highly optimize and has an API that is easy for the web programmers to adopt and develop applications. In addition, OpenCV parallel implementations that target SIMD units and multiprocessors can be ported to equivalent web primitives, providing better performance for real-time and interactive use cases.

...read moreread less

Proceedings Article•DOI•

VideoNOC: assessing video QoE for network operators using passive measurements

[...]

Tarun Mangla¹, Ellen Zegura¹, Mostafa H. Ammar¹, Emir Halepovic², Kyung-Wook Hwang², Rittwik Jana², Marco Platania² - Show less +3 more•Institutions (2)

Georgia Institute of Technology¹, AT&T Labs²

12 Jun 2018

TL;DR: VideoNOC is presented, a prototype of a flexible and scalable platform to infer objective video QoE metrics (e.g., bitrate, rebuffering) for MNOs, and provides valuable insights on a number of design choices by content providers.

...read moreread less

Abstract: Video streaming traffic is rapidly growing in mobile networks. Mobile Network Operators (MNOs) are expected to keep up with this growing demand, while maintaining a high video Quality of Experience (QoE). This makes it critical for MNOs to have a solid understanding of users' video QoE with a goal to help with network planning, provisioning and traffic management. However, designing a system to measure video QoE has several challenges: i) large scale of video traffic data and diversity of video streaming services, ii) cross-layer constraints due to complex cellular network architecture, and iii) extracting QoE metrics from network traffic. In this paper, we present VideoNOC, a prototype of a flexible and scalable platform to infer objective video QoE metrics (e.g., bitrate, rebuffering) for MNOs. We describe the design and architecture of VideoNOC, and outline the methodology to generate a novel data source for fine-grained video QoE monitoring. We then demonstrate some of the use cases of such a monitoring system. VideoNOC reveals video demand across the entire network, provides valuable insights on a number of design choices by content providers (e.g., OS-dependent performance, video player parameters like buffer size, range of encoding bitrates, etc.) and helps analyze the impact of network conditions on video QoE (e.g., mobility and high demand).

...read moreread less

Proceedings Article•DOI•

MMTF-14K: a multifaceted movie trailer feature dataset for recommendation and retrieval

[...]

Yashar Deldjoo¹, Mihai Gabriel Constantin², Bogdan Ionescu², Markus Schedl³, Paolo Cremonesi¹ - Show less +1 more•Institutions (3)

Polytechnic University of Milan¹, Politehnica University of Bucharest², Johannes Kepler University of Linz³

12 Jun 2018

TL;DR: The MMTF-14K multi-faceted dataset is primarily designed for the evaluation of video-based recommender systems, but it also supports the exploration of other multimedia tasks such as popularity prediction, genre classification and auto-tagging.

...read moreread less

Abstract: In this paper we propose a new dataset, i.e., the MMTF-14K multi-faceted dataset. It is primarily designed for the evaluation of video-based recommender systems, but it also supports the exploration of other multimedia tasks such as popularity prediction, genre classification and auto-tagging (aka tag prediction). The data consists of 13,623 Hollywood-type movie trailers, ranked by 138,492 users, generating a total of almost 12.5 million ratings. To address a broader community, metadata, audio and visual descriptors are also pre-computed and provided along with several baseline benchmarking results for uni-modal and multi-modal recommendation systems. This creates a rich collection of data for benchmarking results and which supports future development of this field.

...read moreread less

Proceedings Article•DOI•

Dynamic adaptive streaming for multi-viewpoint omnidirectional videos

[...]

Xavier Corbillon, Francesca De Simone¹, Gwendal Simon, Pascal Frossard²•Institutions (2)

Centrum Wiskunde & Informatica¹, École Polytechnique Fédérale de Lausanne²

12 Jun 2018

TL;DR: This paper studies a multi-viewpoint (MVP) 360-degree video streaming system, where a scene is simultaneously captured by multiple omnidirectional video cameras, and introduces several options for video encoding with existing technologies, such as High Efficiency Video Coding (HEVC) and for the implementation of VP switching.

...read moreread less

Abstract: Full immersion inside a Virtual Reality (VR) scene requires six Degrees of Freedom (6DoF) applications where the user is allowed to perform translational and rotational movements within the virtual space. The implementation of 6DoF applications is however still an open question. In this paper we study a multi-viewpoint (MVP) 360-degree video streaming system, where a scene is simultaneously captured by multiple omnidirectional video cameras. The user can only switch positions to predefined viewpoints (VPs). We focus on the new challenges that are introduced by adaptive MVP 360-degree video streaming. We introduce several options for video encoding with existing technologies, such as High Efficiency Video Coding (HEVC) and for the implementation of VP switching. We model three video-segment download strategies for an adaptive streaming client into Mixed Integer Linear Programming (MILP) problems: an omniscient download scheduler; one where the client proactively downloads all VPs to guarantee fast VP switch; one where the client reacts to the user's navigation pattern. We recorded a one MVP 360-degree video with three VPs, implemented a mobile MVP 360-degree video player, and recorded the viewing patterns of multiple users navigating the content. We solved the adaptive streaming optimization problems on this video considering the collected navigation traces. The results emphasize the gains obtained by using tiles in terms of objective quality of the delivered content. They also emphasize the importance of performing further study on VP switching prediction to reduce the bandwidth consumption and to measure the impact of VP switching delay on the subjective Quality of Experience (QoE).

...read moreread less

Proceedings Article•DOI•

Modeling sensory effects as first-class entities in multimedia applications

[...]

Marina Josué, Raphael Abreu¹, Fábio Barreto, Douglas Paulo de Mattos, Glauco Amorim, Joel A. F. dos Santos¹, Débora C. Muchaluat-Saade - Show less +3 more•Institutions (1)

Centro Federal de Educação Tecnológica de Minas Gerais¹

12 Jun 2018

TL;DR: This paper proposes an approach for modeling sensory effects as first-class entities, enabling multimedia applications to synchronize sensorial media to interactive audiovisual content in a high-level specification, and completes descriptions of mulsemedia applications will be made possible with multimedia models and languages.

...read moreread less

Abstract: Multimedia applications are usually composed by audiovisual content. Traditional multimedia conceptual models, and consequently declarative multimedia authoring languages, do not support the definition of multiple sensory effects. Multiple sensorial media (mulsemedia) applications consider the use of sensory effects that can stimulate touch, smell and taste, in addition to hearing and sight. Therefore, mulsemedia applications have been usually developed using general-purpose programming languages. In order to fill in this gap, this paper proposes an approach for modeling sensory effects as first-class entities, enabling multimedia applications to synchronize sensorial media to interactive audiovisual content in a high-level specification. Thus, complete descriptions of mulsemedia applications will be made possible with multimedia models and languages. In order to validate our ideas, an interactive mulsemedia application example is presented and specified with NCL (Nested Context Language) and Lua. Lua components are used for translating sensory effect high-level attributes to MPEG-V SEM (Sensory Effect Metadata) files. A sensory effect simulator was developed to receive SEM files and simulate mulsemedia application rendering.

...read moreread less

Proceedings Article•DOI•

Foveated streaming of virtual reality videos

[...]

Miguel Fabian Romero-Rondón¹, Lucile Sassatelli¹, Frédéric Precioso¹, Ramon Aparicio-Pardo¹•Institutions (1)

Centre national de la recherche scientifique¹

12 Jun 2018

TL;DR: This work builds on the FOVE's Unity API to design a gaze-adaptive streaming system using one low- and one high-resolution segment from which the foveal region is cropped with per-frame filters, which is permitted in none of the current proposals.

...read moreread less

Abstract: While Virtual Reality (VR) represents a revolution in the user experience, current VR systems are flawed on different aspects. The difficulty to focus naturally in current headsets incurs visual discomfort and cognitive overload, while high-end headsets require tethered powerful hardware for scene synthesis. One of the major solutions envisioned to address these problems is foveated rendering. We consider the problem of streaming stored 360° videos to a VR headset equipped with eye-tracking and foveated rendering capabilities. Our end research goal is to make high-performing foveated streaming systems allowing the playback buffer to build up to absorb the network variations, which is permitted in none of the current proposals. We present our foveated streaming prototype based on the FOVE, one of the first commercially available headsets with an integrated eye-tracker. We build on the FOVE's Unity API to design a gaze-adaptive streaming system using one low- and one high-resolution segment from which the foveal region is cropped with per-frame filters. The low- and high-resolution frames are then merged at the client to approach the natural focusing process.

...read moreread less

Proceedings Article•DOI•

Implementing 360 video tiled streaming system

[...]

Jangwoo Son¹, Dongmin Jang¹, Eun-Seok Ryu¹•Institutions (1)

Gachon University¹

12 Jun 2018

TL;DR: A new viewport dependent streaming method that transmits 360-degree videos using the high efficiency video coding (HEVC) and the scalability extension of HEVC (SHVC) that can reduce the network bandwidth as well as the computational complexity on the decoder side.

...read moreread less

Abstract: The computing power and bandwidth of the current VR are limited when compared to the high-quality VR. To overcome these limits, this study proposes a new viewport dependent streaming method that transmits 360-degree videos using the high efficiency video coding (HEVC) and the scalability extension of HEVC (SHVC). The proposed SHVC and HEVC encoders generate the bitstream that can transmit tiles independently. Therefore, the bitstream generated by the proposed encoder can be extracted in units of tiles. In accordance with what is discussed in the standard, the proposed extractor extracts the bitstream of the tiles corresponding to the viewport. SHVC video bitstream extracted by the proposed methods consist of (i) an SHVC base layer (BL) which represents the entire 360-degree area and (ii) an SHVC enhancement layer (EL) for selective streaming with viewport (region of interest (ROI)) tiles. When the proposed HEVC encoder is used, low and high resolution sequences are separately encoded as the BL and EL of SHVC. By streaming the BL(low resolution) and selective EL(high resolution) tiles with ROI instead of streaming whole high quality 360-degree video, the proposed method can reduce the network bandwidth as well as the computational complexity on the decoder side. Experimental results show more than 47% bandwidth reduction.

...read moreread less

Proceedings Article•DOI•

4G/LTE channel quality reference signal trace data set

[...]

Britta Meixner, Jan Willem Kleinrouweler¹, Pablo Cesar²•Institutions (2)

Centrum Wiskunde & Informatica¹, Delft University of Technology²

12 Jun 2018

TL;DR: This paper provides an Android app to gather further traces as well as R scripts to clean, sort, and analyze the data, which has a larger number of LTE network traces than any publicly available data set.

...read moreread less

Abstract: Mobile networks, especially LTE networks, are used more and more for high-bandwidth services like multimedia or video streams. The quality of the data connection plays a major role in the perceived quality of a service. Videos may be presented in a low quality or experience a lot of stalling events, when the connection is too slow to buffer the next frames for playback. So far, no publicly available data set exists that has a larger number of LTE network traces and can be used for deeper analysis. In this data set, we provide 546 traces of 5 minutes each with a sample rate of 100 ms. Thereof 377 traces are pure LTE data. We furthermore provide an Android app to gather further traces as well as R scripts to clean, sort, and analyze the data.

...read moreread less