scispace - formally typeset
Search or ask a question

Showing papers presented at "ACM SIGMM Conference on Multimedia Systems in 2018"


Proceedings ArticleDOI
12 Jun 2018
TL;DR: This paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to the previous dataset for still images and its associated code is made publicly available to support research on visual attention for 360° content.
Abstract: Research on visual attention in 360° content is crucial to understand how people perceive and interact with this immersive type of content and to develop efficient techniques for processing, encoding, delivering and rendering. And also to offer a high quality of experience to end users. The availability of public datasets is essential to support and facilitate research activities of the community. Recently, some studies have been presented analyzing exploration behaviors of people watching 360° videos, and a few datasets have been published. However, the majority of these works only consider head movements as proxy for gaze data, despite the importance of eye movements in the exploration of omnidirectional content. Thus, this paper presents a novel dataset of 360° videos with associated eye and head movement data, which is a follow-up to our previous dataset for still images [14]. Head and eye tracking data was obtained from 57 participants during a free-viewing experiment with 19 videos. In addition, guidelines on how to obtain saliency maps and scanpaths from raw data are provided. Also, some statistics related to exploration behaviors are presented, such as the impact of the longitudinal starting position when watching omnidirectional videos was investigated in this test. This dataset and its associated code are made publicly available to support research on visual attention for 360° content.

138 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: The purpose of this dataset is to provide additional information (such as competing metrics for users connected to the same cell), thus providing otherwise unavailable information about the eNodeB environment and scheduling principle, to end user.
Abstract: In this paper, we present a 4G trace dataset composed of client-side cellular key performance indicators (KPIs) collected from two major Irish mobile operators, across different mobility patterns (static, pedestrian, car, bus and train). The 4G trace dataset contains 135 traces, with an average duration of fifteen minutes per trace, with viewable throughput ranging from 0 to 173 Mbit/s at a granularity of one sample per second. Our traces are generated from a well-known non-rooted Android network monitoring application, G-NetTrack Pro. This tool enables capturing various channel related KPIs, context-related metrics, downlink and uplink throughput, and also cell-related information. To the best of our knowledge, this is the first publicly available dataset that contains throughput, channel and context information for 4G networks.To supplement our real-time 4G production network dataset, we also provide a synthetic dataset generated from a large-scale 4G ns-3 simulation that includes one hundred users randomly scattered across a seven-cell cluster. The purpose of this dataset is to provide additional information (such as competing metrics for users connected to the same cell), thus providing otherwise unavailable information about the eNodeB environment and scheduling principle, to end user. In addition to this dataset, we also provide the code and context information to allow other researchers to generate their own synthetic datasets.

121 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: Three novel adaptive bitrate algorithms are developed that provide higher QoE to the user in terms of higher bitrate, fewer rebuffers, and lesser bitrate oscillations and perform very well for live streams that require low latency, a challenging scenario for ABR algorithms.
Abstract: Modern video streaming uses adaptive bitrate (ABR) algorithms than run inside video players and continually adjust the quality (i.e., bitrate) of the video segments that are downloaded and rendered to the user. To maximize the quality-of-experience of the user, ABR algorithms must stream at a high bitrate with low rebuffering and low bitrate oscillations. Further, a good ABR algorithm is responsive to user and network events and can be used in demanding scenarios such as low-latency live streaming. Recent research papers provide an abundance of ABR algorithms, but fall short on many of the above real-world requirements.We develop Sabre, an open-source publicly-available simulation tool that enables fast and accurate simulation of adaptive streaming environments. We used Sabre to design and evaluate BOLA-E and DYNAMIC, two novel ABR algorithms. We also developed a FAST SWITCHING algorithm that can replace segments that have already been downloaded with higher-bitrate (thus higher-quality) segments. The new algorithms provide higher QoE to the user in terms of higher bitrate, fewer rebuffers, and lesser bitrate oscillations. In addition, these algorithms react faster to user events such as startup and seek, and respond more quickly to network events such as improvements in throughput. Further, they perform very well for live streams that require low latency, a challenging scenario for ABR algorithms. Overall, our algorithms offer superior video QoE and responsiveness for real-life adaptive video streaming, in comparison to the state-of-the-art. Importantly all three algorithms presented in this paper are now part of the official DASH reference player dash.js and are being used by video providers in production environments. While our evaluation and implementation are focused on the DASH environment, our algorithms are equally applicable to other adaptive streaming formats such as Apple HLS.

99 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: This paper describes an open dataset and software for ITU-T Ree, the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming, and shows the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.
Abstract: This paper describes an open dataset and software for ITU-T Ree. P.1203. As the first standardized Quality of Experience model for audiovisual HTTP Adaptive Streaming (HAS), it has been extensively trained and validated on over a thousand audiovisual sequences containing HAS-typical effects (such as stalling, coding artifacts, quality switches). Our dataset comprises four of the 30 official subjective databases at a bitstream feature level. The paper also includes subjective results and the model performance. Our software for the standard was made available to the public, too, and it is used for all the analyses presented. Among other previously unpublished details, we show the significant performance improvements of using bitstream-based models over metadata-based ones for video quality analysis, and the robustness of combining classical models with machine-learning-based approaches for estimating user QoE.

95 citations


Proceedings ArticleDOI
Liyang Sun1, Fanyi Duanmu1, Yong Liu1, Yao Wang1, Yinghua Ye2, Hang Shi2, David Dai2 
12 Jun 2018
TL;DR: Novel multi-path multi-tier 360° video streaming solutions are developed to simultaneously address the dynamics in both network bandwidth and user viewing direction to achieve a high-level of Quality-of-Experience (QoE) in the challenging 5G wireless network environment.
Abstract: 360° video streaming is a key component of the emerging Virtual Reality (VR) and Augmented Reality (AR) applications. In 360° video streaming, a user may freely navigate through the captured 360° video scene by changing her desired Field-of-View. High-throughput and low-delay data transfers enabled by 5G wireless networks can potentially facilitate untethered 360° video streaming experience. Meanwhile, the high volatility of 5G wireless links present unprecedented challenges for smooth 360° video streaming. In this paper, novel multi-path multi-tier 360° video streaming solutions are developed to simultaneously address the dynamics in both network bandwidth and user viewing direction. We systematically investigate various design trade-offs on streaming quality and robustness. Through simulations driven by real 5G network bandwidth traces and user viewing direction traces, we demonstrate that the proposed 360° video streaming solutions can achieve a high-level of Quality-of-Experience (QoE) in the challenging 5G wireless network environment.

69 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: It is shown that there exists significant latency-throughput trade-offs but the behavior is very complex, and several factors that affect the performance and yield this complex behavior are demonstrated.
Abstract: We study performance characteristics of convolutional neural networks (CNN) for mobile computer vision systems. CNNs have proven to be a powerful and efficient approach to implement such systems. However, the system performance depends largely on the utilization of hardware accelerators, which are able to speed up the execution of the underlying mathematical operations tremendously through massive parallelism. Our contribution is performance characterization of multiple CNN-based models for object recognition and detection with several different hardware platforms and software frameworks, using both local (on-device) and remote (network-side server) computation. The measurements are conducted using real workloads and real processing platforms. On the platform side, we concentrate especially on TensorFlow and TensorRT. Our measurements include embedded processors found on mobile devices and high-performance processors that can be used on the network side of mobile systems. We show that there exists significant latency-throughput trade-offs but the behavior is very complex. We demonstrate and discuss several factors that affect the performance and yield this complex behavior.

68 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: The paper presents the general angular ranges of the subjects' exploration behavior as well as an analysis of the areas where most of the time was spent, and several methods for evaluating head rotation data are presented and discussed.
Abstract: In this paper, we present a viewing test with 48 subjects watching 20 different entertaining omnidirectional videos on an HTC Vive Head Mounted Display (HMD) in a task-free scenario. While the subjects were watching the contents, we recorded their head movements. The obtained dataset is publicly available in addition to the links and timestamps of the source contents used. Within this study, subjects were also asked to fill in the Simulator Sickness Questionnaire (SSQ) after every viewing session. Within this paper, at first SSQ results are presented. Several methods for evaluating head rotation data are presented and discussed. In the course of the study, the collected dataset is published along with the scripts for evaluating the head rotation data. The paper presents the general angular ranges of the subjects' exploration behavior as well as an analysis of the areas where most of the time was spent. The collected information can be presented as head-saliency maps, too. In case of videos, head-saliency data can be used for training saliency models, as information for evaluating decisions during content creation, or as part of streaming solutions for region-of-interest-specific coding as with the latest tile-based streaming solutions, as discussed also in standardization bodies such as MPEG.

64 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: A unique dataset containing sensor data collected from patients suffering from depression, which contains motor activity recordings of 23 unipolar and bipolar depressed patients and 32 healthy controls is presented.
Abstract: Wearable sensors measuring different parts of people's activity are a common technology nowadays. In research, data collected using these devices also draws attention. Nevertheless, datasets containing sensor data in the field of medicine are rare. Often, data is non-public and only results are published. This makes it hard for other researchers to reproduce and compare results or even collaborate. In this paper we present a unique dataset containing sensor data collected from patients suffering from depression. The dataset contains motor activity recordings of 23 unipolar and bipolar depressed patients and 32 healthy controls. For each patient we provide sensor data over several days of continuous measuring and also some demographic data. The severity of the patients' depressive state was labeled using ratings done by medical experts on the Montgomery-Asberg Depression Rating Scale (MADRS). In this respect, the here presented dataset can be useful to explore and understand the association between depression and motor activity better. By making this dataset available, we invite and enable interested researchers the possibility to tackle this challenging and important societal problem.

63 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: This paper presents the first characterization of the prefetch aggressiveness tradeoff, and provides insights into how to best design tomorrow's delivery systems for 360° videos, allowing content providers to reduce bandwidth costs and improve users' playback experiences.
Abstract: With 360° video, only a limited fraction of the full view is displayed at each point in time. This has prompted the design of streaming delivery techniques that allow alternative playback qualities to be delivered for each candidate viewing direction. However, while prefetching based on the user's expected viewing direction is best done close to playback deadlines, large buffers are needed to protect against shortfalls in future available bandwidth. This results in conflicting goals and an important prefetch aggressiveness tradeoff problem regarding how far ahead in time from the current play-point prefetching should be done. This paper presents the first characterization of this tradeoff. The main contributions include an empirical characterization of head movement behavior based on data from viewing sessions of four different categories of 360° video, an optimization-based comparison of the prefetch aggressiveness tradeoffs seen for these video categories, and a data-driven discussion of further optimizations, which include a novel system design that allows both tradeoff objectives to be targeted simultaneously. By qualitatively and quantitatively analyzing the above tradeoffs, we provide insights into how to best design tomorrow's delivery systems for 360° videos, allowing content providers to reduce bandwidth costs and improve users' playback experiences.

50 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: In this paper, the authors proposed a multi-codec DASH dataset comprising AVC, HEVC, VP9, and AV1 in order to enable interoperability testing and streaming experiments for the efficient usage of these codecs under various conditions.
Abstract: The number of bandwidth-hungry applications and services is constantly growing. HTTP adaptive streaming of audio-visual content accounts for the majority of today's internet traffic. Although the internet bandwidth increases also constantly, audio-visual compression technology is inevitable and we are currently facing the challenge to be confronted with multiple video codecs.This paper proposes a multi-codec DASH dataset comprising AVC, HEVC, VP9, and AV1 in order to enable interoperability testing and streaming experiments for the efficient usage of these codecs under various conditions. We adopt state of the art encoding and packaging options and also provide basic quality metrics along with the DASH segments. Additionally, we briefly introduce a multi-codec DASH scheme and possible usage scenarios. Finally, we provide a preliminary evaluation of the encoding efficiency in the context of HTTP adaptive streaming services and applications.

46 citations


Proceedings ArticleDOI
12 Jun 2018
TL;DR: PerCEIVE is a two-stage method for predicting the perceived quality of adaptive VR videos when streamed through mobile networks, using network Quality of Service (QoS) indicators as predictors and employs the predicted VR video playout performance metrics to model and estimate end-user perceived quality.
Abstract: The demand of Virtual Reality (VR) video streaming to mobile devices is booming, as VR becomes accessible to the general public. However, the variability of conditions of mobile networks affects the perception of this type of high-bandwidth-demanding services in unexpected ways. In this situation, there is a need for novel performance assessment models fit to the new VR applications. In this paper, we present PERCEIVE, a two-stage method for predicting the perceived quality of adaptive VR videos when streamed through mobile networks. By means of machine learning techniques, our approach is able to first predict adaptive VR video playout performance, using network Quality of Service (QoS) indicators as predictors. In a second stage, it employs the predicted VR video playout performance metrics to model and estimate end-user perceived quality. The evaluation of PERCEIVE has been performed considering a real-world environment, in which VR videos are streamed while subjected to LTE/4G network condition. The accuracy of PERCEIVE has been assessed by means of the residual error between predicted and measured values. Our approach predicts the different performance metrics of the VR playout with an average prediction error lower than 3.7% and estimates the perceived quality with a prediction error lower than 4% for over 90% of all the tested cases. Moreover, it allows us to pinpoint the QoS conditions that affect adaptive VR streaming services the most.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This demo paper presents efforts towards Social VR services based on photo-realistic video recordings, and focuses on the communication between multiple people (max 3) and the integration of new media formats to represent users as 3D point clouds.
Abstract: Virtual Reality (VR) and 360-degree video are set to become part of the future social environment, enriching and enhancing the way we share experiences and collaborate remotely. While Social VR applications are getting more momentum, most services regarding Social VR focus on animated avatars. In this demo, we present our efforts towards Social VR services based on photo-realistic video recordings. In this demo paper, we focus on two parts, the communication between multiple people (max 3) and the integration of new media formats to represent users as 3D point clouds. We enhance a green screen (chroma key) like cut-out of the person with depth data, allowing point cloud based rendering in the client. Further, the paper presents a user study with 54 people evaluating a three-people communication use case and a technical analysis to move towards 3D representations of users. This demo consists of two shared virtual environments to communicate and interact with others, i.e. i) a 360-degree virtual space with users being represented as 2D video streams (with the background removed) and ii) a 3D space with users being represented as point clouds (based on color and depth video data).

Proceedings ArticleDOI
12 Jun 2018
TL;DR: In this paper, a traffic profiling solution is proposed to passively estimate parameters of HTTP Adaptive Streaming (HAS) applications at the lower layers by observing IP packet arrivals and detecting the state of an HAS client's play-back buffer in real time.
Abstract: Accurate cross-layer information is very useful to optimize mobile networks for specific applications. However, providing application-layer information to lower protocol layers has become very difficult due to the wide adoption of end-to-end encryption and due to the absence of cross-layer signaling standards. As an alternative, this paper presents a traffic profiling solution to passively estimate parameters of HTTP Adaptive Streaming (HAS) applications at the lower layers. By observing IP packet arrivals, our machine learning system identifies video flows and detects the state of an HAS client's play-back buffer in real time. Our experiments with YouTube's mobile client show that Random Forests achieve very high accuracy even with a strong variation of link quality. Since this high performance is achieved at IP level with a small, generic feature set, our approach requires no Deep Packet Inspection (DPI), comes at low complexity, and does not interfere with end-to-end encryption. Traffic profiling is, thus, a powerful new tool for monitoring and managing even encrypted HAS traffic in mobile networks.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: Four datasets extracted from gynecologic, laparoscopic interventions are published with the intend on encouraging research in the field of post-surgical automatic media analysis, designed with the following use cases in mind: medical image retrieval based on a query image, detection of instrument counts, surgical actions and anatomical structures.
Abstract: Modern imaging technology enables medical practitioners to perform minimally invasive surgery (MIS), i.e. a variety of medical interventions inflicting minimal trauma upon patients, hence, greatly improving their recoveries. Not only patients but also surgeons can benefit from this technology, as recorded media can be utilized for speeding-up tedious and time-consuming tasks such as treatment planning or case documentation. In order to improve the predominantly manually conducted process of analyzing said media, with this work we publish four datasets extracted from gynecologic, laparoscopic interventions with the intend on encouraging research in the field of post-surgical automatic media analysis. These datasets are designed with the following use cases in mind: medical image retrieval based on a query image, detection of instrument counts, surgical actions and anatomical structures, as well as distinguishing on which anatomical structure a certain action is performed. Furthermore, we provide suggestions for evaluation metrics and first baseline experiments.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: The newly released Canadian French Emotional (CaFE) speech dataset is introduced and details about its design and content are given.
Abstract: Until recently, there was no emotional speech dataset available in Canadian French. This was a limiting factor for research activities not only in Canada, but also elsewhere. This paper introduces the newly released Canadian French Emotional (CaFE) speech dataset and gives details about its design and content. This dataset contains six different sentences, pronounced by six male and six female actors, in six basic emotions plus one neutral emotion. The six basic emotions are acted in two different intensities. The audio is digitally recorded at high-resolution (192 kHz sampling rate, 24 bits per sample). This new dataset is freely available under a Creative Commons license (CC BY-NC-SA 4.0).

Proceedings ArticleDOI
12 Jun 2018
TL;DR: The primary goal of the dataset is to provide the wide range of video content required for validating DASH Quality of Experience (QoE) delivery over networks, ranging from constrained cellular and satellite systems to future high speed architectures such as the proposed 5G mmwave technology.
Abstract: In this paper we present a Multi-Profile Ultra High Definition (UHD) DASH dataset composed of both AVC (H.264) and HEVC (H.265) video content, generated from three well known open-source 4K video clips. The representation rates and resolutions of our dataset range from 40Mbps in 4K down to 235kbps in 320x240, and are comparable to rates utilised by on demand services such as Netflix, Youtube and Amazon Prime. We provide our dataset for both realtime testbed evaluation and trace-based simulation. The real-time testbed content provides a means of evaluating DASH adaptation techniques on physical hardware, while our trace-based content offers simulation over frameworks such as ns-2 and ns-3. We also provide the original pre-DASH MP4 files and our associated DASH generation scripts, so as to provide researchers with a mechanism to create their own DASH profile content locally. Which improves the reproducibility of results and remove re-buffering issues caused by delay/jitter/losses in the Internet.The primary goal of our dataset is to provide the wide range of video content required for validating DASH Quality of Experience (QoE) delivery over networks, ranging from constrained cellular and satellite systems to future high speed architectures such as the proposed 5G mmwave technology.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: The Valid.IoT Framework is proposed as an attachable IoT framework component that can be linked to generate QoI vectors and Interpolated sensory data with plausibility and quality estimations to a variety of platforms.
Abstract: Heterogeneous sensor device networks with diverse maintainers and information collected via social media as well as crowdsourcing tend to be elements of uncertainty in IoT and Smart City networks. Often, there is no ground truth available that can be used to check the plausibility and concordance of the new information. This paper proposes the Valid.IoT Framework as an attachable IoT framework component that can be linked to generate QoI vectors and Interpolated sensory data with plausibility and quality estimations to a variety of platforms. The framework utilises extended infrastructure knowledge and infrastructure-aware interpolation algorithms to validate crowdsourced and device generated sensor information through sensor fusion.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This demo will capture and present the users EEG, heart Rate, EDA and head motion during the use of AT VR application, composed of the sensor and presentation systems: for acquisition of biological signals constituted by wearable sensors and the virtual wheelchair simulator that interfaces to a typical LCD display.
Abstract: The1 key aim of various assistive technology (AT) systems is to augment an individual's functioning whilst supporting an enhanced quality of life (QoL). In recent times, we have seen the emergence of Virtual Reality (VR) based assistive technology systems made possible by the availability of commercially available Head Mounted Displays (HMDs). The use of VR for AT aims to support levels of interaction and immersion not previously possibly with more traditional AT solutions. Crucial to the success of these technologies is understanding, from the user perspective, the influencing factors that affect the user Quality of Experience (QoE). In addition to the typical QoE metrics, other factors to consider are human behavior like mental and emotional state, posture and gestures. In terms of trying to objectively quantify such factors, there are wide ranges of wearable sensors that are able to monitor physiological signals and provide reliable data. In this demo, we will capture and present the users EEG, heart Rate, EDA and head motion during the use of AT VR application. The prototype is composed of the sensor and presentation systems: for acquisition of biological signals constituted by wearable sensors and the virtual wheelchair simulator that interfaces to a typical LCD display.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This paper presents a practical implementation integrated in the dash.js reference player and provides substantial comparisons against the state-of-the-art methods using trace-driven and real-world experiments.
Abstract: In streaming media, it is imperative to deliver a good viewer experience to preserve customer loyalty. Prior research has shown that this is rather difficult when shared Internet resources struggle to meet the demand from streaming clients that are largely designed to behave in their own self-interest. To date, several schemes for adaptive streaming have been proposed to address this challenge with varying success. In this paper, we take a different approach and develop a game theoretic approach. We present a practical implementation integrated in the dash.js reference player and provide substantial comparisons against the state-of-the-art methods using trace-driven and real-world experiments. Our approach outperforms its competitors in the average viewer experience by 38.5% and in video stability by 62%.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: An MPEG DASH-SRD player for Android and the Samsung Gear VR is built, featuring FoV-based quality decision and a replacement strategy to allow the tiles' buffers to build up while keeping their state up-to-date with the current FoV as much as bandwidth allows.
Abstract: Streaming Virtual Reality (VR), even under the mere form of 360° videos, is much more complex than for regular videos because to lower the required rates, the transmission decisions must take the user's head position into account. The way the user exploits her/his freedom is therefore crucial for the network load. In turn, the way the user moves depends on the video content itself. VR is however a whole new medium, for which the film-making language does not exist yet, its "grammar" only being invented. We present a strongly inter-disciplinary approach to improve the streaming of 360° videos: designing high-level content manipulations (film editing) to limit and even control the user's motion in order to consume less bandwidth while maintaining the user's experience. We build an MPEG DASH-SRD player for Android and the Samsung Gear VR, featuring FoV-based quality decision and a replacement strategy to allow the tiles' buffers to build up while keeping their state up-to-date with the current FoV as much as bandwidth allows. The editing strategies we design have been integrated within the player, and the streaming module has been extended to benefit from the editing. Two sets of user experiments enabled to show that editing indeed impacts head velocity (reduction of up to 30%), consumed bandwidth (reduction of up to 25%) and subjective assessment. User's attention driving tools from other communities can hence be designed in order to improve streaming. We believe this innovative work opens up the path to a whole new field of possibilities in defining degrees of freedom to be wielded for VR streaming optimization.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: A public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months is provided, providing the basis for experience-based video analytics.
Abstract: Cataract surgery is one of the most frequently performed microscopic surgeries in the field of ophthalmology. The goal behind this kind of surgery is to replace the human eye lense with an artificial one, an intervention that is often required due to aging. The entire surgery is performed under microscopy, but co-mounted cameras allow to record and archive the procedure. Currently, the recorded videos are used in a postoperative manner for documentation and training. An additional benefit of recording cataract videos is that they enable video analytics (i.e., manual and/or automatic video content analysis) to investigate medically relevant research questions (e.g., the cause of complications). This, however, necessitates a medical multimedia information system trained and evaluated on existing data, which is currently not publicly available. In this work we provide a public video dataset of 101 cataract surgeries that were performed by four different surgeons over a period of 9 months. These surgeons are grouped into moderately experienced and highly experienced surgeons (assistant vs. senior physicians), providing the basis for experience-based video analytics. All videos have been annotated with quasi-standardized operation phases by a senior ophthalmic surgeon.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This work proposes Cardea, a context-aware visual privacy protection mechanism that protects people's visual privacy in photos according to their privacy preferences, and presents how Cardea can be integrated into privacy-protecting camera apps and online social media and networking sites.
Abstract: The growing popularity of mobile and wearable devices with built-in cameras and social media sites are now threatening people's visual privacy. Motivated by recent user studies that people's visual privacy concerns are closely related to context, we propose Cardea, a context-aware visual privacy protection mechanism that protects people's visual privacy in photos according to their privacy preferences. We define four context elements in a photo, including location, scene, others' presences, and hand gestures. Users can specify their context-dependent privacy preferences based on the above four elements. Cardea will offer fine-grained visual privacy protection service to those who request protection using their identifiable information. We present how Cardea can be integrated into: a) privacy-protecting camera apps, where captured photos will be processed before being saved locally; and b) online social media and networking sites, where uploaded photos will first be examined to protect individuals' visual privacy, before they become visible to others. Our evaluation results on an implemented prototype demonstrate that Cardea is effective with 86% overall accuracy and is welcomed by users, showing promising future of context-aware visual privacy protection for photo taking and sharing.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This work aims to enable web with computer vision by bringing hundreds of OpenCV functions to the open web platform by leveraging OpenCV efficiency, completeness, API maturity, and its communitys collective knowledge.
Abstract: The Web is the world's most ubiquitous compute platform and the foundation of digital economy. Ever since its birth in early 1990's, web capabilities have been increasing in both quantity and quality. However, in spite of all such progress, computer vision is not mainstream on the web yet. The reasons are historical and include lack of sufficient performance of JavaScript, lack of camera support in the standard web APIs, and lack of comprehensive computer-vision libraries. These problems are about to get solved, resulting in the potential of an immersive and perceptual web with transformational effects including in online shopping, education, and entertainment among others. This work aims to enable web with computer vision by bringing hundreds of OpenCV functions to the open web platform. OpenCV is the most popular computer-vision library with a comprehensive set of vision functions and a large developer community. OpenCV is implemented in C++ and up until now, it was not available in the web browsers without the help of unpopular native plugins. This work leverage OpenCV efficiency, completeness, API maturity, and its communitys collective knowledge. It is provided in a format that is easy for JavaScript engines to highly optimize and has an API that is easy for the web programmers to adopt and develop applications. In addition, OpenCV parallel implementations that target SIMD units and multiprocessors can be ported to equivalent web primitives, providing better performance for real-time and interactive use cases.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: VideoNOC is presented, a prototype of a flexible and scalable platform to infer objective video QoE metrics (e.g., bitrate, rebuffering) for MNOs, and provides valuable insights on a number of design choices by content providers.
Abstract: Video streaming traffic is rapidly growing in mobile networks. Mobile Network Operators (MNOs) are expected to keep up with this growing demand, while maintaining a high video Quality of Experience (QoE). This makes it critical for MNOs to have a solid understanding of users' video QoE with a goal to help with network planning, provisioning and traffic management. However, designing a system to measure video QoE has several challenges: i) large scale of video traffic data and diversity of video streaming services, ii) cross-layer constraints due to complex cellular network architecture, and iii) extracting QoE metrics from network traffic. In this paper, we present VideoNOC, a prototype of a flexible and scalable platform to infer objective video QoE metrics (e.g., bitrate, rebuffering) for MNOs. We describe the design and architecture of VideoNOC, and outline the methodology to generate a novel data source for fine-grained video QoE monitoring. We then demonstrate some of the use cases of such a monitoring system. VideoNOC reveals video demand across the entire network, provides valuable insights on a number of design choices by content providers (e.g., OS-dependent performance, video player parameters like buffer size, range of encoding bitrates, etc.) and helps analyze the impact of network conditions on video QoE (e.g., mobility and high demand).

Proceedings ArticleDOI
12 Jun 2018
TL;DR: The MMTF-14K multi-faceted dataset is primarily designed for the evaluation of video-based recommender systems, but it also supports the exploration of other multimedia tasks such as popularity prediction, genre classification and auto-tagging.
Abstract: In this paper we propose a new dataset, i.e., the MMTF-14K multi-faceted dataset. It is primarily designed for the evaluation of video-based recommender systems, but it also supports the exploration of other multimedia tasks such as popularity prediction, genre classification and auto-tagging (aka tag prediction). The data consists of 13,623 Hollywood-type movie trailers, ranked by 138,492 users, generating a total of almost 12.5 million ratings. To address a broader community, metadata, audio and visual descriptors are also pre-computed and provided along with several baseline benchmarking results for uni-modal and multi-modal recommendation systems. This creates a rich collection of data for benchmarking results and which supports future development of this field.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This paper studies a multi-viewpoint (MVP) 360-degree video streaming system, where a scene is simultaneously captured by multiple omnidirectional video cameras, and introduces several options for video encoding with existing technologies, such as High Efficiency Video Coding (HEVC) and for the implementation of VP switching.
Abstract: Full immersion inside a Virtual Reality (VR) scene requires six Degrees of Freedom (6DoF) applications where the user is allowed to perform translational and rotational movements within the virtual space. The implementation of 6DoF applications is however still an open question. In this paper we study a multi-viewpoint (MVP) 360-degree video streaming system, where a scene is simultaneously captured by multiple omnidirectional video cameras. The user can only switch positions to predefined viewpoints (VPs). We focus on the new challenges that are introduced by adaptive MVP 360-degree video streaming. We introduce several options for video encoding with existing technologies, such as High Efficiency Video Coding (HEVC) and for the implementation of VP switching. We model three video-segment download strategies for an adaptive streaming client into Mixed Integer Linear Programming (MILP) problems: an omniscient download scheduler; one where the client proactively downloads all VPs to guarantee fast VP switch; one where the client reacts to the user's navigation pattern. We recorded a one MVP 360-degree video with three VPs, implemented a mobile MVP 360-degree video player, and recorded the viewing patterns of multiple users navigating the content. We solved the adaptive streaming optimization problems on this video considering the collected navigation traces. The results emphasize the gains obtained by using tiles in terms of objective quality of the delivered content. They also emphasize the importance of performing further study on VP switching prediction to reduce the bandwidth consumption and to measure the impact of VP switching delay on the subjective Quality of Experience (QoE).

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This paper proposes an approach for modeling sensory effects as first-class entities, enabling multimedia applications to synchronize sensorial media to interactive audiovisual content in a high-level specification, and completes descriptions of mulsemedia applications will be made possible with multimedia models and languages.
Abstract: Multimedia applications are usually composed by audiovisual content. Traditional multimedia conceptual models, and consequently declarative multimedia authoring languages, do not support the definition of multiple sensory effects. Multiple sensorial media (mulsemedia) applications consider the use of sensory effects that can stimulate touch, smell and taste, in addition to hearing and sight. Therefore, mulsemedia applications have been usually developed using general-purpose programming languages. In order to fill in this gap, this paper proposes an approach for modeling sensory effects as first-class entities, enabling multimedia applications to synchronize sensorial media to interactive audiovisual content in a high-level specification. Thus, complete descriptions of mulsemedia applications will be made possible with multimedia models and languages. In order to validate our ideas, an interactive mulsemedia application example is presented and specified with NCL (Nested Context Language) and Lua. Lua components are used for translating sensory effect high-level attributes to MPEG-V SEM (Sensory Effect Metadata) files. A sensory effect simulator was developed to receive SEM files and simulate mulsemedia application rendering.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This work builds on the FOVE's Unity API to design a gaze-adaptive streaming system using one low- and one high-resolution segment from which the foveal region is cropped with per-frame filters, which is permitted in none of the current proposals.
Abstract: While Virtual Reality (VR) represents a revolution in the user experience, current VR systems are flawed on different aspects. The difficulty to focus naturally in current headsets incurs visual discomfort and cognitive overload, while high-end headsets require tethered powerful hardware for scene synthesis. One of the major solutions envisioned to address these problems is foveated rendering. We consider the problem of streaming stored 360° videos to a VR headset equipped with eye-tracking and foveated rendering capabilities. Our end research goal is to make high-performing foveated streaming systems allowing the playback buffer to build up to absorb the network variations, which is permitted in none of the current proposals. We present our foveated streaming prototype based on the FOVE, one of the first commercially available headsets with an integrated eye-tracker. We build on the FOVE's Unity API to design a gaze-adaptive streaming system using one low- and one high-resolution segment from which the foveal region is cropped with per-frame filters. The low- and high-resolution frames are then merged at the client to approach the natural focusing process.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: A new viewport dependent streaming method that transmits 360-degree videos using the high efficiency video coding (HEVC) and the scalability extension of HEVC (SHVC) that can reduce the network bandwidth as well as the computational complexity on the decoder side.
Abstract: The computing power and bandwidth of the current VR are limited when compared to the high-quality VR. To overcome these limits, this study proposes a new viewport dependent streaming method that transmits 360-degree videos using the high efficiency video coding (HEVC) and the scalability extension of HEVC (SHVC). The proposed SHVC and HEVC encoders generate the bitstream that can transmit tiles independently. Therefore, the bitstream generated by the proposed encoder can be extracted in units of tiles. In accordance with what is discussed in the standard, the proposed extractor extracts the bitstream of the tiles corresponding to the viewport. SHVC video bitstream extracted by the proposed methods consist of (i) an SHVC base layer (BL) which represents the entire 360-degree area and (ii) an SHVC enhancement layer (EL) for selective streaming with viewport (region of interest (ROI)) tiles. When the proposed HEVC encoder is used, low and high resolution sequences are separately encoded as the BL and EL of SHVC. By streaming the BL(low resolution) and selective EL(high resolution) tiles with ROI instead of streaming whole high quality 360-degree video, the proposed method can reduce the network bandwidth as well as the computational complexity on the decoder side. Experimental results show more than 47% bandwidth reduction.

Proceedings ArticleDOI
12 Jun 2018
TL;DR: This paper provides an Android app to gather further traces as well as R scripts to clean, sort, and analyze the data, which has a larger number of LTE network traces than any publicly available data set.
Abstract: Mobile networks, especially LTE networks, are used more and more for high-bandwidth services like multimedia or video streams. The quality of the data connection plays a major role in the perceived quality of a service. Videos may be presented in a low quality or experience a lot of stalling events, when the connection is too slow to buffer the next frames for playback. So far, no publicly available data set exists that has a larger number of LTE network traces and can be used for deeper analysis. In this data set, we provide 546 traces of 5 minutes each with a sample rate of 100 ms. Thereof 377 traces are pure LTE data. We furthermore provide an Android app to gather further traces as well as R scripts to clean, sort, and analyze the data.