scispace - formally typeset
Search or ask a question

Showing papers presented at "ACM SIGMM Conference on Multimedia Systems in 2017"


Proceedings ArticleDOI
20 Jun 2017
TL;DR: KVASIR is a dataset containing images from inside the gastrointestinal (GI) tract that contains two categories of images related to endoscopic polyp removal and is important for research on both single and multi-disease computer aided detection.
Abstract: Automatic detection of diseases by use of computers is an important, but still unexplored field of research. Such innovations may improve medical practice and refine health care systems all over the world. However, datasets containing medical images are hardly available, making reproducibility and comparison of approaches almost impossible. In this paper, we present KVASIR, a dataset containing images from inside the gastrointestinal (GI) tract. The collection of images are classified into three important anatomical landmarks and three clinically significant findings. In addition, it contains two categories of images related to endoscopic polyp removal. Sorting and annotation of the dataset is performed by medical doctors (experienced endoscopists). In this respect, KVASIR is important for research on both single- and multi-disease computer aided detection. By providing it, we invite and enable multimedia researcher into the medical domain of detection and retrieval.

351 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: A dataset of head movements of users watching 360-degree videos on a Head-Mounted Display (HMD) is introduced and some examples of statistics that can be extracted from the collected data, for a content-dependent analysis of users' navigation patterns.
Abstract: While Virtual Reality applications are increasingly attracting the attention of developers and business analysts, the behaviour of users watching 360-degree (ie omnidirectional) videos has not been thoroughly studied yet This paper introduces a dataset of head movements of users watching 360-degree videos on a Head-Mounted Display (HMD) The dataset includes data collected from 59 users watching five 70 s-long 360-degree videos on the Razer OSVR HDK2 HMD The selected videos span a wide range of 360-degree content for which different viewer's involvement, thus navigation patterns, could be expected We describe the open-source software developed to produce the dataset and present the test material and viewing conditions considered during the data acquisition Finally, we show some examples of statistics that can be extracted from the collected data, for a content-dependent analysis of users' navigation patterns The source code of the software used to collect the data has been made publicly available, together with the entire dataset, to enable the community to extend the dataset

238 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: The parameters and characteristics of a dataset for omnidirectional video are proposed and exemplary instantiated to evaluate various aspects of such an ecosystem, namely bitrate overhead, bandwidth requirements, and quality aspects in terms of viewport PSNR.
Abstract: Real-time entertainment services such as streaming audiovisual content deployed over the open, unmanaged Internet account now for more than 70% during peak periods. More and more such bandwidth hungry applications and services are proposed like immersive media services such as virtual reality and, specifically omnidirectional/360-degree videos. The adaptive streaming of omnidirectional video over HTTP imposes an important challenge on today's video delivery infrastructures which calls for dedicated, thoroughly designed techniques for content generation, delivery, and consumption.; AB@This paper describes the usage of tiles --- as specified within modern video codecs such HEVC/H.265 and VP9 --- enabling bandwidth efficient adaptive streaming of omnidirectional video over HTTP and we define various streaming strategies. Therefore, the parameters and characteristics of a dataset for omnidirectional video are proposed and exemplary instantiated to evaluate various aspects of such an ecosystem, namely bitrate overhead, bandwidth requirements, and quality aspects in terms of viewport PSNR. The results indicate bitrate savings from 40% (in a realistic scenario with recorded head movements from real users) up to 65% (in an ideal scenario with a centered/fixed viewport) and serve as a baseline and guidelines for advanced techniques including the outline of a research roadmap for the near future.

194 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: This paper presents datasets of both content data (such as image saliency maps and motion maps derived from 360° videos) and sensor data ( such as viewer head positions and orientations derived from HMD sensors) that can be used to optimize existing 360° video streaming applications and novel applications (like crowd-driven camera movements).
Abstract: 360° videos and Head-Mounted Displays (HMDs) are getting increasingly popular. However, streaming 360° videos to HMDs is challenging. This is because only video content in viewers' Field-of-Views (FoVs) is rendered, and thus sending complete 360° videos wastes resources, including network bandwidth, storage space, and processing power. Optimizing the 360° video streaming to HMDs is, however, highly data and viewer dependent, and thus dictates real datasets. However, to our best knowledge, such datasets are not available in the literature. In this paper, we present our datasets of both content data (such as image saliency maps and motion maps derived from 360° videos) and sensor data (such as viewer head positions and orientations derived from HMD sensors). We put extra efforts to align the content and sensor data using the timestamps in the raw log files. The resulting datasets can be used by researchers, engineers, and hobbyists to either optimize existing 360° video streaming applications (like rate-distortion optimization) and novel applications (like crowd-driven camera movements). We believe that our dataset will stimulate more research activities along this exciting new research direction.

193 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: This paper presents a head tracking dataset composed of 48 users watching 18 sphere videos from 5 categories, and shows that people share certain common patterns in VR spherical video streaming, which are different from conventional video streaming.
Abstract: With Virtual Reality (VR) devices and content getting increasingly popular, understanding user behaviors in virtual environment is important for not only VR product design but also user experience improvement. In VR applications, the head movement is one of the most important user behaviors, which can reflect a user's visual attention, preference, and even unique motion pattern. However, to the best of our knowledge, no dataset containing this information is publicly available. In this paper, we present a head tracking dataset composed of 48 users (24 males and 24 females) watching 18 sphere videos from 5 categories. We carefully record how users watch the videos, how their heads move in each session, what directions they focus, and what content they can remember after each session. Based on this dataset, we show that people share certain common patterns in VR spherical video streaming, which are different from conventional video streaming. We believe the dataset can serve good resource for exploring user behavior patterns in VR applications.

191 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: A dataset of sixty different 360 degree images, each watched by at-least 40 observers is presented and guidelines and tools regarding the procedure to evaluate and compare saliency in omni-directional images are provided.
Abstract: Understanding how observers watch visual stimuli like Images and Videos has helped the multimedia encoding, transmission, quality assessment and rendering communities immensely, to learn the regions important to an observer and provide to him/her an optimum quality of experience. The problem is even more paramount in case of 360 degree stimuli considering that most/a part of the content might not be seen by the observers at all, while other regions maybe extraordinarily important. Attention studies in this area has however been missing, mainly due to the lack of a dataset and guidelines to evaluate and compare visual attention/saliency in such scenarios. In this work, we present a dataset of sixty different 360 degree images, each watched by at-least 40 observers. Additionally, we also provide guidelines and tools to the community regarding the procedure to evaluate and compare saliency in omni-directional images. Some basic image/ observer agnostic viewing characteristics, like variation of exploration strategies with time and expertise, and also the effect of eye-movement within the view-port are explored. The dataset and tools are made available for free use by the community and is expected to promote Reproducible Research for all future work on computational modeling of attention in 360 scenarios.

167 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: OpenFace as mentioned in this paper is a face recognition system that combines inter-frame tracking and face denaturing for live video streams that selectively blurs faces according to specified policies at full frame rates.
Abstract: We present OpenFace, our new open-source face recognition system that approaches state-of-the-art accuracy. Integrating OpenFace with inter-frame tracking, we build RTFace, a mechanism for denaturing video streams that selectively blurs faces according to specified policies at full frame rates. This enables privacy management for live video analytics while providing a secure approach for handling retrospective policy exceptions. Finally, we present a scalable, privacy-aware architecture for large camera networks using RTFace.

103 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: This paper reviews standard approaches toward 360 degree video encoding and compares these to a new, as yet unpublished, approach by Oculus which is referred to as the offset cubic projection, which can produce better or similar visual quality while using less than 50% pixels under reasonable assumptions about user behavior.
Abstract: 360 degree video is anew generation of video streaming technology that promises greater immersiveness than standard video streams. This level of immersiveness is similar to that produced by virtual reality devices -- users can control the field of view using head movements rather than needing to manipulate external devices. Although 360 degree video could revolutionize streaming technology, large scale adoption is hindered by a number of factors. 360 degree video streams have larger bandwidth requirements, require faster responsiveness to user inputs, and users may be more sensitive to lower quality streams.; AB@In this paper, we review standard approaches toward 360 degree video encoding and compare these to a new, as yet unpublished, approach by Oculus which we refer to as the offset cubic projection. Compared to the standard cubic encoding, the offset cube encodes a distorted version of the spherical surface, devoting more information (i.e., pixels) to the view in a chosen direction. We estimate that the offset cube representation can produce better or similar visual quality while using less than 50% pixels under reasonable assumptions about user behavior, resulting in 5.6% to 16.4% average savings in video bitrate. During 360 degree video streaming, Oculus uses a combination of quality level adaptation and view orientation adaptation. We estimate that this combination of streaming adaptation in two dimensions can cause over 57% extra segments to be downloaded compared to an ideal downloading strategy, wasting 20% of the total downloading bandwidth.

103 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: Nerthus, a dataset containing videos from inside the gastrointestinal (GI) tract, showing different degrees of bowel cleansing, is presented and invited multimedia researchers to contribute in the medical field by making systems automatically evaluate the quality of bowel cleaning for colonoscopy.
Abstract: Bowel preparation (cleansing) is considered to be a key precondition for successful colonoscopy (endoscopic examination of the bowel). The degree of bowel cleansing directly affects the possibility to detect diseases and may influence decisions on screening and follow-up examination intervals. An accurate assessment of bowel preparation quality is therefore important. Despite the use of reliable and validated bowel preparation scales, the grading may vary from one doctor to another. An objective and automated assessment of bowel cleansing would contribute to reduce such inequalities and optimize use of medical resources. This would also be a valuable feature for automatic endoscopy reporting in the future. In this paper, we present Nerthus, a dataset containing videos from inside the gastrointestinal (GI) tract, showing different degrees of bowel cleansing. By providing this dataset, we invite multimedia researchers to contribute in the medical field by making systems automatically evaluate the quality of bowel cleansing for colonoscopy. Such innovations would probably contribute to improve the medical field of GI endoscopy.

66 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: BUFFEST, a novel classification framework that can be used to classify and predict streaming clients' buffer conditions from both HTTP and HTTPS traffic, is presented and results are encouraging and show that BUFFEST can distinguish streaming clients with lowbuffer conditions from clients with significant buffer margin during a session even when HTTPS is used.
Abstract: Stalls during video playback are perhaps the most important indicator of a client's viewing experience. To provide the best possible service, a proactive network operator may therefore want to know the buffer conditions of streaming clients and use this information to help avoid stalls due to empty buffers. However, estimation of clients' buffer conditions is complicated by most streaming services being rate-adaptive, and many of them also encrypted. Rate adaptation reduces the correlation between network throughput and client buffer conditions. Usage of HTTPS prevents operators from observing information related to video chunk requests, such as indications of rate adaptation or other HTTP-level information.This paper presents BUFFEST, a novel classification framework that can be used to classify and predict streaming clients' buffer conditions from both HTTP and HTTPS traffic. To illustrate the tradeoffs between prediction accuracy and the available information used by classifiers, we design and evaluate classifiers of different complexity. At the core of BUFFEST is an event-based buffer emulator module for detailed analysis of clients' buffer levels throughout a streaming session, as well as for automated training and evaluation of online packet-level classifiers. We then present example results using simple threshold-based classifiers and machine learning classifiers that only use TCP/IP packet-level information. Our results are encouraging and show that BUFFEST can distinguish streaming clients with low buffer conditions from clients with significant buffer margin during a session even when HTTPS is used.

57 citations


Proceedings ArticleDOI
20 Jun 2017
TL;DR: Results from an European Horizon 2020 research project on the impact of multisensoral media (mulsemedia) on educational learner experience show significant improvements in terms of mulsemedia-enhanced teaching.
Abstract: In recent years, the emerging immersive technologies (e.g. Virtual/Augmented Reality, multisensorial media) bring brand-new multi-dimensional effects such as 3D vision, immersion, vibration, smell, airflow, etc. to gaming, video entertainment and other aspects of human life. This paper reports results from an European Horizon 2020 research project on the impact of multisensoral media (mulsemedia) on educational learner experience. A mulsemedia-enhanced test-bed was developed to perform delivery of video content enhanced with haptic, olfaction and airflow effects. The results of the quality rating and questionnaires show significant improvements in terms of mulsemedia-enhanced teaching.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: Measurements show the substantial improvement in cache hitrates in conjunction with SABR indicating a rich design space for jointly optimized SDN-assisted caching architectures for video streaming applications.
Abstract: State-of-the-art Software Defined Wide Area Networks (SD-WANs) provide the foundation for flexible and highly resilient networking. In this work we design, implement and evaluate a novel architecture (denoted SABR) that leverages the benefits of SDN to provide network assisted Adaptive Bitrate Streaming. With clients retaining full control of their streaming algorithms we clearly show that by this network assistance, both the clients and the content providers benefit significantly in terms of QoE and content origin offloading. SABR utilizes information on available bandwidths per link and network cache contents to guide video streaming clients with the goal of improving the viewer's QoE. In addition, SABR uses SDN capabilities to dynamically program flows to optimize the utilization of CDN caches.; AB@Backed by our study of SDN assisted streaming we discuss the change in the requirements for network-to-player APIs that enables flexible video streaming. We illustrate the difficulty of the problem and the impact of SDN-assisted streaming on QoE metrics using various well established player algorithms. We evaluate SABR together with state-of-the-art DASH quality adaptation algorithms through a series of experiments performed on a real-world, SDN-enabled testbed network with minimal modifications to an existing DASH client. Our measurements show the substantial improvement in cache hitrates in conjunction with SABR indicating a rich design space for jointly optimized SDN-assisted caching architectures for video streaming applications.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: Under limited bandwidth conditions, this work demonstrates how the framework can improve the quality watched by the user compared to a non-tiled solution where all of the video is streamed at the same quality.
Abstract: The demand for 360° Virtual Reality (VR) videos is expected to grow in the near future, thanks to the diffusion of VR headsets. VR Streaming is however challenged by the high bandwidth requirements of 360° videos. To save bandwidth, we spatially tile the video using the H.265 standard and stream only tiles in view at the highest quality. The video is also temporally segmented, so that each temporal segment is composed of several spatial tiles. In order to minimize quality transitions when the user moves, an algorithm is developed to predict where the user is likely going to watch in the near future. Consequently, predicted tiles are also streamed at the highest quality. Finally, the server push in HTTP/2 is used to deliver the tiled video. Only one request is sent from the client; all the tiles of a segment are automatically pushed from the server. This approach results in a better bandwidth utilization and video quality compared to traditional streaming over HTTP/1.1, where each tile has to be requested independently by the client. We showcase the benefits of our framework using a prototype developed on a Samsung Galaxy S7 and a Gear VR, which supports both tiled and non-tiled videos and streaming over HTTP/1.1 and HTTP/2. Under limited bandwidth conditions, we demonstrate how our framework can improve the quality watched by the user compared to a non-tiled solution where all of the video is streamed at the same quality. This result represents a major improvement for the efficient streaming of VR videos.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: The expected latency reduction by near future technological development is studied and it is shown that its potential impact is bigger on the end-to-end latency than that of replication of the service and server placement optimization.
Abstract: Cloud gaming is a relatively new paradigm in which the game is rendered in the cloud and is streamed to an end-user device through a thin client. Latency is a key challenge for cloud gaming. In order to optimize the end-to-end latency, it is first necessary to understand how the end-to-end latency builds up from the mobile device to the cloud gaming server. In this paper we dissect the delays occurring in the mobile device and measure access delays in various networks and network conditions. We also perform a Europe-wide latency measurement study to find the optimal server locations and see how the number of server locations affects the network delay. The results are compared to limits found for perceivable delays in recent human-computer interaction studies. We show that the limits can be achieved only with the latest mobile devices with specific control methods. In addition, we study the expected latency reduction by near future technological development and show that its potential impact is bigger on the end-to-end latency than that of replication of the service and server placement optimization.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: To the best of the knowledge, DroneFace is the only dataset including facial images taken from controlled distances and heights within unconstrained environment, and can be valuable for future study of integrating face recognition techniques onto drones.
Abstract: In this paper, we present DroneFace, an open dataset for testing how well face recognition can work on drones. Because of the high mobility, drones, i.e. unmanned aerial vehicles (UAVs), are appropriate for surveillance, daily patrol or seeking lost people on the streets, and thus need the capability of tracking human targets' faces from the air. Under this context, drones' distances and heights from the targets influence the accuracy of face recognition. In order to test whether a face recognition technique is suitable for drones, we establish DroneFace composed of facial images taken from various combinations of distances and heights for evaluating how a face recognition technique works in recognizing designated faces from the air. Since Face recognition is one of the most successful application in image analysis and understanding, and there exist many face recognition database for various purposes. To the best of our knowledge, DroneFace is the only dataset including facial images taken from controlled distances and heights within unconstrained environment, and can be valuable for future study of integrating face recognition techniques onto drones.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: AdViSE, the Adaptive Video Streaming Evaluation framework for the automated testing of adaptive media players is introduced and the real-time capabilities of the framework and offline analysis including several QoE metrics with respect to a newly introduced bandwidth index are demonstrated.
Abstract: Today we can observe a plethora of adaptive video streaming services and media players which support interoperable formats like DASH and HLS. Most of the players and their rate adaptation algorithms work as a black box. We have developed a system for easy and rapid testing of media players under various network scenarios. In this paper, we introduce AdViSE, the Adaptive Video Streaming Evaluation framework for the automated testing of adaptive media players. The presented framework is used for the comparison and testing of media players in the context of adaptive video streaming over HTTP in web/HTML5 environments.; AB@The demonstration showcases a series of experiments with different media players under given context conditions (e.g., network shaping, delivery format). We will also demonstrate the real-time capabilities of the framework and offline analysis including several QoE metrics with respect to a newly introduced bandwidth index.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: Hyperion a Wearable Augmented Reality (WAR) system based on Google Glass to access text information in the ambient environment is developed and system experiments show that Hyperion improves users ability to be aware of text information around them.
Abstract: We develop Hyperion a Wearable Augmented Reality (WAR) system based on Google Glass to access text information in the ambient environment. Hyperion is able to retrieve text content from users' current view and deliver the content to them in different ways according to their context. We design four work modalities for different situations that mobile users encounter in their daily activities. In addition, user interaction interfaces are provided to adapt to different application scenarios. Although Google Glass may be constrained by its poor computational capabilities and its limited battery capacity, we utilize code-level offloading to companion mobile devices to improve the runtime performance and the sustainability of WAR applications. System experiments show that Hyperion improves users ability to be aware of text information around them. Our prototype indicates promising potential of converging WAR technology and wearable devices such as Google Glass to improve people's daily activities.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: It is shown that with careful design a thin client is capable of offloading much of the AR processing to a cloud server, with the results being streamed back, and show substantial energy savings, low latency and excellent image quality even at relatively low bit-rates.
Abstract: Combining advanced sensors and powerful processing capabilities smart-phone based augmented reality (AR) is becoming increasingly prolific. The increase in prominence of these resource hungry AR applications poses significant challenges to energy constrained environments such as mobile-phones.; AB@To that end we present a platform for offloading AR applications to powerful cloud servers. We implement this system using a thin-client design and explore its performance using the real world application Pokemon Go as a case study. We show that with careful design a thin client is capable of offloading much of the AR processing to a cloud server, with the results being streamed back. Our initial experiments show substantial energy savings, low latency and excellent image quality even at relatively low bit-rates.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: The experimental evaluation proves that the multimedia system presented has detection and localisation accuracy at least as good as existing systems for polyp detection, it is capable of detecting a wider range of diseases, it can analyze video in real-time, and it has a low resource consumption for scalability.
Abstract: Analysis of medical videos for detection of abnormalities and diseases requires both high precision and recall, but also real-time processing for live feedback and scalability for massive screening of entire populations. Existing work on this field does not provide the necessary combination of retrieval accuracy and performance.; AB@In this paper, a multimedia system is presented where the aim is to tackle automatic analysis of videos from the human gastrointestinal (GI) tract. The system includes the whole pipeline from data collection, processing and analysis, to visualization. The system combines filters using machine learning, image recognition and extraction of global and local image features. Furthermore, it is built in a modular way so that it can easily be extended. At the same time, it is developed for efficient processing in order to provide real-time feedback to the doctors. Our experimental evaluation proves that our system has detection and localisation accuracy at least as good as existing systems for polyp detection, it is capable of detecting a wider range of diseases, it can analyze video in real-time, and it has a low resource consumption for scalability.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: This paper presents SAP as a DASH video traffic management solution that reduces playback stalls and seeks to maintain a consistent QoE for cellular users, even those with diverse channel conditions, by leveraging both network and client state information to optimize the pacing of individual video flows.
Abstract: The dramatic growth of cellular video traffic represents a practical challenge for cellular network operators in providing a consistent streaming Quality of Experience (QoE) to their users. Satisfying this objective has so-far proved elusive, due to the inherent system complexities that degrade streaming performance, such as variability in both video bitrate and network conditions. In this paper, we present SAP as a DASH video traffic management solution that reduces playback stalls and seeks to maintain a consistent QoE for cellular users, even those with diverse channel conditions. SAP achieves this by leveraging both network and client state information to optimize the pacing of individual video flows. We extensively evaluate SAP performance using real video content and clients, operating over a simulated LTE network. We implement state-of-the-art client adaptation and traffic management strategies for direct comparison. Our results, using a heavily loaded base station, show that SAP reduces the number of stalls and the average stall duration per session by up to 95%. Additionally, SAP ensures that clients with good channel conditions do not dominate available wireless resources, evidenced by a reduction of up to 40% in the standard deviation of the QoE metric.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: This work proposes to automatically generate device fingerprints based on webpages embedded in surveillance devices based on natural language processing and machine learning to build a classification model and achieves real-time and non-intrusive web crawling by leveraging network scanning technology.
Abstract: Surveillance devices with IP addresses are accessible on the Internet and play a crucial role in monitoring physical worlds. Discovering surveillance devices is a prerequisite for ensuring high availability, reliability, and security of these devices. However, today's device search depends on keywords of packet head fields, and keyword collection is done manually, which requires enormous human efforts and induces inevitable human errors. The difficulty of keeping keywords complete and updated has severely impeded an accurate and large-scale device discovery. To address this problem, we propose to automatically generate device fingerprints based on webpages embedded in surveillance devices. We use natural language processing to extract the content of webpages and machine learning to build a classification model. We achieve real-time and non-intrusive web crawling by leveraging network scanning technology. We implement a prototype of our proposed discovery system and evaluate its effectiveness through real-world experiments. The experimental results show that those automatically generated fingerprints yield very high accuracy of 99% precision and 96% recall. We also deploy the prototype system on Amazon EC2 and search surveillance devices in the whole IPv4 space (nearly 4 billion). The number of devices we found is almost 1.6 million, about twice as many as those using commercial search engines.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: This paper develops several base and fusion recommender systems that generate in real-time a short list of channels for users to consider whenever they want to switch channels, and can be easily adopted by IPTV systems with low data and computation overheads.
Abstract: Compared with the traditional television services, Internet Protocol TV (IPTV) can provide far more TV channels to end users. However, it may also make users feel confused even painful to find channels of their interests from a large number of them. In this paper, using a large IPTV trace, we analyze user channel-switching behaviors to understand when, why and how they switch channels. Based on user behavior analysis, we develop several base and fusion recommender systems that generate in real-time a short list of channels for users to consider whenever they want to switch channels. Evaluation on the IPTV trace demonstrates that our recommender systems can achieve up to 45 percent hit ratio with only three candidate channels. Our recommender systems only need access to user channel watching sequences, and can be easily adopted by IPTV systems with low data and computation overheads.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: The engineering process for a prototype of a (near) web-scale multimedia service using the Spark framework running on the AWS cloud service is described, and experimental results using up to 43 billion SIFT feature vectors from the public YFCC 100M collection are presented, making this the largest high-dimensional feature vector collection reported in the literature.
Abstract: Computing power has now become abundant with multi-core machines, grids and clouds, but it remains a challenge to harness the available power and move towards gracefully handling web-scale datasets. Several researchers have used automatically distributed computing frameworks, notably Hadoop and Spark, for processing multimedia material, but mostly using small collections on small clusters. In this paper, we describe the engineering process for a prototype of a (near) web-scale multimedia service using the Spark framework running on the AWS cloud service. We present experimental results using up to 43 billion SIFT feature vectors from the public YFCC 100M collection, making this the largest high-dimensional feature vector collection reported in the literature. The design of the prototype and performance results demonstrate both the flexibility and scalability of the Spark framework for implementing multimedia services.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: In this paper, the authors present a geo-communication dataset from extensive profiling of 4 major US mobile carriers in Illinois, from the rural location of Hoopeston to the central referral hospital center at Urbana.
Abstract: Use of telecommunication technologies for remote, continuous monitoring of patients can enhance effectiveness of emergency ambulance care during transport from rural areas to a regional center hospital. However, the communication along the various routes in rural areas may have wide bandwidth ranges from 2G to 4G; some regions may have only lower satellite bandwidth available. Bandwidth fluctuation together with real-time communication of various clinical multimedia pose a major challenge during rural patient ambulance transport.; AB@The availability of a pre-transport route-dependent communication bandwidth database is an important resource in remote monitoring and clinical multimedia transmission in rural ambulance transport. Here, we present a geo-communication dataset from extensive profiling of 4 major US mobile carriers in Illinois, from the rural location of Hoopeston to the central referral hospital center at Urbana. In collaboration with Carle Foundation Hospital, we developed a profiler, and collected various geographical and communication traces for realistic emergency rural ambulance transport scenarios. Our dataset is to support our ongoing work of proposing "physiology-aware DASH", which is particularly useful for adaptive remote monitoring of critically ill patients in emergency rural ambulance transport. It provides insights on ensuring higher Quality of Service (QoS) for most critical clinical multimedia in response to changes in patients' physiological states and bandwidth conditions. Our dataset is available online1 for research community.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: A time evaluation of the integration between a distributed mulsemedia platform called PlaySEM and an interactive application whereby users interact by gestures is presented, in order to discover how long this process takes.
Abstract: Mulsemedia applications have become increasingly popular. There have been many efforts to increase the Quality of Experience (QoE) of users by using them. From the users' perspective, it is crucial that systems produce high levels of enjoyment and utility. Thus, many experimental tools have been developed and applied to different purposes such as entertainment, health, and culture. Despite that, little attention is paid to the evaluation of mulsemedia tools and platforms. In this paper, we present a time evaluation of the integration between a distributed mulsemedia platform called PlaySEM and an interactive application whereby users interact by gestures, in order to discover how long this process takes. We describe the test scenario and our approach for measuring this integration. Then, we discuss the results and point out aspects that bring implications to be taken into account for future similar solutions. The results showed values in the range of 27ms to 67ms on average spent throughout the process before the effective activation of the sensory effect devices on a wired network.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: Two tools are developed that allow domain scientists to generate a set of connected, 360° video paths for traversing between dimensional keyframes in the dataset and a corresponding navigational interface is a video selection and playback tool that can be paired with a low-cost HMD to enable an interactive, non-linear, storytelling experience.
Abstract: Immersive, stereoscopic visualization enables scientists to better analyze structural and physical phenomena compared to traditional display mediums. Unfortunately, current head-mounted displays (HMDs) with the high rendering quality necessary for these complex datasets are prohibitively expensive, especially in educational settings where their high cost makes it impractical to buy several devices. To address this problem, we develop two tools: (1) An authoring tool allows domain scientists to generate a set of connected, 360° video paths for traversing between dimensional keyframes in the dataset. (2) A corresponding navigational interface is a video selection and playback tool that can be paired with a low-cost HMD to enable an interactive, non-linear, storytelling experience. We demonstrate the authoring tool's utility by conducting several case studies and assess the navigational interface with a usability study. Results show the potential of our approach in effectively expanding the accessibility of high-quality, immersive visualization to a wider audience using affordable HMDs.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: The CWI-ADE2016 Dataset is a collection of more than 40 million Bluetooth Low Energy packets and of 14 million accelerometer and temperature samples generated by wristbands that people wore in a nightclub that provides a full picture of the performance of the real life deployment of a sensing infrastructure.
Abstract: The CWI-ADE2016 Dataset is a collection of more than 40 million Bluetooth Low Energy (BLE) packets and of 14 million accelerometer and temperature samples generated by wristbands that people wore in a nightclub. The data was gathered during Amsterdam Dance Event 2016 in an exclusive club experience curated around human senses, which leveraged technology as a bridge between the club and the guests. Each guest was handed a custom-made wristband with a BLE-enabled device that broadcast movement, temperature and other sensor readings. A network of Raspberry Pi receivers deployed for the occasion captured broadcast packets from wristbands and any other BLE device in the environment. This data provides a full picture of the performance of the real life deployment of a sensing infrastructure and gives insights to designing sensing platforms, understanding networks and crowds behaviour or studying opportunistic sensing. This paper describes an analysis of this dataset and some examples of usage.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: The proposed method is able to simulate the haptic-enabled deformation of the 3D fusion surface and provides a novel haptic interaction for virtual reality and 3D tele-immersive applications.
Abstract: In recent years, many researches are focusing on the haptic interaction with streaming data like RGBD video / point cloud stream captured by commodity depth sensors. Most previous methods use partial streaming data from depth sensors and only investigate haptic rendering of the rigid surface without complex physics simulation. Many virtual reality and tele-immersive applications such as medical training, and art designing require the complete scene and physics simulation. In this paper, we propose a stable haptic rendering method capable of interacting with streaming deformable surface in real-time. Our method applies KinectFusion for real-time reconstruction of real-world object surface instead of incomplete surface. While construction, it simultaneously uses hierarchical shape matching (HSM) method to simulate the surface deformation in haptic-enabled interaction. We have demonstrated how to combine the fusion and physics simulation of deformation together, and proposed a continuous collision detection method based on Truncated Signed Distance Function (TSDF). Furthermore, we propose a fast TSDF warping method to update the deformation to TSDF, and a proxy finding method to find the proxy position. The proposed method is able to simulate the haptic-enabled deformation of the 3D fusion surface. Therefore it provides a novel haptic interaction for virtual reality and 3D tele-immersive applications. Experimental results show that the proposed approach provides stable haptic rendering and fast simulation of 3D deformable surface.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: This work focuses on studying the potential of proactively caching content of this particular category using a YouTube trace containing over 4 million music video user sessions, and proposes a novel trace-based evaluation methodology for music-specific proactive in-network caching.
Abstract: The preferred channel for listening to music is shifting towards the Internet and especially to mobile networks. Here, the overall traffic is predicted to grow by 45% annually till 2021. However, the resulting increase in network traffic challenges mobile operators. As a result, methods are researched to decrease costly transit traffic and the traffic load inside operator networks using in-network and client-side caching. Additionally to traditional reactive caching, recent works show that proactive caching increases cache efficiency. Thus, in this work, a mobile network using proactive caching is assumed. As music represents the most popular content category on YouTube, this work focuses on studying the potential of proactively caching content of this particular category using a YouTube trace containing over 4 million music video user sessions. The contribution of this work is threefold: First, music content-specific user behavior is derived and audio features of the content are analyzed. Second, using these audio features, genre and mood classifiers are compared in order to guide the design of new proactive caching policies. Third, a novel trace-based evaluation methodology for music-specific proactive in-network caching is proposed and used to evaluate novel proactive caching policies to serve either an aggregate of users or individual clients.

Proceedings ArticleDOI
20 Jun 2017
TL;DR: A real-time smartphone application, which can not only recognize easily-confused herb based on Convolutional Neural Network (CNN), but also provide relevant information about the detected herbs.
Abstract: Chinese herbal medicine (CHM) plays an important role of treatment in traditional Chinese medicine (TCM). Traditionally, CHM is used to restore the balance of the body for sick people and maintain health for common people. However, lack of the knowledge of the herbs may cause misuse of the herbs. In this demo, we will present a real-time smartphone application, which can not only recognize easily-confused herb based on Convolutional Neural Network (CNN), but also provide relevant information about the detected herbs. Our Chinese herb recognition system is implemented on a cloud server and can be used by the client user via smartphone. The recognition system is evaluated by 5-fold cross validation method and the accuracy is around 96%, which is adequate for real-world use.