scispace - formally typeset
Search or ask a question

Showing papers presented at "ACM SIGMM Conference on Multimedia Systems in 2013"


Proceedings ArticleDOI
28 Feb 2013
TL;DR: This dataset paper presents and makes available real-world measurements of the throughput that was achieved at the application layer when adaptive HTTP streaming was performed over 3G networks using mobile devices.
Abstract: In this dataset paper, we present and make available real-world measurements of the throughput that was achieved at the application layer when adaptive HTTP streaming was performed over 3G networks using mobile devices. For the streaming sessions, we used popular commute routes in and around Oslo (Norway) traveling with different types of public transportation (metro, tram, train, bus and ferry). We also have a few logs using a car. Each log provides a times-tamp, GPS coordinates and the measured number of bytes downloaded for approximately every second of the route. The dataset can be used in several ways, but the most obvious application is to emulate the same network bandwidth behavior (on specific geographical positions) for repeated experiments.

292 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: The proposed GamingAnywhere can be employed for setting up cloud gaming testbeds, which, it is believed, will stimulate more research innovations on cloud gaming systems.
Abstract: Cloud gaming is a promising application of the rapidly expanding cloud computing infrastructure. Existing cloud gaming systems, however, are closed-source with proprietary protocols, which raises the bars to setting up testbeds for experiencing cloud games. In this paper, we present a complete cloud gaming system, called GamingAnywhere, which is to the best of our knowledge the first open cloud gaming system. In addition to its openness, we design GamingAnywhere for high extensibility, portability, and reconfigurability. We implement GamingAnywhere on Windows, Linux, and OS X, while its client can be readily ported to other OS's, including iOS and Android. We conduct extensive experiments to evaluate the performance of GamingAnywhere, and compare it against two well-known cloud gaming systems: OnLive and StreamMyGame. Our experimental results indicate that GamingAnywhere is efficient and provides high responsiveness and video quality. For example, GamingAnywhere yields a per-frame processing delay of 34 ms, which is 3+ and 10+ times shorter than OnLive and StreamMyGame, respectively. Our experiments also reveal that all these performance gains are achieved without the expense of higher network loads. The proposed GamingAnywhere can be employed for setting up cloud gaming testbeds, which, we believe, will stimulate more research innovations on cloud gaming systems.

237 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper devise a multipath communication model for Real-time Transport Protocol (RTP); present minimal set of required protocol extensions; develop algorithms for scheduling RTP traffic across multiple paths at the sender and a corresponding de-jittering algorithm at the receiver side; and evaluate the proposal in varying scenarios using media traffic across different emulated mobile access network setups.
Abstract: The Internet infrastructure often supports multiple routes between two communicating hosts and, today, especially mobile hosts usually offer multiple network interfaces, so that disjoint paths between the hosts can be constructed. Having a number of (partly or fully) disjoint paths available may allow applications to distribute their traffic, aggregate capacity of different paths, choose the most suitable subset of paths, and support failover if a path fails. Exploiting multipath characteristics has been explored for TCP, but the requirements for real-time traffic differs notably. In this paper, we devise a multipath communication model for Real-time Transport Protocol (RTP); present minimal set of required protocol extensions; develop algorithms for scheduling RTP traffic across multiple paths at the sender and a corresponding de-jittering algorithm at the receiver side; and evaluate our proposal in varying scenarios using media traffic across different emulated mobile access network setups.

93 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: Bagadus, a prototype of a sports analytics application using soccer as a case study, which integrates a sensor system, a soccer analytics annotations system and a video processing system using a video camera array is presented.
Abstract: Sports analytics is a growing area of interest, both from a computer system view to manage the technical challenges and from a sport performance view to aid the development of athletes. In this paper, we present Bagadus, a prototype of a sports analytics application using soccer as a case study. Bagadus integrates a sensor system, a soccer analytics annotations system and a video processing system using a video camera array. A prototype is currently installed at Alfheim Stadium in Norway, and in this paper, we describe how the system can follow and zoom in on particular player(s). Next, the system will playout events from the games using stitched panorama video or camera switching mode and create video summaries based on queries to the sensor system. Furthermore, we evaluate the system from a systems point of view, benchmarking different approaches, algorithms and tradeoffs.

72 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper presents a distributed dataset for the recently published MPEG-DASH standard which is mirrored at different sites across Europe, namely Klagenfurt, Paris, and Prague, and can be used for real-world evaluations enabling the simulation of switching between different content delivery networks.
Abstract: The delivery of multimedia content over HTTP and on top of existing Internet infrastructures is becoming the preferred method within heterogeneous environment. The basic design principle is having an intelligent client which selects given and applicable media representations by issuing HTTP requests for individual segments based on the users' context and current conditions. Typically, this client behavior differs between implementations of the same kind and for the objective evaluations thereof appropriate datasets are needed. This paper presents a distributed dataset for the recently published MPEG-DASH standard which is mirrored at different sites across Europe, namely Klagenfurt, Paris, and Prague. A client implementation may choose to request segments from these sites and dynamically switch to a different location, e.g., in case the one currently used causes any issues. Thus, this distributed DASH dataset can be used for real-world evaluations enabling the simulation of switching between different content delivery networks.

61 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: The authors conclude that the absence of contextual audio reduces considerably the acceptable temporal boundary between the scent and video, which is a step towards the definition of synchronization specifications for multimedia applications based on olfactory and video media.
Abstract: As a step towards enhancing users' perceived multimedia quality levels beyond the level offered by the classic audiovisual systems, the authors present the results of an experimental study which looked at user's perception of inter-stream synchronization between olfactory data (scent) and video (without relevant audio). The impact on user's quality of experience (by considering enjoyment, relevance and reality) comparing synchronous with asynchronous presentation of olfactory and video media is analyzed and discussed. The aim is to empirically define the temporal boundaries within which users perceive olfactory data and video to be synchronized. The key analysis compares the user detection and perception of synchronization error. State of the art works have investigated temporal boundaries for olfactory data with audiovisual media, but no works document the integration of olfactory data and video (with no related audio). The results of this work show that the temporal boundaries for olfactory and video only are significantly different from olfactory, video and audio. The authors conclude that the absence of contextual audio reduces considerably the acceptable temporal boundary between the scent and video. The results also indicate that olfaction before video is more noticeable to users than olfaction after video and that users are more tolerable of olfactory data after video rather than olfactory data before video. In addition the results show the presence of two main synchronization regions. This work is a step towards the definition of synchronization specifications for multimedia applications based on olfactory and video media.

53 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: This work presents a dataset that contains comprehensive semi-professional user-generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary files, and social information for multiple 'social levels'.
Abstract: The increasing amount of digital multimedia content available is inspiring potential new types of user interaction with video data. Users want to easily find the content by searching and browsing. For this reason, techniques are needed that allow automatic categorisation, searching the content and linking to related information. In this work, we present a dataset that contains comprehensive semi-professional user-generated (SPUG) content, including audiovisual content, user-contributed metadata, automatic speech recognition transcripts, automatic shot boundary files, and social information for multiple 'social levels'. We describe the principal characteristics of this dataset and present results that have been achieved on different tasks.

44 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: SABRE (Smooth Adaptive Bit RatE), a scheme that can be implemented by the client to mitigate the buffer bloat effect of HTTP adaptive streaming, is developed and implemented in the VLC player.
Abstract: HTTP adaptive video streaming is an emerging technology that aims to deliver video quality to clients in a manner that accommodates available bandwidth and its fluctuations. In this scheme, a video stream is split at the server into small video files encoded at multiple bitrates. The video is composed at the client by downloading these files over HTTP and TCP. Although there are some efforts to standardize media representation for this technology, adaptation techniques remain an open area for development. Recently, an alarm was raised by a study about the interaction between TCP congestion control algorithms and large buffers on the Internet. Queuing delays when these buffers are full can reach several hundreds of milliseconds in a phenomenon that was dubbed buffer bloat. In this paper we use measurements on a testbed to demonstrate and quantify the buffer bloat effect of HTTP adaptive streaming. We show that in a typical residential setting a single video stream can easily cause queuing delays up to one second and even more hence seriously degrading the performance of other applications sharing the home network. We develop SABRE (Smooth Adaptive Bit RatE), a scheme that can be implemented by the client to mitigate this problem. We implemented SABRE in the VLC player. Using our testbed, we show that our technique can reduce buffer occupancy and significantly diminish the buffer bloat effect without affecting the experience of the video viewer.

41 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper's effort to create a standard dataset, consisting of videos simultaneously recorded using mobile devices in an unconstrained manner by multiple users attending performance events, which is useful as a common benchmark dataset for a variety of different research topics on mobile videos, including video analytics, video quality enhancement, and automatic video mashups.
Abstract: Proliferation of mobile devices with video recording capability has lead to a tremendous growth in the amount of user-generated mobile videos. Researchers have embarked on developing new interesting applications and enhancement algorithms for mobile video. There is, however, no standard dataset with videos that could represent characteristics of mobile videos captured in realistic scenarios. In this paper, we present our effort to create one such dataset, consisting of videos simultaneously recorded using mobile devices in an unconstrained manner by multiple users attending performance events. Each video is accompanied by concurrent readings from accelerometer and compass sensors. At the time of writing, the dataset contains 473 video clips, with a total length of 30 hours 41 minutes and total size of 122.8 GB. We believe this dataset is useful as a common benchmark dataset for a variety of different research topics on mobile videos, including video analytics, video quality enhancement, and automatic video mashups.

32 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: A prototype that can capture, compress transmit and render triangle mesh geometry in real-time over the internet, and has been successfully integrated into a larger tele-immersive environment that includes beyond state of the art 3D reconstruction and rendering modules.
Abstract: 3D Tele-immersion enables participants in remote locations to share, in real-time, an activity. It offers users natural interactivity and immersive experiences, but it challenges current networking solutions. Work in the past has mainly focused on the efficient delivery of image-based 3D videos and on the realistic rendering and reconstruction of geometry-based 3D objects. The contribution of this paper is a complete media pipeline that allows for geometry-based 3D tele-immersion. Unlike previous approaches, that stream videos or video plus depth estimate, our streaming module can transmit the live-reconstructed 3D representations (triangle meshes). Based on a set of comparative experiments, this paper details the architecture and describes a novel component that can efficiently stream geometry in real-time. This component includes both a novel fast local compression algorithm and a rateless packet protection scheme geared towards the requirements imposed by real-time transmission of live-capture mesh geometry. Tests on a large dataset show an encoding and decoding speed-up of over 10 times at similar compression and quality rates, when compared to the high-end MPEG-4 SC3DMC mesh encoder. The implemented rateless code ensures complete packet loss protection of the triangle mesh object and avoids delay introduced by retransmissions. This approach is compared to a streaming mechanism over TCP and outperforms it at packet loss rates over 2% and/or latencies over 9 ms in terms of end-to-end transmission delay. As reported in this paper, the component has been successfully integrated into a larger tele-immersive environment that includes beyond state of the art 3D reconstruction and rendering modules. This resulted in a prototype that can capture, compress transmit and render triangle mesh geometry in real-time over the internet.

29 citations


Proceedings ArticleDOI
28 Feb 2013
TL;DR: The results show that GreenCache's staggered load-proportional blinking policy results in 3X less buffering time by the client compared to an activation blinking policy, which simply activates and deactivates servers over long periods as power fluctuates, for realistic power variations from renewable energy sources.
Abstract: The growth of smartphones combined with advances in mobile networking have revolutionized the way people consume multimedia data. In particular, users in developing countries primarily rely on smartphones since they often do not have access to more powerful (and more expensive) computing devices. Unfortunately, cellular networks in developing countries have historically had low reliability, due to grid instability and lack of infrastructure. The situation has led network operators to experiment with running cellular towers "off the grid" using intermittent renewable energy sources. In parallel, network operators are also experimenting with co-locating server caches close to cell towers to reduce access latency and back-haul bandwidth. In this paper, we study techniques for optimizing multimedia caches for intermittent renewable energy sources. Specifically, we examine how to apply a blinking abstraction proposed in prior work, which rapidly transitions servers between an active and inactive state, to improve the performance of a multimedia cache powered by renewables, called GreenCache. Our results show that GreenCache's staggered load-proportional blinking policy, which coordinates when servers are active over brief intervals, results in 3X less buffering (or pause) time by the client compared to an activation blinking policy, which simply activates and deactivates servers over long periods as power fluctuates, for realistic power variations from renewable energy sources.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: An adaptive priority-based context-aware framework for efficiently streaming 3D textures to mobile devices with limited energy budget over wireless networks is designed and implemented and results show that using the proposed adaptations significantly improves the gameplay quality per unit of energy consumed to download 3DTextures in mobile games.
Abstract: Advances in computing hardware and novel multimedia applications have urged the development of handheld mobile devices such as smartphones and PDAs. Amongst the most used applications on handheld devices are mobile 3D graphics such as 3D games and 3D virtual environments. With this significant increase of mobile applications and games, one of the challenges is how to efficiently transmit the bulky 3D information to resource-constrained mobile devices. Despite the many attractive features, 3D graphics impose significant demands on the limited battery capacity of mobile devices. Thus the development of efficient approaches to decrease the amount of streamed data with the aim of increasing the battery lifetime has become a key research topic.In this paper, we design and implement an adaptive priority-based context-aware framework for efficiently streaming 3D textures to mobile devices with limited energy budget over wireless networks. Our results show that using our proposed adaptations significantly improves the gameplay quality per unit of energy consumed to download 3D textures in mobile games.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper develops models for power consumption in each one of the major phases of the streaming process: capturing, encoding, and transmission, and validates them with extensive experiments, focusing primarily on H.264 video encoding.
Abstract: Power consumption of video streaming systems has become a major concern, especially in battery-powered devices, such as video sensors. Power is usually dissipated in each one of the major phases of the streaming process: capturing, encoding, and transmission. This paper develops models for power consumption in each of these phases and validates them with extensive experiments, focusing primarily on H.264 video encoding. For comparative purposes, we also study MJPEG and MPEG-4 video codecs. In addition, we analyze the impacts of the main H.264 video compression parameters on power consumption and bitrate. These parameters include quantization parameter, number of reference frames, motion estimation (ME) range, and ME algorithm.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: The challenges defined as part of SED 2012, the data collection process, the dataset and its basic statistics, the ground truth creation and the suggested evaluation methodology are discussed.
Abstract: This paper presents the 2012 Social Event Detection dataset (SED2012). The dataset constitutes a challenging benchmark for methods that detect social events in large collections of multimedia items. More specifically, the dataset comprises more than 160 thousands of Flickr photos and their accompanying metadata, as well as a list of 149 manually selected and annotated target events, each of which is defined as a set of relevant photos. This paper discusses the challenges defined as part of SED 2012, the data collection process, the dataset and its basic statistics, the ground truth creation and the suggested evaluation methodology.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: This work presents a fashion-focused Creative Commons dataset, which is designed to contain a mix of general images as well as a large component of images that are focused on fashion (i.e., relevant to particular clothing items or fashion accessories).
Abstract: In this work, we present a fashion-focused Creative Commons dataset, which is designed to contain a mix of general images as well as a large component of images that are focused on fashion (i.e., relevant to particular clothing items or fashion accessories). The dataset contains 4810 images and related metadata. Furthermore, a ground truth on image's tags is presented. Ground truth generation for large-scale datasets is a necessary but expensive task. Traditional expert based approaches have become an expensive and non-scalable solution. For this reason, we turn to crowdsourcing techniques in order to collect ground truth labels; in particular we make use of the commercial crowdsourcing platform, Amazon Mechanical Turk (AMT). Two different groups of annotators (i.e., trusted annotators known to the authors and crowdsourcing workers on AMT) participated in the ground truth creation. Annotation agreement between the two groups is analyzed. Applications of the dataset in different contexts are discussed. This dataset contributes to research areas such as crowdsourcing for multimedia, multimedia content analysis, and design of systems that can elicit fashion preferences from users.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper describes a demonstration of live streaming of video and subtitle data using the MPEG-DASH technology and its synchronous playback in a web browser and shows that it is feasible with upcoming browsers, paving the way for richer web video applications.
Abstract: Video streaming has become a very popular application on the web, with rich player interfaces and subtitle integration. Additionally, live streaming solutions are deployed based on HTTP Streaming technologies. However, the integration of video and subtitles in live streaming solutions still poses some problems. This paper describes a demonstration of live streaming of video and subtitle data using the MPEG-DASH technology and its synchronous playback in a web browser. It presents the formats, architecture and technical choices made for this demonstration and shows that it is feasible with upcoming browsers, paving the way for richer web video applications.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: This work presents a set of logs from a very popular P2P live streaming application, the SopCast, and describes the crawling methodology, and presents a brief SopCast characterization.
Abstract: P2P-TV applications have attracted a lot of attention from the research community in the last years. Such systems generate a large amount of data which impacts the network performance. As a natural consequence, characterizing these systems has become a very important task to develop better multimedia systems. However, crawling data from P2P live streaming systems is particularly challenging by the fact that most of these applications have private protocols. In this work, we present a set of logs from a very popular P2P live streaming application, the SopCast. We describe our crawling methodology, and present a brief SopCast characterization. We believe that our logs and the characterization can be used as a starting point to the development of new live streaming systems.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper adopts common interaction method of medical experts to mobile computing and provides a tool for experts to annotate videos by drawing on the video and recording speech annotations, focusing on endoscopic surgery.
Abstract: Video annotation is a tedious task. But especially in medical domain the knowledge of experts for the interpretation of videos is of high value. Typically medical doctors do not have time for extensive annotation, but are used to manual notes, speech recordings, and pointing. In this demo paper we present an application for annotation of medical videos, focusing on endoscopic surgery. We adopt common interaction method of medical experts to mobile computing and provide a tool for experts to annotate videos by drawing on the video and recording speech annotations.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper presents the MusiClef data set, a multimodal data set of professionally annotated music that includes editorial metadata about songs, albums, and artists, as well as MusicBrainz identifiers to facilitate linking to other data sets.
Abstract: This paper presents the MusiClef data set, a multimodal data set of professionally annotated music. It includes editorial metadata about songs, albums, and artists, as well as MusicBrainz identifiers to facilitate linking to other data sets. In addition, several state-of-the-art audio features are provided. Different sets of annotations and music context data -- collaboratively generated user tags, web pages about artists and albums, and the annotation labels provided by music experts -- are included too. Versions of this data set were used in the MusiClef evaluation campaigns in 2011 and 2012 for auto-tagging tasks. We report on the motivation for the data set, on its composition, on related sets, and on the evaluation campaigns in which versions of the set were already used. These campaigns likewise represent one use case, i.e. music auto-tagging, of the data set. The complete data set is publicly available for download at http://www.cp.jku.at/musiclef.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: The feasibility of an end-to-end format-agnostic capture, production, delivery and rendering system to support both increased realism and personalization in the media industry is investigated.
Abstract: The media industry is currently being pulled in the often-opposing directions of increased realism (high resolution, stereoscopic, large screen) and personalization (selection and control of content, availability on many devices). We investigate the feasibility of an end-to-end format-agnostic approach to support both these trends. In this paper, different aspects of a format-agnostic capture, production, delivery and rendering system are discussed. At the capture stage, the concept of layered scene representation is introduced, including panoramic video and 3D audio capture. At the analysis stage, a virtual director component is discussed that allows for automatic execution of cinematographic principles, using feature tracking and saliency detection. At the delivery stage, resolution-independent audiovisual transport mechanisms for both managed and unmanaged networks are treated. In the rendering stage, a rendering process that includes the manipulation of audiovisual content to match the connected display and loudspeaker properties is introduced. Different parts of the complete system are revisited demonstrating the requirements and the potential of this advanced concept.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper proposes a novel approach for improving the quality of experience (QoE) of real-time video conferencing systems by increasing MED to within JND in order to have more room for smoothing network delay spikes as well as recovering lost packets, without incurring noticeable degradation in interactivity.
Abstract: This paper proposes a novel approach for improving the quality of experience (QoE) of real-time video conferencing systems. In these systems, QoE is affected by signal quality as well as interactivity, both depending on the packet loss rate, delay jitters, and mouth-to-ear delay (MED) that measures the sender-receiver delay on audio signals (and will be the same as that of video signals when video and audio is synchronized). We notice in the current Internet that increasing MED as well as reducing packet rate can help reduce the delay-aware loss rate in congested connections. Between the two methods, the former plays a more important role and applies well to a variety of network conditions for improving audiovisual signal quality, although overly increasing the MED will degrade interactivity. Based on a psychophysical concept called just-noticeable difference (JND), we find the extent to which MED can be increased, without humans perceiving the difference from the original conversation. The approach can be applied to improve existing video conferencing systems. Starting from the operating point of an existing system, we increase its MED to within JND in order to have more room for smoothing network delay spikes as well as recovering lost packets, without incurring noticeable degradation in interactivity. We demonstrate the idea on Skype and Windows Live Messenger by designing a traffic interceptor to extend their buffering time and to perform packet scheduling/recovery. Our experimental results show significant improvements in QoE, with much better signal quality while maintaining similar interactivity.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: A related list reordering approach which modifies the order of the videos shown on the related list based on the content in the cache is recommended which would lead to a 2 to 5 times increase in cache hit rate and a reduction in server load or back-end bandwidth usage.
Abstract: In this paper, we take advantage of the user behavior of requesting videos from the related list provided by YouTube and the user behavior of requesting videos from the top of this related list to improve the performance of YouTube's caches. We recommend a related list reordering approach which modifies the order of the videos shown on the related list based on the content in the cache. The main goal of our reordering approach is to push the contents already in the cache to the top of the related list and push non-cached contents towards the bottom, which increases the likelihood that the already cached content will be chosen by the viewer. We analyze the benefits of our approach by an investigation that is based on two traces collected from an university campus. Our analysis shows that the proposed reordering approach for related list would lead to a 2 to 5 times increase in cache hit rate compared to an approach without reordering the related list. The increase in hit rate would lead to a 5.12% to 18.19% reduction in server load or back-end bandwidth usage. This increase in hit rate and reduction in back-end bandwidth reduces the latency in streaming the video requested by the viewer and has the potential to improve the overall performance of YouTube's content distribution system. An analysis of YouTube's recommendation system reveals that related lists are created from a small pool of videos, which increases the potential for caching content from related lists and reordering based on the content in the cache.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: A novel system for automatically detecting salient image regions in stereoscopic videos that considers information based on three dimensions: salient colors in individual frames, salient information derived from camera and object motion, and depth saliency.
Abstract: We present a novel system for automatically detecting salient image regions in stereoscopic videos. Our proposed algorithm considers information based on three dimensions: salient colors in individual frames, salient information derived from camera and object motion, and depth saliency. These three components are dynamically combined into one final saliency map based on the reliability of the individual saliency detectors. Such a combination allows using more efficient algorithms even if the quality of one detector degrades. For example, we use a computationally efficient stereo correspondence algorithm that might cause noisy disparity maps for certain scenarios. In this case, however, a more reliable saliency detection algorithm such as the image saliency is preferred. To evaluate the quality of the saliency detection, we created modified versions of stereoscopic videos with the non-salient regions blurred. Having users rate the quality of these videos, the results show that most users do not detect the blurred regions and that the automatic saliency detection is very reliable.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: A 3D spatial logic and algorithms for interpretation of spatial relationships among objects in 3D space are proposed and developed for LVDBMS (Live Video DataBase Management System), a generic platform for live video computing.
Abstract: Interpretation of spatial relations between objects is essential to many applications such as robotics, video surveillance, spatial reasoning, and scene understanding. Current models for spatial logic are two-dimensional. With the advance in new sensing technology, inexpensive depth sensors become widely available and 3D scene reconstruction can be applied in various application scenarios. In this paper, we propose a 3D spatial logic and algorithms for interpretation of spatial relationships among objects in 3D space. More specifically, these techniques are developed for LVDBMS (Live Video DataBase Management System), a generic platform for live video computing. We extend the original directional relationships into 3D directional relationships, and introduce a simple yet effective way to build 3D object models based on depth sensors. A highly accurate and efficient algorithm is also proposed to compute the spatial relationships between two objects by sampling the entire space from the reference object. Experimental results based on a real indoor scene and an RGB-D dataset are given to demonstrate the effectiveness of our techniques.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: An Android based framework is presented to capture the relevant wireless network behavior, geo-coordinates and packet traces for popular streaming applications on Android certified devices to determine the quality of service and network characteristics of different streaming methodologies.
Abstract: The proliferation of smart devices for mobile networks is a major traffic generator nowadays. These devices provide the ability to receive media content in nearly every situation. Despite that video streaming in high quality is getting more and more popular in mobile scenarios, the performance and bottlenecks of mobile applications over wireless networks, especially, during the transmission of media streams, are poorly understood yet. In order to tackle this new challenge, we present an Android based framework to capture the relevant wireless network behavior, geo-coordinates and packet traces for popular streaming applications on Android certified devices. A dataset has been obtained by measurement trials, which have been performed in a 3G network for both HTTP and peer-to-peer video streaming applications. The trials comprise also an additional WiFi measurement for comparison purposes. The presented dataset enables future research to determine the quality of service and network characteristics of different streaming methodologies, which are affected by the typical conditions encountered in wireless networks, like hand-over effects, signal fading, connection losses etc. We hope that both, the presented dataset and the framework, may prove to be useful for the traffic measurement and the multimedia research communities.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: This paper focuses on the preview streaming architecture and framework, and presents the investigation into how such a system would best handle network congestion effectively, and proposes a keyview-aware method that trades off mesh quality and camera speed appropriately depending on how close the current view is to the keyviews.
Abstract: Publishers of 3D models online typically provide two ways to preview a model before the model is downloaded and viewed by the user: (i) by showing a set of thumbnail images of the 3D model taken from representative views (or keyviews); (ii) by showing a video of the 3D model as viewed from a moving virtual camera along a path determined by the content provider. We propose a third approach called preview streaming for mesh-based 3D object: by streaming and showing parts of the mesh surfaces visible along the virtual camera path. This paper focuses on the preview streaming architecture and framework, and presents our investigation into how such a system would best handle network congestion effectively. We study three basic methods: (a) stop-and-wait, where the camera pauses until sufficient data is buffered; (b) reduce-speed, where the camera slows down in accordance to reduce network bandwidth; and (c) reduce-quality, where the camera continues to move at the same speed but fewer vertices are sent and displayed, leading to lower mesh quality. We further propose a keyview-aware method that trades off mesh quality and camera speed appropriately depending on how close the current view is to the keyviews. A user study reveals that our keyview-aware method is preferred over the basic methods.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: An evolutionary 3DTI session optimization approach using an Open Session Management (OSM) architecture that uses a global view of participants and overlay network conditions to optimize prioritized QoS parameters is proposed.
Abstract: Different 3D tele-immersive (3DTI) activities pose different requirements for application and network level quality of service (QoS) to ensure a strong quality of experience (QoE) for participants. Some applications put heavy weight on audio quality, some consider higher quality for upper body video streams, and some seek very low end-to-end interactivity delay. In addition, a variation in streaming content may arise due to the participants' change of interests (e.g., view change). Therefore, there is a need for an adaptive multi-stream, multi-site 3DTI session management strategy, which is unobtrusive, and optimizes QoS parameters in the 3DTI content distribution based on the user activity and content variation. To address this next generation session management problem, we revisit the design space of multi-stream and multi-site 3DTI session layer. We propose an evolutionary 3DTI session optimization approach using an Open Session Management (OSM) architecture that uses a global view of participants and overlay network conditions to optimize prioritized QoS parameters. Experimental results with PlanetLab traces show that the optimization process is computationally unobtrusive, and the optimized TI sessions meet expectations of the participants up to 50% higher compared to the current solutions in the 3DTI space.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: Adaptive Layer Distribution is proposed as a novel scalable media delivery technique that optimises the tradeoff between the streaming bandwidth and error resiliency and provides a parameterised mechanism for dynamic adaptation of the scalable video, while providing increased resilience to the highest quality layers.
Abstract: Bandwidth constriction and datagram loss are prominent issues that affect the perceived quality of streaming video over lossy networks, such as wireless. The use of layered video coding seems attractive as a means to alleviate these issues, but its adoption has been held back in large part by the inherent priority assigned to the critical lower layers and the consequences for quality that result from their loss. The proposed use of forward error correction (FEC) as a solution only further burdens the bandwidth availability and can negate the perceived benefits of increased stream quality.In this paper, we propose Adaptive Layer Distribution (ALD) as a novel scalable media delivery technique that optimises the tradeoff between the streaming bandwidth and error resiliency. ALD is based on the principle of layer distribution, in which the critical stream data is spread amongst all datagrams thus lessening the impact on quality due to network losses. Additionally, ALD provides a parameterised mechanism for dynamic adaptation of the scalable video, while providing increased resilience to the highest quality layers. Our experimental results show that ALD improves the perceived quality and also reduces the bandwidth demand by up to 36% in comparison to the well-known Multiple Description Coding (MDC) technique.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: The ViSOR (Video Surveillance Online Repository) repository is described, designed with the aim of establishing an open platform for collecting, annotating, retrieving, and sharing surveillance videos, as well as evaluating the performance of automatic surveillance systems.
Abstract: This paper describe the ViSOR (Video Surveillance Online Repository) repository, designed with the aim of establishing an open platform for collecting, annotating, retrieving, and sharing surveillance videos, as well as evaluating the performance of automatic surveillance systems. The repository is free and researchers can collaborate sharing their own videos or datasets. Most of the included videos are annotated. Annotations are based on a reference ontology which has been defined integrating hundreds of concepts, some of them coming from the LSCOM and MediaMill ontologies. A new annotation classification schema is also provided, which is aimed at identifying the spatial, temporal and domain detail level used. The web interface allows video browsing, querying by annotated concepts or by keywords, compressed video previewing, media downloading and uploading. Finally, ViSOR includes a performance evaluation desk which can be used to compare different annotations.

Proceedings ArticleDOI
28 Feb 2013
TL;DR: A graphical search interface for multi-faceted hierarchical metadata and a video navigation system implementing this 'Revolving Cube Show' interface, enabling users to search flexibly and intuitively by using simple operations to combine attributes.
Abstract: We describe the graphical search interface we have developed for multi-faceted hierarchical metadata and a video navigation system implementing this 'Revolving Cube Show' interface. This interface can treat discrete, continuous, and hierarchical attributes, enabling users to search flexibly and intuitively by using simple operations to combine attributes. Testing using data on 5495 Japanese TV programs and five attributes (channel, time zone, airtime, genre, and performer) showed that users were able to easily combine the various attributes to create dynamic search hierarchies.