scispace - formally typeset
Search or ask a question

Showing papers in "IEEE MultiMedia in 2015"


Journal ArticleDOI
TL;DR: The authors introduce the principal concepts for multimedia big data computing, discuss what the scientific problems and fundamental challenges are, and present methodologies and approaches from the perspectives of the multimedia life cycle and multimediabig data computing life cycle.
Abstract: With the proliferation of social media, which is largely fostered by the boom of the Internet and mobile ecosystems, a huge amount of multimedia data has been generated, forming the multimedia big data. However, because multimedia data is unstructured and multimodal in nature, with real-time and Quality of Experience requirements, multimedia big data computing has not only created unprecedented opportunities but also imposed fundamental challenges in storage, processing, and analysis. Here, the authors introduce the principal concepts for multimedia big data computing, discuss what the scientific problems and fundamental challenges are, and present methodologies and approaches from the perspectives of the multimedia life cycle and multimedia big data computing life cycle. They also speculate on the research opportunities and directions for multimedia big data computing.

77 citations


Journal ArticleDOI
TL;DR: To obtain a detailed understanding of YouTube video characteristics, a customized Web spider was employed to crawl over a million YouTube videos and revealed that new categories of features have emerged within the YouTube service provision.
Abstract: Given the impact of YouTube on Internet services and social networks, a healthy quantity of research has been conducted over the past few years. The majority of studies on traffic capture and evaluation were carried out prior to Google's acquisition of YouTube in 2007. Since then, there have been some changes made to the user policy and service infrastructure, including limits placed on video duration, file size, and resolution. This article depicts the latest YouTube traffic profiles and delivers updated and valuable information for future researchers. To obtain a detailed understanding of YouTube video characteristics, a customized Web spider was employed to crawl over a million YouTube videos. The study demonstrates consistency with previous research for major video streams while revealing that new categories of features have emerged within the YouTube service provision. Compared with traditional video repositories, YouTube exhibits many unique characteristics that could introduce novel challenges and opportunities for optimizing the performance of short video-sharing services.

63 citations


Journal ArticleDOI
TL;DR: FoodLog is a multimedia food-recording tool that offers a novel method for recording daily food intake primarily for healthcare purposes and its novel use of image-processing techniques presents significant potential for the development of new healthcare monitoring apps.
Abstract: FoodLog is a multimedia food-recording tool that offers a novel method for recording daily food intake primarily for healthcare purposes. Its novel use of image-processing techniques presents significant potential for the development of new healthcare monitoring apps.

63 citations


Journal ArticleDOI
TL;DR: A novel model is proposed with two major advantages: the saliency-guided feature learning can learn features unsupervisedly, and the deep framework recasts IQA as a classification problem, analogous to human qualitative evaluation.
Abstract: Image quality assessment (IQA) has thrived for decades, and researchers continue to explore how the human brain perceives visual stimuli. Psychological evidence shows that humans prefer qualitative descriptions when evaluating image quality, yet most researches still concentrate on numerical descriptions. Furthermore, handcrafting features are widely used in this community, which constrains the models' flexibility. A novel model is proposed with two major advantages: the saliency-guided feature learning can learn features unsupervisedly, and the deep framework recasts IQA as a classification problem, analogous to human qualitative evaluation. Experiments validate the proposed model's effectiveness.

37 citations


Journal ArticleDOI
TL;DR: The proposed solution uses the scalable extensions of HEVC (SHVC) to efficiently compress, store, and deliver UHD video in a large-scale streaming system with scalability, backward compatibility is maintained and the quality of the existing HD services is guaranteed.
Abstract: Compared to the widely deployed high-definition (HD) format, ultra-high definition (UHD) defines video parameters associated with higher spatial resolutions, higher frame rates, higher sample bit depths, and a wider color gamut. UHD promises to significantly enhance the user experience with pictures that offer the "look out the window"' effect. However, this promise comes at the cost of increased bandwidth required to deliver UHD. This article explores on-demand UHD video streaming using the latest video compression and delivery technologies. The proposed solution uses the scalable extensions of HEVC (SHVC) to efficiently compress, store, and deliver UHD video in a large-scale streaming system. With scalability, backward compatibility is maintained and the quality of the existing HD services is guaranteed.

25 citations


Journal ArticleDOI
TL;DR: The functional architecture of a system that exploits the Green Metadata standard for energy-efficient media consumption is described, referring to this functional architecture to explain power reductions in various system components.
Abstract: Based on compelling evidence from responses to an April 2013 call for proposals (CFP) on energy-efficient video consumption, MPEG initiated standardization of Green Metadata for energy-efficient video consumption. This article describes how metadata enables large power reductions when QoE is maintained and even larger reductions when QoE is allowed to vary. When QoE is maintained, metadata enables average power reductions of 12, 12, and 26 percent during encoding, decoding, and display, respectively. In addition, the authors measured up to 80 percent power savings at lowered, but acceptable, QoE levels. This article describes the functional architecture of a system that exploits the Green Metadata standard for energy-efficient media consumption, referring to this functional architecture to explain power reductions in various system components.

25 citations


Journal ArticleDOI
TL;DR: The author contemplates the future of social media and politics and considers whether social media boost democracy in authoritarian regimes and how people are using social media for political participation.
Abstract: The confluence of social media with political action is a complex field raising important questions. Is social media a realm for democratic deliberation? Can we ascertain public opinion from social media outlets? How are people using social media for political participation? Can social media boost democracy in authoritarian regimes? Here, the author considers these questions and contemplates the future of social media and politics.

25 citations


Journal ArticleDOI
TL;DR: A new audio-driven approach for temporal alignment and management of shared audiovisual streams is introduced, which presents the theoretical framework and demonstrates the methodology in real-world scenarios.
Abstract: This work emanates from the particularities residing in contemporary social media storytelling, where multiple users and publishing channels capture and share public events, experiences, and places. Multichannel presentation and visualization mechanisms are pursued along with novel audiovisual mixing (such as time-delay-compensation enhancement, perceptual mixing, quality-based content selection, linking to context-aware metadata, and propagating multimedia semantics), thus promoting multimodal social media editing, processing, and authoring. While the exploitation of multiple time-based media (audio and video) describing the same event may lead to significant content enhancement, difficulties regarding detection and temporal synchronization of multimedia events have to be overcome. In many cases, one can identify events based only on audio features, thus performing an initial cost-effective annotation of the multimedia content. This article introduces a new audio-driven approach for temporal alignment and management of shared audiovisual streams. The article presents the theoretical framework and demonstrates the methodology in real-world scenarios. This article is part of a special issue on social multimedia and storytelling.

24 citations


Journal ArticleDOI
TL;DR: A prototype for a sonic interactive surface capable of delivering surface tapping sounds in real time when triggered by users' taps on a real surface or on an imagined "virtual" surface is presented and a multidimensional measurement approach to evaluate user experiences of multimodal interactive systems is proposed.
Abstract: The audio feedback resulting from object interaction provides information about the material of the surface and about one's own motor behavior. With the current developments in interactive sonification, it's now possible to digitally change this audio feedback, making the use of interactive sonification a compelling approach to shape tactile surface interactions. Here, the authors present a prototype for a sonic interactive surface, capable of delivering surface tapping sounds in real time when triggered by users' taps on a real surface or on an imagined "virtual" surface. In this system, the delivered audio feedback can be varied so that the tapping sounds correspond to different applied strengths during tapping. The authors also propose a multidimensional measurement approach to evaluate user experiences of multimodal interactive systems. They evaluated their system by looking at the effect of the altered tapping sounds on emotional action-related responses, the users' interactions with the surface, and perceived surface hardness. Results show the influence of the sonification of tapping at all levels: emotional, behavioral, and perceptual. These results have implications on the design of interactive sonification displays and tangible auditory interfaces aiming to change perceived and subsequent motor behavior as well as perceived material properties.

23 citations


Journal ArticleDOI
TL;DR: The use of sonification as acoustic feedback (AF) in on-water rowing training with elite athletes was described and the theoretical basis of an ecological dynamics approach was elucidated for understanding expertise and skill acquisition in sport.
Abstract: Feedback systems used in elite sport mainly provide visual information. Another approach displays information audibly via sonification to create coherency between action and reaction. This article describes the use of sonification as acoustic feedback (AF) in on-water rowing training with elite athletes. On the theoretical basis of an ecological dynamics approach, the audio-motor relationship was elucidated for understanding expertise and skill acquisition in sport. Results gained from athlete surveys between 2009 and 2013 were presented to determine if AF reflects specific sections within the rowing-motion comprehensible for the athletes and if the information is provided in a way that is useful for technique training. The final aim was to provide criteria and recommendations for the development of sonification-based applications within a moving context for sports and rehabilitation.

23 citations


Journal ArticleDOI
TL;DR: A framework for real-time analysis and visualization of ballet dance movements performed within a Cave Virtual Reality Environment (CAVE) using a Markovian empirical transition matrix and a spherical self-organizing map is presented.
Abstract: This article presents a framework for real-time analysis and visualization of ballet dance movements performed within a Cave Virtual Reality Environment (CAVE). A Kinect sensor captures and extracts dance-based movement features, from which a topology preserved "posture space"' is constructed using a spherical self-organizing map (SSOM). Recordings of dance movements are parsed into gestural elements by projection onto the SSOM to form unique trajectories in posture space. Dependencies between postures in a trajectory are modeled using a Markovian empirical transition matrix, which is then used to recognize attempted movements. This allows for quantitative assessment and feedback of a student's performance, delivered using concurrent, localized visualizations together with a performance score based on incremental dynamic time warping (IDTW).

Journal ArticleDOI
TL;DR: The authors' CitySensing system semantically annotates the social media streams before fusing it with privacy-preserving aggregates of CDR to feed a public installation where visual story telling assumes a leading role in allowing the audience to perceive emerging patterns and to observe their dynamics.
Abstract: The authors' CitySensing system captures at places social media streams and anonymous Call Data Records (CDR) during city-scale events. It semantically annotates the social media streams before fusing it with privacy-preserving aggregates of CDR. The result is analysed and repurposed to feed a public installation where visual story telling assumes a leading role in allowing the audience to perceive emerging patterns and to observe their dynamics. The authors demonstrate the efficacy of the visualization on the real-world data of two editions of Milan Design Week. This article is part of a special issue on social multimedia and storytelling.

Journal ArticleDOI
TL;DR: The authors present a system capable of rhythmic walking interactions using auditory display that follows either detected footsteps or suggests a tempo, which is either constant or adapts to the walker, yet ecological sounds are considered more natural.
Abstract: The authors present a system capable of rhythmic walking interactions using auditory display. The feedback is based on footstep sounds and follows either detected footsteps or suggests a tempo, which is either constant or adapts to the walker. The auditory display contains simple sinusoidal tones or ecological, physically based synthetic walking sounds. In the tempo-following experiment, the authors investigate the different interaction modes (step versus constant or adaptive tempo) and auditory feedback (sinusoidal tones versus ecological walking sounds) with respect to their effect on walking tempo. They calculate the mean square error (MSE) between the performed and target tempo and the stability of the performed tempo. The results indicate that the MSE with ecological sounds is better than or comparable to that with the sinusoidal tones, yet ecological sounds are considered more natural. Allowing deviations from the cues in the adaptive conditions results in a tempo that's still stable but closer to the natural walking pace of the subjects. These results have implications on the design of interactive entertainment or rehabilitation.

Journal ArticleDOI
TL;DR: A prototype for a sonic interactive surface, capable of delivering surface tapping sounds in real-time, when triggered by the users' taps on a real surface or on an imagined, "virtual" surface, and a multi-dimensional measurement approach to evaluate user experiences of multi-modal interactive systems are presented.
Abstract: © 2015 IEEE. The audio-feedback resulting from object interaction provides information about the material of the surface and about one's own motor behavior. With the current developments in interactive sonification, it is now possible to digitally change this audio-feedback, and thus, the use of interactive sonification becomes a compelling approach to shape tactile surface interactions. Here, we present a prototype for a sonic interactive surface, capable of delivering surface tapping sounds in real-time, when triggered by the users' taps on a real surface or on an imagined, "virtual" surface. In this system, the delivered audio-feedback can be varied so that the heard tapping sounds correspond to different applied strength during tapping. Here, we also propose a multi-dimensional measurement approach to evaluate user experiences of multi-modal interactive systems. We evaluated our system by looking at the effect of the altered tapping sounds on emotional action-related responses, users' way of interacting with the surface, and perceived surface hardness. Results show the influence of the sonification of tapping at all levels: emotional, behavioral and perceptual. These results have implications in the design of interactive sonification displays and tangible auditory interfaces aiming to change perceived and subsequent motor behaviour, as well as perceived material properties.

Journal ArticleDOI
TL;DR: The Variable Markov Oracle (VMO) data structure for multivariate time series indexing is introduced and a probabilistic interpretation of the VMO query-matching algorithm is proposed to find an analogy to the inference problem in a hidden Markov model (HMM).
Abstract: This article introduces the Variable Markov Oracle (VMO) data structure for multivariate time series indexing. VMO can identify repetitive fragments and find sequential similarities between observations. VMO can also be viewed as a combination of online clustering algorithms with variable-order Markov constraints. The authors use VMO for gesture query-by-content and gesture following. A probabilistic interpretation of the VMO query-matching algorithm is proposed to find an analogy to the inference problem in a hidden Markov model (HMM). This probabilistic interpretation extends VMO to be not only a data structure but also a model for time series. Query-by-content experiments were conducted on a gesture database that was recorded using a Kinect 3D camera, showing state-of-the-art performance. The query-by-content experiments' results are compared to previous works using HMM and dynamic time warping. Gesture following is described in the context of an interactive dance environment that aims to integrate human movements with computer-generated graphics to create an augmented reality performance.

Journal ArticleDOI
TL;DR: A brief overview of the keynotes, presentations, panels, summit, grand challenge, and workshops of the First IEEE International Conference on Multimedia Big Data (BigMM 2015) is provided.
Abstract: The motivation for organizing the First IEEE International Conference on Multimedia Big Data (BigMM 2015) was the proliferation of multimedia data and ever-growing requests for multimedia applications, which have made multimedia the "biggest big data" and an important source of insights and information. This conference report provides a brief overview of the keynotes, presentations, panels, summit, grand challenge, and workshops.


Journal ArticleDOI
TL;DR: This work proposes an alternative approach that provides high-level event-based abstractions that hide or minimize the complexity of dealing with interleaved media content as part of hypermedia applications.
Abstract: Unsolicited interleaved media content is common in digital TV systems, particularly in broadcast TV, in which advertisements are inserted into programs transmitted sequentially. Although certain existing middleware systems address this problem, their solutions have limitations. This work proposes an alternative approach that provides high-level event-based abstractions that hide or minimize the complexity of dealing with interleaved media content as part of hypermedia applications. This article discusses how multimedia languages and players can handle multiple time bases in supporting intermedia synchronization with interleaved media content. The proposal has been incorporated in the Nested Context Language (NCL) and in its player, the main component of the Ginga digital TV middleware. This approach can also be adapted to be used in other languages and engines.

Journal ArticleDOI
TL;DR: This automatic video sequence recommendation method selects optimal sets of context-dependent, high-quality viewpoints from multiview videos to enhance the viewing experience.
Abstract: Multiview videos are highly flexible in their ability to enhance the viewing experience. However, the increasing number of cameras burdens even experts who must quickly select suitable viewpoints. Therefore, the authors propose an automatic viewpoint sequence recommendation method. Unlike existing methods, their approach focuses on context dependency using viewpoint evaluation and transition processes performed by two types of agents: camera agents evaluate the view quality, and a producer agent selects the optimal set of viewpoints based on the scene and production context. The authors optimized the process parameters for different contexts of scene and production, and confirmed the effectiveness of their method by comparing context-dependent and independent video sequences with selections made by humans.

Journal ArticleDOI
TL;DR: The project team's interdisciplinary approach to developing and disseminating engaging, interactive educational apps that demonstrate what happens to personal information on the Internet, with a particular focus on multimedia, and their approach to explaining the underlying social and technical principles in accessible terms are described.
Abstract: As part of the Teaching Privacy project, researchers at the International Computer Science Institute and the University of California, Berkeley, are developing learning tools to empower K-12 students and college undergraduates in making informed choices about privacy. Teaching Privacy in part grew out of empirical research on the privacy implications of multimedia technology; this research generated a great deal of interest from teachers, who often want to provide students with guidance on online privacy but feel they are not sufficiently versed in the technical details. These interactions inspired the project, which focuses on working with teachers through outreach, curriculum-building, and professional development. This article describes the project team's interdisciplinary approach to developing and disseminating engaging, interactive educational apps that demonstrate what happens to personal information on the Internet, with a particular focus on multimedia, and their approach to explaining the underlying social and technical principles in accessible terms.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed auditory feedback is significantly more efficient and effective than visual-only feedback or conventional auditory feedback in terms of time and accuracy.
Abstract: With the increasing amount of new motion-sensing hardware, improving the usability of gesture interfaces is becoming more important. However, people don't accept gesture interfaces as comfortably as traditional interfaces because they lack the tactile or haptic feedback of traditional input methods, such as mice or keyboards. However, having tactile feedback in gesture interfaces isn't possible unless users are wearing a device with actuators. Therefore, auditory feedback is an appropriate and unique alternative for assisting visual feedback in gesture interfaces. The authors propose various types of novel auditory feedback methods and explore their effects as secondary feedback on complementing visual feedback. They performed a user study for a menu selection task, and experimental results show that the proposed auditory feedback is significantly more efficient and effective than visual-only feedback or conventional auditory feedback in terms of time and accuracy.

Journal ArticleDOI
TL;DR: This special issue looks at some of the upcoming research on how interaction with sound can be used in a variety of areas and applications.
Abstract: Today's computing technology is radically different from that of 10 years ago. Devices such as smartphones, tablets, and even wearable devices are found wherever we are. Researchers and developers can take advantage of this new era by knowing that the public has personal access to highly interactive multimedia devices. Interfaces involving sound are already in the hands of millions of people. For information display, sound promises an alternative to squeezing information through small screens that then force us to attend to them, thus making us lose awareness of our immediate environment. This special issue looks at some of the upcoming research on how such interaction with sound can be used in a variety of areas and applications.

Journal ArticleDOI
TL;DR: A data-driven approach for semantic scene understanding, without pixelwise annotation or classifier training, that parses a target image in two steps, making the two steps mutually conditional and bootstrapped under the probabilistic Expectation-Maximization (EM) formulation.
Abstract: This article investigates a data-driven approach for semantic scene understanding, without pixelwise annotation or classifier training. The proposed framework parses a target image in two steps: first, retrieving its exemplars (that is, references) from an image database, where all images are unsegmented but annotated with tags; second, recovering its pixel labels by propagating semantics from the references. The authors present a novel framework making the two steps mutually conditional and bootstrapped under the probabilistic Expectation-Maximization (EM) formulation. In the first step, the system selects the references by jointly matching the appearances as well as the semantics (that is, the assigned labels) with the target. They process the second step via a combinatorial graphical representation, in which the vertices are superpixels extracted from the target and its selected references. Then they derive the potentials of assigning labels to one vertex of the target, which depend upon the graph edges that connect the vertex to its spatial neighbors of the target and to similar vertices of the references. The proposed framework can be applied naturally to perform image annotation on new test images. In the experiments, the authors validated their approach on two public databases, and demonstrated superior performance over the state-of-the-art methods in both semantic segmentation and image annotation tasks.

Journal ArticleDOI
TL;DR: The experimental results suggest the wearable auditory biofeedback device using the instrumented ankle-foot orthosis (AFO) called GaitEcho offers similarly adequate functionality for both blind and sighted participants.
Abstract: GaitEcho is a wearable auditory biofeedback device that uses an instrumented ankle-foot orthosis for gait rehabilitation. The authors investigated its feasibility for rehabilitating sighted and blind individuals by employing a reference-tracking task for an ankle-joint exercise. Experimental results suggest that GaitEcho offers similarly adequate functionality (angle controllability, timing controllability, and task difficulty) for both blind and sighted participants in conducting ankle-joint exercises. Furthermore, blind participants reported higher understandability and enjoyment than sighted participants, suggesting positive emotional effect of auditory biofeedback for blind users.

Journal ArticleDOI
TL;DR: The author reviews the evolution of visual documentation and considers where the authors're headed, introducing the Visual Web.
Abstract: A major disruption is taking place in terms of how photographs are captured and the role they play in modern society, offering the tantalizing possibility of creating visual connections and conversations beyond anything we've yet seen or imagined. The author reviews the evolution of visual documentation and considers where we're headed, introducing the Visual Web. This department is part of a special issue on social multimedia and storytelling.

Journal ArticleDOI
TL;DR: A summary of the methods and tools used to help children with Autism Spectrum Disorder initiate conversations, improve "play" skills, and practice social conventions over the past few years are reviewed.
Abstract: Multimedia-based instruction (MBI) has been playing an increasingly important role in interventions to help children with Autism Spectrum Disorder (ASD) initiate conversations, improve "play" skills, and practice social conventions. However, it's important for MBI apps to integrate and enhance established evidence-based approaches. With the support of the National Science Foundation, researchers at the University of Kentucky have started a program to integrate novel multimedia technologies into evidence-based interventions. This summary of the methods and tools they've studied the past few years reviews some of the important lessons learned.

Journal ArticleDOI
TL;DR: An online multimedia storytelling ecosystem is introduced comprised of purpose-built user applications, a collaborative story-authoring engine, social context integration, and socially aware media services that enables online collaborative story coauthoring and provides an ideal platform to study the synergy between social networks and networked media in enhancing the user experience of storytelling.
Abstract: User-generated audio-visual content is becoming the most popular medium for information sharing and social storytelling around live events. In this article, the authors introduce an online multimedia storytelling ecosystem comprised of purpose-built user applications, a collaborative story-authoring engine, social context integration, and socially aware media services. As their event-based user experiments illustrate, the system enables online collaborative story coauthoring and provides an ideal platform to study the synergy between social networks and networked media in enhancing the user experience of storytelling. This article is part of a special issue on social multimedia and storytelling.

Journal ArticleDOI
TL;DR: The authors propose an unsupervised, multistaged approach that explores commonly available, real-world metadata for the detection and linking of social events across sharing platforms.
Abstract: A large part of media shared on online platforms such as Flickr and YouTube is captured at various social events (such as music festivals, exhibitions, and sport events). While it is quite easy to share personal impressions online, it is much more challenging to identify content that is related to the same social event across different platforms. Here, the authors focus on the detection of social events in a data collection from Flickr and YouTube. They propose an unsupervised, multistaged approach that explores commonly available, real-world metadata for the detection and linking of social events across sharing platforms. The proposed methodology and the performed experiments allow for a thorough evaluation of the usefulness of available metadata in the context of social event detection in both single- and cross-platform scenarios. This article is part of a special issue on social multimedia and storytelling.

Journal ArticleDOI
TL;DR: A user-centered design approach for creating an audio interface in the context of climate science to realize a domain-specific sonification platform and to identify climate metaphors to build a metaphoric sound identity for the sonification.
Abstract: This article presents a user-centered design approach for creating an audio interface in the context of climate science. The author's team used contextual inquiry to gather information about scientists' workflows and focus groups to assess data about the scientists' specific use of language. The goal was to realize a domain-specific sonification platform and to identify climate metaphors to build a metaphoric sound identity for the sonification. In a separate set of experiments, participants were asked to pair sound stimuli with climate terms extracted from the initial interviews and to evaluate the sound samples aesthetically. They were also asked to choose sound textures (from a given set of sounds) that best express the specific climate parameter and to rate the relevance of the sound to the metaphor. The author's team assessed correlations between climate terminology and sound stimuli for the sonification tool to improve the sound design. Results show a tendency toward natural sounds by climate scientists.

Journal ArticleDOI
TL;DR: The authors describe the European project ForgetIT, which is investigating the introduction of a form of digital or managed forgetting into information management environments, and the idea of making more conscious decisions about which content is really important, and thus should be preserved safely, and which content the authors can (and should) forget.
Abstract: Each year makes it easier to accumulate large numbers of photos and videos in the social and personal digital space. Their long-term existence is mostly driven by chance rather than by clear guidelines or rules for archiving them. Thus, unfortunately, cases of nonintended both the exposure and disappearance of personal photos happen much too often. This article mainly focuses on this question: What should we remember and thus archive, and what can we forget? The authors describe the European project ForgetIT (www.forgetIT-project.eu), which is investigating the introduction of a form of digital or managed forgetting into information management environments. The project focuses on the idea of making more conscious decisions about which content is really important, and thus should be preserved safely, and which content we can (and should) forget.