scispace - formally typeset
Search or ask a question

Showing papers in "IEEE MultiMedia in 2013"


Journal ArticleDOI
TL;DR: A fully distributed immersive teleconferencing system that assumes there is only one user at each station during the conference, which would allow users to conduct meetings in their own offices for greatest convenience.
Abstract: The Viewport immersive teleconferencing system reconstructs sparse 3D representations for each user and applies virtual seating to maintain the same seating geometry as face-to-face meetings. In this article, we propose a fully distributed immersive teleconferencing system that assumes there is only one user at each station during the conference. Such a system would allow users to conduct meetings in their own offices for greatest convenience. Compared with group conferencing systems, fully distributed systems let us render the light and sound fields from a single user's viewpoint at each site, which demands less of the system hardware.

81 citations


Journal ArticleDOI
TL;DR: It is shown that using a field lens and a square aperture significantly reduces the vignetting problem associated with a relay system and achieves over 95 percent fill factor.
Abstract: We demonstrated a 3D holoscopic video system for 3DTV application. We showed that using a field lens and a square aperture significantly reduces the vignetting problem associated with a relay system and achieves over 95 percent fill factor. The main problem for such a relay system is the nonlinear distortion during the 3D image capturing, which can seriously affect the reconstruction process for a 3D display. The nonlinear distortion mainly includes lens radial distortion (intrinsic) and microlens array perspective distortion (extrinsic). This is the task of future work. Our results also show that the SS coding approach performs better than the standard HEVC scheme. Furthermore, we show that search and retrieval performance relies on the depth map's quality and that the multimodal fusion boosts the retrieval performance.

80 citations


Journal ArticleDOI
TL;DR: A finger-writing system that recognizes characters written in the air without the need for an extra handheld device is presented, which adaptively merges depth, skin, and background models for the hand segmentation to overcome the limitations of the individual models.
Abstract: With the introduction of Microsoft Kinect, there has been considerable interest in creating various attractive and feasible applications in related research fields. Kinect simultaneously captures the depth and color information and provides real-time reliable 3D full-body human-pose reconstruction that essentially turns the human body into a controller. This article presents a finger-writing system that recognizes characters written in the air without the need for an extra handheld device. This application adaptively merges depth, skin, and background models for the hand segmentation to overcome the limitations of the individual models, such as hand-face overlapping problems and the depth-color nonsynchronization. The writing fingertip is detected by a new real-time dual-mode switching method. The recognition accuracy rate is greater than 90 percent for the first five candidates of Chinese characters, English characters, and numbers.

79 citations


Journal ArticleDOI
TL;DR: A novel partial-duplicate image-retrieval scheme based on saliency-guided visual matching, where the localization of duplicates is done simultaneously so as to speed up retrieval.
Abstract: This article proposes a novel partial-duplicate image-retrieval scheme based on saliency-guided visual matching, where the localization of duplicates is done simultaneously. The image is abstracted by visually salient and rich regions (VSRRs), which are of visual saliency and contain rich visual content. Furthermore, to refine the retrieval, a relative saliency ordering constraint is constructed that captures the robust relative saliency layout of the VSRRs. The authors propose an efficient algorithm to embed this constraint into the index system so as to speed up retrieval. Comparison experiments with state-of-the-art methods on five databases show the efficiency and effectiveness of the proposed approach.

56 citations


Journal ArticleDOI
TL;DR: The MPEG Media Transport (MMT) standard is being developed with specifications for encapsulation, delivery, and signaling, which enable fine-grained content access with uniquely identifiable names for optimized content delivery.
Abstract: Content-centric networking promises more efficient distribution of data through in-network caching and the propagation of content through the network. This networking paradigm poses a number of challenges and new opportunities for more efficient multimedia delivery. The MPEG Media Transport (MMT) standard is being developed to address these needs with specifications for encapsulation, delivery, and signaling, which enable fine-grained content access with uniquely identifiable names for optimized content delivery.

54 citations


Journal ArticleDOI
TL;DR: This work shows how two compression blocks for video coding--a modified frequency transform and a modified entropy coding scheme (called a chaotic arithmetic coding or CAC)--can be used for video encryption.
Abstract: Algorithmic parameterization and hardware architectures can ensure secure transmission of multimedia data in resource-constrained environments such as wireless video surveillance networks, telemedicine frameworks for distant health care support in rural areas, and Internet video streaming. Joint multimedia compression and encryption techniques can significantly reduce the computational requirements of video processing systems. The authors present an approach to reduce the computational cost of multimedia encryption while also preserving the properties of compressed video. A hardware-amenable design of the proposed algorithms makes them suitable for real-time embedded multimedia systems. This approach alleviates the need for additional hardware for encryption in resource-constrained scenarios and can be otherwise used to augment existing encryption methods used for content delivery on the Internet or in other applications. This work shows how two compression blocks for video coding--a modified frequency transform (called a secure wavelet transform or SWT) and a modified entropy coding scheme (called a chaotic arithmetic coding or CAC)--can be used for video encryption. Experimental results are shown for selective encryption using the proposed schemes.

47 citations


Journal ArticleDOI
TL;DR: Experiments show that the proposed methods automatically build image phylogeny trees from partial information about the near duplicates, improving the efficiency and effectiveness of the whole process, and represent a step forward in determining causal relationships between digital images overtime.
Abstract: Similar to organisms that evolve in biology, a document can change slightly overtime, and each new version may, in turn, generate other versions. Multimedia phylogeny investigates the history and evolutionary process of digital objects and includes finding the causal and ancestral document relationships, source of modifications, and the order and transformations that originally created the set of near duplicates. Multimedia phylogeny has direct applications in security, forensics, and information retrieval. This article explores the phylogeny problem for near-duplicate images in large-scale scenarios and present solutions that have straightforward extension to other media such as videos. Experiments with approximately 2 million test cases (with synthetic and real data) show that the proposed methods automatically build image phylogeny trees from partial information about the near duplicates, improving the efficiency and effectiveness of the whole process, and represent a step forward in determining causal relationships between digital images overtime.

41 citations


Journal ArticleDOI
TL;DR: The proposed method is based on the idea that the problem of human gait recognition can be transformed from a spatio-temporal problem into the spatial domain, specifically the 2D image domain, by representing a sample of a humangait as a still image.
Abstract: This article proposes a new method of recognizing human gait. The proposed method is based on the idea that the problem of human gait recognition can be transformed from a spatio-temporal problem into the spatial domain, specifically the 2D image domain. This is done by representing a sample of a human gait as a still image. By doing so, all the recorded information is kept while enabling the use of proven content-based image retrieval (CBIR) techniques for recognition. The proposed method uses Microsoft Kinect human-computer interaction technology for data acquisition. To prove the validity of the proposed approach, the authors conducted a study with 50 participants.

36 citations


Journal ArticleDOI
TL;DR: A new feature representation, named Nested-SIFT, is proposed, which utilizes the nesting relationship between SIFT features to group local features to improve the effectiveness of feature representation and the efficiency of feature matching.
Abstract: To improve the effectiveness of feature representation and the efficiency of feature matching, we propose a new feature representation, named Nested-SIFT, which utilizes the nesting relationship between SIFT features to group local features. A Nested-SIFT group consists of a bounding feature and several member features covered by the bounding feature. To obtain a compact representation, SimHash strategy is used to compress member features in a Nested-SIFT group into a binary code, and the similarity between two Nested-SIFT groups is efficiently computed by using the binary codes. Extensive experimental results demonstrate the effectiveness and efficiency of our proposed Nested-SIFT approach.

36 citations


Journal ArticleDOI
TL;DR: A compact image signature is proposed by aggregating tensors of visual descriptors by preprocessing the descriptors through projection and quantization of the signatures.
Abstract: The main issues for Web-scale image retrieval are achieving good accuracy while retaining low computational time and memory footprint. This article proposes a compact image signature by aggregating tensors of visual descriptors. Efficient aggregation is achieved by preprocessing the descriptors. Compactness is achieved by projection and quantization of the signatures. The authors compare the proposed method to other efficient signatures on a 1 million images dataset and show the soundness of the approach.

35 citations


Journal ArticleDOI
TL;DR: An overview of the USAC architecture is provided and the performance relative to the best state-of-the-art speech and audio codecs are summarized.
Abstract: The MPEG Audio Subgroup has a rich history of accomplishments in creating music coding technology. At higher bit rates, MPEG technology can represent arbitrary sounds, including the human voice, with excellent quality. MPEG-1 and MPEG-2 Audio coders use perceptually shaped quantization noise as the primary tool for achieving compression. The MPEG-4 High-Efficiency Advanced Audio Coding (AAC) standard is a single technology capable of compressing speech, speech mixed with music, or music signals with quality that is always at least as good as the best of two state-of-the-art reference codecs, one optimized for speech and mixed content (AMR-WB B;) and the other optimized for music and general audio (HE-AACv2). This article provides an overview of the USAC architecture and summarizes the performance relative to the best state-of-the-art speech and audio codecs.

Journal Article
John R. Smith1
TL;DR: Video search needs effective and efficient techniques for video summarization to enable rapid triage and finding relevant video contents.
Abstract: Video search needs effective and efficient techniques for video summarization to enable rapid triage and finding relevant video contents.

Journal ArticleDOI
TL;DR: Challenges in delivery of multimedia content over 4G networks for several application scenarios are outlined to augment the increasing demand for video applications in cellular and wireless traffic.
Abstract: Wireless network traffic is dominated by video and requires new ways to maximize the user experience and optimize networks to prevent saturation. The exploding number of subscribers in cellular networks has exponentially increased the volume and variety of multimedia content flowing across the network. This article details some challenges in delivery of multimedia content over 4G networks for several application scenarios. To augment the increasing demand for video applications in cellular and wireless traffic, these challenges must be efficiently addressed.

Journal ArticleDOI
TL;DR: A system that guides listeners through orchestral performances in real time by presenting time-relevant annotations in a manner similar to that of a personal museum guide has been developed and adopted by the Philadelphia Orchestra.
Abstract: Many people enjoy the symphony, but those without prior training often find it difficult to relate to the music. The authors have developed a system that guides listeners through orchestral performances in real time by presenting time-relevant annotations in a manner similar to that of a personal museum guide. These annotations are authored in partnership with musical experts prior to a performance to provide appropriate contextual information for a given concert program. Using acoustic features of the music, they align the live performance with that of a previously time-stamped recording. The aligned position is transmitted to an application on the users' handheld devices, which present the annotations using an intuitive and unobtrusive interface. To assess its utility, the system underwent a user beta testing stage accompanying orchestra concert broadcasts. It has since been adopted by the Philadelphia Orchestra for use during live concerts in its 2012-2013 subscription season and beyond.

Journal ArticleDOI
TL;DR: This article proposes the adoption of a content-aware approach into the network infrastructure, thus making it capable of identifying, processing, and manipulating media streams and objects in real time to maximize quality of service (QoS) and experience (QoE).
Abstract: Increasingly popular multimedia services are expected to play a dominant role in the future of the Internet. In this context, it is essential that content-aware networking (CAN) architectures explicitly address the efficient delivery and processing of multimedia content. This article proposes the adoption of a content-aware approach into the network infrastructure, thus making it capable of identifying, processing, and manipulating media streams and objects in real time to maximize quality of service (QoS) and experience (QoE). Our proposal is built on the exploitation of scalable media coding technologies within such a content-aware networking environment. This discussion is based on four representative use cases for media delivery (unicast, multicast, peer-to-peer, and adaptive HTTP streaming) and reviews CAN challenges, specifically flow processing, caching/buffering, and QoS/QoE management.

Journal ArticleDOI
TL;DR: A soft-threshold learning algorithm is utilized to estimate the optimal decision thresholds for detectors, and a multiscale sequence matching method is employed to precisely locate copies using a 2D Hough transform and multigranularities similarity evaluation.
Abstract: For video copy detection, no single audio-visual feature, or single detector based on several features, can work well for all transformations. This article proposes a novel video copy-detection and localization approach with scalable cascading of complementary detectors and multiscale sequence matching. In this cascade framework, a soft-threshold learning algorithm is utilized to estimate the optimal decision thresholds for detectors, and a multiscale sequence matching method is employed to precisely locate copies using a 2D Hough transform and multigranularities similarity evaluation. Excellent performance on the TRECVID-CBCD 2011 benchmark dataset shows the effectiveness and efficiency of the proposed approach.

Journal ArticleDOI
TL;DR: This article reviews three depth-sensing approaches for 3DTV and discusses several approaches for acquiring depth information and provides a comparative analysis of their characteristics.
Abstract: In the context of 3D video systems, depth information could be used to render a scene from additional viewpoints. Although there have been many recent advances in this area, including the introduction of the Microsoft Kinect sensor, the robust acquisition of such information continues to be a challenge. This article reviews three depth-sensing approaches for 3DTV. The authors discuss several approaches for acquiring depth information and provides a comparative analysis of their characteristics.

Journal ArticleDOI
TL;DR: The author provides a model for musical interface design and discusses it in terms of a large online database of digital musical instruments he has created.
Abstract: This article discusses the proliferation of new musical instruments and interfaces for computer-based music performance (digital musical instruments). It discusses the notion of a musical instrument schema and how preexisting musical practice can be used to provide design guidelines for this developing field. In so doing, it teases out notions of control and creation and discusses a number of theoretical positions for those notions in musical performance. The author provides a model for musical interface design and discusses it in terms of a large online database of digital musical instruments he has created.

Journal ArticleDOI
TL;DR: The main concepts, parts, and achievements of the JPSearch framework are discussed and its use is demonstrated through a set of substantial case studies.
Abstract: Triggered by the rise of social networks, community-based image sharing platforms are emerging at an increasing rate. Currently, almost every repository offers a different interaction interface and metadata description format. Unfortunately, this prevents unified and efficient access to these repositories. Consequently, data exchange between systems is often cumbersome. In this context, ISO/IEC JTC1 SC29 WG1 (more commonly known as JPEG) initiated the JPSearch framework standardization, which aims to foster the interaction with and among image repositories. The standard focuses on three main cornerstones supporting repository synchronization, search and access, and image collection creation and maintenance. This article discusses the main concepts, parts, and achievements of the JPSearch framework and demonstrates its use through a set of substantial case studies.

Journal ArticleDOI
TL;DR: The authors describe a selection of relevant deployment scenarios, from content licensing to authorization-based content access control, including a specific case for mobile scenarios.
Abstract: Standards-based middleware architectures for content management are suitable for a range of business scenarios. In this context, the authors review the MPEG-M standard and the MIPAMS standards-based architecture. They describe a selection of relevant deployment scenarios, from content licensing to authorization-based content access control, including a specific case for mobile scenarios. They illustrate each of the scenarios with real MIPAMS implementations developed in several research projects and under contracts within the industry.

Journal ArticleDOI
TL;DR: A graph-model based technique is proposed, identified as demonstration graph, to construct and coordinate both behavioral and cognitive models automatically for IHCs to accomplish complex tasks in a simple, universal way.
Abstract: In this article, we propose a graph-model based technique, identified as demonstration graph, to construct and coordinate both behavioral and cognitive models automatically for IHCs to accomplish complex tasks in a simple, universal way. Our technique is inspired by the insight from psychology, neuroscience, and human ethology that humans' decision making largely relies on their past, similar experiences.5,6 Our work is further supported by Alan Turing, who believed that building an intelligent system necessitates imitating human mental processing.7 Thus, we use the Learning-from-Demonstrations (LfD) method,8 borrowed from the robotics domain, to make a character mimic successful human demonstrations to accomplish a well-defined task.

Journal ArticleDOI
TL;DR: To provide users with a high-quality experience, interactive telepresence system platforms must accommodate multiple performance profiles for diverse, shared cyberphysical activities.
Abstract: To provide users with a high-quality experience, interactive telepresence system platforms must accommodate multiple performance profiles for diverse, shared cyberphysical activities.

Journal ArticleDOI
TL;DR: As an alternative to traditional hardware-based ultra-high definition (UHD) multimedia systems, the proposed software-based approach offers a better cost-benefit ratio and might help facilitate large-scale deployment.
Abstract: As an alternative to traditional hardware-based ultra-high definition (UHD) multimedia systems, the proposed software-based approach offers a better cost-benefit ratio and might help facilitate large-scale deployment.

Journal ArticleDOI
John R. Smith1
TL;DR: The way forward is for the multimedia field to create appropriate lesson plans or more generally develop curriculum-based approaches to multimedia machine learning, says EIC John R. Smith.
Abstract: Machine learning has become an indispensible tool for the multimedia community. Given large amounts of data, computers using machine learning are able to create rich representations and accomplish impressive discrimination tasks. Yet, the way machines learn is still differs significantly from how humans learn. EIC John R. Smith explains that the way forward is for the multimedia field to create appropriate lesson plans or more generally develop curriculum-based approaches to multimedia machine learning.

Journal ArticleDOI
TL;DR: Social multimedia signal processing aims to transform the noise-like phenomena in social media into signals useful for building novel, socially aware multimedia applications and targeted advertising techniques as well as exploring new marketing methods.
Abstract: Social media gives ordinary people the power to be content creators and information disseminators. This information is embedded in multimedia shared across social networks, containing valuable indications about various facets of human life-about what captures our attention, our sharing biases, and the digital traces we abdicate. Social multimedia signal processing aims to transform the noise-like phenomena in social media into signals useful for building novel, socially aware multimedia applications and targeted advertising techniques as well as exploring new marketing methods. With a fresh way to look at the existence of multimedia in online social networks, we can also explore new marketing methods and targeted advertising techniques.

Journal ArticleDOI
TL;DR: A novel sparse projection method to address the efficiency challenge by learning a discriminative compact representation that drastically reduces transmission costs and with less than 10 percent nonzero elements in the projection matrix, it also reduces computational and storage costs.
Abstract: Retrieving relevant videos from a large corpus on mobile devices is a vital challenge. This article addresses two key issues for mobile search on user-generated videos. The first is the lack of good relevance measurement for learning semantically rich representations, due to the unconstrained nature of online videos. The second is the limited resources on mobile devices, stringent bandwidth, and delay requirement between the device and video server. The authors propose a knowledge-embedded sparse projection learning approach. To alleviate the need for expensive annotation in hash learning, they investigate varying approaches for pseudo label mining, where explicit semantic analysis leverages Wikipedia. In addition, they propose a novel sparse projection method to address the efficiency challenge by learning a discriminative compact representation that drastically reduces transmission costs. With less than 10 percent nonzero elements in the projection matrix, it also reduces computational and storage costs. The experimental results on 100,000 videos show that the proposed algorithm yields performance competitive with the prior state-of-the-art hashing methods, which are not applicable for mobiles and solely rely on costly manual annotations. The average query time for 100,000 videos was only 0.592 seconds.

Journal ArticleDOI
TL;DR: The authors optimize the design of a hash-code collision and counting scheme to enable fast search of visual features of MPEG CDVS and explore a new indexing scheme.
Abstract: Visual search over large image repositories in real time is one of the key challenges for applications such as mobile visual query-by-capture, augmented reality, and biometrics-based identification. Search accuracy and response speed are two important performance factors. This article focuses on one of the important elements of this technology that enables large-scale visual search: indexing (or hashing). Indexing is the process of organizing a database of searchable elements into an efficiently searchable configuration. The searchable elements in our case are compact features extracted from images. This article explores a new indexing scheme. The authors optimize the design of a hash-code collision and counting scheme to enable fast search of visual features of MPEG CDVS.

Journal ArticleDOI
TL;DR: The authors give a brief overview of the psychology of face perception and then describe some of the applications of computer vision and pattern recognition applied to face recognition in media production.
Abstract: Facial expressions play an important role in day-by-day communication as well as media production. This article surveys automatic facial analysis and modeling methods using computer vision techniques and their applications for media production. The authors give a brief overview of the psychology of face perception and then describe some of the applications of computer vision and pattern recognition applied to face recognition in media production. This article also covers the automatic generation of face models, which are used in movie and TV productions for special effects in order to manipulate people's faces or combine real actors with computer graphics.

Journal ArticleDOI
TL;DR: A categorization of related theories and algorithms is provided and include a mathematical formulation, analysis, and discussion per category to improve the efficiency, effectiveness, and overall utility of Web image search reranking technology.
Abstract: This article reviews recent advancements in developing approaches to Web image search reranking. The authors provide a categorization of related theories and algorithms and include a mathematical formulation, analysis, and discussion per category. They highlight the limitations of the existing approaches and make recommendations on what they believe to be the most critical research directions to improve the efficiency, effectiveness, and overall utility of Web image search reranking technology.

Journal ArticleDOI
TL;DR: This article studies the challenges and opportunities of million-scale near-duplicate video retrieval using three well-known and potentially scalable methods and concludes that although these approaches can perform efficiently they suffer from a significant performance drop.
Abstract: This article studies the challenges and opportunities of million-scale near-duplicate video retrieval using three well-known and potentially scalable methods It concludes that although these approaches can perform efficiently they suffer from a significant performance drop because the efficient features are neither discriminative nor robust enough for versatile cases