scispace - formally typeset
Search or ask a question

Showing papers in "IEEE MultiMedia in 2012"


Journal ArticleDOI
Zhengyou Zhang1
TL;DR: While the Kinect sensor incorporates several advanced sensing hardware, this article focuses on the vision aspect of the sensor and its impact beyond the gaming industry.
Abstract: Recent advances in 3D depth cameras such as Microsoft Kinect sensors (www.xbox.com/en-US/kinect) have created many opportunities for multimedia computing. The Kinect sensor lets the computer directly sense the third dimension (depth) of the players and the environment. It also understands when users talk, knows who they are when they walk up to it, and can interpret their movements and translate them into a format that developers can use to build new experiences. While the Kinect sensor incorporates several advanced sensing hardware, this article focuses on the vision aspect of the Kinect sensor and its impact beyond the gaming industry.

2,294 citations


Journal ArticleDOI
TL;DR: Two large facial-expression databases depicting challenging real-world conditions were constructed using a semi-automatic approach via a recommender system based on subtitles.
Abstract: Two large facial-expression databases depicting challenging real-world conditions were constructed using a semi-automatic approach via a recommender system based on subtitles.

552 citations


Journal ArticleDOI
TL;DR: This article surveys forensic face-recognition approaches and the challenges they face in improving matching and retrieval results as well as processing low-quality images.
Abstract: This article surveys forensic face-recognition approaches and the challenges they face in improving matching and retrieval results as well as processing low-quality images.

136 citations


Journal ArticleDOI
TL;DR: The proposed copyright protection scheme combines the discrete cosine transform (DCT) and singular value decomposition (SVD) using a control parameter to avoid the false-positive problem and can improve the image quality GA-based evolution.
Abstract: The proposed copyright protection scheme combines the discrete cosine transform (DCT) and singular value decomposition (SVD) using a control parameter to avoid the false-positive problem. In this article, we propose an efficient copyright protection scheme for e-government document images. First, we apply the discrete cosine transform (DCT) to the host image and use the zigzag space-filling curve (SFC) for the DCT coefficients. The DCT coefficients in the zigzag manner are then mapped into four areas with different frequencies in a rectangular shape. Then, we apply the singular value decomposition (SVD) operation to each area, and the host image is modified by the left singular vectors and the singular values of the DCT-transformed watermark to embed the watermark image. The left singular vectors and singular values are used as a control parameter to avoid the false-positive problem. Each area decides the scaling factor's optimal value using a genetic algorithm (GA) with the mean of the watermark's SVs. A scaling factor is simulated by chromosomes, and several optimization GA operators are used. After remapping each modified coefficient DCT back to the original position, the proposed inverse DCT produces the watermarked image. Our experimental results show that we can improve the image quality GA-based evolution and that this approach is robust under several kinds of attacks.

88 citations


Journal ArticleDOI
TL;DR: A digital image scrambling method based on a 2D cellular automaton, specifically the well-known Game of Life, produces an effective image encryption technique.
Abstract: A digital image scrambling method based on a 2D cellular automaton, specifically the well-known Game of Life, produces an effective image encryption technique.

68 citations


Journal ArticleDOI
TL;DR: The multimedia field is contributing to the evolution of these forms by summarizing long video sequences, discovering patterns in related content, or developing techniques for generating stories.
Abstract: In general, multimedia is nothing without a story. The original storytellers - think Homer's The Iliad - memorized long stories, with plenty of emotional content, and delivered them in a highly entertaining (and possibly interactive) manner. Music, movies, and video games are perhaps today's most popular forms of story-telling. The multimedia field is contributing to the evolution of these forms by summarizing long video sequences, discovering patterns in related content, or developing techniques for generating stories. The best question we can ask is, how can multimedia systems create better user experiences?

67 citations


Journal ArticleDOI
TL;DR: This article took an unsupervised approach in designing appropriate similarity measures to explicitly address the challenge arising from low-quality tattoo image matching and plans to extend the Tattoo-ID system to different application domains.
Abstract: In this article, we took an unsupervised approach in designing appropriate similarity measures to explicitly address the challenge arising from low-quality tattoo image matching. In the future, we plan to improve the matching algorithm by exploring both super- vised and semisupervised learning algorithms. Besides tattoos, other types of soft forensic evidence can be collected and managed in the form of images, such as shoe prints and gang graffiti images. Although Tattoo-ID focuses on tattoo image matching and retrieval, the underlying techniques developed in the Tattoo-ID system can be adopted to other forensic image databases.15 Other types of soft forensic image evidence might include shoeprints and gang graffiti images. In the future, we plan to extend the Tattoo-ID system to different application domains.

67 citations


Journal ArticleDOI
TL;DR: Concerns regarding how to authenticate multimedia data have led to research in forgery detection, but studies in audio are generally still limited compared to those in image and video.
Abstract: Recent developments in the audio- authentication field include basic, preliminary audio analysis and advanced audio- authentication techniques that exploit audio recording conditions and compressed audio features. Multimedia is involved in every aspect of our lives, and many of us rely on various websites for information on events taking place all over the world. We form opinions based on the content of camera footage, phone conversations, and other recordings. Although some sources give us authentic information, others contain forged content. Concerns regarding how to authenticate multimedia data have led to research in forgery detection, but studies in audio are generally still limited compared to those in image and video.

55 citations


Journal ArticleDOI
TL;DR: A proposed real-time video watermarking scheme is transparent and robust to geometric distortions, including rotation with cropping, scaling, aspect ratio change, frame dropping, and swapping.
Abstract: A proposed real-time video watermarking scheme is transparent and robust to geometric distortions, including rotation with cropping, scaling, aspect ratio change, frame dropping, and swapping.

49 citations


Journal ArticleDOI
TL;DR: QA's evolution from text to multimedia and the challenges in the field are identified by surveying recent research in multimedia question answering.
Abstract: By surveying recent research in multimedia question answering, this article explores QA's evolution from text to multimedia and identifies the challenges in the field.

48 citations


Journal ArticleDOI
TL;DR: A multiscale scale-invariant feature transform (SIFT) descriptor can help improve the ability to discriminate between images when using copy detection to identify illegal image copies.
Abstract: A multiscale scale-invariant feature transform (SIFT) descriptor can help improve our ability to discriminate between images when using copy detection to identify illegal image copies.

Journal ArticleDOI
TL;DR: An automated system for recognizing human skin disease conditions analyzes skin texture images using texture recognition techniques based on gray-level co-occurrence and wavelet decomposition matrices.
Abstract: An automated system for recognizing human skin disease conditions analyzes skin texture images using texture recognition techniques based on gray-level co-occurrence and wavelet decomposition matrices.

Journal ArticleDOI
TL;DR: The Workflow Recognition (WR) large-scale dataset is a collection of video sequences from the real industrial manufacturing environment of a major automobile manufacturer.
Abstract: Unlike any previous effort, the Workflow Recognition (WR) large-scale dataset is a collection of video sequences from the real industrial manufacturing environment of a major automobile manufacturer.

Journal ArticleDOI
TL;DR: The MediaEval Multimedia Benchmark leveraged community cooperation and crowdsourcing to develop a large Internet video dataset for its Genre Tagging and Rich Speech Retrieval tasks.
Abstract: The MediaEval Multimedia Benchmark leveraged community cooperation and crowdsourcing to develop a large Internet video dataset for its Genre Tagging and Rich Speech Retrieval tasks.

Journal ArticleDOI
TL;DR: A real-time posterity logging system detects and tracks multiple targets in video streams, grabbing face images and retaining only the best quality for each detected target.
Abstract: A real-time posterity logging system detects and tracks multiple targets in video streams, grabbing face images and retaining only the best quality for each detected target.

Journal ArticleDOI
TL;DR: This special issue provides an overview of current research following this mission, with six high-quality contributions cover various approaches in the field, ranging from the visual recognition of faces and tattoos to the discovery of near duplicates and content tampering.
Abstract: With the proliferation of multimedia data, it has become necessary to secure this content from illegal use, efficiently detect and reconstruct illegal activities from it, and use it as a source of intelligence. Serious challenges arise from the sheer data volume, however. The multimedia research community has developed many exciting solutions for dealing with video footage, images, audio, and other multimedia content over recent years, including knowledge extraction, automatic categorization, and indexing. Although this work forms an excellent foundation for protecting and analyzing multimedia content, challenges remain in the complexity of the targeted material, the lack of structure and metadata, and other application-specific constraints. This special issue provides an overview of current research following this mission. The articles originally appeared at the ACM Multimedia 2010 Workshop on Multimedia in Forensics, Security, and Intelligence (MiFor). The six high-quality contributions cover various approaches in the field, ranging from the visual recognition of faces and tattoos to the discovery of near duplicates and content tampering.

Journal ArticleDOI
TL;DR: The ImageCLEF Wikipedia image retrieval task aimed to support ad-hoc image retrieval evaluation using large-scale collections of Wikipedia images and their user-generated annotations.
Abstract: The ImageCLEF Wikipedia image retrieval task aimed to support ad-hoc image retrieval evaluation using large-scale collections of Wikipedia images and their user-generated annotations.

Journal ArticleDOI
TL;DR: A combined face and eye detector system based on multiresolution local ternary patterns and local phase quantization descriptors can achieve noticeable performance improvements by extracting features locally.
Abstract: A combined face and eye detector system based on multiresolution local ternary patterns and local phase quantization descriptors can achieve noticeable performance improvements by extracting features locally.

Journal ArticleDOI
TL;DR: This article summarizes the authors' initial work on live monitoring of raw UGC and events as they unfold and highlights six of the research projects carried out by the NExT Center along the lines of crawling, analyzing, and visualizing live UGC data.
Abstract: The Web has revolutionized the way we create, disseminate, and consume information. Users have changed from passive recipients of information to active content consumers and creators, and the nature of information has also changed from static text to dynamic multimedia. With the widespread use of social networks, live user-generated content (UGC) has begun to dominate the Internet. Such UGC covers a range of media, from text (tweets, forums, and Facebook messages) to images (Instagram and Flickr), videos (YouTube), location check-ins (Foursquare), and community question-and-answer forums (Yahoo!Answers and WikiAnswers). The NUS-Tsinghua Center for Extreme Search (or NExT Center) is collaboration between the National University of Singapore (NUS) and Tsinghua University that focuses on the novel, challenging task of analyzing and organizing UGC to make it available for general access. This article summarizes the authors' initial work on live monitoring of raw UGC and events as they unfold. It highlights six of the research projects carried out by the NExT Center along the lines of crawling, analyzing, and visualizing live UGC data.

Journal ArticleDOI
TL;DR: In this paper, a sparsity-constrained bilinear model (SBLM) and a set of SBLMs were combined in a boosting-like procedure to enhance performance.
Abstract: Using higher-level visual elements to represent images, the authors have developed a sparsity-constrained bilinear model (SBLM) and have combined a set of SBLMs in a boosting-like procedure to enhance performance.

Journal ArticleDOI
TL;DR: A panel discussion explored this intriguing question: Where is the user in multimedia retrieval, and suggested methods to bring the data and the user back together.
Abstract: What started as a field with an emphasis on optimally serving users' interactive information needs has now become dominated by methods that focus on improving the mean average precision (MAP) of a clearly defined task disconnected from its application. With the pervasiveness of the Internet and all the sensors available to derive contextual user information, it is time to bring the data and the user back together. As a field, we must consider understanding the subjective and descriptive nature of users and understanding data as equally interesting research topics that are both worthy of publication. At the 2012 ACM Second Annual International Conference on Multimedia Retrieval (ICMR) in Hong Kong, a panel took place with Marcel Worring as the moderator and the other authors of this article as the panelists. This panel discussion explored this intriguing question: Where is the user in multimedia retrieval?

Journal ArticleDOI
TL;DR: To explain the importance that meeting browsers have gained in time, the paper summarizes findings of user studies, discusses features of meeting browser prototypes developed in AMI/IM2, and outlines the main evaluation protocol proposed (BET).
Abstract: This paper surveys the work carried out within two large consortia, AMI and IM2, on improving access to records of human meetings using multimodal interfaces called meeting browsers. Their design has emerged as an important goal, with both theoretical interest and practical applications. Meeting browsers are assistance tools that help humans navigate through multimedia records (audio, video, documents, and metadata) in order to obtain a general idea about what happened in a meeting or to find specific pieces of information, for discovery or verification. To explain the importance that meeting browsers have gained in time, the paper summarizes findings of user studies, discusses features of meeting browser prototypes developed in AMI/IM2, and outlines the main evaluation protocol proposed (BET). Reference scores are provided for future benchmarking. These achievements in meeting browsing are the result of an iterative software process, from user studies to prototypes and then to products.

Journal ArticleDOI
John R. Smith1
TL;DR: Better characterizing the overall size and shape of the semantic space for multimedia will help define what is on the other side and ensure that the authors make progress on bridging the gap.
Abstract: "Bridging the semantic gap" is an expression often used to describe work on multimedia content understanding. At best, research today is bridging a semantic gap, of which there are many. Better characterizing the overall size and shape of the semantic space for multimedia will help define what is on the other side and ensure that we make progress on bridging the gap.

Journal ArticleDOI
TL;DR: The main requirements and architecture of micro grids for media and semantic computing are presented, playing a central role in the semantic evolution of small and medium-sized services.
Abstract: Micro grid technology is playing a central role in the semantic evolution of small- and medium-sized services. This article presents the main requirements and architecture of micro grids for media and semantic computing.

Journal ArticleDOI
TL;DR: How recent trends in parallel computing have influenced the design of modern video coding standards is discussed, including how the High Efficiency Video Coding (HEVC) standard is looking at ways to implement the co-exploration between algorithm and architecture (CEAA) approach.
Abstract: Video coding has always been a computationally intensive process. Although dramatic improvements in coding efficiency have been realized in recent years, the algorithms have become increasingly complex and there is a broader recognition that it is necessary to realize the capabilities of multicore processors. This article discusses how recent trends in parallel computing have influenced the design of modern video coding standards. Specifically, the authors discuss how the High Efficiency Video Coding (HEVC) standard, which is being jointly developed by ISO/IEC JTC1/SC29 WG11 (MPEG) and ITU-T SH16/Q.6 (VCEG), is looking at ways to implement the co-exploration between algorithm and architecture (CEAA) approach.

Journal ArticleDOI
TL;DR: A case-based computer-aided diagnosis system assists physicians and other medical personnel in the interpretation of optical biopsies obtained through confocal laser endomicroscopy and shows promising results on inferring semantic metadata from low-level features.
Abstract: A case-based computer-aided diagnosis system assists physicians and other medical personnel in the interpretation of optical biopsies obtained through confocal laser endomicroscopy. Extraction in CLE images shows promising results on inferring semantic metadata from low-level features. In order to effectively ensure the interoperability with potential third-party applications, the system provides an interface compliant with the recent standards ISO/IEC 15938-12:2008 (MPEG Query Format) and ISO/IEC 24800 (JPEG Search).

Journal ArticleDOI
TL;DR: This article surveys immersive communication environments and focuses on high-level issues and applications in entertainment, business and society, and simulated learning and education.
Abstract: This article surveys immersive communication environments and focuses on high-level issues and applications. The authors classify the immersive computing environments into three major categories: entertainment, business and society, and simulated learning and education.

Journal ArticleDOI
TL;DR: UltraViolet defines an ecosystem for online delivery and consumption of premium video content that coordinates authorized use among multiple digital rights management systems.
Abstract: Online media delivery is a big business, but there's a lack of interoperability between various retailers and associated devices. UltraViolet defines an ecosystem for online delivery and consumption of premium video content that coordinates authorized use among multiple digital rights management (DRM) systems. This initiative is supported by Hollywood studios and a number of important players in the media delivery industry.

Journal ArticleDOI
TL;DR: This special issue hopes to address challenges in large-scale media research by covering identification of use cases and task design, dataset development, and basic research over existing datasets.
Abstract: The widespread adoption of smartphones equipped with high-quality image-capturing capabilities coupled with the prevalent use of social networks have resulted in an explosive growth of social media content. People now routinely capture the scenes around them and instantly share the multimedia content with their friends over a variety of social networks. The social network functions also ensure that much of this content comes with some form of social annotations. This environment sets the stage for advances in large-scale media research. This special issue hopes to address these challenges. The articles in this issue cover identification of use cases and task design, dataset development, and basic research over existing datasets.

Journal ArticleDOI
TL;DR: A functional mapping between the Java Call Control (JCC) and Session Initiation Protocol (SIP) can help developers more easily create, deploy, and manage advanced telecom services.
Abstract: A functional mapping between the Java Call Control (JCC) and Session Initiation Protocol (SIP) can help developers more easily create, deploy, and manage advanced telecom services.