Showing papers in &quot;IEEE MultiMedia in 2013&quot;

Immersive 3D Holoscopic Video System

TL;DR: A fully distributed immersive teleconferencing system that assumes there is only one user at each station during the conference, which would allow users to conduct meetings in their own offices for greatest convenience.

...read moreread less

Abstract: The Viewport immersive teleconferencing system reconstructs sparse 3D representations for each user and applies virtual seating to maintain the same seating geometry as face-to-face meetings. In this article, we propose a fully distributed immersive teleconferencing system that assumes there is only one user at each station during the conference. Such a system would allow users to conduct meetings in their own offices for greatest convenience. Compared with group conferencing systems, fully distributed systems let us render the light and sound fields from a single user's viewpoint at each site, which demands less of the system hardware.

...read moreread less

81 citations

Journal Article•DOI•

[...]

Amar Aggoun¹, Emmanuel Tsekleves¹, Mohammad Rafiq Swash¹, Dimitrios Zarpalas, Anastasios Dimou, Petros Daras, Paulo Nunes², Luís Ducla Soares² - Show less +4 more•Institutions (2)

Brunel University London¹, ISCTE – University Institute of Lisbon²

A New Writing Experience: Finger Writing in the Air Using a Kinect Sensor

TL;DR: It is shown that using a field lens and a square aperture significantly reduces the vignetting problem associated with a relay system and achieves over 95 percent fill factor.

...read moreread less

Abstract: We demonstrated a 3D holoscopic video system for 3DTV application. We showed that using a field lens and a square aperture significantly reduces the vignetting problem associated with a relay system and achieves over 95 percent fill factor. The main problem for such a relay system is the nonlinear distortion during the 3D image capturing, which can seriously affect the reconstruction process for a 3D display. The nonlinear distortion mainly includes lens radial distortion (intrinsic) and microlens array perspective distortion (extrinsic). This is the task of future work. Our results also show that the SS coding approach performs better than the standard HEVC scheme. Furthermore, we show that search and retrieval performance relies on the depth map's quality and that the multimodal fusion boosts the retrieval performance.

...read moreread less

80 citations

Journal Article•DOI•

[...]

Xin Zhang¹, Ye Zhichao¹, Lianwen Jin¹, Ziyong Feng¹, Shaojie Xu¹ - Show less +1 more•Institutions (1)

South China University of Technology¹

22 Nov 2013-IEEE MultiMedia

TL;DR: A finger-writing system that recognizes characters written in the air without the need for an extra handheld device is presented, which adaptively merges depth, skin, and background models for the hand segmentation to overcome the limitations of the individual models.

...read moreread less

Abstract: With the introduction of Microsoft Kinect, there has been considerable interest in creating various attractive and feasible applications in related research fields. Kinect simultaneously captures the depth and color information and provides real-time reliable 3D full-body human-pose reconstruction that essentially turns the human body into a controller. This article presents a finger-writing system that recognizes characters written in the air without the need for an extra handheld device. This application adaptively merges depth, skin, and background models for the hand segmentation to overcome the limitations of the individual models, such as hand-face overlapping problems and the depth-color nonsynchronization. The writing fingertip is detected by a new real-time dual-mode switching method. The recognition accuracy rate is greater than 90 percent for the first five candidates of Chinese characters, English characters, and numbers.

...read moreread less

79 citations

Journal Article•DOI•

Partial-Duplicate Image Retrieval via Saliency-Guided Visual Matching

[...]

Liang Li¹, Shuqiang Jiang¹, Zheng-Jun Zha², Zhipeng Wu³, Qingming Huang¹ - Show less +1 more•Institutions (3)

Chinese Academy of Sciences¹, National University of Singapore², University of Tokyo³

MMT: An Emerging MPEG Standard for Multimedia Delivery over the Internet

TL;DR: A novel partial-duplicate image-retrieval scheme based on saliency-guided visual matching, where the localization of duplicates is done simultaneously so as to speed up retrieval.

...read moreread less

Abstract: This article proposes a novel partial-duplicate image-retrieval scheme based on saliency-guided visual matching, where the localization of duplicates is done simultaneously. The image is abstracted by visually salient and rich regions (VSRRs), which are of visual saliency and contain rich visual content. Furthermore, to refine the retrieval, a relative saliency ordering constraint is constructed that captures the robust relative saliency layout of the VSRRs. The authors propose an efficient algorithm to embed this constraint into the index system so as to speed up retrieval. Comparison experiments with state-of-the-art methods on five databases show the efficiency and effectiveness of the proposed approach.

...read moreread less

56 citations

Journal Article•DOI•

[...]

Young-Kwon Lim¹, Kyung-Mo Park¹, Jin Young Lee², S. Aoki, G. Fernando - Show less +1 more•Institutions (2)

Samsung¹, Electronics and Telecommunications Research Institute²

Securing Multimedia Content Using Joint Compression and Encryption

TL;DR: The MPEG Media Transport (MMT) standard is being developed with specifications for encapsulation, delivery, and signaling, which enable fine-grained content access with uniquely identifiable names for optimized content delivery.

...read moreread less

Abstract: Content-centric networking promises more efficient distribution of data through in-network caching and the propagation of content through the network. This networking paradigm poses a number of challenges and new opportunities for more efficient multimedia delivery. The MPEG Media Transport (MMT) standard is being developed to address these needs with specifications for encapsulation, delivery, and signaling, which enable fine-grained content access with uniquely identifiable names for optimized content delivery.

...read moreread less

54 citations

Journal Article•DOI•

[...]

Amit Pande¹, Prasant Mohapatra¹, Joseph Zambreno²•Institutions (2)

University of California, Davis¹, Iowa State University²

Large-Scale Image Phylogeny: Tracing Image Ancestral Relationships

TL;DR: This work shows how two compression blocks for video coding--a modified frequency transform and a modified entropy coding scheme (called a chaotic arithmetic coding or CAC)--can be used for video encryption.

...read moreread less

Abstract: Algorithmic parameterization and hardware architectures can ensure secure transmission of multimedia data in resource-constrained environments such as wireless video surveillance networks, telemedicine frameworks for distant health care support in rural areas, and Internet video streaming. Joint multimedia compression and encryption techniques can significantly reduce the computational requirements of video processing systems. The authors present an approach to reduce the computational cost of multimedia encryption while also preserving the properties of compressed video. A hardware-amenable design of the proposed algorithms makes them suitable for real-time embedded multimedia systems. This approach alleviates the need for additional hardware for encryption in resource-constrained scenarios and can be otherwise used to augment existing encryption methods used for content delivery on the Internet or in other applications. This work shows how two compression blocks for video coding--a modified frequency transform (called a secure wavelet transform or SWT) and a modified entropy coding scheme (called a chaotic arithmetic coding or CAC)--can be used for video encryption. Experimental results are shown for selective encryption using the proposed schemes.

...read moreread less

47 citations

Journal Article•DOI•

[...]

Zanoni Dias¹, Siome Goldenstein¹, Anderson Rocha¹•Institutions (1)

State University of Campinas¹

Walking in Colors: Human Gait Recognition Using Kinect and CBIR

TL;DR: Experiments show that the proposed methods automatically build image phylogeny trees from partial information about the near duplicates, improving the efficiency and effectiveness of the whole process, and represent a step forward in determining causal relationships between digital images overtime.

...read moreread less

Abstract: Similar to organisms that evolve in biology, a document can change slightly overtime, and each new version may, in turn, generate other versions. Multimedia phylogeny investigates the history and evolutionary process of digital objects and includes finding the causal and ancestral document relationships, source of modifications, and the order and transformations that originally created the set of near duplicates. Multimedia phylogeny has direct applications in security, forensics, and information retrieval. This article explores the phylogeny problem for near-duplicate images in large-scale scenarios and present solutions that have straightforward extension to other media such as videos. Experiments with approximately 2 million test cases (with synthetic and real data) show that the proposed methods automatically build image phylogeny trees from partial information about the near duplicates, improving the efficiency and effectiveness of the whole process, and represent a step forward in determining causal relationships between digital images overtime.

...read moreread less

41 citations

Journal Article•DOI•

[...]

Miloš Milovanović¹, Miroslav Minović¹, Dusan Starcevic¹•Institutions (1)

University of Belgrade¹

27 Mar 2013-IEEE MultiMedia

TL;DR: The proposed method is based on the idea that the problem of human gait recognition can be transformed from a spatio-temporal problem into the spatial domain, specifically the 2D image domain, by representing a sample of a humangait as a still image.

...read moreread less

Abstract: This article proposes a new method of recognizing human gait. The proposed method is based on the idea that the problem of human gait recognition can be transformed from a spatio-temporal problem into the spatial domain, specifically the 2D image domain. This is done by representing a sample of a human gait as a still image. By doing so, all the recorded information is kept while enabling the use of proven content-based image retrieval (CBIR) techniques for recognition. The proposed method uses Microsoft Kinect human-computer interaction technology for data acquisition. To prove the validity of the proposed approach, the authors conducted a study with 50 participants.

...read moreread less

36 citations

Journal Article•DOI•

Nested-SIFT for Efficient Image Matching and Retrieval

[...]

Pengfei Xu¹, Lei Zhang², Kuiyuan Yang², Hongxun Yao¹•Institutions (2)

Harbin Institute of Technology¹, Microsoft²

Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors

TL;DR: A new feature representation, named Nested-SIFT, is proposed, which utilizes the nesting relationship between SIFT features to group local features to improve the effectiveness of feature representation and the efficiency of feature matching.

...read moreread less

Abstract: To improve the effectiveness of feature representation and the efficiency of feature matching, we propose a new feature representation, named Nested-SIFT, which utilizes the nesting relationship between SIFT features to group local features. A Nested-SIFT group consists of a bounding feature and several member features covered by the bounding feature. To obtain a compact representation, SimHash strategy is used to compress member features in a Nested-SIFT group into a binary code, and the similarity between two Nested-SIFT groups is efficiently computed by using the binary codes. Extensive experimental results demonstrate the effectiveness and efficiency of our proposed Nested-SIFT approach.

...read moreread less

36 citations

Journal Article•DOI•

[...]

Romain Negrel, David Picard, Philippe-Henri Gosselin

MPEG Unified Speech and Audio Coding

TL;DR: A compact image signature is proposed by aggregating tensors of visual descriptors by preprocessing the descriptors through projection and quantization of the signatures.

...read moreread less

Abstract: The main issues for Web-scale image retrieval are achieving good accuracy while retaining low computational time and memory footprint. This article proposes a compact image signature by aggregating tensors of visual descriptors. Efficient aggregation is achieved by preprocessing the descriptors. Compactness is achieved by projection and quantization of the signatures. The authors compare the proposed method to other efficient signatures on a 1 million images dataset and show the soundness of the approach.

...read moreread less

35 citations

Journal Article•DOI•

[...]

Schuyler Quackenbush

TL;DR: An overview of the USAC architecture is provided and the performance relative to the best state-of-the-art speech and audio codecs are summarized.

...read moreread less

Abstract: The MPEG Audio Subgroup has a rich history of accomplishments in creating music coding technology. At higher bit rates, MPEG technology can represent arbitrary sounds, including the human voice, with excellent quality. MPEG-1 and MPEG-2 Audio coders use perceptually shaped quantization noise as the primary tool for achieving compression. The MPEG-4 High-Efficiency Advanced Audio Coding (AAC) standard is a single technology capable of compressing speech, speech mixed with music, or music signals with quality that is always at least as good as the best of two state-of-the-art reference codecs, one optimized for speech and mixed content (AMR-WB B;) and the other optimized for music and general audio (HE-AACv2). This article provides an overview of the USAC architecture and summarizes the performance relative to the best state-of-the-art speech and audio codecs.

...read moreread less

Journal Article•

Herding Cats

[...]

John R. Smith¹•Institutions (1)

IBM¹

Video Delivery Challenges and Opportunities in 4G Networks

TL;DR: Video search needs effective and efficient techniques for video summarization to enable rapid triage and finding relevant video contents.

...read moreread less

Abstract: Video search needs effective and efficient techniques for video summarization to enable rapid triage and finding relevant video contents.

...read moreread less

Journal Article•DOI•

[...]

Amit Pande¹, Vishal Ahuja¹, Rajarajan Sivaraj¹, Eilwoo Baik¹, Prasant Mohapatra¹ - Show less +1 more•Institutions (1)

University of California, Davis¹

Orchestral Performance Companion: Using Real-Time Audio to Score Alignment

TL;DR: Challenges in delivery of multimedia content over 4G networks for several application scenarios are outlined to augment the increasing demand for video applications in cellular and wireless traffic.

...read moreread less

Abstract: Wireless network traffic is dominated by video and requires new ways to maximize the user experience and optimize networks to prevent saturation. The exploding number of subscribers in cellular networks has exponentially increased the volume and variety of multimedia content flowing across the network. This article details some challenges in delivery of multimedia content over 4G networks for several application scenarios. To augment the increasing demand for video applications in cellular and wireless traffic, these challenges must be efficiently addressed.

...read moreread less

Journal Article•DOI•

[...]

Matthew Prockup¹, David Grunberg¹, A. Hrybyk¹, Youngmoo E. Kim¹•Institutions (1)

Drexel University¹

Scalable Media Coding Enabling Content-Aware Networking

TL;DR: A system that guides listeners through orchestral performances in real time by presenting time-relevant annotations in a manner similar to that of a personal museum guide has been developed and adopted by the Philadelphia Orchestra.

...read moreread less

Abstract: Many people enjoy the symphony, but those without prior training often find it difficult to relate to the music. The authors have developed a system that guides listeners through orchestral performances in real time by presenting time-relevant annotations in a manner similar to that of a personal museum guide. These annotations are authored in partnership with musical experts prior to a performance to provide appropriate contextual information for a given concert program. Using acoustic features of the music, they align the live performance with that of a previously time-stamped recording. The aligned position is transmitted to an application on the users' handheld devices, which present the annotations using an intuitive and unobtrusive interface. To assess its utility, the system underwent a user beta testing stage accompanying orchestra concert broadcasts. It has since been adopted by the Philadelphia Orchestra for use during live concerts in its 2012-2013 subscription season and beyond.

...read moreread less

Journal Article•DOI•

[...]

Michael Grafl¹, Christian Timmerer¹, Hermann Hellwagner¹, George Xilouris, Georgios Gardikis, Daniele Renzi, Stefano Battista, Eugen Borcoci, Daniel Negru² - Show less +5 more•Institutions (2)

Adria Airways¹, University of Bordeaux²

Video Copy-Detection and Localization with a Scalable Cascading Framework

TL;DR: This article proposes the adoption of a content-aware approach into the network infrastructure, thus making it capable of identifying, processing, and manipulating media streams and objects in real time to maximize quality of service (QoS) and experience (QoE).

...read moreread less

Abstract: Increasingly popular multimedia services are expected to play a dominant role in the future of the Internet. In this context, it is essential that content-aware networking (CAN) architectures explicitly address the efficient delivery and processing of multimedia content. This article proposes the adoption of a content-aware approach into the network infrastructure, thus making it capable of identifying, processing, and manipulating media streams and objects in real time to maximize quality of service (QoS) and experience (QoE). Our proposal is built on the exploitation of scalable media coding technologies within such a content-aware networking environment. This discussion is based on four representative use cases for media delivery (unicast, multicast, peer-to-peer, and adaptive HTTP streaming) and reviews CAN challenges, specifically flow processing, caching/buffering, and QoS/QoE management.

...read moreread less

Journal Article•DOI•

[...]

Yonghong Tian¹, Tiejun Huang¹, Menglin Jiang¹, Wen Gao¹•Institutions (1)

Peking University¹

Depth Sensing for 3DTV: A Survey

TL;DR: A soft-threshold learning algorithm is utilized to estimate the optimal decision thresholds for detectors, and a multiscale sequence matching method is employed to precisely locate copies using a 2D Hough transform and multigranularities similarity evaluation.

...read moreread less

Abstract: For video copy detection, no single audio-visual feature, or single detector based on several features, can work well for all transformations. This article proposes a novel video copy-detection and localization approach with scalable cascading of complementary detectors and multiscale sequence matching. In this cascade framework, a soft-threshold learning algorithm is utilized to estimate the optimal decision thresholds for detectors, and a multiscale sequence matching method is employed to precisely locate copies using a 2D Hough transform and multigranularities similarity evaluation. Excellent performance on the TRECVID-CBCD 2011 benchmark dataset shows the effectiveness and efficiency of the proposed approach.

...read moreread less

Journal Article•DOI•

[...]

Sebastian Schwarz¹, Roger Olsson¹, Mårten Sjöström¹•Institutions (1)

Mid Sweden University¹

22 Nov 2013-IEEE MultiMedia

TL;DR: This article reviews three depth-sensing approaches for 3DTV and discusses several approaches for acquiring depth information and provides a comparative analysis of their characteristics.

...read moreread less

Abstract: In the context of 3D video systems, depth information could be used to render a scene from additional viewpoints. Although there have been many recent advances in this area, including the introduction of the Microsoft Kinect sensor, the robust acquisition of such information continues to be a challenge. This article reviews three depth-sensing approaches for 3DTV. The authors discuss several approaches for acquiring depth information and provides a comparative analysis of their characteristics.

...read moreread less

Journal Article•DOI•

New Musical Instrument Design Considerations

[...]

Garth Paine¹•Institutions (1)

Arizona State University¹

22 Nov 2013-IEEE MultiMedia

TL;DR: The author provides a model for musical interface design and discusses it in terms of a large online database of digital musical instruments he has created.

...read moreread less

Abstract: This article discusses the proliferation of new musical instruments and interfaces for computer-based music performance (digital musical instruments). It discusses the notion of a musical instrument schema and how preexisting musical practice can be used to provide design guidelines for this developing field. In so doing, it teases out notions of control and creation and discusses a number of theoretical positions for those notions in musical performance. The author provides a model for musical interface design and discusses it in terms of a large online database of digital musical instruments he has created.

...read moreread less

Journal Article•DOI•

JPEG's JPSearch Standard: Harmonizing Image Management and Search

[...]

Mario Döller¹, Ruben Tous, Frederik Temmermans², Kyoungro Yoon³, Je-Ho Park⁴, Youngseop Kim⁴, Florian Stegmaier⁵, Jaime Delgado - Show less +4 more•Institutions (5)

University of Applied Sciences Kufstein¹, VU University Amsterdam², Konkuk University³, Dankook University⁴, University of Passau⁵

Standards-Based Architectures for Content Management

TL;DR: The main concepts, parts, and achievements of the JPSearch framework are discussed and its use is demonstrated through a set of substantial case studies.

...read moreread less

Abstract: Triggered by the rise of social networks, community-based image sharing platforms are emerging at an increasing rate. Currently, almost every repository offers a different interaction interface and metadata description format. Unfortunately, this prevents unified and efficient access to these repositories. Consequently, data exchange between systems is often cumbersome. In this context, ISO/IEC JTC1 SC29 WG1 (more commonly known as JPEG) initiated the JPSearch framework standardization, which aims to foster the interaction with and among image repositories. The standard focuses on three main cornerstones supporting repository synchronization, search and access, and image collection creation and maintenance. This article discusses the main concepts, parts, and achievements of the JPSearch framework and demonstrates its use through a set of substantial case studies.

...read moreread less

Journal Article•DOI•

[...]

Silvia Llorente, Eva Rodríguez, Jaime Delgado, Victor Torres-Padrosa

Character Behavior Planning and Visual Simulation in Virtual 3D Space

TL;DR: The authors describe a selection of relevant deployment scenarios, from content licensing to authorization-based content access control, including a specific case for mobile scenarios.

...read moreread less

Abstract: Standards-based middleware architectures for content management are suitable for a range of business scenarios. In this context, the authors review the MPEG-M standard and the MIPAMS standards-based architecture. They describe a selection of relevant deployment scenarios, from content licensing to authorization-based content access control, including a specific case for mobile scenarios. They illustrate each of the scenarios with real MIPAMS implementations developed in several research projects and under contracts within the industry.

...read moreread less

Journal Article•DOI•

[...]

Mingliang Xu¹, Zhigeng Pan², Mingmin Zhang³, Pei Lv³, Pengyu Zhu³, Yangdong Ye¹, Wei Song¹ - Show less +3 more•Institutions (3)

Zhengzhou University¹, Hangzhou Normal University², Zhejiang University³

Classification and Analysis of 3D Teleimmersive Activities

TL;DR: A graph-model based technique is proposed, identified as demonstration graph, to construct and coordinate both behavioral and cognitive models automatically for IHCs to accomplish complex tasks in a simple, universal way.

...read moreread less

Abstract: In this article, we propose a graph-model based technique, identified as demonstration graph, to construct and coordinate both behavioral and cognitive models automatically for IHCs to accomplish complex tasks in a simple, universal way. Our technique is inspired by the insight from psychology, neuroscience, and human ethology that humans' decision making largely relies on their past, similar experiences.5,6 Our work is further supported by Alan Turing, who believed that building an intelligent system necessitates imitating human mental processing.7 Thus, we use the Learning-from-Demonstrations (LfD) method,8 borrowed from the robotics domain, to make a character mimic successful human demonstrations to accomplish a well-defined task.

...read moreread less

Journal Article•DOI•

[...]

Ahsan Arefin¹, Zixia Huang¹, Raoul Rivas¹, Shu Shi¹, Pengye Xia¹, Klara Nahrstedt¹, Wanmin Wu², Gregorij Kurillo³, Ruzena Bajcsy³ - Show less +5 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, University of California, San Diego², University of California, Berkeley³

A Software-Based Solution for Distributing and Displaying 3D UHD Films

TL;DR: To provide users with a high-quality experience, interactive telepresence system platforms must accommodate multiple performance profiles for diverse, shared cyberphysical activities.

...read moreread less

Abstract: To provide users with a high-quality experience, interactive telepresence system platforms must accommodate multiple performance profiles for diverse, shared cyberphysical activities.

...read moreread less

Journal Article•DOI•

[...]

Lucenildo Aquino Junior, Ruan Delgado Gomes, Manoel Silva Neto, Alexandre Duarte, Rostand Costa, Guido Lemos de Souza Filho - Show less +2 more

TL;DR: As an alternative to traditional hardware-based ultra-high definition (UHD) multimedia systems, the proposed software-based approach offers a better cost-benefit ratio and might help facilitate large-scale deployment.

...read moreread less

Abstract: As an alternative to traditional hardware-based ultra-high definition (UHD) multimedia systems, the proposed software-based approach offers a better cost-benefit ratio and might help facilitate large-scale deployment.

...read moreread less

Journal Article•DOI•

Lessons in Learning

[...]

John R. Smith¹•Institutions (1)

IBM¹

Social Multimedia Signals: Sense, Process, and Put Them to Work

TL;DR: The way forward is for the multimedia field to create appropriate lesson plans or more generally develop curriculum-based approaches to multimedia machine learning, says EIC John R. Smith.

...read moreread less

Abstract: Machine learning has become an indispensible tool for the multimedia community. Given large amounts of data, computers using machine learning are able to create rich representations and accomplish impressive discrimination tasks. Yet, the way machines learn is still differs significantly from how humans learn. EIC John R. Smith explains that the way forward is for the multimedia field to create appropriate lesson plans or more generally develop curriculum-based approaches to multimedia machine learning.

...read moreread less

Journal Article•DOI•

[...]

Suman Deb Roy¹, Gilad Lotan, Wenjun Zeng¹•Institutions (1)

University of Missouri¹

Scalable Mobile Video Retrieval with Sparse Projection Learning and Pseudo Label Mining

TL;DR: Social multimedia signal processing aims to transform the noise-like phenomena in social media into signals useful for building novel, socially aware multimedia applications and targeted advertising techniques as well as exploring new marketing methods.

...read moreread less

Abstract: Social media gives ordinary people the power to be content creators and information disseminators. This information is embedded in multimedia shared across social networks, containing valuable indications about various facets of human life-about what captures our attention, our sharing biases, and the digital traces we abdicate. Social multimedia signal processing aims to transform the noise-like phenomena in social media into signals useful for building novel, socially aware multimedia applications and targeted advertising techniques as well as exploring new marketing methods. With a fresh way to look at the existence of multimedia in online social networks, we can also explore new marketing methods and targeted advertising techniques.

...read moreread less

Journal Article•DOI•

[...]

Guan-Long Wu¹, Yin-Hsi Kuo¹, Tzu-Hsuan Chiu¹, Winston H. Hsu¹, Lexing Xie² - Show less +1 more•Institutions (2)

National Taiwan University¹, Australian National University²

Large Visual Repository Search with Hash Collision Design Optimization

TL;DR: A novel sparse projection method to address the efficiency challenge by learning a discriminative compact representation that drastically reduces transmission costs and with less than 10 percent nonzero elements in the projection matrix, it also reduces computational and storage costs.

...read moreread less

Abstract: Retrieving relevant videos from a large corpus on mobile devices is a vital challenge. This article addresses two key issues for mobile search on user-generated videos. The first is the lack of good relevance measurement for learning semantically rich representations, due to the unconstrained nature of online videos. The second is the limited resources on mobile devices, stringent bandwidth, and delay requirement between the device and video server. The authors propose a knowledge-embedded sparse projection learning approach. To alleviate the need for expensive annotation in hash learning, they investigate varying approaches for pseudo label mining, where explicit semantic analysis leverages Wikipedia. In addition, they propose a novel sparse projection method to address the efficiency challenge by learning a discriminative compact representation that drastically reduces transmission costs. With less than 10 percent nonzero elements in the projection matrix, it also reduces computational and storage costs. The experimental results on 100,000 videos show that the proposed algorithm yields performance competitive with the prior state-of-the-art hashing methods, which are not applicable for mobiles and solely rely on costly manual annotations. The average query time for 100,000 videos was only 0.592 seconds.

...read moreread less

Journal Article•DOI•

[...]

Xin Xin¹, Abhishek Nagar², Gaurav Srivastava², Zhu Li², Felix Carlos Fernandes², Aggelos K. Katsaggelos¹ - Show less +2 more•Institutions (2)

Northwestern University¹, Samsung²

Applications of Face Analysis and Modeling in Media Production

TL;DR: The authors optimize the design of a hash-code collision and counting scheme to enable fast search of visual features of MPEG CDVS and explore a new indexing scheme.

...read moreread less

Abstract: Visual search over large image repositories in real time is one of the key challenges for applications such as mobile visual query-by-capture, augmented reality, and biometrics-based identification. Search accuracy and response speed are two important performance factors. This article focuses on one of the important elements of this technology that enables large-scale visual search: indexing (or hashing). Indexing is the process of organizing a database of searchable elements into an efficiently searchable configuration. The searchable elements in our case are compact features extracted from images. This article explores a new indexing scheme. The authors optimize the design of a hash-code collision and counting scheme to enable fast search of visual features of MPEG CDVS.

...read moreread less

Journal Article•DOI•

[...]

Darren Cosker¹, Peter Eisert², Oliver Grau³, Peter J. B. Hancock⁴, Jonathan McKinnell, Eng-Jon Ong⁵ - Show less +2 more•Institutions (5)

University of Bath¹, Humboldt University of Berlin², Intel³, University of Stirling⁴, University of Surrey⁵

Learning to Rerank Web Images

TL;DR: The authors give a brief overview of the psychology of face perception and then describe some of the applications of computer vision and pattern recognition applied to face recognition in media production.

...read moreread less

Abstract: Facial expressions play an important role in day-by-day communication as well as media production. This article surveys automatic facial analysis and modeling methods using computer vision techniques and their applications for media production. The authors give a brief overview of the psychology of face perception and then describe some of the applications of computer vision and pattern recognition applied to face recognition in media production. This article also covers the automatic generation of face models, which are used in movie and TV productions for special effects in order to manipulate people's faces or combine real actors with computer graphics.

...read moreread less

Journal Article•DOI•

[...]

Linjun Yang¹, Alan Hanjalic²•Institutions (2)

Microsoft¹, Delft University of Technology²

Large-Scale Near-Duplicate Web Video Retrieval: Challenges and Approaches

TL;DR: A categorization of related theories and algorithms is provided and include a mathematical formulation, analysis, and discussion per category to improve the efficiency, effectiveness, and overall utility of Web image search reranking technology.

...read moreread less

Abstract: This article reviews recent advancements in developing approaches to Web image search reranking. The authors provide a categorization of related theories and algorithms and include a mathematical formulation, analysis, and discussion per category. They highlight the limitations of the existing approaches and make recommendations on what they believe to be the most critical research directions to improve the efficiency, effectiveness, and overall utility of Web image search reranking technology.

...read moreread less

Journal Article•DOI•

[...]

Yang Cai¹, Linjun Yang²•Institutions (2)

Carnegie Mellon University¹, Microsoft²