Showing papers in &quot;IEEE MultiMedia in 2014&quot;

Compact Descriptors for Visual Search

TL;DR: In this paper, the authors propose a joint parsing system consisting of three modules: video parsing, text parsing, and joint inference, which produces a parse graph that represents the compositional structures of spatial information (objects and scenes), temporal information (actions and events), and causal information (causalities between events and fluents).

...read moreread less

Abstract: This article proposes a multimedia analysis framework to process video and text jointly for understanding events and answering user queries. The framework produces a parse graph that represents the compositional structures of spatial information (objects and scenes), temporal information (actions and events), and causal information (causalities between events and fluents) in the video and text. The knowledge representation of the framework is based on a spatial-temporal-causal AND-OR graph (S/T/C-AOG), which jointly models possible hierarchical compositions of objects, scenes, and events as well as their interactions and mutual contexts, and specifies the prior probabilistic distribution of the parse graphs. The authors present a probabilistic generative model for joint parsing that captures the relations between the input video/text, their corresponding parse graphs, and the joint parse graph. Based on the probabilistic model, the authors propose a joint parsing system consisting of three modules: video parsing, text parsing, and joint inference. Video parsing and text parsing produce two parse graphs from the input video and text, respectively. The joint inference module produces a joint parse graph by performing matching, deduction, and revision on the video and text parse graphs. The proposed framework has the following objectives: to provide deep semantic parsing of video and text that goes beyond the traditional bag-of-words approaches; to perform parsing and reasoning across the spatial, temporal, and causal dimensions based on the joint S/T/C-AOG representation; and to show that deep joint parsing facilitates subsequent applications such as generating narrative text descriptions and answering queries in the forms of who, what, when, where, and why. The authors empirically evaluated the system based on comparison against ground-truth as well as accuracy of query answering and obtained satisfactory results.

...read moreread less

126 citations

Journal Article•DOI•

[...]

Ling-Yu Duan¹, Jie Lin¹, Jie Chen¹, Tiejun Huang¹, Wen Gao¹ - Show less +1 more•Institutions (1)

Peking University¹

06 Jan 2014-IEEE MultiMedia

TL;DR: Major progress is reviewed in standardizing technologies that will enable efficient and interoperable design of visual search applications and the location search and recognition oriented data collection and benchmark under the MPEG CDVS evaluation framework is presented.

...read moreread less

Abstract: To ensure application interoperability in visual object search technologies, the MPEG Working Group has made great efforts in standardizing visual search technologies. Moreover, extraction and transmission of compact descriptors are valuable for next-generation, mobile, visual search applications. This article reviews the significant progress of MPEG Compact Descriptors for Visual Search (CDVS) in standardizing technologies that will enable efficient and interoperable design of visual search applications. In addition, the article presents the location search and recognition oriented data collection and benchmark under the MPEG CDVS evaluation framework.

...read moreread less

89 citations

Journal Article•DOI•

View-Based 3D Object Retrieval: Challenges and Approaches

[...]

Yue Gao¹, Qionghai Dai¹•Institutions (1)

Tsinghua University¹

05 Mar 2014-IEEE MultiMedia

TL;DR: The recent research progress in view-based 3D object retrieval is introduced by reviewing advances and identifying challenges in this field.

...read moreread less

Abstract: View-based 3D object retrieval is an emerging research topic that has numerous geographic-related applications in many fields, such as computer-aided design (CAD) and virtual city navigation. This article briefly introduces the recent research progress in view-based 3D object retrieval by reviewing advances and identifying challenges in this field.

...read moreread less

76 citations

Journal Article•DOI•

Local Stereo Matching with Improved Matching Cost and Disparity Refinement

[...]

Jianbo Jiao¹, Ronggang Wang¹, Wenmin Wang¹, Dong Shengfu¹, Zhenyu Wang¹, Wen Gao¹ - Show less +2 more•Institutions (1)

Peking University¹

A User-Centric Media Retrieval Competition: The Video Browser Showdown 2012-2014

TL;DR: A local stereo matching method that employs a new combined cost approach and a secondary disparity refinement mechanism that is the best cost-volume filtering-based local method and validate the proposed method's effectiveness.

...read moreread less

Abstract: Recent local stereo matching methods have achieved comparable performance with global methods. However, the final disparity map still contains significant outliers. In this article, the authors propose a local stereo matching method that employs a new combined cost approach and a secondary disparity refinement mechanism. They formulate combined cost using a modified color census transform and truncated absolute differences of color and gradients. They also use symmetric guided filter for cost aggregation. Unlike in traditional stereo matching, they propose a novel secondary disparity refinement to further remove the remaining outliers. Experimental results on the Middlebury benchmark show that their method ranks fifth out of 153 submitted methods, and it's the best cost-volume filtering-based local method. Experiments on real-world sequences and depth-based applications also validate the proposed method's effectiveness.

...read moreread less

65 citations

Journal Article•DOI•

[...]

Klaus Schoeffmann¹•Institutions (1)

Alpen-Adria-Universität Klagenfurt¹

01 Oct 2014-IEEE MultiMedia

TL;DR: The Video Browser Showdown evaluates the performance of exploratory tools for interactive content search in videos in direct competition and in front of an audience to push research on user-centric video search tools.

...read moreread less

Abstract: The Video Browser Showdown is an international competition in the field of interactive video search and retrieval. It is held annually as a special session at the International Conference on Multimedia Modeling (MMM). The Video Browser Showdown evaluates the performance of exploratory tools for interactive content search in videos in direct competition and in front of an audience. Its goal is to push research on user-centric video search tools including video navigation, content browsing, content interaction, and video content visualization. This article summarizes the first three VBS competitions (2012-2014).

...read moreread less

61 citations

Journal Article•DOI•

The Scalable Extensions of HEVC for Ultra-High-Definition Video Delivery

[...]

Yan Ye¹, Pierre Andrivon•Institutions (1)

InterDigital, Inc.¹

22 Jul 2014-IEEE MultiMedia

TL;DR: An overview of SHVC, the scalable extension of H.265/HEVC, which adopts a scalable coding architecture with only high-level syntax changes relative to its base codec, which allows SHVC to be deployed with significantly reduced implementation cost.

...read moreread less

Abstract: This article presents an overview of SHVC, the scalable extension of H.265/HEVC. SHVC adopts a scalable coding architecture with only high-level syntax changes relative to its base codec, which allows SHVC to be deployed with significantly reduced implementation cost. SHVC supports a rich set of scalability features. It also addresses the increasing market demand for higher quality and higher value video content delivery by providing a set of desired scalability features with high coding efficiency.

...read moreread less

59 citations

Journal Article•DOI•

Standardization of Biometric Template Protection

[...]

Shantanu Rane

03 Nov 2014-IEEE MultiMedia

TL;DR: The progress of standardization of biometric template protection schemes is reviewed, an umbrella term for a class of techniques used to mitigate the security and privacy threats inherent in biometric recognition.

...read moreread less

Abstract: Whether it is providing fingerprints at airport immigration desks, tagging friends on social networking sites, or logging into a smartphone, biometrics provide a fast, convenient, and unobtrusive means for access control or identity verification. Biometric template protection is an umbrella term for a class of techniques used to mitigate the security and privacy threats inherent in biometric recognition. During the past decade and a half, template protection has gained traction in academia and industry, becoming the subject of publications, patents and conferences. This article reviews the progress of standardization of biometric template protection schemes.

...read moreread less

48 citations

Journal Article•DOI•

Efficient BOF Generation and Compression for On-Device Mobile Visual Location Recognition

[...]

Tao Guan¹, He Yunfeng¹, Liya Duan¹, Jianzhong Yang¹, Juan Gao¹, Junqing Yu¹ - Show less +2 more•Institutions (1)

Huazhong University of Science and Technology¹

01 Apr 2014-IEEE MultiMedia

TL;DR: The authors have made two contributions to the design of a BOF-based on-device MVLR system, and propose a memory-efficient approximate nearest-neighbor search algorithm by combining residual vector quantization and tree-structured RVQ (TSRVQ).

...read moreread less

Abstract: Existing mobile visual location recognition (MVLR) applications typically rely on bag-of-features (BOF) representation, which shows superior performance in retrieval accuracy. However, although the BOF framework is promising, it is not compact enough for on-device MVLR. The authors have made two contributions to the design of a BOF-based on-device MVLR system. First, to generate BOF descriptors, they propose a memory-efficient approximate nearest-neighbor search algorithm by combining residual vector quantization (RVQ) and tree-structured RVQ (TSRVQ). Second, they implemented a GPS-based and heading-aware RankBoost algorithm to reduce the dimensionality of the BOF descriptors. The authors evaluate the effectiveness of the proposed algorithms on an HTC mobile phone. Their work applies to on-device MVLR in city-scale workspaces.

...read moreread less

48 citations

Journal Article•DOI•

Projected Residual Vector Quantization for ANN Search

[...]

Benchang Wei¹, Tao Guan¹, Junqing Yu¹•Institutions (1)

Huazhong University of Science and Technology¹

06 Jan 2014-IEEE MultiMedia

TL;DR: The authors propose a method of projected residual vector quantization for ANN search that considers the projection errors in the quantization process and design three simple and effective optimization strategies to improve the performance of the PRVQ algorithm.

...read moreread less

Abstract: In this research, we propose Projected Residual Vector Quantization (PRVQ) to deal with the problem of large-scale approximate nearest neighbor (ANN) search in a high-dimensional space. A lot of quantization-based ANN search algorithms have been proposed in the past few years. However, most of the existing methods discard the projection errors generated in the dimension reduction process, which inevitably decreases the search accuracy. In view of that, the authors propose a method of projected residual vector quantization for ANN search that considers the projection errors in the quantization process. They also design three simple and effective optimization strategies to improve the performance of the PRVQ algorithm. The authors have integrated the proposed PRVQ algorithm into a mobile landmark recognition system to prove its effectiveness.

...read moreread less

46 citations

Journal Article•DOI•

Visions for Augmented Cultural Heritage Experience

[...]

Rita Cucchiara¹, Alberto Del Bimbo²•Institutions (2)

University of Modena and Reggio Emilia¹, University of Florence²

05 Mar 2014-IEEE MultiMedia

TL;DR: Computer vision promises to be an extraordinary enabling technology for augmenting visitor experiences, bridging the affective gap by understanding the visitor's individual cognitive needs and interests and his or her situational affective state.

...read moreread less

Abstract: Museum visitor experiences differ from person to person, from cognitive to affective experiences. Progress in information technology has provided us with the opportunity to improve both the quantity and personalization of cultural information, privileging the cognitive experience against the affective. Computer vision promises to be an extraordinary enabling technology for augmenting visitor experiences, bridging the affective gap by understanding the visitor's individual cognitive needs and interests and his or her situational affective state.

...read moreread less

41 citations

Journal Article•DOI•

Training Quality-Aware Filters for No-Reference Image Quality Assessment

[...]

Lin Zhang¹, Zhongyi Gu², Xiaoxu Liu², Hongyu Li², Jianwei Lu² - Show less +1 more•Institutions (2)

Nanjing University of Science and Technology¹, Tongji University²

Memory-Efficient Image Databases for Mobile Visual Search

TL;DR: The authors propose a general-purpose, no-reference image quality assessment (NR-IQA) with the goal of developing a model that does not require prior knowledge about nondistorted reference images and the types of distortions, and which can achieve better prediction performance than the other state-of-the-art approaches.

...read moreread less

Abstract: With the rapid increase of digital imaging and communication technology usage, there's now great demand for fast and practical image quality assessment (IQA) algorithms that can predict an image's quality as consistently as humans. The authors propose a general-purpose, no-reference image quality assessment (NR-IQA) with the goal of developing a model that does not require prior knowledge about nondistorted reference images and the types of distortions. The key is to obtain effective image representations using learning quality-aware filters (QAFs). Unlike other regression models, they also use a random forest to train the mapping from the feature space. Extensive experiments conducted on the LIVE and CSIQ datasets demonstrate that the proposed NR-IQA metric QAF can achieve better prediction performance than the other state-of-the-art approaches in terms of both prediction accuracy and generalization capability.

...read moreread less

Journal Article•DOI•

[...]

David Chen¹, Bernd Girod¹•Institutions (1)

Stanford University¹

Toward Multiscreen Social TV with Geolocation-Aware Social Sense

TL;DR: In this article, the authors describe methods that compress visual word histograms, which require a codebook and decoding compressed signatures, and use residuals to achieve the same accuracy with much smaller codebooks and compressed domain matching.

...read moreread less

Abstract: Mobile visual search systems compare images against a database for object recognition. If query data is transmitted over a slow network or processed on a congested server, the latency increases substantially. This article shows how on-device database matching guarantees fast recognition regardless of external conditions. The database signatures must be compact because of limited memory, capable of fast comparisons, and discriminative for robust recognition. The authors first describe methods that compress visual word histograms, which require a codebook and decoding compressed signatures. They then describe methods that use residuals to achieve the same accuracy with much smaller codebooks and compressed domain matching.

...read moreread less

Journal Article•DOI•

[...]

Han Hu¹, Yonggang Wen², Huanbo Luan¹, Tat-Seng Chua¹, Xuelong Li³ - Show less +1 more•Institutions (3)

National University of Singapore¹, Nanyang Technological University², Chinese Academy of Sciences³

03 Feb 2014-IEEE MultiMedia

TL;DR: A multiscreen, social TV system integrated with social sense via a second screen as a novel paradigm for content consumption and the feasibility and effectiveness of the proposed approach in transforming the TV viewing experience are described.

...read moreread less

Abstract: The increasing popularity of social interactions and geotagged, user-generated content has transformed the television viewing experience from laid-back video watching behavior into a "lean-forward"' socially engaged experience. This article describes a multiscreen, social TV system integrated with social sense via a second screen as a novel paradigm for content consumption. This new application is built upon the authors' cloud-centric media platform, which provides on-demand virtual machines for content platform services, including media distribution, storage, and processing. The media platform is also integrated with a Big Data social platform that crawls and mines social data related to the media content. Specifically, this new social TV approach consists of three key subsystems: interactive TV, social sense, and multiscreen orchestration. Interactive TV implements a cloud-based, social TV system, offering rich social features; social sense discovers the geolocation-aware public perception and knowledge related to the media content; and multiscreen orchestration provides an intuitive and user-friendly human-computer interface to combine the two other subsystems, fusing the TV viewing experience with social perception. The authors have built a proof-of-concept demo over a private cloud at the Nanyang Technological University (NTU), Singapore. Feature verification and performance comparisons demonstrate the feasibility and effectiveness of the proposed approach in transforming the TV viewing experience.

...read moreread less

Journal Article•

Objective Self

[...]

Ramesh Jain¹, Laleh Jalali¹•Institutions (1)

University of California, Irvine¹

01 Oct 2014-IEEE MultiMedia

TL;DR: With advances in technology, the 21st century has witnessed significant advances in storage, processing, sensing, and communication technologies, resulting in the popularization of strong data-dependent approaches, leading to the rise in the popularity of scientism in almost all disciplines where data can be collected.

...read moreread less

Abstract: Humans have always been interested in understanding themselves and their environment. Understanding their relationship with the environment is important to survival as well as thriving in the present situation and planning for the future. With advances in technology, the 21st century has witnessed significant advances in storage, processing, sensing, and communication technologies. All these have resulted in the popularization of strong data-dependent approaches, leading to the rise in the popularity of scientism in almost all disciplines where data can be collected. As the availability of data has become widespread, the desire to understand the physical reality at different levels in different applications has also become possible and desirable.

...read moreread less

Journal Article•DOI•

Next-Generation 3D Formats with Depth Map Support

[...]

Ying Chen¹, Anthony Vetro²•Institutions (2)

Qualcomm¹, Mitsubishi Electric Research Laboratories²

Real-Time Gaze Estimation with Online Calibration

TL;DR: The compression formats described in this article can be used to support emerging auto-stereoscopic displays and free-viewpoint video functionalities.

...read moreread less

Abstract: This article reviews the most recent extensions to the Advanced Video Coding (AVC) and High Efficiency Video Coding (HEVC) coding standards, which integrate depth video to support advanced multiview and 3D video functionalities. All the extensions provide single-view compatibility, while some extensions add depth support on top of conforming stereoscopic bitstreams. To achieve the highest gains in coding efficiency, depth information is utilized in coding the texture views. The compression formats described in this article can be used to support emerging auto-stereoscopic displays and free-viewpoint video functionalities.

...read moreread less

Journal Article•DOI•

[...]

Li Sun¹, Mingli Song¹, Zicheng Liu², Ming-Ting Sun³•Institutions (3)

Zhejiang University¹, Microsoft², University of Washington³

07 Oct 2014-IEEE MultiMedia

TL;DR: Unlike previous gaze estimation methods using explicit offline calibration with fixed number of calibration points or implicit calibration, the authors' approach constantly improves person-specific eye parameters through online calibration, which enables the system to adapt gradually to a new user.

...read moreread less

Abstract: Gaze-tracking technology is highly valuable in many interactive and diagnostic applications. For many gaze estimation systems, calibration is an unavoidable procedure necessary to determine certain person-specific parameters, either explicitly or implicitly. Recently, several offline implicit calibration methods have been proposed to ease the calibration burden. However, the calibration procedure is still cumbersome, and gaze estimation accuracy needs further improvement. In this article, the authors present a novel 3D gaze estimation system with online calibration. The proposed system is based on a new 3D model-based gaze estimation method using a single consumer depth camera sensor (via Kinect). Unlike previous gaze estimation methods using explicit offline calibration with fixed number of calibration points or implicit calibration, their approach constantly improves person-specific eye parameters through online calibration, which enables the system to adapt gradually to a new user. The experimental results and the human-computer interaction (HCI) application show that the proposed system can work in real time with superior gaze estimation accuracy and minimal calibration burden.

...read moreread less

Journal Article•DOI•

Fashion Analysis: Current Techniques and Future Directions

[...]

Si Liu¹, Luoqi Liu¹, Shuicheng Yan¹•Institutions (1)

National University of Singapore¹

Toward Haptic Cinematography: Enhancing Movie Experiences with Camera-Based Haptic Effects

TL;DR: The state-of-the-art clothing analysis techniques (clothing modeling, recognition, and parsing) that can be applied in many real applications, such as clothing retrieval and recommendation are surveyed.

...read moreread less

Abstract: Driven by the huge profit potential in the fashion industry, intelligent fashion analysis based on techniques for clothing and makeover analysis is receiving much attention in the multimedia and computer vision literature. This article surveys the state-of-the-art clothing analysis techniques (clothing modeling, recognition, and parsing) that can be applied in many real applications, such as clothing retrieval and recommendation. The authors then introduce several makeover-related research directions, such as facial attractiveness prediction, facial makeup synthesis, and hair segmentation. Lastly, they discuss promising future directions for clothing and makeover analysis.

...read moreread less

Journal Article•DOI•

[...]

Fabien Danieau¹, Julien Fleureau, Philippe Guillotel, Nicolas Mollet, Marc Christie, Anatole Lécuyer¹ - Show less +2 more•Institutions (1)

French Institute for Research in Computer Science and Automation¹

06 Jan 2014-IEEE MultiMedia

TL;DR: Haptics is presented as a new component of the filmmaker's toolkit and a taxonomy of haptic effects is proposed and new effects coupled with classical cinematographic motions are introduced to enhance the video-viewing experience.

...read moreread less

Abstract: Haptics, the technology which brings tactile or force-feedback to users, has a great potential for enhancing movies and could lead to new immersive experiences. This article introduces haptic cinematography, which presents haptics as a new component of the filmmaker's toolkit. The authors propose a taxonomy of haptic effects and introduce new effects coupled with classical cinematographic motions to enhance the video-viewing experience. They propose two models to render haptic effects based on camera motions: the first model makes the audience feel the motion of the camera, and the second provides haptic metaphors related to the semantics of the camera effect. Results from a user study suggest that these new effects improve the quality of experience. Filmmakers can use this new way of creating haptic effects to propose new immersive audio-visual experiences.

...read moreread less

Journal Article•DOI•

Multimodal Spatio-Temporal Theme Modeling for Landmark Analysis

[...]

Weiqing Min¹, Bing-Kun Bao¹, Changsheng Xu¹•Institutions (1)

Chinese Academy of Sciences¹

30 Jan 2014-IEEE MultiMedia

TL;DR: This work proposes a probabilistic topic model called Multimodal Spatio-Temporal Theme Modeling (mmSTTM), which considers both textual and visual contexts to learn general, local, and temporal themes, which span a low-dimensional theme space.

...read moreread less

Abstract: Here, we discuss mining and summarizing landmarks' general themes as well as the local and temporal themes. General themes occur extensively in various landmarks, and include accommodations and other standard features. The local theme implies a specific theme that exists only at a certain landmark, such as a unique physical characteristic. The temporal theme corresponds to the location-time-representative pattern, which relates only to a certain landmark during a certain period-such as fleet week at the Golden Gate Bridge or red maple leaves in Kiyomizu-dera. Local themes are useful in landmark analysis for their discriminative and representative attributes. However, the ability to discover landmark diversity at different moments makes temporal themes equally important in landmark studies. Time dependent diversity shows complete viewing angles over time and complements local themes in landmark understanding. Furthermore, it provides more comprehensive and structured information for landmark history browsing and tourist decision making. We propose a probabilistic topic model called Multimodal Spatio-Temporal Theme Modeling (mmSTTM). The model considers both textual and visual contexts to learn general, local, and temporal themes, which span a low-dimensional theme space. The model also assigns all textual and visual keywords to each theme, along with a probability for each; a keyword with high weight assignment is meaningful for the theme, while low-weighted keywords are considered noise.

...read moreread less

Journal Article•DOI•

A Multimedia Semantic Retrieval Mobile System Based on HCFGs

[...]

Yimin Yang¹, Hsin-Yu Ha¹, Fausto C. Fleites¹, Shu-Ching Chen¹•Institutions (1)

Florida International University¹

Multimodal Feature Fusion for 3D Shape Recognition and Retrieval

TL;DR: The authors present a novel, multi-modal fusion scheme to effectively fuse the multimodel results and generate the final ranked retrieval results.

...read moreread less

Abstract: A multimedia semantic retrieval system based on hidden coherent feature groups (HCFGs) can support multimedia semantic retrieval on mobile applications. The system can capture the correlation between features and partition the original feature set into HCFGs, which have strong intragroup correlation while maintaining low intercorrelation. The authors present a novel, multimodel fusion scheme to effectively fuse the multimodel results and generate the final ranked retrieval results. In addition, to incorporate user interaction for effective retrieval, the proposed system also features a user feedback mechanism that helps refine the retrieval results.

...read moreread less

Journal Article•DOI•

[...]

Shuhui Bu¹, Cheng Shaoguang¹, Zhenbao Liu¹, Junwei Han¹•Institutions (1)

Northwestern Polytechnical University¹

Self-Recognized Image Protection Technique that Resists Large-Scale Cropping

TL;DR: This article presents a 3D feature learning framework that combines different modality data effectively to promote the discriminability of unimodal features.

...read moreread less

Abstract: Three-dimensional shapes contain different kinds of information that jointly characterize the shape. Traditional methods, however, perform recognition or retrieval using only one type. This article presents a 3D feature learning framework that combines different modality data effectively to promote the discriminability of unimodal features. Two independent deep belief networks (DBNs) are employed to learn high-level features from low-level features, and a restricted Boltzmann machine (RBM) is trained for mining the deep correlations between the different modalities. Experiments demonstrate that the proposed method can achieve better performance.

...read moreread less

Journal Article•DOI•

[...]

Jung-San Lee¹, Bo Li²•Institutions (2)

Feng Chia University¹, Vanderbilt University²

05 Mar 2014-IEEE MultiMedia

TL;DR: A feature-based watermarking algorithm to embed a binary image as a watermark in the DCT domain to guarantee visual efficiency and resist synchronous damage to images is proposed.

...read moreread less

Abstract: Brought up by the crucial issue of copyright, digital watermarking plays a key role in protecting integrity and providing authorization for multimedia. Efficient watermarking techniques require visual imperceptibility and robustness against various attacks. In this new approach, the authors propose a feature-based watermarking algorithm to embed a binary image as a watermark in the DCT domain to guarantee visual efficiency. In particular, the technique embeds marker bits to locate the original block of a cropped image and thereby resist synchronous damage to images. As simulation results show, compared to related methods, the proposed approach better resists several major image attacks, including cropping, shifting, blurring, noise, sharpening, and JPEG lossy compression. Moreover, this watermarking method can achieve blind extraction, in which the original image isn't required for watermark extraction and the embedded image can be restored with high visual efficiency.

...read moreread less

Journal Article•

Toward Haptic Cinematography: Enhancing Movie Experience with Haptic Effects based on Cinematographic Camera Motions

[...]

Fabien Danieau, Julien Fleureau, Philippe Guillotel, Nicolas Mollet, Marc Christie, Anatole Lécuyer - Show less +2 more

Graph-Based Residence Location Inference for Social Media Users

TL;DR: A taxonomy of haptic effects is proposed and novel effects coupled with classical cinematographic motions are introduced to enhance video viewing experience.

...read moreread less

Abstract: Haptics, the technology which brings tactile or force-feedback to users, has a great potential for enhancing movies and could lead to new immersive experiences. In this paper we introduce \textit{Haptic Cinematography} which presents haptics as a new component of the filmmaker's toolkit. We propose a taxonomy of haptic effects and we introduce novel effects coupled with classical cinematographic motions to enhance video viewing experience. More precisely we propose two models to render haptic effects based on camera motions: the first model makes the audience feel the motion of the camera and the second provides haptic metaphors related to the semantics of the camera effect. Results from a user study suggest that these new effects improve the quality of experience. Filmmakers may use this new way of creating haptic effects to propose new immersive audiovisual experiences.

...read moreread less

Journal Article•DOI•

[...]

Dan Xu¹, Peng Cui¹, Wenwu Zhu¹, Shiqiang Yang¹•Institutions (1)

Tsinghua University¹

03 Nov 2014-IEEE MultiMedia

TL;DR: The authors propose a data-driven approach to explore the use of friendship locality, social proximity, and content proximity for geographically nearby users and extensively evaluates the proposed method using a large-scale real dataset to achieve 15 percent relative improvement over state-of-the-art approaches.

...read moreread less

Abstract: Location information in social media is becoming increasingly vital in applications such as election prediction, epidemic forecasting, and emergency detection. However, only a tiny proportion of users proactively share their residence locations (which can be used to approximate the locations of most user-generated content) in their profiles, and inferring the residence location of the remaining users is nontrivial. In this article, the authors propose a framework for residence location inference in social media by jointly considering social, visual, and textual information. They first propose a data-driven approach to explore the use of friendship locality, social proximity, and content proximity for geographically nearby users. Based on these observations, they then propose a location propagation algorithm to effectively infer residence location for social media users. They extensively evaluate the proposed method using a large-scale real dataset and achieve a 15 percent relative improvement over state-of-the-art approaches.

...read moreread less

Journal Article•DOI•

Mobile Photo Recommendation and Logbook Generation Using Context-Tagged Images

[...]

Windson Viana, Reinaldo Braga, Fabrício D. A. Lemos, Joao M. O. de Souza, Rafael Augusto Ferreira do Carmo, Rossana M. C. Andrade, Hervé Martin¹ - Show less +3 more•Institutions (1)

Joseph Fourier University¹

Clustering Faces in Movies Using an Automatically Constructed Social Network

TL;DR: The authors present two mobile systems, MMedia2U and CAPTAIN, that take the concept of context-aware multimedia management beyond photo organization and annotation beyond photo organizations and annotation.

...read moreread less

Abstract: Context-aware and semantic-based technologies have been successfully employed to improve multimedia management in mobile environments. Large sets of context-tagged images on the Web are a concrete example of this achievement. The authors present two mobile systems, MMedia2U and CAPTAIN, that take the concept of context-aware multimedia management beyond photo organization and annotation. CAPTAIN is a tool that helps generate logbooks using context-tagged images and tracking data. Crewmembers used this tool to manage multimedia content and publish it to a blog during a sea expedition. MMedia2U is a mobile photo recommender system that exploits the user's context and context-tagged images to improve photo recommendation.

...read moreread less

Journal Article•DOI•

[...]

Mei-Chen Yeh¹, Wen-Po Wu•Institutions (1)

National Taiwan Normal University¹

Latent Subspace Projection Pursuit with Online Optimization for Robust Visual Tracking

TL;DR: The authors develop a method that improves face-clustering accuracy by incorporating the social context information inherent among characters in a movie by presenting a fusion scheme that eliminates ambiguities and bridges information from two fields.

...read moreread less

Abstract: Clustering faces in movies is a challenging task because faces in a feature-length film are relatively uncontrolled and vary widely in appearance. Such variations make it difficult to appropriately measure the similarity between faces under significantly different settings. In this article, the authors develop a method that improves face-clustering accuracy by incorporating the social context information inherent among characters in a movie. In particular, they study the relation of social network construction and face clustering and present a fusion scheme that eliminates ambiguities and bridges information from two fields. Experiments on real-world data show superior clustering performance compared with state-of-the-art methods. Furthermore, their method can help incrementally build a character's social network that is similar to a manually labeled example.

...read moreread less

Journal Article•DOI•

[...]

Risheng Liu¹, Wei Jin¹, Zhixun Su¹, Changcheng Zhang¹•Institutions (1)

Dalian University of Technology¹

Context-Adaptive Modeling for Wavelet-Domain Distributed Video Coding

TL;DR: The authors first present a linear projection view to formulate subspace learning and then develop a novel framework, called Latent Subspace Projection Pursuit (LSPP), to estimate the intrinsic dimension, removing corruptions and recovering the subspace structure for observed datasets.

...read moreread less

Abstract: This article develops a novel subspace learning algorithm for visual tracking. Specifically, the authors first present a linear projection view to formulate subspace learning and then develop a novel framework, called Latent Subspace Projection Pursuit (LSPP), to estimate the intrinsic dimension, removing corruptions and recovering the subspace structure for observed datasets. The authors evaluate the performance of their proposed method on various synthetic and real-world datasets, and the experimental results demonstrate that LSPP can achieve significant improvements in terms of performance and reduced computational complexity for visual tracking.

...read moreread less

Journal Article•DOI•

[...]

Linbo Qing¹, Wenjun Zeng²•Institutions (2)

Sichuan University¹, University of Missouri²

A New Paradigm for Querying Blobs in Vehicular Networks

TL;DR: The authors propose a bit-level context-adaptive correlation model to exploit high-order statistical correlation for wavelet-domain distributed video coding (DVC) and introduces SI binning to classify the SI based on its quality.

...read moreread less

Abstract: The authors propose a bit-level context-adaptive correlation model to exploit high-order statistical correlation for wavelet-domain distributed video coding (DVC). The magnitude and sign of each coefficient are coded separately in a bit-plane fashion. The context for magnitude bit plane are designed based on the side information (SI), the local neighborhood, and the parent coefficient. The sign bit plane takes the sign of the SI as the context. The authors also introduce SI binning to classify the SI based on its quality. The SI's class is then included in the contexts for both magnitude coding and sign coding. Experimental results show that the proposed scheme provides significant coding gain over existing DVC systems.

...read moreread less

Journal Article•DOI•

[...]

Ouri Wolfson¹, Bo Xu¹•Institutions (1)

University of Illinois at Chicago¹

TL;DR: This article introduces a novel paradigm, describes a set of derived query-processing strategies and compares them along three dimensions: push versus pull, whether or not a communication infrastructure is utilized, and whether metadata dissemination is separated from blob dissemination.

...read moreread less

Abstract: In this article, the authors study querying binary large objects (blobs) such as video and voice clips in a network of vehicles communicating wirelessly They introduce a novel paradigm, describe a set of derived query-processing strategies and compare them along three dimensions: push versus pull, whether or not a communication infrastructure is utilized, and whether metadata dissemination is separated from blob dissemination They analyze these strategies theoretically and experimentally in terms of answer throughput and communication overhead

...read moreread less

Journal Article•DOI•

How Many Visual Concepts

[...]

John R. Smith¹•Institutions (1)

IBM¹