Showing papers by "Rongrong Ji published in 2013"

PDF

Open Access

Proceedings Article•DOI•

Large-scale visual sentiment ontology and detectors using adjective noun pairs

[...]

Damian Borth¹, Rongrong Ji², Tao Chen², Thomas M. Breuel¹, Shih-Fu Chang² - Show less +1 more•Institutions (2)

Kaiserslautern University of Technology¹, Columbia University²

21 Oct 2013

TL;DR: This work presents a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP) and proposes SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image.

...read moreread less

Abstract: We address the challenge of sentiment analysis from visual content. In contrast to existing methods which infer sentiment or emotion directly from visual low-level features, we propose a novel approach based on understanding of the visual concepts that are strongly related to sentiments. Our key contribution is two-fold: first, we present a method built upon psychological theories and web mining to automatically construct a large-scale Visual Sentiment Ontology (VSO) consisting of more than 3,000 Adjective Noun Pairs (ANP). Second, we propose SentiBank, a novel visual concept detector library that can be used to detect the presence of 1,200 ANPs in an image. The VSO and SentiBank are distinct from existing work and will open a gate towards various applications enabled by automatic sentiment analysis. Experiments on detecting sentiment of image tweets demonstrate significant improvement in detection accuracy when comparing the proposed SentiBank based predictors with the text-based approaches. The effort also leads to a large publicly available resource consisting of a visual sentiment ontology, a large detector library, and the training/testing benchmark for visual sentiment analysis.

...read moreread less

692 citations

Proceedings Article•DOI•

SentiBank: large-scale ontology and classifiers for detecting sentiment and emotions in visual content

[...]

Damian Borth¹, Tao Chen², Rongrong Ji², Shih-Fu Chang²•Institutions (2)

Kaiserslautern University of Technology¹, Columbia University²

21 Oct 2013

TL;DR: A novel system which combines sound structures from psychology and the folksonomy extracted from social multimedia to develop a large visual sentiment ontology consisting of 1,200 concepts and associated classifiers called SentiBank, believed to offer a powerful mid-level semantic representation enabling high-level sentiment analysis of social multimedia.

...read moreread less

Abstract: A picture is worth one thousand words, but what words should be used to describe the sentiment and emotions conveyed in the increasingly popular social multimedia? We demonstrate a novel system which combines sound structures from psychology and the folksonomy extracted from social multimedia to develop a large visual sentiment ontology consisting of 1,200 concepts and associated classifiers called SentiBank. Each concept, defined as an Adjective Noun Pair (ANP), is made of an adjective strongly indicating emotions and a noun corresponding to objects or scenes that have a reasonable prospect of automatic detection. We believe such large-scale visual classifiers offer a powerful mid-level semantic representation enabling high-level sentiment analysis of social multimedia. We demonstrate novel applications made possible by SentiBank including live sentiment prediction of social media and visualization of visual content in a rich intuitive semantic space.

...read moreread less

180 citations

Journal Article•DOI•

Learning to Distribute Vocabulary Indexing for Scalable Visual Search

[...]

Rongrong Ji¹, Ling-Yu Duan¹, Jie Chen¹, Lexing Xie², Hongxun Yao³, Wen Gao¹ - Show less +2 more•Institutions (3)

Peking University¹, Australian National University², Harbin Institute of Technology³

01 Jan 2013-IEEE Transactions on Multimedia

TL;DR: This paper proposes to parallelize the near duplicate visual search architecture to index millions of images over multiple servers, including the distribution of both visual vocabulary and the corresponding indexing structure, and validates the distributed vocabulary indexing scheme in a real world location search system over 10 million landmark images.

...read moreread less

Abstract: In recent years, there is an ever-increasing research focus on Bag-of-Words based near duplicate visual search paradigm with inverted indexing. One fundamental yet unexploited challenge is how to maintain the large indexing structures within a single server subject to its memory constraint, which is extremely hard to scale up to millions or even billions of images. In this paper, we propose to parallelize the near duplicate visual search architecture to index millions of images over multiple servers, including the distribution of both visual vocabulary and the corresponding indexing structure. We optimize the distribution of vocabulary indexing from a machine learning perspective, which provides a “memory light” search paradigm that leverages the computational power across multiple servers to reduce the search latency. Especially, our solution addresses two essential issues: “What to distribute” and “How to distribute”. “What to distribute” is addressed by a “lossy” vocabulary Boosting, which discards both frequent and indiscriminating words prior to distribution. “How to distribute” is addressed by learning an optimal distribution function, which maximizes the uniformity of assigning the words of a given query to multiple servers. We validate the distributed vocabulary indexing scheme in a real world location search system over 10 million landmark images. Comparing to the state-of-the-art alternatives of single-server search [5], [6], [16] and distributed search [23], our scheme has yielded a significant gain of about 200% speedup at comparable precision by distributing only 5% words. We also report excellent robustness even when partial servers crash.

...read moreread less

104 citations

Proceedings Article•DOI•

Visual Reranking through Weakly Supervised Multi-graph Learning

[...]

Cheng Deng¹, Rongrong Ji², Wei Liu³, Dacheng Tao⁴, Xinbo Gao¹ - Show less +1 more•Institutions (4)

Xidian University¹, Xiamen University², IBM³, University of Technology, Sydney⁴

01 Dec 2013

TL;DR: A novel image reranking approach is proposed by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across different graphs.

...read moreread less

Abstract: Visual reranking has been widely deployed to refine the quality of conventional content-based image retrieval engines. The current trend lies in employing a crowd of retrieved results stemming from multiple feature modalities to boost the overall performance of visual reranking. However, a major challenge pertaining to current reranking methods is how to take full advantage of the complementary property of distinct feature modalities. Given a query image and one feature modality, a regular visual reranking framework treats the top-ranked images as pseudo positive instances which are inevitably noisy, difficult to reveal this complementary property, and thus lead to inferior ranking performance. This paper proposes a novel image reranking approach by introducing a Co-Regularized Multi-Graph Learning (Co-RMGL) framework, in which the intra-graph and inter-graph constraints are simultaneously imposed to encode affinities in a single graph and consistency across different graphs. Moreover, weakly supervised learning driven by image attributes is performed to denoise the pseudo-labeled instances, thereby highlighting the unique strength of individual feature modality. Meanwhile, such learning can yield a few anchors in graphs that vitally enable the alignment and fusion of multiple graphs. As a result, an edge weight matrix learned from the fused graph automatically gives the ordering to the initially retrieved results. We evaluate our approach on four benchmark image retrieval datasets, demonstrating a significant performance gain over the state-of-the-arts.

...read moreread less

89 citations

Proceedings Article•

Salient object detection via low-rank and structured sparse matrix decomposition

[...]

Houwen Peng¹, Bing Li¹, Rongrong Ji², Weiming Hu¹, Weihua Xiong¹, Congyan Lang³ - Show less +2 more•Institutions (3)

Chinese Academy of Sciences¹, Xiamen University², Beijing Jiaotong University³

14 Jul 2013

TL;DR: In this model, a tree-structured sparsity-inducing norm regularization is firstly introduced to provide a hierarchical description of the image structure to ensure the completeness of the extracted salient object, and high-level priors are integrated to guide the matrix decomposition and enhance the saliency detection.

...read moreread less

Abstract: Salient object detection provides an alternative solution to various image semantic understanding tasks such as object recognition, adaptive compression and image retrieval. Recently, low-rank matrix recovery (LR) theory has been introduced into saliency detection, and achieves impressed results. However, the existing LR-based models neglect the underlying structure of images, and inevitably degrade the associated performance. In this paper, we propose a Low-rank and Structured sparse Matrix Decomposition (LSMD) model for salient object detection. In the model, a tree-structured sparsity-inducing norm regularization is firstly introduced to provide a hierarchical description of the image structure to ensure the completeness of the extracted salient object. The similarity of saliency values within the salient object is then guaranteed by the l∞-norm. Finally, high-level priors are integrated to guide the matrix decomposition and enhance the saliency detection. Experimental results on the largest public benchmark database show that our model outperforms existing LR-based approaches and other state-of-the-art methods, which verifies the effectiveness and robustness of the structure cues in our model.

...read moreread less

61 citations

Proceedings Article•DOI•

Label Propagation from ImageNet to 3D Point Clouds

[...]

Yan Wang¹, Rongrong Ji¹, Shih-Fu Chang¹•Institutions (1)

Columbia University¹

23 Jun 2013

TL;DR: This paper utilizes the existing massive 2D semantic labeled datasets from decade-long community efforts, and a novel ``cross-domain'' label propagation approach, which effectively addresses the cross-domain issue and does not require any training data from the target scenes, with good scalability towards large scale applications.

...read moreread less

Abstract: Recent years have witnessed a growing interest in understanding the semantics of point clouds in a wide variety of applications. However, point cloud labeling remains an open problem, due to the difficulty in acquiring sufficient 3D point labels towards training effective classifiers. In this paper, we overcome this challenge by utilizing the existing massive 2D semantic labeled datasets from decade-long community efforts, such as Image Net and Label Me, and a novel ``cross-domain'' label propagation approach. Our proposed method consists of two major novel components, Exemplar SVM based label propagation, which effectively addresses the cross-domain issue, and a graphical model based contextual refinement incorporating 3D constraints. Most importantly, the entire process does not require any training data from the target scenes, also with good scalability towards large scale applications. We evaluate our approach on the well-known Cornell Point Cloud Dataset, achieving much greater efficiency and comparable accuracy even without any 3D training data. Our approach shows further major gains in accuracy when the training data from the target scenes is used, outperforming state-of-the-art approaches with far better efficiency.

...read moreread less

44 citations

Journal Article•DOI•

Nonlinear scrambling-based reversible watermarking for 2D-vector maps

[...]

Liujuan Cao¹, Chaoguang Men¹, Rongrong Ji²•Institutions (2)

Harbin Engineering University¹, Columbia University²

01 Mar 2013-The Visual Computer

TL;DR: Comprehensive experimental results validate that the proposed reversible watermarking scheme could effectively prevent the high-precision vector data from being illegally used with maintaining the basic shape of each polyline, simultaneously.

...read moreread less

Abstract: The reversible watermarking technique is suitable for vector maps due to its reversibility after watermark extraction. In this paper, a novel reversible watermarking scheme based on the idea of nonlinear scrambling is proposed. It begins with feature point extraction. To avoid the high-precision vector data being illegally used by unauthorized users, the algorithm nonlinearly scrambles the relative position of feature points. Then based on the proposed reversible embedding, both scrambled feature points and nonfeature points are taken as cover data, the coordinates of which are modified to embed both watermark data and feature point identification data. Finally, combined with the scrambling secret key, the original vector data can be exactly recovered with watermark extraction. Comprehensive experimental results validate that the scheme could effectively prevent the high-precision vector data from being illegally used with maintaining the basic shape of each polyline, simultaneously.

...read moreread less

37 citations

Journal Article•DOI•

Remote Dynamic Three-Dimensional Scene Reconstruction

[...]

You Yang¹, Qiong Liu¹, Rongrong Ji², Yue Gao³•Institutions (3)

Huazhong University of Science and Technology¹, Xiamen University², National University of Singapore³

07 May 2013-PLOS ONE

TL;DR: This paper proposes a precise and robust scheme for dynamic 3D scene reconstruction by using the compressed color video stream and their inaccurate motion vectors, which ensures the depth maps can be compensated in both video-rate and high resolution at the terminal side towards reducing the system consumption on both the compression and transmission.

...read moreread less

Abstract: Remote dynamic three-dimensional (3D) scene reconstruction renders the motion structure of a 3D scene remotely by means of both the color video and the corresponding depth maps. It has shown a great potential for telepresence applications like remote monitoring and remote medical imaging. Under this circumstance, video-rate and high resolution are two crucial characteristics for building a good depth map, which however mutually contradict during the depth sensor capturing. Therefore, recent works prefer to only transmit the high-resolution color video to the terminal side, and subsequently the scene depth is reconstructed by estimating the motion vectors from the video, typically using the propagation based methods towards a video-rate depth reconstruction. However, in most of the remote transmission systems, only the compressed color video stream is available. As a result, color video restored from the streams has quality losses, and thus the extracted motion vectors are inaccurate for depth reconstruction. In this paper, we propose a precise and robust scheme for dynamic 3D scene reconstruction by using the compressed color video stream and their inaccurate motion vectors. Our method rectifies the inaccurate motion vectors by analyzing and compensating their quality losses, motion vector absence in spatial prediction, and dislocation in near-boundary region. This rectification ensures the depth maps can be compensated in both video-rate and high resolution at the terminal side towards reducing the system consumption on both the compression and transmission. Our experiments validate that the proposed scheme is robust for depth map and dynamic scene reconstruction on long propagation distance, even with high compression ratio, outperforming the benchmark approaches with at least 3.3950 dB quality gains for remote applications.

...read moreread less

27 citations

Proceedings Article•

Semi-supervised learning with manifold fitted graphs

[...]

Tongtao Zhang¹, Rongrong Ji¹, Wei Liu², Dacheng Tao³, Gang Hua⁴ - Show less +1 more•Institutions (4)

Columbia University¹, IBM², University of Technology, Sydney³, Stevens Institute of Technology⁴

03 Aug 2013

TL;DR: Extensive experiments carried out on six benchmark datasets validate that the proposed M-fitted graph is superior to state-of-the-art neighborhood graphs in terms of classification accuracy using popular graph-based semi-supervised learning methods.

...read moreread less

Abstract: In this paper, we propose a locality-constrained and sparsity-encouraged manifold fitting approach, aiming at capturing the locally sparse manifold structure into neighborhood graph construction by exploiting a principled optimization model. The proposed model formulates neighborhood graph construction as a sparse coding problem with the locality constraint, therefore achieving simultaneous neighbor selection and edge weight optimization. The core idea underlying our model is to perform a sparse manifold fitting task for each data point so that close-by points lying on the same local manifold are automatically chosen to connect and meanwhile the connection weights are acquired by simple geometric reconstruction. We term the novel neighborhood graph generated by our proposed optimization model M-Fitted Graph since such a graph stems from sparse manifold fitting. To evaluate the robustness and effectiveness of M-fitted graphs, we leverage graph-based semi-supervised learning as the testbed. Extensive experiments carried out on six benchmark datasets validate that the proposed M-fitted graph is superior to state-of-the-art neighborhood graphs in terms of classification accuracy using popular graph-based semi-supervised learning methods.

...read moreread less

26 citations

Journal Article•DOI•

Mining spatiotemporal video patterns towards robust action retrieval

[...]

Liujuan Cao¹, Rongrong Ji², Yue Gao³, Wei Liu², Qi Tian⁴ - Show less +1 more•Institutions (4)

Harbin Engineering University¹, Columbia University², Tsinghua University³, University of Texas at San Antonio⁴

01 Apr 2013-Neurocomputing

TL;DR: This paper introduces an attention shift scheme to detect and partition the focused human actions from YouTube videos, and leverages a boosting based feature selection to output the final action descriptors, which incorporates the ranking distortion of the conjunctive queries into the boosting objective.

...read moreread less

16 citations

Journal Article•DOI•

Background subtraction driven seeds selection for moving objects segmentation and matting

[...]

Bineng Zhong¹, Yan Chen¹, Yewang Chen¹, Rongrong Ji², Ying Chen³, Duansheng Chen¹, Hanzi Wang⁴ - Show less +3 more•Institutions (4)

Huaqiao University¹, Columbia University², Beijing Electronic Science and Technology Institute³, Xiamen University⁴

01 Mar 2013-Neurocomputing

TL;DR: This paper proposes a new automatic way to integrate a background subtraction (BGS) and an alpha matting technique via a heuristic seeds selection scheme, and demonstrates the efficiency and effectiveness of this method.

...read moreread less

Journal Article•DOI•

Image retrieval with query-adaptive hashing

[...]

Dong Liu¹, Shuicheng Yan², Rongrong Ji¹, Xian-Sheng Hua³, Hong-Jiang Zhang⁴ - Show less +1 more•Institutions (4)

Harbin Institute of Technology¹, National University of Singapore², Microsoft³, Advanced Technology Center⁴

19 Feb 2013-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: Extensive experiments over three benchmark image datasets well demonstrate the superiority of the proposed query-adaptive hashing method over the state-of-the-art ones in terms of retrieval accuracy.

...read moreread less

Abstract: Hashing-based approximate nearest-neighbor search may well realize scalable content-based image retrieval. The existing semantic-preserving hashing methods leverage the labeled data to learn a fixed set of semantic-aware hash functions. However, a fixed hash function set is unable to well encode all semantic information simultaneously, and ignores the specific user's search intention conveyed by the query. In this article, we propose a query-adaptive hashing method which is able to generate the most appropriate binary codes for different queries. Specifically, a set of semantic-biased discriminant projection matrices are first learnt for each of the semantic concepts, through which a semantic-adaptable hash function set is learnt via a joint sparsity variable selection model. At query time, we further use the sparsity representation procedure to select the most appropriate hash function subset that is informative to the semantic information conveyed by the query. Extensive experiments over three benchmark image datasets well demonstrate the superiority of our proposed query-adaptive hashing method over the state-of-the-art ones in terms of retrieval accuracy.

...read moreread less

Journal Article•DOI•

A Bayesian framework for dense depth estimation based on spatial-temporal correlation

[...]

Qiong Liu¹, You Yang², Yue Gao³, Rongrong Ji⁴, Li Yu¹ - Show less +1 more•Institutions (4)

Huazhong University of Science and Technology¹, Tsinghua University², National University of Singapore³, Columbia University⁴

01 Mar 2013-Neurocomputing

TL;DR: A Bayesian framework is proposed to generate accurate and temporal consistent dense depth videos in an efficient way and can achieve accurate depth videos with higher efficiency up to 68.14% than traditional methods.

...read moreread less

Proceedings Article•DOI•

Spectral-spatial classification of hyperspectral imagery based on Random Forests

[...]

Liu Wei¹, Shaozi Li¹, Miaohui Zhang¹, Yundong Wu², Songzhi Su¹, Rongrong Ji¹ - Show less +2 more•Institutions (2)

Xiamen University¹, Jimei University²

17 Aug 2013

TL;DR: Results on two hyperspectral images show that the proposed framework combining spectral information with spatial context can greatly improve the final result with respect to pixel-wise classification with Random Forests.

...read moreread less

Abstract: The high dimensionality of hyperspectral images are usually coupled with limited reference data available, which degenerates the performances of supervised classification techniques such as random forests (RF). The commonly used pixel-wise classification lacks information about spatial structures of the image. In order to improve the performances of classification, incorporation of spectral and spatial is needed. This paper proposes a novel scheme for accurate spectral-spatial classification of hyperspectral image. It is based on random forests, followed by majority voting within the superpixels obtained by oversegmentation through a graph-based technique. The scheme combines the result of a pixel-wise RF classification and the segmentation map obtained by oversegmentation. Our experimental results on two hyperspectral images show that the proposed framework combining spectral information with spatial context can greatly improve the final result with respect to pixel-wise classification with Random Forests.

...read moreread less

Journal Article•DOI•

Phrasal Paraphrase Based Question Reformulation for Archived Question Retrieval

[...]

Yu Zhang¹, Wei-Nan Zhang¹, Ke Lu², Rongrong Ji³, Fanglin Wang⁴, Ting Liu¹ - Show less +2 more•Institutions (4)

Harbin Institute of Technology¹, Chinese Academy of Sciences², Xiamen University³, National University of Singapore⁴

21 Jun 2013-PLOS ONE

TL;DR: This paper presents a question reformulation scheme to enhance the question retrieval model by fully exploring the intelligence of paraphrase in phrase-level, which compensates for the existing paraphrasing research in a suitable granularity.

...read moreread less

Abstract: Lexical gap in cQA search, resulted by the variability of languages, has been recognized as an important and widespread phenomenon. To address the problem, this paper presents a question reformulation scheme to enhance the question retrieval model by fully exploring the intelligence of paraphrase in phrase-level. It compensates for the existing paraphrasing research in a suitable granularity, which either falls into fine-grained lexical-level or coarse-grained sentence-level. Given a question in natural language, our scheme first detects the involved key-phrases by jointly integrating the corpus-dependent knowledge and question-aware cues. Next, it automatically extracts the paraphrases for each identified key-phrase utilizing multiple online translation engines, and then selects the most relevant reformulations from a large group of question rewrites, which is formed by full permutation and combination of the generated paraphrases. Extensive evaluations on a real world data set demonstrate that our model is able to characterize the complex questions and achieves promising performance as compared to the state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Learning Compact Visual Descriptors for Low Bit Rate Mobile Landmark Search

[...]

Ling-Yu Duan¹, Jie Chen¹, Rongrong Ji¹, Tiejun Huang¹, Wen Gao¹ - Show less +1 more•Institutions (1)

Peking University¹

21 Jun 2013-Ai Magazine

TL;DR: This article introduces the work on low bit rate mobile landmark search, in which a compact yet discriminative landmark image descriptor is extracted by using location context such as GPS, crowd-sourced hotspot WLAN, and cell tower locations.

...read moreread less

Abstract: Coming with the ever growing computational power of mobile devices, mobile visual search have undergone an evolution in techniques and applications. A significant trend is low bit rate visual search, where compact visual descriptors are extracted directly over a mobile and delivered as queries rather than raw images to reduce the query transmission latency. In this article, we introduce our work on low bit rate mobile landmark search, in which a compact yet discriminative landmark image descriptor is extracted by using location context such as GPS, crowd-sourced hotspot WLAN, and cell tower locations. The compactness originates from the bag-of-words image representation, with an offline learning from geotagged photos from online photo sharing websites including Flickr and Panoramio. The learning process involves segmenting the landmark photo collection by discrete geographical regions using Gaussian mixture model, and then boosting a ranking sensitive vocabulary within each region, with an “entropy” based descriptor compactness feedback to refine both phases iteratively. In online search, when entering a geographical region, the codebook in a mobile device are downstream adapted to generate extremely compact descriptors with promising discriminative ability. We have deployed landmark search apps to both HTC and iPhone mobile phones, working over the database of million scale images in typical areas like Beijing, New York, and Barcelona, and others. Our descriptor outperforms alternative compact descriptors (Chen et al. 2009; Chen et al., 2010; Chandrasekhar et al. 2009a; Chandrasekhar et al. 2009b) with significant margins. Beyond landmark search, this article will summarize the MPEG standarization progress of compact descriptor for visual search (CDVS) (Yuri et al. 2010; Yuri et al. 2011) towards application interoperability.

...read moreread less

Journal Article•DOI•

Visual attention modeling based on short-term environmental adaption

[...]

Xiaoshuai Sun¹, Hongxun Yao¹, Rongrong Ji¹•Institutions (1)

Harbin Institute of Technology¹

01 Feb 2013-Journal of Visual Communication and Image Representation

TL;DR: A novel principle for modeling visual attention mechanism named short-term environmental adaption is proposed, which adaptively extract sparse features and treats saliency as the features' conditional self-information, which is more accurate in saliency measurement and more sparse with respect to visual signal representation.

...read moreread less

Proceedings Article•DOI•

A new camera self-calibration method based on CSA

[...]

Li-Chuan Geng¹, Shaozi Li¹, Songzhi Su¹, Donglin Cao¹, Yun-Qi Lei¹, Rongrong Ji¹ - Show less +2 more•Institutions (1)

Xiamen University¹

01 Nov 2013

TL;DR: This paper proposes an artificial immune system based method which can fast convergent to the global optimization solutions of AIS and demonstrates the performance of the proposed method with synthetic and real data.

...read moreread less

Abstract: A large number of computer vision applications rely on camera calibration. Camera self-calibration which only depends on the relationship between corresponding points of a pair of images draws much attention for its simplicity. Almost all the camera self-calibration methods rely on the solution of Kruppa equations which are difficult to be directly solved. The state-of-the-art self-calibration algorithms usually convert the solution of these equations to non-linear optimization problem, traditional optimization methods usually have the drawback of convergent to local extreme. Artificial immune system (AIS) has the ability to fast convergent to global extreme. To address this problem, we proposed an artificial immune system based method which can fast convergent to the global optimization solutions. We demonstrate the performance of the proposed method with synthetic and real data.

...read moreread less

Proceedings Article•DOI•

Stereotime: a wireless 2D and 3D switchable video communication system

[...]

You Yang¹, Qiong Liu¹, Yue Gao², Binbin Xiong¹, Li Yu¹, Huanbo Luan², Rongrong Ji³, Qi Tian⁴ - Show less +4 more•Institutions (4)

Huazhong University of Science and Technology¹, National University of Singapore², Xiamen University³, University of Texas at San Antonio⁴

21 Oct 2013

TL;DR: This work presents a wireless 2D and 3D switchable video communication to handle the previous challenges, and name it as Stereotime, and shows the functionalities and compatibilities on 3D mobile devices in WiFi network environment.

...read moreread less

Abstract: Mobile 3D video communication, especially with 2D and 3D compatible, is a new paradigm for both video communication and 3D video processing. Current techniques face challenges in mobile devices when bundled constraints such as computation resource and compatibility should be considered. In this work, we present a wireless 2D and 3D switchable video communication to handle the previous challenges, and name it as Stereotime. The methods of Zig-Zag fast object segmentation, depth cues detection and merging, and texture-adaptive view generation are used for 3D scene reconstruction. We show the functionalities and compatibilities on 3D mobile devices in WiFi network environment.

...read moreread less

Journal Article•DOI•

Bidirectional-isomorphic manifold learning at image semantic understanding & representation

[...]

Xianming Liu¹, Hongxun Yao¹, Rongrong Ji¹, Pengfei Xu¹, Xiaoshuai Sun¹ - Show less +1 more•Institutions (1)

Harbin Institute of Technology¹

01 May 2013-Multimedia Tools and Applications

TL;DR: A Bidirectional- Isomorphic Manifold learning strategy to optimize both visual feature space and textual space, in order to achieve more accurate comprehension for image semantics and relationships and promising results show that the model attains a significant improvement over state-of-the-art algorithms.

...read moreread less

Abstract: From relevant textual information to improve visual content understanding and representation is an effective way for deeply understanding web image content. However, the description of images is usually imprecise at the semantic level, which is caused by the noisy and redundancy information in both text (such as surrounding text in HTML pages) and visual (such as intra-class diversity) aspects. This paper considers the solution from the association analysis for image content and presents a Bidirectional- Isomorphic Manifold learning strategy to optimize both visual feature space and textual space, in order to achieve more accurate comprehension for image semantics and relationships. To achieve this optimization between two different models, Bidirectional-Isomorphic Manifold Learning utilizes a novel algorithm to unify adjustments in both models together to a topological structure, which is called the reversed Manifold mapping. We also demonstrate its correctness and convergence from a mathematical perspective. Image annotation and keywords correlation analysis are applied. Two groups of experiments are conducted: The first group is carried on the Corel 5000 image database to validate our method's effectiveness by comparing with state-of-the-art Generalized Manifold Ranking Based Image Retrieval and SVM, while the second group carried on a web-downloaded Flickr dataset with over 6,000 images to testify the proposed method's effectiveness in real-world application. The promising results show that our model attains a significant improvement over state-of-the-art algorithms.

...read moreread less

Proceedings Article•DOI•

Query-dependent visual dictionary adaptation for image reranking

[...]

Wang Jialong¹, Cheng Deng¹, Wei Liu², Rongrong Ji³, Xiangyu Chen⁴, Xinbo Gao¹ - Show less +2 more•Institutions (4)

Xidian University¹, IBM², Xiamen University³, Agency for Science, Technology and Research⁴

21 Oct 2013

TL;DR: A query-dependent image reranking approach by leveraging the higher level attribute detection among the top returned images to adapt the dictionary built over the visual features to a query-specific fashion is proposed.

...read moreread less

Abstract: Although text-based image search engines are popular for ranking images of user's interest, the state-of-the-art ranking performance is still far from satisfactory. One major issue comes from the visual similarity metric used in the ranking operation, which depends solely on visual features. To tackle this issue, one feasible method is to incorporate semantic concepts, also known as image attributes, into image ranking. However, the optimal combination of visual features and image attributes remains unknown. In this paper, we propose a query-dependent image reranking approach by leveraging the higher level attribute detection among the top returned images to adapt the dictionary built over the visual features to a query-specific fashion. We start from offline learning transposition probabilities between visual codewords and attributes, then utilize the probabilities to online adapt the dictionary, and finally produce a query-dependent and semantics-induced metric for image ranking. Extensive evaluations on several benchmark image datasets demonstrate the effectiveness and efficiency of the proposed approach in comparison with state-of-the-arts.

...read moreread less

Proceedings Article•DOI•

On the interoperability of local descriptors compression

[...]

Jie Chen¹, Ling-Yu Duan¹, Jie Lin¹, Rongrong Ji¹, Tiejun Huang¹, Wen Gao¹ - Show less +2 more•Institutions (1)

Peking University¹

26 May 2013

TL;DR: This paper proposes to combine feature transform and multi-stage vector quantization to implement the interoperability of compact local descriptors, and reports superior performance over state-of-the-arts datasets.

...read moreread less

Abstract: There are a number of component technologies that are useful for visual search, including format of visual descriptors, descriptor extraction process, as well as indexing, and matching algorithms. As a minimum, the format of descriptors as well as parts of their extraction process should be defined to ensure interoperability. In this paper, we study the problem of interoperability among compressed local descriptors at different bit-rates; that is, allowing effective and efficient comparison of compact descriptors, which is fundamentally important to mobile visual search applications. We propose to combine feature transform and multi-stage vector quantization to implement the interoperability of compact local descriptors. First, an orthogonal transform (e.g. Principle component analysis, PCA) is employed to eliminate the correlation between local feature dimensions, which improves the performance of compressed domain descriptor matching with the well-aligned distance computing of sorted important features in transform space. Second, a multi-stage vector quantization (MSVQ) is applied to generate compact codes for local descriptors. At light quantization tables, MSVQ takes advantage of the transform domain features to properly allocate different budgets to each group of transformed feature dimensions, respectively. The interoperability between compressed descriptors at different bit rates can be achieved by the descriptors' fast matching in the orthogonal feature space. In other words, descriptor decoding into the original feature space (SIFT space) is unnecessary, as the distance can be calculated by pre-computed lookup tables. In particular, such efficient matching in transform domain is significant for large-scale visual search. Over a set of benchmark datasets, we have reported superior performance over state-of-the-arts.

...read moreread less

Proceedings Article•DOI•

Decomposed human localization in personal photo albums

[...]

Bing Shuai¹, Songzhi Su¹, Shaozi Li¹, Yun Cheng², Rongrong Ji¹ - Show less +1 more•Institutions (2)

Xiamen University¹, Hunan University²

01 Nov 2013

TL;DR: The experiment results demonstrated that the decomposition-based model worked very well at localizing deformable persons, which boosted the average precision by 10% compared to state-of-the-art person detectors, and Similar Pose Feature (SPF) provides the feasibility of projecting persons with similar poses into same clusters, facilitating a novel pose-based photo album browsing functionality.

...read moreread less

Abstract: Recent years have seen tremendous progress in human detection, whereas only upright poses are usually considered. In this paper, we relax this constraint to localizing highly deformable persons, as commonly exhibited in personal photo albums. Human localization based on arbitrary pose is extremely challenging, due to the large pose variances, disabling the traditional part based template detectors. To tackle this issue, we propose a decomposition-based human localization model dealing with this issue in three-step: a stable upper-body is firstly detected, then a set of bigger bounding boxes are extended, from which the most appropriate instance is distinguished by a discriminative Whole Person Model. The experiment results demonstrated that our decomposition-based model worked very well at localizing deformable persons, which boosted the average precision by 10% compared to state-of-the-art person detectors. On the other hand, Similar Pose Feature(SPF) provides the feasibility of projecting persons with similar poses into same clusters, facilitating a novel pose-based photo album browsing functionality.

...read moreread less

Journal Article•DOI•

Learning from mobile contexts to minimize the mobile location search latency

[...]

Ling-Yu Duan¹, Rongrong Ji², Jie Chen¹, Hongxun Yao², Tiejun Huang¹, Wen Gao² - Show less +2 more•Institutions (2)

Peking University¹, Harbin Institute of Technology²

01 Apr 2013-Signal Processing-image Communication

TL;DR: This work combines location related side information from the mobile devices to adaptively supervise the compact visual descriptor design in a flexible manner, which is very suitable to search locations or landmarks within a bandwidth constraint wireless link.

...read moreread less

Abstract: We propose to learn an extremely compact visual descriptor from the mobile contexts towards low bit rate mobile location search. Our scheme combines location related side information from the mobile devices to adaptively supervise the compact visual descriptor design in a flexible manner, which is very suitable to search locations or landmarks within a bandwidth constraint wireless link. Along with the proposed compact descriptor learning, a large-scale, contextual aware mobile visual search benchmark dataset PKUBench is also introduced, which serves as the first comprehensive benchmark for the quantitative evaluation of how the cheaply available mobile contexts can help the mobile visual search systems. Our proposed contextual learning based compact descriptor has shown to outperform the existing works in terms of compression rate and retrieval effectiveness.

...read moreread less

Journal Article•DOI•

Weakly supervised codebook learning by iterative label propagation with graph quantization

[...]

Liujuan Cao¹, Rongrong Ji², Wei Liu³, Hongxun Yao², Qi Tian⁴ - Show less +1 more•Institutions (4)

Harbin Engineering University¹, Harbin Institute of Technology², Columbia University³, University of Texas at San Antonio⁴

01 Aug 2013-Signal Processing

TL;DR: A weakly supervised codebook learning framework, which integrates image labels to supervise codebook building with two steps: the Label Propagation step propagates image labels into local patches by multiple instance learning and instance selection and the Graph Quantization step integrates patch labels to build codebook using Mean Shift.

...read moreread less

Proceedings Article•

Localizing web videos from heterogeneous images

[...]

Xianming Liu¹, Yue Gao², Rongrong Ji³, Shiyu Chang¹, Thomas S. Huang¹ - Show less +1 more•Institutions (3)

University of Illinois at Urbana–Champaign¹, National University of Singapore², Xiamen University³

01 Jan 2013

TL;DR: By "transferring" the large-scale web images with geographical tags to web videos, to make a carefully designed associations between visual content similarities, this paper tackles the problem of geo-localization of web images from a novel perspective.

...read moreread less

Abstract: While geo-localization of web images has been widely studied, limited effort is devoted to that of web videos. Nevertheless, an accurate location inference approach specified on web videos is of fundamental importance, as it's occupying increasing proportions in web corpus. The key challenge comes from the lack of sufficient labels for model training. In this paper, we tackle this problem from a novel perspective, by "transferring" the large-scale web images with geographical tags to web videos, to make a carefully designed associations between visual content similarities. A group of experiments are conducted on a collected web image and video data set, where superior performance gains are reported over several alternatives.

...read moreread less

Proceedings Article•DOI•

Seeing actions through scene context

[...]

Hong-Bo Zhang¹, Songzhi Su¹, Shaozi Li¹, Duansheng Chen², Bineng Zhong², Rongrong Ji¹ - Show less +2 more•Institutions (2)

Xiamen University¹, Huaqiao University²

01 Nov 2013

TL;DR: This paper model the scene as a mid-level “hidden layer” to bridge action descriptors and action categories via a scene topic model, in which hybrid visual descriptors including spatiotemporal action features and scene descriptors are first extracted from the video sequence.

...read moreread less

Abstract: Recognizing human actions is not alone, as hinted by the scene herein. In this paper, we investigate the possibility to boost the action recognition performance by exploiting their scene context associated. To this end, we model the scene as a mid-level “hidden layer” to bridge action descriptors and action categories. This is achieved via a scene topic model, in which hybrid visual descriptors including spatiotemporal action features and scene descriptors are first extracted from the video sequence. Then, we learn a joint probability distribution between scene and action by a Naive Bayesian N-earest Neighbor algorithm, which is adopted to jointly infer the action categories online by combining off-the-shelf action recognition algorithms. We demonstrate our merits by comparing to state-of-the-arts in several action recognition benchmarks.

...read moreread less

Proceedings Article•DOI•

Saliency detection by adaptive clustering

[...]

Hai Cao¹, Shaozi Li¹, Songzhi Su¹, Yun Cheng², Rongrong Ji¹ - Show less +1 more•Institutions (2)

Xiamen University¹, Hunan University of Humanities, Science and Technology²

01 Nov 2013

TL;DR: A clustering-based method to detect refined regions with comparative performance for coarse-grained classification with unknown clusters number is proposed, and an adaptive algorithm called f-means is developed in this paper.

...read moreread less

Abstract: Saliency detection plays an important role in image segmentation, content-aware resizing and object recognition. Most approaches obtain promising performance recently, which is useful for the postprocessing. We propose a clustering-based method to detect refined regions with comparative performance. For coarse-grained classification with unknown clusters number, an adaptive algorithm called f-means is developed in this paper. Pixels are clustered by f-means based on color and spatial features, and then the centroids are used to compute their saliency values. Experiments show that our algorithm generates more fine maps, which outperform the state-of-the-art approaches on MSRA dataset. Relying on the saliency map, we also get superior results in foreground extracting, image resizing and thumbnails generation.

...read moreread less