scispace - formally typeset
Search or ask a question

Showing papers by "Guo-Jun Qi published in 2008"


Proceedings ArticleDOI
23 Jun 2008
TL;DR: This work proposes an integrated multi- label multi-instance learning (MLMIL) approach based on hidden conditional random fields (HCRFs), which simultaneously captures both the connections between semantic labels and regions, and the correlations among the labels in a single formulation.
Abstract: In real world, an image is usually associated with multiple labels which are characterized by different regions in the image. Thus image classification is naturally posed as both a multi-label learning and multi-instance learning problem. Different from existing research which has considered these two problems separately, we propose an integrated multi- label multi-instance learning (MLMIL) approach based on hidden conditional random fields (HCRFs), which simultaneously captures both the connections between semantic labels and regions, and the correlations among the labels in a single formulation. We apply this MLMIL framework to image classification and report superior performance compared to key existing approaches over the MSR Cambridge (MSRC) and Corel data sets.

245 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: This paper proposes a two-dimensional active learning scheme that not only considers the sample dimension but also the label dimension, and it is shown that the traditional active learning formulation is a special case of 2DAL when there is only one label.
Abstract: In this paper, we propose a two-dimensional active learning scheme and show its application in image classification. Traditional active learning methods select samples only along the sample dimension. While this is the right strategy in binary classification, it is sub-optimal for multi-label classification. In multi-label classification, we argue that, for each selected sample, only a part of more effective labels are necessary to be annotated while others can be inferred by exploring the correlations among the labels. The reason is that the contributions of different labels to minimizing the classification error are different due to the inherent label correlations. To this end, we propose to select sample-label pairs, rather than only samples, to minimize a multi-label Bayesian classification error bound. This new active learning strategy not only considers the sample dimension but also the label dimension, and we call it Two-Dimensional Active Learning (2DAL). We also show that the traditional active learning formulation is a special case of 2DAL when there is only one label. Extensive experiments conducted on two real-world applications show that the 2DAL significantly outperforms the best existing approaches which did not take label correlation into account.

144 citations


Journal ArticleDOI
TL;DR: This article proposes another paradigm of the video annotation method that simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multilabel (CML) method, which benefits from the compensation of complementary information between different labels.
Abstract: Automatic video annotation is an important ingredient for semantic-level video browsing, search and navigation. Much attention has been paid to this topic in recent years. These researches have evolved through two paradigms. In the first paradigm, each concept is individually annotated by a pre-trained binary classifier. However, this method ignores the rich information between the video concepts and only achieves limited success. Evolved from the first paradigm, the methods in the second paradigm add an extra step on the top of the first individual classifiers to fuse the multiple detections of the concepts. However, the performance of these methods can be degraded by the error propagation incurred in the first step to the second fusion one. In this article, another paradigm of the video annotation method is proposed to address these problems. It simultaneously annotates the concepts as well as model correlations between them in one step by the proposed Correlative Multilabel (CML) method, which benefits from the compensation of complementary information between different labels. Furthermore, since the video clips are composed by temporally ordered frame sequences, we extend the proposed method to exploit the rich temporal information in the videos. Specifically, a temporal-kernel is incorporated into the CML method based on the discriminative information between Hidden Markov Models (HMMs) that are learned from the videos. We compare the performance between the proposed approach and the state-of-the-art approaches in the first and second paradigms on the widely used TRECVID data set. As to be shown, superior performance of the proposed method is gained.

68 citations


Proceedings ArticleDOI
26 Oct 2008
TL;DR: This paper proposes a scalable framework for annotation-based video search, as well as a novel approach to enable large-scale semantic concept annotation, that is, online multi-label active learning, scalable to both the video sample dimension and concept label dimension.
Abstract: Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real content-based video search. However, due to the complexity of both video data and semantic concepts, existing techniques on automatic video annotation are still not able to handle large-scale video set and large-scale concept set, in terms of both annotation accuracy and computation cost. To address this problem, in this paper, we propose a scalable framework for annotation-based video search, as well as a novel approach to enable large-scale semantic concept annotation, that is, online multi-label active learning. This framework is scalable to both the video sample dimension and concept label dimension. Large-scale unlabeled video samples are assumed to arrive consecutively in batches with an initial pre-labeled training set, based on which a preliminary multi-label classifier is built. For each arrived batch, a multi-label active learning engine is applied, which automatically selects and manually annotates a set of unlabeled sample-label pairs. And then an online learner updates the original classifier by taking the newly labeled sample-label pairs into consideration. This process repeats until all data are arrived. During the process, new labels, even without any pre-labeled training samples, can be incorporated into the process anytime. Experiments on TRECVID dataset demonstrate the effectiveness and efficiency of the proposed framework.

46 citations


Journal ArticleDOI
TL;DR: KLNP improves a recently proposed method linear neighborhood propagation by tackling the limitation of its local linear assumption on the distribution of semantics by combining the consistency assumption and the local linear embedding method in a nonlinear kernel-mapped space.
Abstract: The insufficiency of labeled training data for representing the distribution of the entire dataset is a major obstacle in automatic semantic annotation of large-scale video database. Semi-supervised learning algorithms, which attempt to learn from both labeled and unlabeled data, are promising to solve this problem. In this paper, a novel graph-based semi-supervised learning method named kernel linear neighborhood propagation (KLNP) is proposed and applied to video annotation. This approach combines the consistency assumption, which is the basic assumption in semi-supervised learning, and the local linear embedding (LLE) method in a nonlinear kernel-mapped space. KLNP improves a recently proposed method linear neighborhood propagation (LNP) by tackling the limitation of its local linear assumption on the distribution of semantics. Experiments conducted on the TRECVID data set demonstrate that this approach outperforms other popular graph-based semi-supervised learning methods for video semantic annotation.

44 citations


Proceedings ArticleDOI
26 Oct 2008
TL;DR: This paper proposes an integrated graph-based semi-supervised learning framework to utilize these two types of representations simultaneously, and explores an effective and computationally efficient strategy to convert the multiple-instance representation into a single-instance one.
Abstract: Recently, many learning methods based on multiple-instance (local) or single-instance (global) representations of images have been proposed for image annotation. Their performances on image annotation, however, are mixed as for certain concepts the single-instance representations of images are more suitable, while for some other concepts the multiple-instance representations are better. Thus in this paper, we explore an unified learning framework that combines the multiple-instance and single-instance representations for image annotation. More specifically, we propose an integrated graph-based semi-supervised learning framework to utilize these two types of representations simultaneously, and explore an effective and computationally efficient strategy to convert the multiple-instance representation into a single-instance one. Experiments conducted on the Coral image dataset show the effectiveness and efficiency of the proposed integrated framework.

25 citations


Patent
Guo-Jun Qi1, Xian-Sheng Hua1, Yong Rui1, Hong-Jiang Zhang1, Shipeng Li1 
13 Feb 2008
TL;DR: In this article, a classifier is used to annotate an image by implementing a labeling function that maps an input feature space and a label space to a combination feature vector, which models both features of individual ones of the concepts and correlations among the concepts.
Abstract: Correlative multi-label image annotation may entail annotating an image by indicating respective labels for respective concepts. In an example embodiment, a classifier is to annotate an image by implementing a labeling function that maps an input feature space and a label space to a combination feature vector. The combination feature vector models both features of individual ones of the concepts and correlations among the concepts.

22 citations


Patent
25 Sep 2008
TL;DR: In this paper, a preliminary classifier is constructed from a pre-labeled training set included with an initial batch of annotated data samples, and a first batch of sample-label pairs is selected by using a samplelabel pair selection module.
Abstract: Online multi-label active annotation may include building a preliminary classifier from a pre-labeled training set included with an initial batch of annotated data samples, and selecting a first batch of sample-label pairs from the initial batch of annotated data samples. The sample-label pairs may be selected by using a sample-label pair selection module. The first batch of sample-label pairs may be provided to online participants to manually annotate the first batch of sample-label pairs based on the preliminary classifier. The preliminary classifier may be updated to form a first updated classifier based on an outcome of the providing the first batch of sample-label pairs to the online participants.

21 citations


01 Jan 2008
TL;DR: This paper describes the MSRA experiments for TRECVID 2008, which representatively investigated the benefit of global and local low-level features by a variety of learning-based methods, including supervised and semi-supervised learning algorithms.
Abstract: This paper describes the MSRA experiments for TRECVID 2008. We performed the experiments in high-level feature extraction and automatic search tasks. For high-level feature extraction, we representatively investigated the benefit of global and local low-level features by a variety of learning-based methods, including supervised and semi-supervised learning algorithms. For automatic search, we focused on text and visual baseline, query-independent learning, and various reranking methods.

18 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: A new distance measure is proposed that integrates joint appearance-spatial image features and is computed as an upper bound of an information-theoretic discrimination, and can be computed efficiently in a recursive formulation that scales well to image size.
Abstract: The goal of image categorization is to classify a collection of unlabeled images into a set of predefined classes to support semantic-level image retrieval. The distance measures used in most existing approaches either ignored the spatial structures or used them in a separate step. As a result, these distance measures achieved only limited success. To address these difficulties, in this paper, we propose a new distance measure that integrates joint appearance-spatial image features. Such a distance measure is computed as an upper bound of an information-theoretic discrimination, and can be computed efficiently in a recursive formulation that scales well to image size. In addition, the upper bound approximation can be further tightened via adaption learning from a universal reference model. Extensive experiments on two widely-used data sets show that the proposed approach significantly outperforms the state-of-the-art approaches.

10 citations


Patent
24 Sep 2008
TL;DR: In this article, a kernelized spatial-contextual image classification is described, which consists of generating a first spatial contextual model to represent a first image, the first spatialcontextual model having a plurality of interconnected nodes arranged in a first pattern of connections with each node connected to at least one other node.
Abstract: Kernelized spatial-contextual image classification is disclosed One embodiment comprises generating a first spatial-contextual model to represent a first image, the first spatial-contextual model having a plurality of interconnected nodes arranged in a first pattern of connections with each node connected to at least one other node, generating a second spatial-contextual model to represent a second image using the first pattern of connections, and estimating the distance between corresponding nodes in the first spatial-contextual model and the second spatial-contextual model based on a relationship with adjacent connected nodes to determine a distance between the first image and the second image

01 Jun 2008
TL;DR: This paper proposes a scalable framework for annotation-based video search, as well as a novel approach to enable large-scale semantic concept annotation, that is, online multi-label active learning, scalable to both the video sample dimension and concept label dimension.
Abstract: Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real content-based video search. However, due to the complexity of both video data and semantic concepts, existing techniques on automatic video annotation are still not able to handle large-scale video set and large-scale concept set, in terms of both annotation accuracy and computation cost. To address this problem, in this paper, we propose a scalable framework for annotation-based video search, as well as a novel approach to enable large-scale semantic concept annotation, that is, online multi-label active learning. This framework is scalable to both the video sample dimension and concept label dimension. Large-scale unlabeled video samples are assumed to arrive consecutively in batches with an initial pre-labeled training set, based on which a preliminary multi-label classifier is built. For each arrived batch, a multi-label active learning engine is applied, which automatically selects and manually annotates a set of unlabeled sample-label pairs. And then an online learner updates the original classifier by taking the newly labeled sample-label pairs into consideration. This process repeats until all data are arrived. During the process, new labels, even without any pre-labeled training samples, can be incorporated into the process anytime. Experiments on TRECVID dataset demonstrate the effectiveness and efficiency of the proposed framework.

Proceedings ArticleDOI
26 Aug 2008
TL;DR: The proposed approach takes a query-document pair as a sample and extracts a set of query-independent textual and visual features from each pair, suitable for a real-world video search system since the learned relevance relation is independent on any query.
Abstract: Most of existing learning-based methods for query-by-example take the query examples as ldquopositiverdquo and build a model for each query. These methods, referred to as query-dependent, only achieved limited success as they can hardly be applied to real-world applications, in which an arbitrary query is usually given. To address this problem, we propose to learn a query-independent model by exploiting the relevance information which exists in the pair of query-document. The proposed approach takes a query-document pair as a sample and extracts a set of query-independent textual and visual features from each pair. It is general and suitable for a real-world video search system since the learned relevance relation is independent on any query. We conducted extensive experiments over TRECVID 2005-2007 corpus and shown superior performance (+37% in Mean Average Precision) to the query-dependent learning approaches.