scispace - formally typeset
Search or ask a question

Showing papers by "Ivor W. Tsang published in 2009"


Proceedings ArticleDOI
14 Jun 2009
TL;DR: A new data-dependent regularizer based on smoothness assumption into Least-Squares SVM (LS-SVM), which enforces that the target classifier shares similar decision values with the auxiliary classifiers from relevant source domains on the unlabeled patterns of the target domain.
Abstract: We propose a multiple source domain adaptation method, referred to as Domain Adaptation Machine (DAM), to learn a robust decision function (referred to as target classifier) for label prediction of patterns from the target domain by leveraging a set of pre-computed classifiers (referred to as auxiliary/source classifiers) independently learned with the labeled patterns from multiple source domains. We introduce a new data-dependent regularizer based on smoothness assumption into Least-Squares SVM (LS-SVM), which enforces that the target classifier shares similar decision values with the auxiliary classifiers from relevant source domains on the unlabeled patterns of the target domain. In addition, we employ a sparsity regularizer to learn a sparse target classifier. Comprehensive experiments on the challenging TRECVID 2005 corpus demonstrate that DAM outperforms the existing multiple source domain adaptation methods for video concept detection in terms of effectiveness and efficiency.

361 citations


Proceedings ArticleDOI
20 Jun 2009
TL;DR: This work proposes a new cross-domain kernel learning method, referred to as Domain Transfer SVM, that simultaneously learns a kernel function and a robust SVM classifier by minimizing both the structural risk functional of SVM and the distribution mismatch of labeled and unlabeled samples between the auxiliary and target domains.
Abstract: Cross-domain learning methods have shown promising results by leveraging labeled patterns from auxiliary domains to learn a robust classifier for target domain, which has a limited number of labeled samples. To cope with the tremendous change of feature distribution between different domains in video concept detection, we propose a new cross-domain kernel learning method. Our method, referred to as Domain Transfer SVM (DTSVM), simultaneously learns a kernel function and a robust SVM classifier by minimizing both the structural risk functional of SVM and the distribution mismatch of labeled and unlabeled samples between the auxiliary and target domains. Comprehensive experiments on the challenging TRECVID corpus demonstrate that DTSVM outperforms existing cross-domain learning and multiple kernel learning methods.

295 citations


Journal ArticleDOI
TL;DR: Experiments on a number of synthetic and real-world data sets demonstrate that the proposed approach is more accurate, much faster, and can handle data sets that are hundreds of times larger than the largest data set reported in the MMC literature.
Abstract: Motivated by the success of large margin methods in supervised learning, maximum margin clustering (MMC) is a recent approach that aims at extending large margin methods to unsupervised learning. However, its optimization problem is nonconvex and existing MMC methods all rely on reformulating and relaxing the nonconvex optimization problem as semidefinite programs (SDP). Though SDP is convex and standard solvers are available, they are computationally very expensive and only small data sets can be handled. To make MMC more practical, we avoid SDP relaxations and propose in this paper an efficient approach that performs alternating optimization directly on the original nonconvex problem. A key step to avoid premature convergence in the resultant iterative procedure is to change the loss function from the hinge loss to the Laplacian/square loss so that overconfident predictions are penalized. Experiments on a number of synthetic and real-world data sets demonstrate that the proposed approach is more accurate, much faster (hundreds to tens of thousands of times faster), and can handle data sets that are hundreds of times larger than the largest data set reported in the MMC literature.

155 citations


Proceedings Article
15 Apr 2009
TL;DR: A novel convex optimization method, LG-MMC, which maximizes the margin of opposite clusters via “Label Generation”, which is much more scalable than existing convex approaches and tighter than state-of-art convex MMCs.
Abstract: Maximum margin principle has been successfully applied to many supervised and semi-supervised problems in machine learning. Recently, this principle was extended for clustering, referred to as Maximum Margin Clustering (MMC) and achieved promising performance in recent studies. To avoid the problem of local minima, MMC can be solved globally via convex semi-definite programming (SDP) relaxation. Although many efficient approaches have been proposed to alleviate the computational burden of SDP, convex MMCs are still not scalable for medium data sets. In this paper, we propose a novel convex optimization method, LG-MMC, which maximizes the margin of opposite clusters via “Label Generation”. It can be shown that LG-MMC is much more scalable than existing convex approaches. Moreover, we show that our convex relaxation is tighter than state-of-art convex MMCs. Experiments on seventeen UCI datasets and MNIST dataset show significant improvement over existing MMC algorithms.

139 citations


Proceedings Article
11 Jul 2009
TL;DR: Transfer Component Analysis (TCA) as discussed by the authors tries to learn some transfer components across domains in a Reproducing Kernel Hilbert Space (RKHS) using Maximum Mean Discrepancy (MMD).
Abstract: Domain adaptation solves a learning problem in a target domain by utilizing the training data in a different but related source domain. Intuitively, discovering a good feature representation across domains is crucial. In this paper, we propose to find such a representation through a new learning method, transfer component analysis (TCA), for domain adaptation. TCA tries to learn some transfer components across domains in a Reproducing Kernel Hilbert Space (RKHS) using Maximum Mean Discrepancy (MMD). In the subspace spanned by these transfer components, data distributions in different domains are close to each other. As a result, with the new representations in this subspace, we can apply standard machine learning methods to train classifiers or regression models in the source domain for use in the target domain. The main contribution of our work is that we propose a novel feature representation in which to perform domain adaptation via a new parametric kernel using feature extraction methods, which can dramatically minimize the distance between domain distributions by projecting data onto the learned transfer components. Furthermore, our approach can handle large datsets and naturally lead to out-of-sample generalization. The effectiveness and efficiency of our approach in are verified by experiments on two real-world applications: cross-domain indoor WiFi localization and cross-domain text classification.

104 citations


Book ChapterDOI
27 Aug 2009
TL;DR: Two convex optimization methods which maximize the margin of concepts via key instance generation at the instance-level and bag-level, respectively are proposed which can effectively locate ROIs and achieve performances competitive with state-of-the-art algorithms on benchmark data sets.
Abstract: In content-based image retrieval (CBIR) and image screening, it is often desirable to locate the regions of interest (ROI) in the images automatically. This can be accomplished with multi-instance learning techniques by treating each image as a bag of instances (regions). Many SVM-based methods are successful in predicting the bag labels, however, few of them can locate the ROIs. Moreover, they are often based on either local search or an EM-style strategy, and may get stuck in local minima easily. In this paper, we propose two convex optimization methods which maximize the margin of concepts via key instance generation at the instance-level and bag-level, respectively. Our formulation can be solved efficiently with a cutting plane algorithm. Experiments show that the proposed methods can effectively locate ROIs, and they also achieve performances competitive with state-of-the-art algorithms on benchmark data sets.

100 citations


Proceedings ArticleDOI
28 Jun 2009
TL;DR: This work proposes a domain adaptation method that parameterizes this concept space by linear transformation under which it explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the sourcedomain.
Abstract: One common predictive modeling challenge occurs in text mining problems is that the training data and the operational (testing) data are drawn from different underlying distributions This poses a great difficulty for many statistical learning methods However, when the distribution in the source domain and the target domain are not identical but related, there may exist a shared concept space to preserve the relation Consequently a good feature representation can encode this concept space and minimize the distribution gap To formalize this intuition, we propose a domain adaptation method that parameterizes this concept space by linear transformation under which we explicitly minimize the distribution difference between the source domain with sufficient labeled data and target domains with only unlabeled data, while at the same time minimizing the empirical loss on the labeled data in the source domain Another characteristic of our method is its capability for considering multiple classes and their interactions simultaneously We have conducted extensive experiments on two common text mining problems, namely, information extraction and document classification to demonstrate the effectiveness of our proposed method

73 citations


Proceedings Article
11 Jul 2009
TL;DR: A new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clusters as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the data.
Abstract: In this paper, we propose a new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clustering as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the data. SEC is based on the observation that the cluster assignment matrix of high dimensional data can be represented by a low dimensional linear mapping of data. We also discover the connection between SEC and other clustering methods, such as spectral clustering, Clustering with local and global regularization, K-means and Discriminative K-means. The experiments on many real-world data sets show that SEC significantly out-performs the existing spectral clustering methods as well as K-means clustering related methods.

72 citations


Proceedings ArticleDOI
14 Jun 2009
TL;DR: This paper proposes an efficient approach to NPK learning from side information, referred to as Simple NPKL, which can efficiently learn non-parametric kernels from large sets of pairwise constraints and shows that the proposed SimpleNPKL with linear loss has a closed-form solution that can be simply computed by the Lanczos algorithm.
Abstract: Previous studies of Non-Parametric Kernel (NPK) learning usually reduce to solving some Semi-Definite Programming (SDP) problem by a standard SDP solver. However, time complexity of standard interior-point SDP solvers could be as high as O(n6.5). Such intensive computation cost prohibits NPK learning applicable to real applications, even for data sets of moderate size. In this paper, we propose an efficient approach to NPK learning from side information, referred to as SimpleNPKL, which can efficiently learn non-parametric kernels from large sets of pairwise constraints. In particular, we show that the proposed SimpleNPKL with linear loss has a closed-form solution that can be simply computed by the Lanczos algorithm. Moreover, we show that the SimpleNPKL with square hinge loss can be re-formulated as a saddle-point optimization task, which can be further solved by a fast iterative algorithm. In contrast to the previous approaches, our empirical results show that our new technique achieves the same accuracy, but is significantly more efficient and scalable.

35 citations


Proceedings ArticleDOI
19 Oct 2009
TL;DR: A (quasi) real-time textual query based personal photo retrieval system by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.) to improve the photo retrieval performance.
Abstract: The rapid popularization of digital cameras and mobile phone cameras has lead to an explosive growth of consumer photo collections. In this paper, we present a (quasi) real-time textual query based personal photo retrieval system by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.). After a user provides a textual query (e.g., "pool"), our system exploits the inverted file method to automatically find the positive web images that are related to the textual query "pool" as well as the negative web images which are irrelevant to the textual query. Based on these automatically retrieved relevant and irrelevant web images, we employ two simple but effective classification methods, k Nearest Neighbor (kNN) and decision stumps, to rank personal consumer photos. To further improve the photo retrieval performance, we propose three new relevance feedback methods via cross-domain learning. These methods effectively utilize both the web images and the consumer images. In particular, our proposed cross-domain learning methods can learn robust classifiers with only a very limited amount of labeled consumer photos from the user by leveraging the pre-learned decision stumps at interactive response time. Extensive experiments on both consumer and professional stock photo datasets demonstrated the effectiveness and efficiency of our system, which is also inherently not limited by any predefined lexicon.

33 citations


Journal ArticleDOI
TL;DR: A nonlinear generalization of the popular maximum-likelihood linear regression (MLLR) adaptation algorithm using kernel methods, called maximum penalized likelihood kernel regression adaptation (MPLKR), applies kernel regression with appropriate regularization to determine the affine model transform in a kernel-induced high-dimensional feature space.
Abstract: This paper proposes a nonlinear generalization of the popular maximum-likelihood linear regression (MLLR) adaptation algorithm using kernel methods. The proposed method, called maximum penalized likelihood kernel regression adaptation (MPLKR), applies kernel regression with appropriate regularization to determine the affine model transform in a kernel-induced high-dimensional feature space. Although this is not the first attempt of applying kernel methods to conventional linear adaptation algorithms, unlike most of other kernelized adaptation methods such as kernel eigenvoice or kernel eigen-MLLR, MPLKR has the advantage that it is a convex optimization and its solution is always guaranteed to be globally optimal. In fact, the adapted Gaussian means can be obtained analytically by simply solving a system of linear equations. From the Bayesian perspective, MPLKR can also be considered as the kernel version of maximum a posteriori linear regression (MAPLR) adaptation. Supervised and unsupervised speaker adaptation using MPLKR were evaluated on the Resource Management and Wall Street Journal 5K tasks, respectively, achieving a word error rate reduction of 23.6% and 15.5% respectively over the speaker-independently model.

Proceedings Article
01 Dec 2009
TL;DR: It is shown that this problem is equivalent to a sequential convex optimization problem by applying constraint concave-convex procedure (CCCP) and efficient algorithm with theoretical guarantee is proposed and computational issue is investigated.
Abstract: We propose a semi-supervised framework incorporating feature mapping with multiclass classification. By learning multiple classification tasks simultaneously, this framework can learn the latent feature space effectively for both labeled and unlabeled data. The knowledge in the transformed space can be transferred not only between the labeled and unlabeled data, but also across multiple classes, so as to improve the classification performance given a small amount of labeled data. We show that this problem is equivalent to a sequential convex optimization problem by applying constraint concave-convex procedure (CCCP). Efficient algorithm with theoretical guarantee is proposed and computational issue is investigated. Extensive experiments have been conducted to demonstrate the effectiveness of our proposed framework.

Proceedings ArticleDOI
19 Oct 2009
TL;DR: This demonstration presents a (quasi) real-time textual query based image retrieval system (T-IRS) for consumer photos by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc.).
Abstract: In this demonstration, we present a (quasi) real-time textual query based image retrieval system (T-IRS) for consumer photos by leveraging millions of web images and their associated rich textual descriptions (captions, categories, etc) After a user provides a textual query (eg, "boat"), our system automatically finds the positive web images that are related to the textual query "boat" as well as the negative web images which are irrelevant to the textual query Based on these automatically retrieved positive and negative web images, we employ the decision stump ensemble classifier to rank personal consumer photos To further improve the photo retrieval performance, we also develop a novel relevance feedback method, referred to as Cross-Domain Regularized Regression (CDRR), which effectively utilizes both the web images and the consumer images Our system is inherently not limited by any predefined lexicon