scispace - formally typeset
Search or ask a question
Proceedings Article

Cross-Language Text Classification Using Structural Correspondence Learning

11 Jul 2010-pp 1118-1127
TL;DR: A new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation, is presented, using unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences.
Abstract: We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling. We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.
Citations
More filters
Journal ArticleDOI
TL;DR: This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied toTransfer learning, which can be applied to big data environments.
Abstract: Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.

2,900 citations

Posted Content
TL;DR: An overview of domain adaptation and transfer learning with a specific view on visual applications and the methods that go beyond image categorization, such as object detection or image segmentation, video analyses or learning visual attributes are overviewed.
Abstract: The aim of this paper is to give an overview of domain adaptation and transfer learning with a specific view on visual applications. After a general motivation, we first position domain adaptation in the larger transfer learning problem. Second, we try to address and analyze briefly the state-of-the-art methods for different types of scenarios, first describing the historical shallow methods, addressing both the homogeneous and the heterogeneous domain adaptation methods. Third, we discuss the effect of the success of deep convolutional architectures which led to new type of domain adaptation methods that integrate the adaptation within the deep architecture. Fourth, we overview the methods that go beyond image categorization, such as object detection or image segmentation, video analyses or learning visual attributes. Finally, we conclude the paper with a section where we relate domain adaptation to other machine learning solutions.

454 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel method called Heterogeneous Feature Augmentation (HFA) based on SVM which can simultaneously learn the target classifier as well as infer the labels of unlabeled target samples and shows that the SHFA and HFA outperform the existing HDA methods.
Abstract: In this paper, we study the heterogeneous domain adaptation (HDA) problem, in which the data from the source domain and the target domain are represented by heterogeneous features with different dimensions. By introducing two different projection matrices, we first transform the data from two domains into a common subspace such that the similarity between samples across different domains can be measured. We then propose a new feature mapping function for each domain, which augments the transformed samples with their original features and zeros. Existing supervised learning methods ( e.g., SVM and SVR) can be readily employed by incorporating our newly proposed augmented feature representations for supervised HDA. As a showcase, we propose a novel method called Heterogeneous Feature Augmentation (HFA) based on SVM. We show that the proposed formulation can be equivalently derived as a standard Multiple Kernel Learning (MKL) problem, which is convex and thus the global solution can be guaranteed. To additionally utilize the unlabeled data in the target domain, we further propose the semi-supervised HFA (SHFA) which can simultaneously learn the target classifier as well as infer the labels of unlabeled target samples. Comprehensive experiments on three different applications clearly demonstrate that our SHFA and HFA outperform the existing HDA methods.

435 citations


Cites methods from "Cross-Language Text Classification ..."

  • ...Sentiment Classification: We use the Cross-Lingual Sentiment (CLS) dataset5 [36], which is an extended version of the Multi-Domain Sentiment Dataset [2] widely used for domain adaptation....

    [...]

Posted Content
TL;DR: A new learning method for heterogeneous domain adaptation (HDA), in which the data from the source domain and the target domain are represented by heterogeneous features with different dimensions, and it is demonstrated that HFA outperforms the existing HDA methods.
Abstract: We propose a new learning method for heterogeneous domain adaptation (HDA), in which the data from the source domain and the target domain are represented by heterogeneous features with different dimensions. Using two different projection matrices, we first transform the data from two domains into a common subspace in order to measure the similarity between the data from two domains. We then propose two new feature mapping functions to augment the transformed data with their original features and zeros. The existing learning methods (e.g., SVM and SVR) can be readily incorporated with our newly proposed augmented feature representations to effectively utilize the data from both domains for HDA. Using the hinge loss function in SVM as an example, we introduce the detailed objective function in our method called Heterogeneous Feature Augmentation (HFA) for a linear case and also describe its kernelization in order to efficiently cope with the data with very high dimensions. Moreover, we also develop an alternating optimization algorithm to effectively solve the nontrivial optimization problem in our HFA method. Comprehensive experiments on two benchmark datasets clearly demonstrate that HFA outperforms the existing HDA methods.

315 citations


Cites background or methods from "Cross-Language Text Classification ..."

  • ...Based on structural correspondence learning (Blitzer et al., 2006), two methods (Prettenhofer & Stein, 2010; Wei & Pal, 2010) were recently proposed to extract the so-called pivot features from the source and target domains, which is specifically designed for the cross-language text classification…...

    [...]

  • ...The pioneer works (Dai et al., 2009; Prettenhofer & Stein, 2010; Wei & Pal, 2010; Yang et al., 2009; Zhu et al., 2011) are limited to some specific HDA tasks, because they required additional information to transfer the source knowledge to the target domain....

    [...]

Proceedings Article
16 Jun 2013
TL;DR: A novel multi-view learning model to integrate all features and learn the weight for every feature with respect to each cluster individually via new joint structured sparsity-inducing norms is proposed.
Abstract: Combining information from various data sources has become an important research topic in machine learning with many scientific applications. Most previous studies employ kernels or graphs to integrate different types of features, which routinely assume one weight for one type of features. However, for many problems, the importance of features in one source to an individual cluster of data can be varied, which makes the previous approaches ineffective. In this paper, we propose a novel multi-view learning model to integrate all features and learn the weight for every feature with respect to each cluster individually via new joint structured sparsity-inducing norms. The proposed multi-view learning framework allows us not only to perform clustering tasks, but also to deal with classification tasks by an extension when the labeling knowledge is available. A new efficient algorithm is derived to solve the formulated objective with rigorous theoretical proof on its convergence. We applied our new data fusion method to five broadly used multi-view data sets for both clustering and classification. In all experimental results, our method clearly outperforms other related state-of-the-art methods.

281 citations


Cites methods from "Cross-Language Text Classification ..."

  • ...With the advances of machine translation techniques, one can easily get different translations for one document (Prettenhofer & Stein, 2010), and the translation in each language can be considered as a view....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

16,538 citations


"Cross-Language Text Classification ..." refers background or methods in this paper

  • ...An alternative view of cross-language structural correspondence learning is provided by the framework of structural learning (Ando and Zhang, 2005a)....

    [...]

  • ...SCL is related to the structural learning paradigm introduced by Ando and Zhang (2005a)....

    [...]

  • ...Following Ando and Zhang (2005a) and Quattoni et al. (2007) we choose w for the target task to be w∗ = θT v∗, where v∗ is defined as follows: v∗ = argmin v∈Rk ∑ (x,y)∈DS L(y, (θTv)Tx) + λ 2 ‖v‖2 (3) Since (θT v)T = vT θ it follows that this view of CL-SCL corresponds to the induction of a new…...

    [...]

  • ...Here we propose a different approach to crosslanguage text classification which adopts ideas from the field of multi-task learning (Ando and Zhang, 2005a)....

    [...]

  • ...Ando and Zhang (2005b) present a semi-supervised learning method based on this paradigm, which generates related tasks from unlabeled data....

    [...]

01 Jan 2002
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we flnd that standard machine learning techniques deflnitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classiflcation, and support vector machines) do not perform as well on sentiment classiflcation as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classiflcation problem more challenging.

6,980 citations

Proceedings ArticleDOI
06 Jul 2002
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

6,626 citations


"Cross-Language Text Classification ..." refers result in this paper

  • ...The average accuracy is about 82%, which is consistent with prior work on monolingual sentiment analysis (Pang et al., 2002; Blitzer et al., 2007)....

    [...]

Proceedings Article
01 Jun 2007
TL;DR: This work extends to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline.
Abstract: Automatic sentiment classification has been extensively studied and applied in recent years. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is impractical. We investigate domain adaptation for sentiment classifiers, focusing on online reviews for different types of products. First, we extend to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline. Second, we identify a measure of domain similarity that correlates well with the potential for adaptation of a classifier from one domain to another. This measure could for instance be used to select a small set of domains to annotate whose trained classifiers would transfer well to many other domains.

2,239 citations


"Cross-Language Text Classification ..." refers background or result in this paper

  • ...The average accuracy is about 82%, which is consistent with prior work on monolingual sentiment analysis (Pang et al., 2002; Blitzer et al., 2007)....

    [...]

  • ...Following Blitzer et al. (2007) a review with >3 (<3) stars is labeled as positive (negative); other reviews are discarded....

    [...]

  • ...The corpus is extended with English product reviews provided by Blitzer et al. (2007)....

    [...]

Journal ArticleDOI
TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.
Abstract: We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\epsilon}$$ is $${\tilde{O}(1 / \epsilon)}$$, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require $${\Omega(1 / \epsilon^2)}$$ iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is $${\tilde{O}(d/(\lambda \epsilon))}$$, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

2,037 citations


"Cross-Language Text Classification ..." refers methods in this paper

  • ...In particular, the learning rate schedule from PEGASOS is adopted (Shalev-Shwartz et al., 2007), and the modified Huber loss, introduced by Zhang (2004), is chosen as loss function L....

    [...]

  • ...In particular, the learning rate schedule from PEGASOS is adopted (Shalev-Shwartz et al., 2007), and the modified Huber loss, introduced by Zhang (2004), is chosen as loss function L.3 SGD receives two hyperparameters as input: the number of iterations T , and the regularization parameter λ....

    [...]