Cross-Language Text Classification Using Structural Correspondence Learning

Open AccessProceedings Article

Cross-Language Text Classification Using Structural Correspondence Learning

- pp 1118-1127

TLDR

A new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation, is presented, using unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences.

Abstract:

We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling. We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.

Citations

PDF

Open Access

More filters

Proceedings Article

Back to the Roots of Genres: Text Classification by Language Function

Henning Wachsmuth, +1 more

TL;DR: The linguistically motivated text classification task language function analysis, LFA, is introduced, which focuses on one well-defined aspect of genres to determine whether a text is predominantly expressive, appellative, or informative.

...read moreread less

Posted Content

Bridging the domain gap in cross-lingual document classification

Guokun Lai, +3 more

- 25 Sep 2019 -

arXiv: Computation and Language

TL;DR: It is shown that addressing the domain gap is crucial in XLU and state-of-the-art cross-lingual methods are combined with recently proposed methods for weakly supervised learning such as unsupervised pre-training and unsuper supervised data augmentation to simultaneously close both the language gap and thedomain gap.

...read moreread less

Proceedings ArticleDOI

A Multi-lingual Annotated Dataset for Aspect-Oriented Opinion Mining

Salud María Jiménez-Zafra, +5 more

TL;DR: Trip-MAML is a multi-lingual dataset for aspect-oriented opinion mining that enables researchers to face the problem on languages other than English and to the experiment the application of cross-lingUAL learning methods to the task.

...read moreread less

Book ChapterDOI

Graph-based semi-supervised learning for cross-lingual sentiment classification

Mohammad Sadegh Hajmohammadi, +2 more

TL;DR: This work proposes a new model which uses sentiment information of unlabelled data as well as labelled data in a graph-based semi-supervised learning approach so as to incorporate intrinsic structure of unlabeled data from the target language into the learning process.

...read moreread less

Journal ArticleDOI

Cross-species Data Classification by Domain Adaptation via Discriminative Heterogeneous Maximum Mean Discrepancy

Limin Li, +1 more

- 01 Jan 2021 -

IEEE/ACM Transactions on Computational B...

TL;DR: This work proposes a heterogeneous domain adaptation approach by making use of MMD, which measures the probability divergence in an embedded low-dimensional common subspace, and aims to find new representations of the samples in a common sub space by minimizing the domain probability divergence with preserving the known discriminative information.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Regularization and variable selection via the elastic net

Hui Zou, +1 more

- 01 Apr 2005 -

Journal of The Royal Statistical Society...

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

...read moreread less

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

Bo Pang, +2 more

TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.

...read moreread less

Proceedings ArticleDOI

Thumbs up? Sentiment Classification using Machine Learning Techniques

Bo Pang, +2 more

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.

...read moreread less

Proceedings Article

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification

John Blitzer, +2 more

TL;DR: This work extends to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline.

...read moreread less

Journal ArticleDOI

Pegasos: primal estimated sub-gradient solver for SVM

Shai Shalev-Shwartz, +3 more

- 01 Mar 2011 -

Mathematical Programming

TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.

...read moreread less

Collapse

Cross-Language Text Classification Using Structural Correspondence Learning

Citations

Back to the Roots of Genres: Text Classification by Language Function

Bridging the domain gap in cross-lingual document classification

A Multi-lingual Annotated Dataset for Aspect-Oriented Opinion Mining

Graph-based semi-supervised learning for cross-lingual sentiment classification

Cross-species Data Classification by Domain Adaptation via Discriminative Heterogeneous Maximum Mean Discrepancy

References

Regularization and variable selection via the elastic net

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

Thumbs up? Sentiment Classification using Machine Learning Techniques

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification

Pegasos: primal estimated sub-gradient solver for SVM

Related Papers (5)

Co-Training for Cross-Lingual Sentiment Classification

Domain Adaptation with Structural Correspondence Learning

A Survey on Transfer Learning

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification