Open AccessProceedings Article
Cross-Language Text Classification Using Structural Correspondence Learning
Peter Prettenhofer,Benno Stein +1 more
- pp 1118-1127
TLDR
A new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation, is presented, using unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences.Abstract:
We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling.
We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.read more
Citations
More filters
Proceedings ArticleDOI
Genetic Programming for Domain Adaptation in Product Reviews
TL;DR: This work model the features in each sentence using a variable length tree called a Genetic Program, which is able to outperform the accuracy of baseline multi-domain models in the range of 5–20%.
Proceedings ArticleDOI
Transition-based Adversarial Network for Cross-lingual Aspect Extraction
Wenya Wang,Sinno Jialin Pan +1 more
TL;DR: A novel deep model to transfer knowledge from a source language with labeled training data to a target language without any annotations is developed and achieves state-of-the-art performance on English, French and Spanish restaurant review datasets.
Journal ArticleDOI
Cross-Language Latent Relational Search between Japanese and English Languages Using a Web Corpus
TL;DR: To perform cross-language latent relational search in high speed, a multi-lingual indexing method for storing entities and lexical patterns that represent the semantic relations extracted from Web corpora is proposed and a hybrid lexical pattern clustering algorithm is proposed to capture the semantic similarity between Lexical patterns across languages.
Proceedings ArticleDOI
Cross-Domain Labeled LDA for Cross-Domain Text Classification
TL;DR: This work proposes a novel group alignment which aligns the semantics at group level and embeds it into a cross-domain topic model, and proposes a Cross-Domain Labeled LDA (CDL-LDA).
Proceedings ArticleDOI
Segmentation-Free Word Embedding for Unsegmented Languages
TL;DR: This paper considers word co-occurrence statistics over all possible candidates of segmentations based on frequent character n-grams instead of segmented sentences provided by conventional word segmenters to propose segmentation-free word embedding, which does not require word segmentation as a preprocessing step.
References
More filters
Journal ArticleDOI
Regularization and variable selection via the elastic net
Hui Zou,Trevor Hastie +1 more
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Thumbs up? Sentiment Classiflcation using Machine Learning Techniques
TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Proceedings ArticleDOI
Thumbs up? Sentiment Classification using Machine Learning Techniques
TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Proceedings Article
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
TL;DR: This work extends to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline.
Journal ArticleDOI
Pegasos: primal estimated sub-gradient solver for SVM
TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.