scispace - formally typeset
Open AccessProceedings Article

Cross-Language Text Classification Using Structural Correspondence Learning

TLDR
A new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation, is presented, using unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences.
Abstract
We present a new approach to cross-language text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce task-specific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of inter-language correspondence modeling. We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.

read more

Citations
More filters
Proceedings ArticleDOI

Deep Pivot-Based Modeling for Cross-language Cross-domain Transfer with Minimal Guidance

TL;DR: This work proposes a framework that builds on pivot-based learning, structure-aware Deep Neural Networks and bilingual word embeddings, with the goal of training a model on labeled data from one language, domain so that it can be effectively applied to another (language, domain) pair.
Posted Content

Transfer Learning for Cross-Dataset Recognition: A Survey

TL;DR: A taxonomy of cross-dataset scenarios and problems is proposed according the properties of data that define how different datasets are diverged, thereby review the recent advances on each specific problem under different scenarios.
Proceedings Article

Modeling Review Argumentation for Robust Sentiment Analysis

TL;DR: It is claimed that even a shallow model of the argumentation of a text allows for an effective and more robust classification, while providing intuitive explanations of the classification results.
Posted Content

Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

TL;DR: The authors presented an extensive literature survey on the use of typological information in the development of NLP techniques and showed that to date, using information in existing typological databases has resulted in consistent but modest improvements in system performance, due to both intrinsic limitations of databases and under-employment of the typological features included in them.
Journal Article

Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification.

TL;DR: In this paper, a distributional correspondence indexing (DCI) method is proposed for domain adaptation in sentiment classification, which derives term representations in a vector space common to both domains where each dimension re ects its distributional correspondences to a pivot, i.e., a highly predictive term that behaves similarly across domains.
References
More filters
Journal ArticleDOI

Regularization and variable selection via the elastic net

TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.

Thumbs up? Sentiment Classiflcation using Machine Learning Techniques

TL;DR: In this paper, the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, was considered and three machine learning methods (Naive Bayes, maximum entropy classiflcation, and support vector machines) were employed.
Proceedings ArticleDOI

Thumbs up? Sentiment Classification using Machine Learning Techniques

TL;DR: This work considers the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative, and concludes by examining factors that make the sentiment classification problem more challenging.
Proceedings Article

Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification

TL;DR: This work extends to sentiment classification the recently-proposed structural correspondence learning (SCL) algorithm, reducing the relative error due to adaptation between domains by an average of 30% over the original SCL algorithm and 46% over a supervised baseline.
Journal ArticleDOI

Pegasos: primal estimated sub-gradient solver for SVM

TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.