Multinomial Adversarial Networks for Multi-Domain Text Classification

doi:10.18653/V1/N18-1111

Open AccessProceedings ArticleDOI

Multinomial Adversarial Networks for Multi-Domain Text Classification

Xilun Chen, +1 more

- Vol. 1, pp 1226-1240

Chats0

TLDR

A multinomial adversarial network (MAN) to tackle the real-world problem of multi-domain text classification (MDTC) in which labeled data may exist for multiple domains, but in insufficient amounts to train effective classifiers for one or more of the domains.

Abstract:

Many text classification tasks are known to be highly domain-dependent. Unfortunately, the availability of training data can vary drastically across domains. Worse still, for some domains there may not be any annotated data at all. In this work, we propose a multinomial adversarial network (MAN) to tackle this real-world problem of multi-domain text classification (MDTC) in which labeled data may exist for multiple domains, but in insufficient amounts to train effective classifiers for one or more of the domains. We provide theoretical justifications for the MAN framework, proving that different instances of MANs are essentially minimizers of various f-divergence metrics (Ali and Silvey, 1966) among multiple probability distributions. MANs are thus a theoretically sound generalization of traditional adversarial networks that discriminate over two distributions. More specifically, for the MDTC task, MAN learns features that are invariant across multiple domains by resorting to its ability to reduce the divergence among the feature distributions of each domain. We present experimental results showing that MANs significantly outperform the prior art on the MDTC task. We also show that MANs achieve state-of-the-art performance for domains with no labeled data.

Citations

PDF

Open Access

More filters

Posted Content

WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale

Keisuke Sakaguchi, +3 more

- 24 Jul 2019 -

arXiv: Computation and Language

TL;DR: The authors introduced WinoGrande, a large-scale dataset of 44k problems, inspired by the original Winograd Schema Challenge (WSC) design, but adjusted to improve both the scale and the hardness of the dataset.

...read moreread less

Proceedings ArticleDOI

Multi-Source Domain Adaptation with Mixture of Experts

Jiang Guo, +2 more

TL;DR: This article propose a mixture-of-experts approach for unsupervised domain adaptation from multiple sources, which explicitly captures the relationship between a target example and different source domains. But their approach is limited to sentiment analysis and partof-speech tagging.

...read moreread less

Proceedings ArticleDOI

Multi-Source Cross-Lingual Model Transfer: Learning What to Share

Xilun Chen, +4 more

TL;DR: This model leverages adversarial networks to learn language-invariant features, and mixture-of-experts models to dynamically exploit the similarity between the target language and each individual source language to further boost target language performance.

...read moreread less

Posted Content

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

Jing Han, +3 more

- 21 Sep 2018 -

arXiv: Computation and Language

TL;DR: A comprehensive overview of the application of adversarial training to affective computing and sentiment analysis is provided in this paper, with a range of potential future research directions highlighted in this paper.

...read moreread less

Proceedings ArticleDOI

Domain-agnostic Question-Answering with Adversarial Training

Seanie Lee, +2 more

TL;DR: An adversarial training framework for domain generalization in Question Answering (QA) task is utilized, where the two models constantly compete, so that QA model can learn domain-invariant features.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Sergey Ioffe, +1 more

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Posted Content

Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, +3 more

- 16 Jan 2013 -

arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Automatic differentiation in PyTorch

Adam Paszke, +9 more

TL;DR: An automatic differentiation module of PyTorch is described — a library designed to enable rapid research on machine learning models that focuses on differentiation of purely imperative programs, with a focus on extensibility and low overhead.

...read moreread less

Collapse

Neural Computation

Deep contextualized word representations

Matthew E. Peters, +6 more

Multinomial Adversarial Networks for Multi-Domain Text Classification

Citations

WINOGRANDE: An Adversarial Winograd Schema Challenge at Scale

Multi-Source Domain Adaptation with Mixture of Experts

Multi-Source Cross-Lingual Model Transfer: Learning What to Share

Adversarial Training in Affective Computing and Sentiment Analysis: Recent Advances and Perspectives

Domain-agnostic Question-Answering with Adversarial Training

References

Adam: A Method for Stochastic Optimization

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

Efficient Estimation of Word Representations in Vector Space

Neural Machine Translation by Jointly Learning to Align and Translate

Automatic differentiation in PyTorch

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Generative Adversarial Nets

Adam: A Method for Stochastic Optimization

Long short-term memory

Deep contextualized word representations