Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information

Open AccessPosted Content

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information

- 29 May 2018 -

TLDR

The authors proposed a densely-connected co-attentive recurrent neural network (C-RNN), which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers.

Abstract:

Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering. For these tasks, understanding logical and semantic relationship between two sentences is required but it is yet challenging. Although attention mechanism is useful to capture the semantic relationship and to properly align the elements of two sentences, previous methods of attention mechanism simply use a summation operation which does not retain original features enough. Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers. It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer. To alleviate the problem of an ever-increasing size of feature vectors due to dense concatenation operations, we also propose to use an autoencoder after dense concatenation. We evaluate our proposed architecture on highly competitive benchmark datasets related to sentence matching. Experimental results show that our architecture, which retains recurrent and attentive features, achieves state-of-the-art performances for most of the tasks.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models

Xueliang Zhao, +5 more

TL;DR: The authors proposed a knowledge-grounded dialogue generation model with pre-trained language models and an unsupervised approach to jointly optimize knowledge selection and response generation with unlabeled dialogues.

...read moreread less

Posted Content

Named Entity Recognition for Social Media Texts with Semantic Augmentation.

Yuyang Nie, +4 more

- 29 Oct 2020 -

arXiv: Computation and Language

TL;DR: A neural-based approach to NER for social media texts where both local and augmented semantics are taken into account, and an attentive semantic augmentation module and a gate module to encode and aggregate such information are proposed.

...read moreread less

Posted Content

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models.

Xueliang Zhao, +5 more

- 17 Oct 2020 -

arXiv: Computation and Language

TL;DR: Empirical results indicate that the proposed response generation defined by a pre-trained language model with a knowledge selection module and an unsupervised approach to jointly optimizing knowledge selection and response generation with unlabeled dialogues can significantly outperform state-of-the-art methods in both automatic evaluation and human judgment.

...read moreread less

Posted Content

Simple but effective techniques to reduce biases.

Rabeeh Karimi Mahabadi, +1 more

- 13 Sep 2019 -

arXiv: Computation and Language

TL;DR: This work introduces an additional lightweight bias-only model which learns dataset biases and uses its prediction to adjust the loss of the base model to reduce the biases.

...read moreread less

Journal ArticleDOI

DRr-Net: Dynamic Re-Read Network for Sentence Semantic Matching

Kun Zhang, +6 more

TL;DR: A Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings ArticleDOI

Densely Connected Convolutional Networks

Gao Huang, +3 more

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Posted Content

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

- 16 Oct 2013 -

arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

Collapse

arXiv: Learning

Deep contextualized word representations

Matthew E. Peters, +6 more

- 15 Feb 2018 -

arXiv: Computation and Language

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information

Citations

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models

Named Entity Recognition for Social Media Texts with Semantic Augmentation.

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models.

Simple but effective techniques to reduce biases.

DRr-Net: Dynamic Re-Read Network for Sentence Semantic Matching

References

Deep Residual Learning for Image Recognition

Glove: Global Vectors for Word Representation

Densely Connected Convolutional Networks

Distributed Representations of Words and Phrases and their Compositionality

Distributed Representations of Words and Phrases and their Compositionality

Related Papers (5)

Glove: Global Vectors for Word Representation

A large annotated corpus for learning natural language inference

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Adam: A Method for Stochastic Optimization

Deep contextualized word representations