scispace - formally typeset
Open AccessPosted Content

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information

TLDR
The authors proposed a densely-connected co-attentive recurrent neural network (C-RNN), which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers.
Abstract
Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering. For these tasks, understanding logical and semantic relationship between two sentences is required but it is yet challenging. Although attention mechanism is useful to capture the semantic relationship and to properly align the elements of two sentences, previous methods of attention mechanism simply use a summation operation which does not retain original features enough. Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers. It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer. To alleviate the problem of an ever-increasing size of feature vectors due to dense concatenation operations, we also propose to use an autoencoder after dense concatenation. We evaluate our proposed architecture on highly competitive benchmark datasets related to sentence matching. Experimental results show that our architecture, which retains recurrent and attentive features, achieves state-of-the-art performances for most of the tasks.

read more

Citations
More filters
Proceedings ArticleDOI

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models

TL;DR: The authors proposed a knowledge-grounded dialogue generation model with pre-trained language models and an unsupervised approach to jointly optimize knowledge selection and response generation with unlabeled dialogues.
Posted Content

Named Entity Recognition for Social Media Texts with Semantic Augmentation.

TL;DR: A neural-based approach to NER for social media texts where both local and augmented semantics are taken into account, and an attentive semantic augmentation module and a gate module to encode and aggregate such information are proposed.
Posted Content

Knowledge-Grounded Dialogue Generation with Pre-trained Language Models.

TL;DR: Empirical results indicate that the proposed response generation defined by a pre-trained language model with a knowledge selection module and an unsupervised approach to jointly optimizing knowledge selection and response generation with unlabeled dialogues can significantly outperform state-of-the-art methods in both automatic evaluation and human judgment.
Posted Content

Simple but effective techniques to reduce biases.

TL;DR: This work introduces an additional lightweight bias-only model which learns dataset biases and uses its prediction to adjust the loss of the base model to reduce the biases.
Journal ArticleDOI

DRr-Net: Dynamic Re-Read Network for Sentence Semantic Matching

TL;DR: A Dynamic Re-read Network (DRr-Net) approach for sentence semantic matching, which is able to pay close attention to a small region of sentences at each step and re-read the important words for better sentence semantic understanding.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings ArticleDOI

Densely Connected Convolutional Networks

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Related Papers (5)