scispace - formally typeset
Open AccessPosted Content

Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information

TLDR
The authors proposed a densely-connected co-attentive recurrent neural network (C-RNN), which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers.
Abstract
Sentence matching is widely used in various natural language tasks such as natural language inference, paraphrase identification, and question answering. For these tasks, understanding logical and semantic relationship between two sentences is required but it is yet challenging. Although attention mechanism is useful to capture the semantic relationship and to properly align the elements of two sentences, previous methods of attention mechanism simply use a summation operation which does not retain original features enough. Inspired by DenseNet, a densely connected convolutional network, we propose a densely-connected co-attentive recurrent neural network, each layer of which uses concatenated information of attentive features as well as hidden features of all the preceding recurrent layers. It enables preserving the original and the co-attentive feature information from the bottommost word embedding layer to the uppermost recurrent layer. To alleviate the problem of an ever-increasing size of feature vectors due to dense concatenation operations, we also propose to use an autoencoder after dense concatenation. We evaluate our proposed architecture on highly competitive benchmark datasets related to sentence matching. Experimental results show that our architecture, which retains recurrent and attentive features, achieves state-of-the-art performances for most of the tasks.

read more

Citations
More filters
Proceedings ArticleDOI

FASTMATCH: Accelerating the Inference of BERT-based Text Matching

TL;DR: A novel BERT-based text matching model, in which the representations and the interactions are decoupled, which can achieve up to 100X speed-up to BERT and RoBERTa at the online matching phase, while keeping more than 98.7% of the performance.
Proceedings ArticleDOI

Multi-layer Attention Neural Network for Sentence Semantic Matching

TL;DR: This paper proposes the multi-layer attention neural network which employs multi-level attention between two sentences and employs two layers of soft-alignment attention-one after embedding layer and the other after encode layer.
Posted Content

Adversarial Examples with Difficult Common Words for Paraphrase Identification

TL;DR: A novel algorithm is proposed to generate a new type of adversarial examples to study the robustness of deep paraphrase identification models and it is shown that adversarial training with generated adversarialExamples can improve model robustness.
Dissertation

From lexical towards contextualized meaning representation

TL;DR: This thesis proposes syntax-aware token embeddings (SATokE) that capture specific linguistic information, encoding the structure of the sentence from a dependency point of view in their representations and empirically demonstrates the superiority of the token representations compared to popular distributional representations of words and to other token embedDings proposed in the literature.
Posted Content

Multi-Perspective Inferrer: Reasoning Sentences Relationship from Holistic Perspective.

TL;DR: This paper proposes the Multi-Perspective Inferrer (MPI), a novel NLI model that reasons relationships from multiple perspectives associated with the three relationships and is architecture-free and compatible with the powerful BERT.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings ArticleDOI

Densely Connected Convolutional Networks

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Posted Content

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Related Papers (5)