Position-aware Attention and Supervised Data Improve Slot Filling

doi:10.18653/V1/D17-1004

Open AccessProceedings ArticleDOI

Position-aware Attention and Supervised Data Improve Slot Filling

Yuhao Zhang, +4 more

- pp 35-45

Chats0

TLDR

An effective new model is proposed, which combines an LSTM sequence model with a form of entity position-aware attention that is better suited to relation extraction that builds TACRED, a large supervised relation extraction dataset obtained via crowdsourcing and targeted towards TAC KBP relations.

Abstract:

Organized relational knowledge in the form of “knowledge graphs” is important for many applications. However, the ability to populate knowledge bases with facts automatically extracted from documents has improved frustratingly slowly. This paper simultaneously addresses two issues that have held back prior work. We first propose an effective new model, which combines an LSTM sequence model with a form of entity position-aware attention that is better suited to relation extraction. Then we build TACRED, a large (119,474 examples) supervised relation extraction dataset obtained via crowdsourcing and targeted towards TAC KBP relations. The combination of better supervised data and a more appropriate high-capacity model enables much better relation extraction performance. When the model trained on this new dataset replaces the previous relation extraction component of the best TAC KBP 2015 slot filling system, its F1 score increases markedly from 22.2% to 26.7%.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Know What You Don't Know: Unanswerable Questions for SQuAD

Pranav Rajpurkar, +2 more

TL;DR: SQuADRUn as discussed by the authors is a new dataset that combines the existing Stanford Question Answering Dataset with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones.

...read moreread less

Proceedings ArticleDOI

ERNIE: Enhanced Language Representation with Informative Entities

Zhengyan Zhang, +5 more

TL;DR: This paper utilizes both large-scale textual corpora and KGs to train an enhanced language representation model (ERNIE) which can take full advantage of lexical, syntactic, and knowledge information simultaneously, and is comparable with the state-of-the-art model BERT on other common NLP tasks.

...read moreread less

Journal ArticleDOI

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, +5 more

- 12 Mar 2020 -

Transactions of the Association for Comp...

TL;DR: The approach extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it.

...read moreread less

Posted Content

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, +5 more

- 24 Jul 2019 -

arXiv: Computation and Language

TL;DR: SpanBERT as discussed by the authors extends BERT by masking contiguous random spans, rather than random tokens, and training the span boundary representations to predict the entire content of the masked span, without relying on the individual token representations within it.

...read moreread less

Proceedings ArticleDOI

Graph Convolution over Pruned Dependency Trees Improves Relation Extraction.

Yuhao Zhang, +2 more

TL;DR: An extension of graph convolutional networks that is tailored for relation extraction, which pools information over arbitrary dependency structures efficiently in parallel is proposed, and a novel pruning strategy is applied to the input trees by keeping words immediately around the shortest path between the two entities among which a relation might hold.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Journal Article

Dropout: a simple way to prevent neural networks from overfitting

Nitish Srivastava, +4 more

- 01 Jan 2014 -

Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Posted Content

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

- 01 Sep 2014 -

arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Collapse

Position-aware Attention and Supervised Data Improve Slot Filling

Citations

Know What You Don't Know: Unanswerable Questions for SQuAD

ERNIE: Enhanced Language Representation with Informative Entities

SpanBERT: Improving Pre-training by Representing and Predicting Spans

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Graph Convolution over Pruned Dependency Trees Improves Relation Extraction.

References

Long short-term memory

Dropout: a simple way to prevent neural networks from overfitting

Glove: Global Vectors for Word Representation

Neural Machine Translation by Jointly Learning to Align and Translate

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Relation Classification via Convolutional Deep Neural Network

Distant supervision for relation extraction without labeled data

Glove: Global Vectors for Word Representation

Modeling relations and their mentions without labeled text