Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms

doi:10.18653/V1/P17-1164

Home
/
Papers
/
Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms

Proceedings Article•DOI•

Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms

Shulin Liu¹, Yubo Chen¹, Kang Liu¹, Jun Zhao¹•Institutions (1)

Chinese Academy of Sciences¹

01 Jul 2017-Vol. 1, pp 1789-1798

TL;DR: This work systematically investigates the proposed model under the supervision of different attention strategies and shows that the approach advances state-of-the-arts and achieves the best F1 score on ACE 2005 dataset.

read less

Abstract: This paper tackles the task of event detection (ED), which involves identifying and categorizing events. We argue that arguments provide significant clues to this task, but they are either completely ignored or exploited in an indirect manner in existing detection approaches. In this work, we propose to exploit argument information explicitly for ED via supervised attention mechanisms. In specific, we systematically investigate the proposed model under the supervision of different attention strategies. Experimental results show that our approach advances state-of-the-arts and achieves the best F1 score on ACE 2005 dataset.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•

Graph Convolutional Networks With Argument-Aware Pooling for Event Detection

[...]

Thien Huu Nguyen¹, Ralph Grishman²•Institutions (2)

University of Oregon¹, New York University²

26 Apr 2018

TL;DR: This work investigates a convolutional neural network based on dependency trees to perform event detection and proposes a novel pooling method that relies on entity mentions to aggregate the convolution vectors.

...read moreread less

Abstract: The current neural network models for event detection have only considered the sequential representation of sentences. Syntactic representations have not been explored in this area although they provide an effective mechanism to directly link words to their informative context for event detection in the sentences. In this work, we investigate a convolutional neural network based on dependency trees to perform event detection. We propose a novel pooling method that relies on entity mentions to aggregate the convolution vectors. The extensive experiments demonstrate the benefits of the dependency-based convolutional neural networks and the entity mention-based pooling method for event detection. We achieve the state-of-the-art performance on widely used datasets with both perfect and predicted entity mentions.

...read moreread less

311 citations

Cites background from "Exploiting Argument Information to ..."

...Syntactic dependency graphs represents sentences as directed trees with head-modifier dependency arcs between related words (Mcdonald and Pereira 2006; Koo, Carreras, and Collins 2008)....
[...]

Proceedings Article•DOI•

Jointly Multiple Events Extraction via Attention-based Graph Information Aggregation

[...]

Xiao Liu¹, Zhunchen Luo², Heyan Huang¹•Institutions (2)

Beijing Institute of Technology¹, Academy of Military Science²

01 Jan 2018

TL;DR: This paper proposed a joint multiple event extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information flow and attention-based graph convolution networks to model graph information.

...read moreread less

Abstract: Event extraction is of practical utility in natural language processing. In the real world, it is a common phenomenon that multiple events existing in the same sentence, where extracting them are more difficult than extracting a single event. Previous works on modeling the associations between events by sequential modeling methods suffer a lot from the low efficiency in capturing very long-range dependencies. In this paper, we propose a novel Jointly Multiple Events Extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information flow and attention-based graph convolution networks to model graph information. The experiment results demonstrate that our proposed framework achieves competitive results compared with state-of-the-art methods.

...read moreread less

295 citations

Proceedings Article•DOI•

Exploring Pre-trained Language Models for Event Extraction and Generation

[...]

Sen Yang¹, Dawei Feng¹, Linbo Qiao¹, Zhigang Kan¹, Dongsheng Li¹ - Show less +1 more•Institutions (1)

National University of Defense Technology¹

01 Jul 2019

TL;DR: This work proposes an event extraction model to overcome the roles overlap problem by separating the argument prediction in terms of roles, and proposes a method to automatically generate labeled data by editing prototypes and screen out generated samples by ranking the quality.

...read moreread less

Abstract: Traditional approaches to the task of ACE event extraction usually depend on manually annotated data, which is often laborious to create and limited in size. Therefore, in addition to the difficulty of event extraction itself, insufficient training data hinders the learning process as well. To promote event extraction, we first propose an event extraction model to overcome the roles overlap problem by separating the argument prediction in terms of roles. Moreover, to address the problem of insufficient training data, we propose a method to automatically generate labeled data by editing prototypes and screen out generated samples by ranking the quality. Experiments on the ACE2005 dataset demonstrate that our extraction model can surpass most existing extraction methods. Besides, incorporating our generation method exhibits further significant improvement. It obtains new state-of-the-art results on the event extraction task, including pushing the F1 score of trigger classification to 81.1%, and the F1 score of argument classification to 58.9%.

...read moreread less

228 citations

Cites background or methods from "Exploiting Argument Information to ..."

...While Liu et al. (2016, 2017) manages to mine additional events from the frames in FrameNet....
[...]
..., 2016) improves extraction with additionally events automatically detected from FrameNet, while ANNAugATT (Liu et al., 2017) exploits argument information via the supervised attention mechanisms to improve the performance further....
[...]
...ANN-FN (Liu et al., 2016) improves extraction with additionally events automatically detected from FrameNet, while ANNAugATT (Liu et al., 2017) exploits argument information via the supervised attention mechanisms to improve the performance further....
[...]

Journal Article•DOI•

A language-independent neural network for event detection

[...]

Xiaocheng Feng¹, Bing Qin¹, Ting Liu¹•Institutions (1)

Harbin Institute of Technology¹

03 Aug 2018-Science in China Series F: Information Sciences

TL;DR: A language-independent neural network is developed to capture both sequence and chunk information from specific contexts and use them to train an event detector for multiple languages without any manually encoded features.

...read moreread less

Abstract: Event detection remains a challenge because of the difficulty of encoding the word semantics in various contexts.Previous approaches have heavily depended on language-specific knowledge and pre-existing natural language processing tools.However, not all languages have such resources and tools available compared with English language.A more promising approach is to automatically learn effective features from data, without relying on language-specific resources.In this study, we develop a language-independent neural network tocapture both sequence and chunk information from specific contexts and use them to train an event detector for multiple languages without any manually encoded features.Experiments show that our approach can achieve robust, efficient and accurate results for various languages.In the ACE 2005 English event detection task, our approach achieved a 73.4% F-score with an average of 3.0% absolute improvement compared with state-of-the-art.Additionally, our experimental results are competitive for Chinese and Spanish.

...read moreread less

178 citations

Posted Content•

An Introductory Survey on Attention Mechanisms in NLP Problems

[...]

Dichao Hu¹•Institutions (1)

Georgia Institute of Technology¹

12 Nov 2018-arXiv: Computation and Language

TL;DR: An introductory summary of the attention mechanism in different NLP problems is conducted, aiming to provide basic knowledge on this widely used method, to discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.

...read moreread less

Abstract: First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summarization, question answering, dependency parsing, etc. In this paper, we survey through recent works and conduct an introductory summary of the attention mechanism in different NLP problems, aiming to provide our readers with basic knowledge on this widely used method, discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.

...read moreread less

135 citations

Cites methods from "Exploiting Argument Information to ..."

...paring the attention distribution with the gold alignment data, and quantiﬁed using alignment error rate (AER). Similarly, Liu et al. proposed a method to manually construct ”gold attention vectors” (Liu et al. 2017) by ﬁrst identifying labelled key words within a sentence and then conducting post-processing procedures such as smoothing and normalization, given abundant well-annotated data. For example, for the s...
[...]
...Similarly, Liu et al. proposed a method to manually construct ”gold attention vectors” (Liu et al. 2017) by first identifying labelled key words within a sentence and then conducting post-processing procedures such as smoothing and normalization, given abundant well-annotated data....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Collapse

References

PDF

Open Access

More filters

Posted Content•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013-arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

...read moreread less

20,077 citations

Proceedings Article•DOI•

Convolutional Neural Networks for Sentence Classification

[...]

Yoon Kim¹•Institutions (1)

New York University¹

25 Aug 2014

TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.

...read moreread less

Abstract: We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

...read moreread less

9,776 citations

"Exploiting Argument Information to ..." refers background in this paper

...Regularization is implemented by a dropout (Kim, 2014; Hinton et al., 2012) and L2 norm....
[...]

Posted Content•

Convolutional Neural Networks for Sentence Classification

[...]

Yoon Kim¹•Institutions (1)

New York University¹

25 Aug 2014-arXiv: Computation and Language

TL;DR: In this article, CNNs are trained on top of pre-trained word vectors for sentence-level classification tasks and a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks.

...read moreread less

7,826 citations

Posted Content•

Improving neural networks by preventing co-adaptation of feature detectors

[...]

Geoffrey E. Hinton¹, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov¹ - Show less +1 more•Institutions (1)

University of Toronto¹

03 Jul 2012-arXiv: Neural and Evolutionary Computing

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.

...read moreread less

Abstract: When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

...read moreread less

6,899 citations

"Exploiting Argument Information to ..." refers background in this paper

...Regularization is implemented by a dropout (Kim, 2014; Hinton et al., 2012) and L2 norm....
[...]

Journal Article•DOI•

A neural probabilistic language model

[...]

Yoshua Bengio¹, Réjean Ducharme¹, Pascal Vincent¹, Christian Janvin¹•Institutions (1)

Université de Montréal¹

01 Mar 2003-Journal of Machine Learning Research

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.

...read moreread less

Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

...read moreread less

6,832 citations