scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Exploiting Argument Information to Improve Event Detection via Supervised Attention Mechanisms

01 Jul 2017-Vol. 1, pp 1789-1798
TL;DR: This work systematically investigates the proposed model under the supervision of different attention strategies and shows that the approach advances state-of-the-arts and achieves the best F1 score on ACE 2005 dataset.
Abstract: This paper tackles the task of event detection (ED), which involves identifying and categorizing events. We argue that arguments provide significant clues to this task, but they are either completely ignored or exploited in an indirect manner in existing detection approaches. In this work, we propose to exploit argument information explicitly for ED via supervised attention mechanisms. In specific, we systematically investigate the proposed model under the supervision of different attention strategies. Experimental results show that our approach advances state-of-the-arts and achieves the best F1 score on ACE 2005 dataset.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article
26 Apr 2018
TL;DR: This work investigates a convolutional neural network based on dependency trees to perform event detection and proposes a novel pooling method that relies on entity mentions to aggregate the convolution vectors.
Abstract: The current neural network models for event detection have only considered the sequential representation of sentences. Syntactic representations have not been explored in this area although they provide an effective mechanism to directly link words to their informative context for event detection in the sentences. In this work, we investigate a convolutional neural network based on dependency trees to perform event detection. We propose a novel pooling method that relies on entity mentions to aggregate the convolution vectors. The extensive experiments demonstrate the benefits of the dependency-based convolutional neural networks and the entity mention-based pooling method for event detection. We achieve the state-of-the-art performance on widely used datasets with both perfect and predicted entity mentions.

311 citations


Cites background from "Exploiting Argument Information to ..."

  • ...Syntactic dependency graphs represents sentences as directed trees with head-modifier dependency arcs between related words (Mcdonald and Pereira 2006; Koo, Carreras, and Collins 2008)....

    [...]

Proceedings ArticleDOI
01 Jan 2018
TL;DR: This paper proposed a joint multiple event extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information flow and attention-based graph convolution networks to model graph information.
Abstract: Event extraction is of practical utility in natural language processing. In the real world, it is a common phenomenon that multiple events existing in the same sentence, where extracting them are more difficult than extracting a single event. Previous works on modeling the associations between events by sequential modeling methods suffer a lot from the low efficiency in capturing very long-range dependencies. In this paper, we propose a novel Jointly Multiple Events Extraction (JMEE) framework to jointly extract multiple event triggers and arguments by introducing syntactic shortcut arcs to enhance information flow and attention-based graph convolution networks to model graph information. The experiment results demonstrate that our proposed framework achieves competitive results compared with state-of-the-art methods.

295 citations

Proceedings ArticleDOI
01 Jul 2019
TL;DR: This work proposes an event extraction model to overcome the roles overlap problem by separating the argument prediction in terms of roles, and proposes a method to automatically generate labeled data by editing prototypes and screen out generated samples by ranking the quality.
Abstract: Traditional approaches to the task of ACE event extraction usually depend on manually annotated data, which is often laborious to create and limited in size. Therefore, in addition to the difficulty of event extraction itself, insufficient training data hinders the learning process as well. To promote event extraction, we first propose an event extraction model to overcome the roles overlap problem by separating the argument prediction in terms of roles. Moreover, to address the problem of insufficient training data, we propose a method to automatically generate labeled data by editing prototypes and screen out generated samples by ranking the quality. Experiments on the ACE2005 dataset demonstrate that our extraction model can surpass most existing extraction methods. Besides, incorporating our generation method exhibits further significant improvement. It obtains new state-of-the-art results on the event extraction task, including pushing the F1 score of trigger classification to 81.1%, and the F1 score of argument classification to 58.9%.

228 citations


Cites background or methods from "Exploiting Argument Information to ..."

  • ...While Liu et al. (2016, 2017) manages to mine additional events from the frames in FrameNet....

    [...]

  • ..., 2016) improves extraction with additionally events automatically detected from FrameNet, while ANNAugATT (Liu et al., 2017) exploits argument information via the supervised attention mechanisms to improve the performance further....

    [...]

  • ...ANN-FN (Liu et al., 2016) improves extraction with additionally events automatically detected from FrameNet, while ANNAugATT (Liu et al., 2017) exploits argument information via the supervised attention mechanisms to improve the performance further....

    [...]

Journal ArticleDOI
TL;DR: A language-independent neural network is developed to capture both sequence and chunk information from specific contexts and use them to train an event detector for multiple languages without any manually encoded features.
Abstract: Event detection remains a challenge because of the difficulty of encoding the word semantics in various contexts.Previous approaches have heavily depended on language-specific knowledge and pre-existing natural language processing tools.However, not all languages have such resources and tools available compared with English language.A more promising approach is to automatically learn effective features from data, without relying on language-specific resources.In this study, we develop a language-independent neural network tocapture both sequence and chunk information from specific contexts and use them to train an event detector for multiple languages without any manually encoded features.Experiments show that our approach can achieve robust, efficient and accurate results for various languages.In the ACE 2005 English event detection task, our approach achieved a 73.4% F-score with an average of 3.0% absolute improvement compared with state-of-the-art.Additionally, our experimental results are competitive for Chinese and Spanish.

178 citations

Posted Content
TL;DR: An introductory summary of the attention mechanism in different NLP problems is conducted, aiming to provide basic knowledge on this widely used method, to discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.
Abstract: First derived from human intuition, later adapted to machine translation for automatic token alignment, attention mechanism, a simple method that can be used for encoding sequence data based on the importance score each element is assigned, has been widely applied to and attained significant improvement in various tasks in natural language processing, including sentiment classification, text summarization, question answering, dependency parsing, etc. In this paper, we survey through recent works and conduct an introductory summary of the attention mechanism in different NLP problems, aiming to provide our readers with basic knowledge on this widely used method, discuss its different variants for different tasks, explore its association with other techniques in machine learning, and examine methods for evaluating its performance.

135 citations


Cites methods from "Exploiting Argument Information to ..."

  • ...paring the attention distribution with the gold alignment data, and quantified using alignment error rate (AER). Similarly, Liu et al. proposed a method to manually construct ”gold attention vectors” (Liu et al. 2017) by first identifying labelled key words within a sentence and then conducting post-processing procedures such as smoothing and normalization, given abundant well-annotated data. For example, for the s...

    [...]

  • ...Similarly, Liu et al. proposed a method to manually construct ”gold attention vectors” (Liu et al. 2017) by first identifying labelled key words within a sentence and then conducting post-processing procedures such as smoothing and normalization, given abundant well-annotated data....

    [...]

References
More filters
Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations

Proceedings ArticleDOI
Yoon Kim1
25 Aug 2014
TL;DR: The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification, and are proposed to allow for the use of both task-specific and static vectors.
Abstract: We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

9,776 citations


"Exploiting Argument Information to ..." refers background in this paper

  • ...Regularization is implemented by a dropout (Kim, 2014; Hinton et al., 2012) and L2 norm....

    [...]

Posted Content
Yoon Kim1
TL;DR: In this article, CNNs are trained on top of pre-trained word vectors for sentence-level classification tasks and a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks.
Abstract: We report on a series of experiments with convolutional neural networks (CNN) trained on top of pre-trained word vectors for sentence-level classification tasks. We show that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks. Learning task-specific vectors through fine-tuning offers further gains in performance. We additionally propose a simple modification to the architecture to allow for the use of both task-specific and static vectors. The CNN models discussed herein improve upon the state of the art on 4 out of 7 tasks, which include sentiment analysis and question classification.

7,826 citations

Posted Content
TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.
Abstract: When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

6,899 citations


"Exploiting Argument Information to ..." refers background in this paper

  • ...Regularization is implemented by a dropout (Kim, 2014; Hinton et al., 2012) and L2 norm....

    [...]

Journal ArticleDOI
TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

6,832 citations