scispace - formally typeset
Search or ask a question
Book ChapterDOI

Convolution Neural Network for Relation Extraction

TL;DR: This paper proposes a novel convolution network, incorporating lexical features, applied to Relation Extraction, and compares the Convolution Neural Network CNN on relation extraction with the state-of-art tree kernel approach, including Typed Dependency Path Kernel and Shortest Dependency path Kernel and Context-Sensitive tree kernel.
Abstract: Deep Neural Network has been applied to many Natural Language Processing tasks. Instead of building hand-craft features, DNN builds features by automatic learning, fitting different domains well. In this paper, we propose a novel convolution network, incorporating lexical features, applied to Relation Extraction. Since many current deep neural networks use word embedding by word table, which, however, neglects semantic meaning among words, we import a new coding method, which coding input words by synonym dictionary to integrate semantic knowledge into the neural network. We compared our Convolution Neural Network CNN on relation extraction with the state-of-art tree kernel approach, including Typed Dependency Path Kernel and Shortest Dependency Path Kernel and Context-Sensitive tree kernel, resulting in a 9% improvement competitive performance on ACE2005 data set. Also, we compared the synonym coding with the one-hot coding, and our approach got 1.6% improvement. Moreover, we also tried other coding method, such as hypernym coding, and give some discussion according the result.
Citations
More filters
Proceedings ArticleDOI
01 Jun 2015
TL;DR: This work introduces a convolutional neural network for relation extraction that automatically learns features from sentences and minimizes the dependence on external toolkits and resources.
Abstract: Up to now, relation extraction systems have made extensive use of features generated by linguistic analysis modules. Errors in these features lead to errors of relation detection and classification. In this work, we depart from these traditional approaches with complicated feature engineering by introducing a convolutional neural network for relation extraction that automatically learns features from sentences and minimizes the dependence on external toolkits and resources. Our model takes advantages of multiple window sizes for filters and pre-trained word embeddings as an initializer on a non-static architecture to improve the performance. We emphasize the relation extraction problem with an unbalanced corpus. The experimental results show that our system significantly outperforms not only the best baseline systems for relation extraction but also the state-of-the-art systems for relation classification.

483 citations


Cites background or methods from "Convolution Neural Network for Rela..."

  • ...For relation classification and extraction, there are two very recent works on CNNs for relation classification (Liu et al., 2013)2 and (Zeng et al....

    [...]

  • ...…mention into the representation, for each word xi, its relative distances to the two entity heads i−i1 and i−i2 are also mapped into real-value vectors di1 and di2 respectively using a position embedding table D (initialized randomly) (Collobert et al., 2011; Liu et al., 2013; Zeng et al., 2014)....

    [...]

  • ...This demonstrates the advantages of the models with multiple window sizes over the single window size models in Liu et al. (2013) and Zeng et al. (2014)....

    [...]

  • ...the other words in the relation mention into the representation, for each word xi, its relative distances to the two entity heads i−i1 and i−i2 are also mapped into real-value vectors di1 and di2 respectively using a position embedding table D (initialized randomly) (Collobert et al., 2011; Liu et al., 2013; Zeng et al., 2014)....

    [...]

  • ...…embeddings, once initialized by some “universal” embeddings, are allowed to vary during the optimization process to reach an 2The title of the paper (Liu et al., 2013) on relation extraction is misleading since the authors actually do relation classification, according to the experimental…...

    [...]

Proceedings ArticleDOI
01 Nov 2020
TL;DR: This article proposed an entity-masked contrastive pre-training framework for relation extraction to gain a deeper understanding on both textual context and type information while avoiding rote memorization of entities or use of superficial cues in mentions.
Abstract: Neural models have achieved remarkable success on relation extraction (RE) benchmarks. However, there is no clear understanding what information in text affects existing RE models to make decisions and how to further improve the performance of these models. To this end, we empirically study the effect of two main information sources in text: textual context and entity mentions (names). We find that (i) while context is the main source to support the predictions, RE models also heavily rely on the information from entity mentions, most of which is type information, and (ii) existing datasets may leak shallow heuristics via entity mentions and thus contribute to the high performance on RE benchmarks. Based on the analyses, we propose an entity-masked contrastive pre-training framework for RE to gain a deeper understanding on both textual context and type information while avoiding rote memorization of entities or use of superficial cues in mentions. We carry out extensive experiments to support our views, and show that our framework can improve the effectiveness and robustness of neural models in different RE scenarios. All the code and datasets are released at https://github.com/thunlp/RE-Context-or-Names.

112 citations

Posted Content
TL;DR: This review compares the contributions and pitfalls of the various DL models that have been used for Relation Extraction to help guide the path ahead.
Abstract: Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead.

84 citations


Cites background or methods from "Convolution Neural Network for Rela..."

  • ...vised domain for relation extraction which built upon the works of Liu et al. (2013) and Zeng et al....

    [...]

  • ...This work was one of the last works in supervised domain for relation extraction which built upon the works of Liu et al. (2013) and Zeng et al. (2014)....

    [...]

  • ...vised domain for relation extraction which built upon the works of Liu et al. (2013) and Zeng et al. (2014). The model completely gets rid of exterior lexical features to enrich the representation of the input sentence and lets the CNN learn the required features itself....

    [...]

Proceedings ArticleDOI
Ziran Li1, Ning Ding1, Zhiyuan Liu1, Hai-Tao Zheng1, Ying Shen2 
01 Jul 2019
TL;DR: A multi-grained lattice framework for Chinese relation extraction is proposed, which incorporates word-level information into character sequence inputs so that segmentation errors can be avoided and model multiple senses of polysemous words with the help of external linguistic knowledge to alleviate polysemy ambiguity.
Abstract: Chinese relation extraction is conducted using neural networks with either character-based or word-based inputs, and most existing methods typically suffer from segmentation errors and ambiguity of polysemy. To address the issues, we propose a multi-grained lattice framework (MG lattice) for Chinese relation extraction to take advantage of multi-grained language information and external linguistic knowledge. In this framework, (1) we incorporate word-level information into character sequence inputs so that segmentation errors can be avoided. (2) We also model multiple senses of polysemous words with the help of external linguistic knowledge, so as to alleviate polysemy ambiguity. Experiments on three real-world datasets in distinct domains show consistent and significant superiority and robustness of our model, as compared with other baselines. We will release the source code of this paper in the future.

65 citations


Cites background from "Convolution Neural Network for Rela..."

  • ...Recent developments in deep learning have heightened the interest for neural relation extractions (NRE), which attempt to use neural networks to automatically learn semantic features (Liu et al., 2013; Zeng et al., 2014, 2015; Lin et al., 2016; Zhou et al., 2016; Jiang et al., 2016)....

    [...]

  • ...As a pioneer, (Liu et al., 2013) proposed a simple CNN RE model and it is regarded as one seminal work that uses a neural network to automatically learn features....

    [...]

Proceedings ArticleDOI
01 Jul 2020
TL;DR: Inspired by the mechanism in human long-term memory formation, EMAR is introduced and it is shown that EMAR could get rid of catastrophically forgetting old relations and outperform the state-of-the-art continual learning models.
Abstract: Continual relation learning aims to continually train a model on new data to learn incessantly emerging novel relations while avoiding catastrophically forgetting old relations. Some pioneering work has proved that storing a handful of historical relation examples in episodic memory and replaying them in subsequent training is an effective solution for such a challenging problem. However, these memory-based methods usually suffer from overfitting the few memorized examples of old relations, which may gradually cause inevitable confusion among existing relations. Inspired by the mechanism in human long-term memory formation, we introduce episodic memory activation and reconsolidation (EMAR) to continual relation learning. Every time neural models are activated to learn both new and memorized data, EMAR utilizes relation prototypes for memory reconsolidation exercise to keep a stable understanding of old relations. The experimental results show that EMAR could get rid of catastrophically forgetting old relations and outperform the state-of-the-art continual learning models.

63 citations


Cites methods from "Convolution Neural Network for Rela..."

  • ...The conventional RE work, including both supervised RE models (Zelenko et al., 2003; Zhou et al., 2005; Gormley et al., 2015; Socher et al., 2012; Liu et al., 2013; Zeng et al., 2014; Nguyen and Grishman, 2015; dos Santos et al., 2015; Xu et al., 2015; Liu et al., 2015; Miwa and Bansal, 2016) and distantly supervised models (Bunescu and Mooney, 2007; Mintz et al....

    [...]

  • ...The conventional RE work, including both supervised RE models (Zelenko et al., 2003; Zhou et al., 2005; Gormley et al., 2015; Socher et al., 2012; Liu et al., 2013; Zeng et al., 2014; Nguyen and Grishman, 2015; dos Santos et al., 2015; Xu et al., 2015; Liu et al., 2015; Miwa and Bansal, 2016) and…...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations

Journal ArticleDOI
01 Sep 2000-Language
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Abstract: Part 1 The lexical database: nouns in WordNet, George A. Miller modifiers in WordNet, Katherine J. Miller a semantic network of English verbs, Christiane Fellbaum design and implementation of the WordNet lexical database and searching software, Randee I. Tengi. Part 2: automated discovery of WordNet relations, Marti A. Hearst representing verb alterations in WordNet, Karen T. Kohl et al the formalization of WordNet by methods of relational concept analysis, Uta E. Priss. Part 3 Applications of WordNet: building semantic concordances, Shari Landes et al performance and confidence in a semantic annotation task, Christiane Fellbaum et al WordNet and class-based probabilities, Philip Resnik combining local context and WordNet similarity for word sense identification, Claudia Leacock and Martin Chodorow using WordNet for text retrieval, Ellen M. Voorhees lexical chains as representations of context for the detection and correction of malapropisms, Graeme Hirst and David St-Onge temporal indexing through lexical chaining, Reem Al-Halimi and Rick Kazman COLOR-X - using knowledge from WordNet for conceptual modelling, J.F.M. Burg and R.P. van de Riet knowledge processing on an extended WordNet, Sanda M. Harabagiu and Dan I Moldovan appendix - obtaining and using WordNet.

13,049 citations

Journal ArticleDOI
TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

6,832 citations

Proceedings ArticleDOI
05 Jul 2008
TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.
Abstract: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

5,759 citations