Convolution Neural Network for Relation Extraction

doi:10.1007/978-3-642-53917-6_21

Home
/
Papers
/
Convolution Neural Network for Relation Extraction

Book Chapter•DOI•

Convolution Neural Network for Relation Extraction

Chunyang Liu, Wenbo Sun¹, Wenhan Chao¹, Wanxiang Che²•Institutions (2)

Beihang University¹, Harbin Institute of Technology²

14 Dec 2013-pp 231-242

TL;DR: This paper proposes a novel convolution network, incorporating lexical features, applied to Relation Extraction, and compares the Convolution Neural Network CNN on relation extraction with the state-of-art tree kernel approach, including Typed Dependency Path Kernel and Shortest Dependency path Kernel and Context-Sensitive tree kernel.

read less

Abstract: Deep Neural Network has been applied to many Natural Language Processing tasks. Instead of building hand-craft features, DNN builds features by automatic learning, fitting different domains well. In this paper, we propose a novel convolution network, incorporating lexical features, applied to Relation Extraction. Since many current deep neural networks use word embedding by word table, which, however, neglects semantic meaning among words, we import a new coding method, which coding input words by synonym dictionary to integrate semantic knowledge into the neural network. We compared our Convolution Neural Network CNN on relation extraction with the state-of-art tree kernel approach, including Typed Dependency Path Kernel and Shortest Dependency Path Kernel and Context-Sensitive tree kernel, resulting in a 9% improvement competitive performance on ACE2005 data set. Also, we compared the synonym coding with the one-hot coding, and our approach got 1.6% improvement. Moreover, we also tried other coding method, such as hypernym coding, and give some discussion according the result.

...read moreread less

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Relation Extraction: Perspective from Convolutional Neural Networks

[...]

Thien Huu Nguyen¹, Ralph Grishman¹•Institutions (1)

New York University¹

01 Jun 2015

TL;DR: This work introduces a convolutional neural network for relation extraction that automatically learns features from sentences and minimizes the dependence on external toolkits and resources.

...read moreread less

Abstract: Up to now, relation extraction systems have made extensive use of features generated by linguistic analysis modules. Errors in these features lead to errors of relation detection and classification. In this work, we depart from these traditional approaches with complicated feature engineering by introducing a convolutional neural network for relation extraction that automatically learns features from sentences and minimizes the dependence on external toolkits and resources. Our model takes advantages of multiple window sizes for filters and pre-trained word embeddings as an initializer on a non-static architecture to improve the performance. We emphasize the relation extraction problem with an unbalanced corpus. The experimental results show that our system significantly outperforms not only the best baseline systems for relation extraction but also the state-of-the-art systems for relation classification.

...read moreread less

483 citations

Cites background or methods from "Convolution Neural Network for Rela..."

...For relation classification and extraction, there are two very recent works on CNNs for relation classification (Liu et al., 2013)2 and (Zeng et al....
[...]
...…mention into the representation, for each word xi, its relative distances to the two entity heads i−i1 and i−i2 are also mapped into real-value vectors di1 and di2 respectively using a position embedding table D (initialized randomly) (Collobert et al., 2011; Liu et al., 2013; Zeng et al., 2014)....
[...]
...This demonstrates the advantages of the models with multiple window sizes over the single window size models in Liu et al. (2013) and Zeng et al. (2014)....
[...]
...the other words in the relation mention into the representation, for each word xi, its relative distances to the two entity heads i−i1 and i−i2 are also mapped into real-value vectors di1 and di2 respectively using a position embedding table D (initialized randomly) (Collobert et al., 2011; Liu et al., 2013; Zeng et al., 2014)....
[...]
...…embeddings, once initialized by some “universal” embeddings, are allowed to vary during the optimization process to reach an 2The title of the paper (Liu et al., 2013) on relation extraction is misleading since the authors actually do relation classification, according to the experimental…...
[...]

Proceedings Article•DOI•

Learning from Context or Names? An Empirical Study on Neural Relation Extraction

[...]

Hao Peng, Tianyu Gao¹, Xu Han², Yankai Lin³, Peng Li⁴, Zhiyuan Liu², Maosong Sun², Jie Zhou³ - Show less +4 more•Institutions (4)

Princeton University¹, Tsinghua University², Tencent³, University of California, Santa Barbara⁴

01 Nov 2020

TL;DR: This article proposed an entity-masked contrastive pre-training framework for relation extraction to gain a deeper understanding on both textual context and type information while avoiding rote memorization of entities or use of superficial cues in mentions.

...read moreread less

Abstract: Neural models have achieved remarkable success on relation extraction (RE) benchmarks. However, there is no clear understanding what information in text affects existing RE models to make decisions and how to further improve the performance of these models. To this end, we empirically study the effect of two main information sources in text: textual context and entity mentions (names). We find that (i) while context is the main source to support the predictions, RE models also heavily rely on the information from entity mentions, most of which is type information, and (ii) existing datasets may leak shallow heuristics via entity mentions and thus contribute to the high performance on RE benchmarks. Based on the analyses, we propose an entity-masked contrastive pre-training framework for RE to gain a deeper understanding on both textual context and type information while avoiding rote memorization of entities or use of superficial cues in mentions. We carry out extensive experiments to support our views, and show that our framework can improve the effectiveness and robustness of neural models in different RE scenarios. All the code and datasets are released at https://github.com/thunlp/RE-Context-or-Names.

...read moreread less

112 citations

Posted Content•

A Survey of Deep Learning Methods for Relation Extraction.

[...]

Shantanu Kumar

10 May 2017-arXiv: Computation and Language

TL;DR: This review compares the contributions and pitfalls of the various DL models that have been used for Relation Extraction to help guide the path ahead.

...read moreread less

Abstract: Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead.

...read moreread less

84 citations

Cites background or methods from "Convolution Neural Network for Rela..."

...vised domain for relation extraction which built upon the works of Liu et al. (2013) and Zeng et al....
[...]
...This work was one of the last works in supervised domain for relation extraction which built upon the works of Liu et al. (2013) and Zeng et al. (2014)....
[...]
...vised domain for relation extraction which built upon the works of Liu et al. (2013) and Zeng et al. (2014). The model completely gets rid of exterior lexical features to enrich the representation of the input sentence and lets the CNN learn the required features itself....
[...]

Proceedings Article•DOI•

Chinese Relation Extraction with Multi-Grained Information and External Linguistic Knowledge.

[...]

Ziran Li¹, Ning Ding¹, Zhiyuan Liu¹, Hai-Tao Zheng¹, Ying Shen² - Show less +1 more•Institutions (2)

Tsinghua University¹, Peking University²

01 Jul 2019

TL;DR: A multi-grained lattice framework for Chinese relation extraction is proposed, which incorporates word-level information into character sequence inputs so that segmentation errors can be avoided and model multiple senses of polysemous words with the help of external linguistic knowledge to alleviate polysemy ambiguity.

...read moreread less

Abstract: Chinese relation extraction is conducted using neural networks with either character-based or word-based inputs, and most existing methods typically suffer from segmentation errors and ambiguity of polysemy. To address the issues, we propose a multi-grained lattice framework (MG lattice) for Chinese relation extraction to take advantage of multi-grained language information and external linguistic knowledge. In this framework, (1) we incorporate word-level information into character sequence inputs so that segmentation errors can be avoided. (2) We also model multiple senses of polysemous words with the help of external linguistic knowledge, so as to alleviate polysemy ambiguity. Experiments on three real-world datasets in distinct domains show consistent and significant superiority and robustness of our model, as compared with other baselines. We will release the source code of this paper in the future.

...read moreread less

65 citations

Cites background from "Convolution Neural Network for Rela..."

...Recent developments in deep learning have heightened the interest for neural relation extractions (NRE), which attempt to use neural networks to automatically learn semantic features (Liu et al., 2013; Zeng et al., 2014, 2015; Lin et al., 2016; Zhou et al., 2016; Jiang et al., 2016)....
[...]
...As a pioneer, (Liu et al., 2013) proposed a simple CNN RE model and it is regarded as one seminal work that uses a neural network to automatically learn features....
[...]

Proceedings Article•DOI•

Continual Relation Learning via Episodic Memory Activation and Reconsolidation

[...]

Xu Han¹, Dai Yi, Tianyu Gao¹, Yankai Lin², Zhiyuan Liu¹, Peng Li³, Maosong Sun¹, Jie Zhou⁴ - Show less +4 more•Institutions (4)

Tsinghua University¹, Tencent², Dalian University of Technology³, Shenzhen University⁴

01 Jul 2020

TL;DR: Inspired by the mechanism in human long-term memory formation, EMAR is introduced and it is shown that EMAR could get rid of catastrophically forgetting old relations and outperform the state-of-the-art continual learning models.

...read moreread less

Abstract: Continual relation learning aims to continually train a model on new data to learn incessantly emerging novel relations while avoiding catastrophically forgetting old relations. Some pioneering work has proved that storing a handful of historical relation examples in episodic memory and replaying them in subsequent training is an effective solution for such a challenging problem. However, these memory-based methods usually suffer from overfitting the few memorized examples of old relations, which may gradually cause inevitable confusion among existing relations. Inspired by the mechanism in human long-term memory formation, we introduce episodic memory activation and reconsolidation (EMAR) to continual relation learning. Every time neural models are activated to learn both new and memorized data, EMAR utilizes relation prototypes for memory reconsolidation exercise to keep a stable understanding of old relations. The experimental results show that EMAR could get rid of catastrophically forgetting old relations and outperform the state-of-the-art continual learning models.

...read moreread less

63 citations

Cites methods from "Convolution Neural Network for Rela..."

...The conventional RE work, including both supervised RE models (Zelenko et al., 2003; Zhou et al., 2005; Gormley et al., 2015; Socher et al., 2012; Liu et al., 2013; Zeng et al., 2014; Nguyen and Grishman, 2015; dos Santos et al., 2015; Xu et al., 2015; Liu et al., 2015; Miwa and Bansal, 2016) and distantly supervised models (Bunescu and Mooney, 2007; Mintz et al....
[...]
...The conventional RE work, including both supervised RE models (Zelenko et al., 2003; Zhou et al., 2005; Gormley et al., 2015; Socher et al., 2012; Liu et al., 2013; Zeng et al., 2014; Nguyen and Grishman, 2015; dos Santos et al., 2015; Xu et al., 2015; Liu et al., 2015; Miwa and Bansal, 2016) and…...
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

LIBSVM: A library for support vector machines

[...]

Chih-Chung Chang¹, Chih-Jen Lin¹•Institutions (1)

National Taiwan University¹

06 May 2011-ACM Transactions on Intelligent Systems and Technology

TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

...read moreread less

40,826 citations

Journal Article•DOI•

WordNet : an electronic lexical database

[...]

Christiane Fellbaum

01 Sep 2000-Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Abstract: Part 1 The lexical database: nouns in WordNet, George A. Miller modifiers in WordNet, Katherine J. Miller a semantic network of English verbs, Christiane Fellbaum design and implementation of the WordNet lexical database and searching software, Randee I. Tengi. Part 2: automated discovery of WordNet relations, Marti A. Hearst representing verb alterations in WordNet, Karen T. Kohl et al the formalization of WordNet by methods of relational concept analysis, Uta E. Priss. Part 3 Applications of WordNet: building semantic concordances, Shari Landes et al performance and confidence in a semantic annotation task, Christiane Fellbaum et al WordNet and class-based probabilities, Philip Resnik combining local context and WordNet similarity for word sense identification, Claudia Leacock and Martin Chodorow using WordNet for text retrieval, Ellen M. Voorhees lexical chains as representations of context for the detection and correction of malapropisms, Graeme Hirst and David St-Onge temporal indexing through lexical chaining, Reem Al-Halimi and Rick Kazman COLOR-X - using knowledge from WordNet for conceptual modelling, J.F.M. Burg and R.P. van de Riet knowledge processing on an extended WordNet, Sanda M. Harabagiu and Dan I Moldovan appendix - obtaining and using WordNet.

...read moreread less

13,049 citations

Journal Article•DOI•

A neural probabilistic language model

[...]

Yoshua Bengio¹, Réjean Ducharme¹, Pascal Vincent¹, Christian Janvin¹•Institutions (1)

Université de Montréal¹

01 Mar 2003-Journal of Machine Learning Research

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.

...read moreread less

Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words in a language. This is intrinsically difficult because of the curse of dimensionality: a word sequence on which the model will be tested is likely to be different from all the word sequences seen during training. Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set. We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences. The model learns simultaneously (1) a distributed representation for each word along with (2) the probability function for word sequences, expressed in terms of these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar (in the sense of having a nearby representation) to words forming an already seen sentence. Training such large models (with millions of parameters) within a reasonable time is itself a significant challenge. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach significantly improves on state-of-the-art n-gram models, and that the proposed approach allows to take advantage of longer contexts.

...read moreread less

6,832 citations

Proceedings Article•DOI•

A unified architecture for natural language processing: deep neural networks with multitask learning

[...]

Ronan Collobert¹, Jason Weston¹•Institutions (1)

Princeton University¹

05 Jul 2008

TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.

...read moreread less

Abstract: We describe a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense (grammatically and semantically) using a language model. The entire network is trained jointly on all these tasks using weight-sharing, an instance of multitask learning. All the tasks use labeled data except the language model which is learnt from unlabeled text and represents a novel form of semi-supervised learning for the shared tasks. We show how both multitask learning and semi-supervised learning improve the generalization of the shared tasks, resulting in state-of-the-art-performance.

...read moreread less

5,759 citations

Book•

Convolutional networks for images, speech, and time series

[...]