Distant supervision for relation extraction without labeled data

doi:10.3115/1690219.1690287

Home
/
Papers
/
Distant supervision for relation extraction without labeled data

Proceedings Article•DOI•

Distant supervision for relation extraction without labeled data

Mike D. Mintz¹, Steven Bills¹, Rion Snow¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

02 Aug 2009-pp 1003-1011

TL;DR: This work investigates an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size.

read less

Abstract: Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACE-style algorithms, and allowing the use of corpora of any size. Our experiments use Freebase, a large semantic database of several thousand relations, to provide distant supervision. For each pair of entities that appears in some Freebase relation, we find all sentences containing those entities in a large unlabeled corpus and extract textual features to train a relation classifier. Our algorithm combines the advantages of supervised IE (combining 400,000 noisy pattern features in a probabilistic classifier) and unsupervised IE (extracting large numbers of relations from large corpora of any domain). Our model is able to extract 10,000 instances of 102 relations at a precision of 67.6%. We also analyze feature performance, showing that syntactic parse features are particularly helpful for relations that are ambiguous or lexically distant in their expression.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Active Learning Literature Survey

[...]

Burr Settles

01 Jan 2009

TL;DR: This report provides a general introduction to active learning and a survey of the literature, including a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date.

...read moreread less

Abstract: The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. An active learner may pose queries, usually in the form of unlabeled data instances to be labeled by an oracle (e.g., a human annotator). Active learning is well-motivated in many modern machine learning problems, where unlabeled data may be abundant or easily obtained, but labels are difficult, time-consuming, or expensive to obtain. This report provides a general introduction to active learning and a survey of the literature. This includes a discussion of the scenarios in which queries can be formulated, and an overview of the query strategy frameworks proposed in the literature to date. An analysis of the empirical and theoretical evidence for successful active learning, a summary of problem setting variants and practical issues, and a discussion of related topics in machine learning research are also presented.

...read moreread less

5,227 citations

Cites background from "Distant supervision for relation ex..."

..., 2008) and also to evaluate learning algorithms on data for which no gold-standard labelings exist (Mintz et al., 2009; Carlson et al., 2010)....
[...]
...Such approaches have been used to produce gold-standard quality training sets (Snow et al., 2008) and also to evaluate learning algorithms on data for which no gold-standard labelings exist (Mintz et al., 2009; Carlson et al., 2010)....
[...]

Proceedings Article•

Knowledge graph embedding by translating on hyperplanes

[...]

Zhen Wang¹, Jianwen Zhang², Jianlin Feng¹, Zheng Chen²•Institutions (2)

Sun Yat-sen University¹, Microsoft²

27 Jul 2014

TL;DR: This paper proposes TransH which models a relation as a hyperplane together with a translation operation on it and can well preserve the above mapping properties of relations with almost the same model complexity of TransE.

...read moreread less

Abstract: We deal with embedding a large scale knowledge graph composed of entities and relations into a continuous vector space. TransE is a promising method proposed recently, which is very efficient while achieving state-of-the-art predictive performance. We discuss some mapping properties of relations which should be considered in embedding, such as reflexive, one-to-many, many-to-one, and many-to-many. We note that TransE does not do well in dealing with these properties. Some complex models are capable of preserving these mapping properties but sacrifice efficiency in the process. To make a good trade-off between model capacity and efficiency, in this paper we propose TransH which models a relation as a hyperplane together with a translation operation on it. In this way, we can well preserve the above mapping properties of relations with almost the same model complexity of TransE. Additionally, as a practical knowledge graph is often far from completed, how to construct negative examples to reduce false negative labels in training is very important. Utilizing the one-to-many/many-to-one mapping property of a relation, we propose a simple trick to reduce the possibility of false negative labeling. We conduct extensive experiments on link prediction, triplet classification and fact extraction on benchmark datasets like WordNet and Freebase. Experiments show TransH delivers significant improvements over TransE on predictive accuracy with comparable capability to scale up.

...read moreread less

2,835 citations

Cites methods from "Distant supervision for relation ex..."

...Most existing extracting methods (Mintz et al. 2009; Riedel, Yao, and McCallum 2010; Hoffmann et al. 2011; Surdeanu et al. 2012) distantly collect evidences from an external text corpus for a candidate fact, ignoring the capability of the knowledge graph itself to reason the new fact....
[...]

Proceedings Article•

Learning entity and relation embeddings for knowledge graph completion

[...]

Yankai Lin¹, Zhiyuan Liu¹, Maosong Sun¹, Yang Liu², Xuan Zhu² - Show less +1 more•Institutions (2)

Tsinghua University¹, Samsung²

25 Jan 2015

TL;DR: TransR is proposed to build entity and relation embeddings in separate entity space and relation spaces to build translations between projected entities and to evaluate the models on three tasks including link prediction, triple classification and relational fact extraction.

...read moreread less

Abstract: Knowledge graph completion aims to perform link prediction between entities. In this paper, we consider the approach of knowledge graph embeddings. Recently, models such as TransE and TransH build entity and relation embeddings by regarding a relation as translation from head entity to tail entity. We note that these models simply put both entities and relations within the same semantic space. In fact, an entity may have multiple aspects and various relations may focus on different aspects of entities, which makes a common space insufficient for modeling. In this paper, we propose TransR to build entity and relation embeddings in separate entity space and relation spaces. Afterwards, we learn embeddings by first projecting entities from entity space to corresponding relation space and then building translations between projected entities. In experiments, we evaluate our models on three tasks including link prediction, triple classification and relational fact extraction. Experimental results show significant and consistent improvements compared to state-of-the-art baselines including TransE and TransH. The source code of this paper can be obtained from https://github.com/mrlyk423/relation_extraction.

...read moreread less

2,823 citations

Cites methods from "Distant supervision for relation ex..."

...In experiments we will also compare with RESCAL, a collective matrix factorization model presented in (Nickel, Tresp, and Kriegel 2011; 2012)....
[...]

Journal Article•DOI•

Knowledge Graph Embedding: A Survey of Approaches and Applications

[...]

Quan Wang¹, Zhendong Mao¹, Bin Wang¹, Li Guo¹•Institutions (1)

Chinese Academy of Sciences¹

01 Dec 2017-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This article provides a systematic review of existing techniques of Knowledge graph embedding, including not only the state-of-the-arts but also those with latest trends, based on the type of information used in the embedding task.

...read moreread less

Abstract: Knowledge graph (KG) embedding is to embed components of a KG including entities and relations into continuous vector spaces, so as to simplify the manipulation while preserving the inherent structure of the KG. It can benefit a variety of downstream tasks such as KG completion and relation extraction, and hence has quickly gained massive attention. In this article, we provide a systematic review of existing techniques, including not only the state-of-the-arts but also those with latest trends. Particularly, we make the review based on the type of information used in the embedding task. Techniques that conduct embedding using only facts observed in the KG are first introduced. We describe the overall framework, specific model design, typical training procedures, as well as pros and cons of such techniques. After that, we discuss techniques that further incorporate additional information besides facts. We focus specifically on the use of entity types, relation paths, textual descriptions, and logical rules. Finally, we briefly introduce how KG embedding can be applied to and benefit a wide variety of downstream tasks such as KG completion, relation extraction, question answering, and so forth.

...read moreread less

1,905 citations

Cites background from "Distant supervision for relation ex..."

...Much research has tried to leverage KGs for this task, but usually as distant supervision to automatically generate labeled data [9], [125], [126], [127]....
[...]

Proceedings Article•DOI•

Knowledge vault: a web-scale approach to probabilistic knowledge fusion

[...]

Xin Dong¹, Evgeniy Gabrilovich¹, Geremy Heitz¹, Wilko Horn¹, Ni Lao¹, Kevin Murphy¹, Thomas Strohmann¹, Shaohua Sun¹, Wei Zhang¹ - Show less +5 more•Institutions (1)

Google¹

24 Aug 2014

TL;DR: The Knowledge Vault is a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories that computes calibrated probabilities of fact correctness.

...read moreread less

Abstract: Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Microsoft's Satori, and Google's Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous approaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilistic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods.

...read moreread less

1,657 citations

Cites methods from "Distant supervision for relation ex..."

...Next, we train relation extractors using distant supervision [29]....
[...]
...The features that we use are similar to those described in [29]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Freebase: a collaboratively created graph database for structuring human knowledge

[...]

Kurt Bollacker, Colin Evans, Praveen Paritosh, Tim Sturge, Jamie Taylor - Show less +1 more

09 Jun 2008

TL;DR: MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.

...read moreread less

Abstract: Freebase is a practical, scalable tuple database used to structure general human knowledge. The data in Freebase is collaboratively created, structured, and maintained. Freebase currently contains more than 125,000,000 tuples, more than 4000 types, and more than 7000 properties. Public read/write access to Freebase is allowed through an HTTP-based graph-query API using the Metaweb Query Language (MQL) as a data query and manipulation language. MQL provides an easy-to-use object-oriented interface to the tuple data in Freebase and is designed to facilitate the creation of collaborative, Web-based data-oriented applications.

...read moreread less

4,813 citations

"Distant supervision for relation ex..." refers methods in this paper

...Our algorithm uses Freebase (Bollacker et al., 2008), a large semantic database, to provide distant supervision for relation extraction....
[...]

Proceedings Article•DOI•

Automatic acquisition of hyponyms from large text corpora

[...]

Marti A. Hearst¹•Institutions (1)

University of California, Berkeley¹

23 Aug 1992

TL;DR: A set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest are identified.

...read moreread less

Abstract: We describe a method for the automatic acquisition of the hyponymy lexical relation from unrestricted text. Two goals motivate the approach: (i) avoidance of the need for pre-encoded knowledge and (ii) applicability across a wide range of text. We identify a set of lexico-syntactic patterns that are easily recognizable, that occur frequently and across text genre boundaries, and that indisputably indicate the lexical relation of interest. We describe a method for discovering these patterns and suggest that other lexical relations will also be acquirable in this way. A subset of the acquisition algorithm is implemented and the results are used to augment and critique the structure of a large hand-built thesaurus. Extensions and applications to areas such as information retrieval are suggested.

...read moreread less

3,550 citations

"Distant supervision for relation ex..." refers background in this paper

...…a wide variety of lexical, syntactic, and semantic features, and use supervised classifiers to label the relation mention holding between a given pair of entities in a test set sentence, optionally combining relation men- tions (Zhou et al., 2005; Zhou et al., 2007; Surdeanu and Ciaramita, 2007)....
[...]

Proceedings Article•DOI•

Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling

[...]

Jenny Rose Finkel¹, Trond Grenager¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

25 Jun 2005

TL;DR: By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference.

...read moreread less

Abstract: Most current statistical natural language processing models use only local features so as to permit dynamic programming in inference, but this makes them unable to fully account for the long distance structure that is prevalent in language use. We show how to solve this dilemma with Gibbs sampling, a simple Monte Carlo method used to perform approximate inference in factored probabilistic models. By using simulated annealing in place of Viterbi decoding in sequence models such as HMMs, CMMs, and CRFs, it is possible to incorporate non-local structure while preserving tractable inference. We use this technique to augment an existing CRF-based information extraction system with long-distance dependency models, enforcing label consistency and extraction template consistency constraints. This technique results in an error reduction of up to 9% over state-of-the-art systems on two established information extraction tasks.

...read moreread less

3,209 citations

"Distant supervision for relation ex..." refers methods in this paper

...We perform named entity tagging using the Stanford four-class named entity tagger (Finkel et al., 2005)....
[...]

Proceedings Article•DOI•

Cheap and Fast -- But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks

[...]

Rion Snow¹, Brendan O'Connor, Dan Jurafsky¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

25 Oct 2008

TL;DR: This work explores the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web, and proposes a technique for bias correction that significantly improves annotation quality on two tasks.

...read moreread less

Abstract: Human linguistic annotation is crucial for many natural language processing tasks but can be expensive and time-consuming. We explore the use of Amazon's Mechanical Turk system, a significantly cheaper and faster method for collecting annotations from a broad base of paid non-expert contributors over the Web. We investigate five tasks: affect recognition, word similarity, recognizing textual entailment, event temporal ordering, and word sense disambiguation. For all five, we show high agreement between Mechanical Turk non-expert annotations and existing gold standard labels provided by expert labelers. For the task of affect recognition, we also show that using non-expert labels for training machine learning algorithms can be as effective as using gold standard annotations from experts. We propose a technique for bias correction that significantly improves annotation quality on two tasks. We conclude that many large labeling tasks can be effectively designed and carried out in this method at a fraction of the usual expense.

...read moreread less

2,237 citations

"Distant supervision for relation ex..." refers methods in this paper

...Human evaluation was performed by evaluators on Amazon’s Mechanical Turk service, shown to be effective for natural language annotation in Snow et al. (2008)....
[...]

Proceedings Article•

Open information extraction from the web

[...]

Michele Banko¹, Michael Cafarella¹, Stephen Soderland¹, Matt Broadhead¹, Oren Etzioni¹ - Show less +1 more•Institutions (1)

University of Washington¹

06 Jan 2007

TL;DR: Open Information Extraction (OIE) as mentioned in this paper is a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input.

...read moreread less

Abstract: Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.

...read moreread less

1,574 citations