Home
/
Topics
/
Knowledge extraction

Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1972
1970

Papers

PDF

Open Access

More filters

Proceedings Article•

Mining associations in text in the presence of background knowledge

[...]

Ronen Feldman¹, Haym Hirsh²•Institutions (2)

Bar-Ilan University¹, Rutgers University²

02 Aug 1996

TL;DR: FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process.

...read moreread less

Abstract: This paper describes the FACT system for knowledge discovery from text. It discovers associations - patterns of co-occurrence -amongst keywords labeling the items in a collection of textual documents. In addition, FACT is able to use background knowledge about the keywords labeling the documents in its discovery process. FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process. Execution of a knowledge-discovery query is structured so that these background-knowledge constraints can be exploited in the search for possible results. Finally, rather than requiring a user to specify an explicit query expression in the knowledge-discovery query language, FACT presents the user with a simple-to-use graphical interface to the query language, with the language providing a well-defined semantics for the discovery actions performed by a user through the interface.

...read moreread less

131 citations

Journal Article•DOI•

Protein family classification and functional annotation

[...]

Cathy H. Wu¹, Hongzhan Huang¹, Lai-Su L. Yeh¹, Winona C. Barker¹•Institutions (1)

Georgetown University Medical Center¹

01 Feb 2003-Computational Biology and Chemistry

TL;DR: The approach to protein functional annotation with case studies and examines common identification errors is described and it is illustrated that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.

...read moreread less

131 citations

Book•

Handbook of Research on Text and Web Mining Technologies

[...]

Min Song, Yi-Fang Brook Wu

30 Sep 2008

TL;DR: This compendium of pioneering studies from leading experts is essential to academic reference collections and introduces researchers and students to cutting-edge techniques for gaining knowledge discovery from unstructured text.

...read moreread less

Abstract: The massive daily overflow of electronic data to information seekers creates the need for better ways to digest and organize this information to make it understandable and useful. Text mining, a variation of data mining, extracts desired information from large, unstructured text collections stored in electronic forms. The Handbook of Research on Text and Web Mining Technologies is the first comprehensive reference to the state of research in the field of text mining, serving a pivotal role in educating practitioners in the field. This compendium of pioneering studies from leading experts is essential to academic reference collections and introduces researchers and students to cutting-edge techniques for gaining knowledge discovery from unstructured text.

...read moreread less

130 citations

Journal Article•DOI•

Automatic knowledge extraction from documents

[...]

James Fan¹, Aditya Kalyanpur¹, David C. Gondek¹, David A. Ferrucci¹•Institutions (1)

IBM¹

01 May 2012-Journal of Reproduction and Development

TL;DR: This paper describes in detail what kind of shallow knowledge is extracted, how it is automatically done from a large corpus, and how additional semantics are inferred from aggregate statistics of the automatically extracted shallow knowledge.

...read moreread less

Abstract: Access to a large amount of knowledge is critical for success at answering open-domain questions for DeepQA systems such as IBM Watson™. Formal representation of knowledge has the advantage of being easy to reason with, but acquisition of structured knowledge in open domains from unstructured data is often difficult and expensive. Our central hypothesis is that shallow syntactic knowledge and its implied semantics can be easily acquired and can be used in many areas of a question-answering system. We take a two-stage approach to extract the syntactic knowledge and implied semantics. First, shallow knowledge from large collections of documents is automatically extracted. Second, additional semantics are inferred from aggregate statistics of the automatically extracted shallow knowledge. In this paper, we describe in detail what kind of shallow knowledge is extracted, how it is automatically done from a large corpus, and how additional semantics are inferred from aggregate statistics. We also briefly discuss the various ways extracted knowledge is used throughout the IBM DeepQA system.

...read moreread less

130 citations

Journal Article•DOI•

Anonymity preserving pattern discovery

[...]

Maurizio Atzori¹, Francesco Bonchi², Fosca Giannotti², Dino Pedreschi¹•Institutions (2)

University of Pisa¹, Istituto di Scienza e Tecnologie dell'Informazione²

01 Jul 2008

TL;DR: By shifting the concept of k-anonymity from the source data to the extracted patterns, this paper formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provides a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns.

...read moreread less

Abstract: It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in frequent pattern mining. In this paper we show that this belief is ill-founded. By shifting the concept of k -anonymity from the source data to the extracted patterns, we formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provide a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns. On this basis, we obtain a formal notion of privacy protection that allows the disclosure of the extracted knowledge while protecting the anonymity of the individuals in the source database. Moreover, in order to handle the cases where the threats to anonymity cannot be avoided, we study how to eliminate such threats by means of pattern (not data!) distortion performed in a controlled way.

...read moreread less

130 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
…
102
103
104
105
106
107
108
…
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

20,644

Papers

453,302

Citations

No. of papers in the topic in previous years
Year	Papers
2023	120
2022	285
2021	506
2020	660
2019	740
2018	683

Knowledge extraction

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics