Home
/
Topics
/
Knowledge extraction

Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1972
1970

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Two-Stage Topic Extraction Model for Bibliometric Data Analysis Based on Word Embeddings and Clustering

[...]

Aytuğ Onan¹•Institutions (1)

Izmir Kâtip Çelebi University¹

07 Oct 2019-IEEE Access

TL;DR: The empirical analysis reveals that ensemble word embedding scheme yields better predictive performance compared to the baseline word vectors for topic extraction, and ensemble clustering framework outperforms the baseline clustering methods.

...read moreread less

Abstract: Topic extraction is an essential task in bibliometric data analysis, data mining and knowledge discovery, which seeks to identify significant topics from text collections. The conventional topic extraction schemes require human intervention and involve also comprehensive pre-processing tasks to represent text collections in an appropriate way. In this paper, we present a two-stage framework for topic extraction from scientific literature. The presented scheme employs a two-staged procedure, where word embedding schemes have been utilized in conjunction with cluster analysis. To extract significant topics from text collections, we propose an improved word embedding scheme, which incorporates word vectors obtained by word2vec, POS2vec, word-position2vec and LDA2vec schemes. In the clustering phase, an improved clustering ensemble framework, which incorporates conventional clustering methods (i.e., k-means, k-modes, k-means++, self-organizing maps and DIANA algorithm) by means of the iterative voting consensus, has been presented. In the empirical analysis, we analyze a corpus containing 160,424 abstracts of articles from various disciplines, including agricultural engineering, economics, engineering and computer science. In the experimental analysis, performance of the proposed scheme has been compared to conventional baseline clustering methods (such as, k-means, k-modes, and k-means++), LDA-based topic modelling and conventional word embedding schemes. The empirical analysis reveals that ensemble word embedding scheme yields better predictive performance compared to the baseline word vectors for topic extraction. Ensemble clustering framework outperforms the baseline clustering methods. The results obtained by the proposed framework show an improvement in Jaccard coefficient, Folkes & Mallows measure and F1 score.

...read moreread less

97 citations

Book Chapter•DOI•

Efficient mining of association rules based on formal concept analysis

[...]

Lotfi Lakhal, Gerd Stumme¹•Institutions (1)

University of Kassel¹

01 Jan 2005

TL;DR: This survey will first introduce some basic ideas of this connection along a specific algorithm, Titanic, and show how FCA helps in reducing the number of resulting rules without loss of information, before giving a general overview over the history and state of the art of applying FCA for association rule mining.

...read moreread less

Abstract: Association rules are a popular knowledge discovery technique for warehouse basket analysis. They indicate which items of the warehouse are frequently bought together. The problem of association rule mining has first been stated in 1993. Five years later, several research groups discovered that this problem has a strong connection to Formal Concept Analysis (FCA). In this survey, we will first introduce some basic ideas of this connection along a specific algorithm, Titanic, and show how FCA helps in reducing the number of resulting rules without loss of information, before giving a general overview over the history and state of the art of applying FCA for association rule mining.

...read moreread less

97 citations

Journal Article•DOI•

Principles of human¿computer collaboration for knowledge discovery in science

[...]

Raúl E. Valdés-Pérez¹•Institutions (1)

Carnegie Mellon University¹

01 Feb 1999-Artificial Intelligence

TL;DR: This work describes discovery in science as the generation of novel, interesting, plausible, and intelligible knowledge about the objects of study, and analyzes four current machine discovery programs in chemistry, medicine, mathematics, and linguistics according to how their design, or the circumstances of their application, heighten the chances of finding knowledge that has all four properties.

...read moreread less

97 citations

Book Chapter•DOI•

Inductive Logic Programming for Knowledge Discovery in Databases

[...]

Stefan Wrobel¹•Institutions (1)

Otto-von-Guericke University Magdeburg¹

01 Jan 2001

TL;DR: A tutorial-style introduction to relational analysis is provided, beginning with a detailed explanation of why and where one might be interested in relational analysis, and the basics of Inductive Logic Programming (ILP), the scientific field where relational methods are primarily studied.

...read moreread less

Abstract: Relational data mining algorithms and systems are capable of directly dealing with multiple tables or relations as they are found in today’s relational databases. This reduces the need for manual preprocessing and allows problems to be treated that cannot be handled easily with standard single-table methods. This paper provides a tutorial-style introduction to the topic, beginning with a detailed explanation of why and where one might be interested in relational analysis. We then present the basics of Inductive Logic Programming (ILP), the scientific field where relational methods are primarily studied. After illustrating the workings of MiDOS, a relational methods for subgroup discovery, in more detail, we show how to use relational methods in one of the current data mining systems.

...read moreread less

96 citations

Book Chapter•DOI•

The Microsoft Academic Knowledge Graph: A Linked Data Source with 8 Billion Triples of Scholarly Data

[...]

Michael Färber¹•Institutions (1)

Karlsruhe Institute of Technology¹

26 Oct 2019

TL;DR: The Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study, is presented.

...read moreread less

Abstract: In this paper, we present the Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is licensed under the Open Data Commons Attribution License (ODC-By). By providing the data as RDF dump files as well as a data source in the Linked Open Data cloud with resolvable URIs and links to other data sources, we bring a vast amount of scholarly data to the Web of Data. Furthermore, we provide entity embeddings for all 210 million represented publications. We facilitate a number of use case scenarios, particularly in the field of digital libraries, such as (1) entity-centric exploration of papers, researchers, affiliations, etc.; (2) data integration tasks using RDF as a common data model and links to other data sources; and (3) data analysis and knowledge discovery of scholarly data.

...read moreread less

96 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
…
150
151
152
153
154
155
156
…
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

20,644

Papers

453,302

Citations

No. of papers in the topic in previous years
Year	Papers
2023	120
2022	285
2021	506
2020	660
2019	740
2018	683

Knowledge extraction

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics