scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The empirical analysis reveals that ensemble word embedding scheme yields better predictive performance compared to the baseline word vectors for topic extraction, and ensemble clustering framework outperforms the baseline clustering methods.
Abstract: Topic extraction is an essential task in bibliometric data analysis, data mining and knowledge discovery, which seeks to identify significant topics from text collections. The conventional topic extraction schemes require human intervention and involve also comprehensive pre-processing tasks to represent text collections in an appropriate way. In this paper, we present a two-stage framework for topic extraction from scientific literature. The presented scheme employs a two-staged procedure, where word embedding schemes have been utilized in conjunction with cluster analysis. To extract significant topics from text collections, we propose an improved word embedding scheme, which incorporates word vectors obtained by word2vec, POS2vec, word-position2vec and LDA2vec schemes. In the clustering phase, an improved clustering ensemble framework, which incorporates conventional clustering methods (i.e., k-means, k-modes, k-means++, self-organizing maps and DIANA algorithm) by means of the iterative voting consensus, has been presented. In the empirical analysis, we analyze a corpus containing 160,424 abstracts of articles from various disciplines, including agricultural engineering, economics, engineering and computer science. In the experimental analysis, performance of the proposed scheme has been compared to conventional baseline clustering methods (such as, k-means, k-modes, and k-means++), LDA-based topic modelling and conventional word embedding schemes. The empirical analysis reveals that ensemble word embedding scheme yields better predictive performance compared to the baseline word vectors for topic extraction. Ensemble clustering framework outperforms the baseline clustering methods. The results obtained by the proposed framework show an improvement in Jaccard coefficient, Folkes & Mallows measure and F1 score.

97 citations

Book ChapterDOI
01 Jan 2005
TL;DR: This survey will first introduce some basic ideas of this connection along a specific algorithm, Titanic, and show how FCA helps in reducing the number of resulting rules without loss of information, before giving a general overview over the history and state of the art of applying FCA for association rule mining.
Abstract: Association rules are a popular knowledge discovery technique for warehouse basket analysis. They indicate which items of the warehouse are frequently bought together. The problem of association rule mining has first been stated in 1993. Five years later, several research groups discovered that this problem has a strong connection to Formal Concept Analysis (FCA). In this survey, we will first introduce some basic ideas of this connection along a specific algorithm, Titanic, and show how FCA helps in reducing the number of resulting rules without loss of information, before giving a general overview over the history and state of the art of applying FCA for association rule mining.

97 citations

Journal ArticleDOI
TL;DR: This work describes discovery in science as the generation of novel, interesting, plausible, and intelligible knowledge about the objects of study, and analyzes four current machine discovery programs in chemistry, medicine, mathematics, and linguistics according to how their design, or the circumstances of their application, heighten the chances of finding knowledge that has all four properties.

97 citations

Book ChapterDOI
01 Jan 2001
TL;DR: A tutorial-style introduction to relational analysis is provided, beginning with a detailed explanation of why and where one might be interested in relational analysis, and the basics of Inductive Logic Programming (ILP), the scientific field where relational methods are primarily studied.
Abstract: Relational data mining algorithms and systems are capable of directly dealing with multiple tables or relations as they are found in today’s relational databases. This reduces the need for manual preprocessing and allows problems to be treated that cannot be handled easily with standard single-table methods. This paper provides a tutorial-style introduction to the topic, beginning with a detailed explanation of why and where one might be interested in relational analysis. We then present the basics of Inductive Logic Programming (ILP), the scientific field where relational methods are primarily studied. After illustrating the workings of MiDOS, a relational methods for subgroup discovery, in more detail, we show how to use relational methods in one of the current data mining systems.

96 citations

Book ChapterDOI
26 Oct 2019
TL;DR: The Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study, is presented.
Abstract: In this paper, we present the Microsoft Academic Knowledge Graph (MAKG), a large RDF data set with over eight billion triples with information about scientific publications and related entities, such as authors, institutions, journals, and fields of study. The data set is licensed under the Open Data Commons Attribution License (ODC-By). By providing the data as RDF dump files as well as a data source in the Linked Open Data cloud with resolvable URIs and links to other data sources, we bring a vast amount of scholarly data to the Web of Data. Furthermore, we provide entity embeddings for all 210 million represented publications. We facilitate a number of use case scenarios, particularly in the field of digital libraries, such as (1) entity-centric exploration of papers, researchers, affiliations, etc.; (2) data integration tasks using RDF as a common data model and links to other data sources; and (3) data analysis and knowledge discovery of scholarly data.

96 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683