scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Book
31 Aug 1998
TL;DR: This chapter discusses data mining and knowledge discovery through the lens of machine learning, and some of the techniques used in this chapter were previously described in the preface.
Abstract: Foreword. Preface. 1. Data Mining and Knowledge Discovery. 2. Rough Sets. 3. Fuzzy Sets. 4. Bayesian Methods. 5. Evolutionary Computing. 6. Machine Learning. 7. Neural Networks. 8. Clustering. 9. Preprocessing. Index.

552 citations

Journal ArticleDOI
TL;DR: The concept of data mining as a querying process and the first steps toward efficient development of knowledge discovery applications are discussed.
Abstract: DATABASE MINING IS NOT SIMPLY ANOTHER buzzword for statistical data analysis or inductive learning. Database mining sets new challenges to database technology: new concepts and methods are needed for query languages, basic operations, and query processing strategies. The most important new component is the ad hoc nature of knowledge and data discovery (KDD) queries and the need for efficient query compilation into a multitude of existing and new data analysis methods. Hence, database mining builds upon the existing body of work in statistics and machine learning but provides completely new functionalities. The current generation of database systems are designed mainly to support business applications. The success of Structured Query Language (SQL) has capitalized on a small number of primitives sufficient to support a vast majority of such applications. Unfortunately, these primitives are not sufficient to capture the emerging family of new applications dealing with knowledge discovery. Most current KDD systems offer isolated discovery features using tree inducers, neural nets, and rule discovery algorithms. Such systems cannot be embedded into a large application and typically offer just one knowledge dis-The concept of data mining as a querying process and the first steps toward efficient development of knowledge discovery applications are discussed.

547 citations

Journal ArticleDOI
TL;DR: In this paper, a self-supervised learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text.
Abstract: To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.

545 citations

Book
01 Jan 2011
TL;DR: This coherently written multi-author monograph provides a thorough introduction and systematic overview of the area and will become a valuable source of reference for R&D professionals active in relational data mining.
Abstract: As the first book devoted to relational data mining, this coherently written multi-author monograph provides a thorough introduction and systematic overview of the area. The first part introduces the reader to the basics and principles of classical knowledge discovery in databases and inductive logic programming; subsequent chapters by leading experts assess the techniques in relational data mining in a principled and comprehensive way; finally, three chapters deal with advanced applications in various fields and refer the reader to resources for relational data mining.This book will become a valuable source of reference for R&D professionals active in relational data mining. Students as well as IT professionals and ambitioned practitioners interested in learning about relational data mining will appreciate the book as a useful text and gentle introduction to this exciting new field.

530 citations

Journal ArticleDOI
TL;DR: The growing self-organizing map (GSOM) is presented in detail and the effect of a spread factor, which can be used to measure and control the spread of the GSOM, is investigated.
Abstract: The growing self-organizing map (GSOM) algorithm is presented in detail and the effect of a spread factor, which can be used to measure and control the spread of the GSOM, is investigated. The spread factor is independent of the dimensionality of the data and as such can be used as a controlling measure for generating maps with different dimensionality, which can then be compared and analyzed with better accuracy. The spread factor is also presented as a method of achieving hierarchical clustering of a data set with the GSOM. Such hierarchical clustering allows the data analyst to identify significant and interesting clusters at a higher level of the hierarchy, and continue with finer clustering of the interesting clusters only. Therefore, only a small map is created in the beginning with a low spread factor, which can be generated for even a very large data set. Further analysis is conducted on selected sections of the data and of smaller volume. Therefore, this method facilitates the analysis of even very large data sets.

529 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683