scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Through the use of the accelerator, three representative heuristic fuzzy-rough feature selection algorithms have been enhanced and it is shown that these modified algorithms are much faster than their original counterparts.

125 citations

Journal ArticleDOI
TL;DR: The approach is based on concepts, which are extracted from texts to be used as characteristics in the mining process, and statistical techniques are applied on concepts in order to find interesting patterns in concept distributions or associations.
Abstract: This paper presents an approach for knowledge discovery in texts extracted from the Web. Instead of analyzing words or attribute values, the approach is based on concepts, which are extracted from texts to be used as characteristics in the mining process. Statistical techniques are applied on concepts in order to find interesting patterns in concept distributions or associations. In this way, users can pe rform discovery in a high level, since concepts describe real world events, objects, thoughts, etc. For identifying concepts in texts, a categorization algorithm is used associated to a previous classification task for concept definitions. Two experiments are presented: one for political analysis and other for competitive intelligence. At the end, the approach is discussed, examining its problems and advantages in the Web context. Keywords Knowledge discovery, data mining, information extraction, categorization, text mining. 1. INTRODUCTION The Web is a large and growing collection of texts. This amount of text is becoming a valuable resource of information and knowledge. As Garofalakis and partners comment, "

125 citations

Book ChapterDOI
01 Sep 1999
TL;DR: It is argued that in data mining the major requirement of security control mechanism is not to ensure precise and bias-free statistics, but rather to preserve the high-level descriptions of knowledge constructed by artificial data mining tools.
Abstract: The recent proliferation of data mining tools for the analysis of large volumes of data has paid little attention to individual privacy issues. Here, we introduce methods aimed at finding a balance between the individuals' right to privacy and the data-miners' need to find general patterns in huge volumes of detailed records. In particular, we focus on the data-mining task of classification with decision trees. We base our security-control mechanism on noise-addition techniques used in statistical databases because (1) the multidimensional matrix model of statistical databases and the multidimensional cubes of On-Line Analytical Processing (OLAP) are essentially the same, and (2) noise-addition techniques are very robust. The main drawback of noise addition techniques in the context of statistical databases is low statistical quality of released statistics. We argue that in data mining the major requirement of security control mechanism (in addition to protect privacy) is not to ensure precise and bias-free statistics, but rather to preserve the high-level descriptions of knowledge constructed by artificial data mining tools.

125 citations

Journal ArticleDOI
TL;DR: A workflow and a few empirical case studies for Chinese word segmentation rules of the Conditional Random Fields model are presented, and the potential of leveraging natural language processing and knowledge graph technologies for geoscience is shown.

125 citations

Journal Article
TL;DR: This bibliography subsumes an earlier bibliography and shows that the value of investigating temporal, spatial and spatio-temporal data has been growing in both interest and applicability.
Abstract: Data mining and knowledge discovery have become important issues for research over the past decade. This has been caused not only by the growth in the size of datasets but also in the availability of otherwise unavailable datasets over the Internet and the increased value that organisations now place on the knowledge that can be gained from data analysis. It is therefore not surprising that the increased interest in temporal and spatial data has led also to an increased interest in mining such data. This bibliography subsumes an earlier bibliography and shows that the value of investigating temporal, spatial and spatio-temporal data has been growing in both interest and applicability.

125 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683