scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A document discovery tool based on Conceptual Clustering by Formal Concept Analysis that allows users to navigate e-mail using a visual lattice metaphor rather than a tree to aid knowledge discovery in document collections.
Abstract: This paper discusses a document discovery tool based on Conceptual Clustering by Formal Concept Analysis. The program allows users to navigate e-mail using a visual lattice metaphor rather than a tree. It implements a virtual. le structure over e-mail where files and entire directories can appear in multiple positions. The content and shape of the lattice formed by the conceptual ontology can assist in e-mail discovery. The system described provides more flexibility in retrieving stored e-mails than what is normally available in e-mail clients. The paper discusses how conceptual ontologies can leverage traditional document retrieval systems and aid knowledge discovery in document collections.

91 citations

Proceedings ArticleDOI
01 May 2001
TL;DR: A set of new algorithms that solve the Distributed Association Rule Mining problem using far less communication and continue to be efficient even when the data is skewed or the partition sizes are imbalanced are presented.
Abstract: Mining for associations between items in large transactional databases is a central problem in the field of knowledge discovery. When the database is partitioned among several share-nothing machines, the problem can be addressed using distributed data mining algorithms. One such algorithm, called CD, was proposed by Agrawal and Shafer in [1] and was later enhanced by the FDM algorithm of Cheung, Han et al. [5].The main problem with these algorithms is that they do not scale well with the number of partitions. They are thus impractical for use in modern distributed environments such as peer-to-peer systems, in which hundreds or thousands of computers may interact. In this paper we present a set of new algorithms that solve the Distributed Association Rule Mining problem using far less communication. In addition to being very efficient, the new algorithms are also extremely robust. Unlike existing algorithms, they continue to be efficient even when the data is skewed or the partition sizes are imbalanced. We present both experimental and theoretical results concerning the behavior of these algorithms and explain how they can be implemented in different settings.

91 citations

Proceedings Article
08 Aug 1983
TL;DR: A formal, but pragmatic, method of recording and organizing human expertise into a knowledge-based system is presented and experience gained from testing and from expert feedback is described.
Abstract: A formal, but pragmatic, method of recording and organizing human expertise into a knowledge-based system is presented. Practical considerations and methods which increase system validity while minimizing demands on human domain specialists are explored. The methodology concentrates on domain definition (background knowledge, references, situations, and procedures), on fundamental knowledge formulation (elementary rules, beliefs, and expectations), and on basal knowledge consolidation (review and correction cycles). Experience gained from testing and from expert feedback is described.

91 citations

Book ChapterDOI
TL;DR: This paper describes how to assemble a Semantic Web of workflow provenance logs that allows a bioinformatician to browse and navigate between experimental components by generating hyperlinks based on semantic annotations associated with them.
Abstract: e-Science experiments are those performed using computer-based resources such as database searches, simulations or other applications Like their laboratory based counterparts, the data associated with an e-Science experiment are of reduced value if other scientists are not able to identify the origin, or provenance, of those data Provenance is the term given to metadata about experiment processes, the derivation paths of data, and the sources and quality of experimental components, which includes the scientists themselves, related literature, etc Consequently provenance metadata are valuable resources for e-Scientists to repeat experiments, track versions of data and experiment runs, verify experiment results, and as a source of experimental insight One specific kind of in silico experiment is a workflow In this paper we describe how we can assemble a Semantic Web of workflow provenance logs that allows a bioinformatician to browse and navigate between experimental components by generating hyperlinks based on semantic annotations associated with them By associating well-formalized semantics with workflow logs we take a step towards integration of process provenance information and improved knowledge discovery

91 citations

01 Jan 1999
TL;DR: This Chapter discusses selected rough set based solutions to two main knowledge discovery problems, namely the description problem and the classification (prediction) problem.
Abstract: The amount of electronic data available is growing very fast and this explosive growth in databases has generated a need for new techniques and tools that can intelligently and automatically extract implicit, previously unknown, hidden and potentially useful information and knowledge from these data. These tools and techniques are the subject of the field of Knowledge Discovery in Databases. In this Chapter we discuss selected rough set based solutions to two main knowledge discovery problems, namely the description problem and the classification (prediction) problem.

91 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683