scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The main idea in this paper is to describe key papers and provide some guidelines to help medical practitioners to explore previous works and identify interesting areas for future research.
Abstract: Data mining is a powerful method to extract knowledge from data. Raw data faces various challenges that make traditional method improper for knowledge extraction. Data mining is supposed to be able to handle various data types in all formats. Relevance of this paper is emphasized by the fact that data mining is an object of research in different areas. In this paper, we review previous works in the context of knowledge extraction from medical data. The main idea in this paper is to describe key papers and provide some guidelines to help medical practitioners. Medical data mining is a multidisciplinary field with contribution of medicine and data mining. Due to this fact, previous works should be classified to cover all users' requirements from various fields. Because of this, we have studied papers with the aim of extracting knowledge from structural medical data published between 1999 and 2013. We clarify medical data mining and its main goals. Therefore, each paper is studied based on the six medical tasks: screening, diagnosis, treatment, prognosis, monitoring and management. In each task, five data mining approaches are considered: classification, regression, clustering, association and hybrid. At the end of each task, a brief summarization and discussion are stated. A standard framework according to CRISP-DM is additionally adapted to manage all activities. As a discussion, current issue and future trend are mentioned. The amount of the works published in this scope is substantial and it is impossible to discuss all of them on a single work. We hope this paper will make it possible to explore previous works and identify interesting areas for future research.

220 citations

Book ChapterDOI
TL;DR: It is shown how concept map-based knowledge models can be used to organize repositories of information in a way that makes them easily browsable, and how concept maps can improve searching algorithms for the Web.
Abstract: Information visualization has been a research topic for many years, leading to a mature field where guidelines and practices are well established. Knowledge visualization, in contrast, is a relatively new area of research that has received more attention recently due to the interest from the business community in Knowledge Management. In this paper we present the CmapTools software as an example of how concept maps, a knowledge visualization tool, can be combined with recent technology to provide integration between knowledge and information visualizations. We show how concept map-based knowledge models can be used to organize repositories of information in a way that makes them easily browsable, and how concept maps can improve searching algorithms for the Web. We also report on how information can be used to complement knowledge models and, based on the searching algorithms, improve the process of constructing concept maps.

220 citations

Proceedings Article
18 Aug 1985
TL;DR: A program that verifies the consistency and completeness of expert system knowledge bases which utilize the Lookheed Expert System (LES) framework, called CHECK, which combines logical principles as well as specific information about the knowledge representation formalism of LES.
Abstract: In this paper we describe a program that verifies the consistency and completeness of expert system knowledge bases which utilize the Lookheed Expert System (LES) framework. The algorithms described here are not specific to LES and can be applied to most rule-based systems. The program, called CHECK, combines logical principles as well as specific information about the knowledge representation formalism of LES. The program checks for redundant rules, conflicting rules, subsumed rules, missing rules, circular rules, unreachable clauses, and deadend clauses. It also generates a dependency chart which shows the dependencies among the rules and between the rules and the goals. CHECK can help the knowledge engineer to detect many programming errors even before the knowledge base testing phase. It also helps detect gaps in the knowledge base which the knowledge engineer and the expert might have overlooked. A wide variety of knowledge bases have been analyzed using CHECK.

220 citations

Dissertation
01 Jan 2004
TL;DR: Experimental results demonstrate that discovered patterns in extracted text can be used to effectively improve the underlying IE method, and an approach to using rules mined from extracted data to improve the accuracy of information extraction is presented.
Abstract: The popularity of the Web and the large number of documents available in electronic form has motivated the search for hidden knowledge in text collections. Consequently, there is growing research interest in the general topic of text mining. In this dissertation, we develop a text-mining system by integrating methods from Information Extraction (IE) and Data Mining (Knowledge Discovery from Databases or KDD). By utilizing existing IE and KDD techniques, text-mining systems can be developed relatively rapidly and evaluated on existing text corpora for testing IE systems. We present a general text-mining framework called DISCOTEX which employs an IE module for transforming natural-language documents into structured data and a KDD module for discovering prediction rules from the extracted data. When discovering patterns in extracted text, strict matching of strings is inadequate because textual database entries generally exhibit variations due to typographical errors, misspellings, abbreviations, and other sources. We introduce the notion of discovering “soft-matching” rules from text and present two new learning algorithms. TEXTRISE is an inductive method for learning soft-matching prediction rules that integrates rule-based and instance-based learning methods. Simple, interpretable rules are discovered using rule induction, while a nearest-neighbor algorithm provides soft matching. SOFTAPRIORI is a text-mining algorithm for discovering association rules from texts that uses a similarity measure to allow flexible matching to variable database items. We present experimental results on inducing prediction and association rules from natural-language texts demonstrating that TEXTRISE and SOFTA PRIORI learn more accurate rules than previous methods for these tasks. We also present an approach to using rules mined from extracted data to improve the accuracy of information extraction. Experimental results demonstrate that such discovered patterns can be used to effectively improve the underlying IE method.

219 citations

Book ChapterDOI
23 Sep 1998
TL;DR: This paper describes the Term Extraction module of the Document Explorer system, and provides experimental evaluation performed on a set of 52,000 documents published by Reuters in the years 1995–1996.
Abstract: Knowledge Discovery in Databases (KDD) focuses on the computerized exploration of large amounts of data and on the discovery of interesting patterns within them. While most work on KDD has been concerned with structured databases, there has been little work on handling the huge amount of information that is available only in unstructured textual form. Previous work in text mining focused at the word or the tag level. This paper presents an approach to performing text mining at the term level. The mining process starts by preprocessing the document collection and extracting terms from the documents. Each document is then represented by a set of terms and annotations characterizing the document. Terms and additional higher-level entities are then organized in a hierarchical taxonomy. In this paper we will describe the Term Extraction module of the Document Explorer system, and provide experimental evaluation performed on a set of 52,000 documents published by Reuters in the years 1995–1996.

219 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683