scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Proceedings Article
01 Jul 1998
TL;DR: Technical design issues faced in the development of Open Knowledge Base Connectivity are discussed, how OKBC improves upon GFP is highlighted, and practical experiences in using it are reported on.
Abstract: The technology for building large knowledge bases (KBs) is yet to witness a breakthrough so that a KB can be constructed by the assembly of prefabricated knowledge components. Knowledge components include both pieces of domain knowledge (for example, theories of economics or fault diagnosis) and KB tools (for example, editors and theorem provers). Most of the current KB development tools can only manipulate knowledge residing in the knowledge representation system (KRS) for which the tools were originally developed. Open Knowledge Base Connectivity (OKBC) is an application programming interface for accessing KRSs, and was developed to enable the construction of reusable KB tools. OKBC improves upon its predecessor, the Generic Frame Protocol (GFP), in several significant ways. OKBC can be used with a much larger range of systems because its knowledge model supports an assertional view of a KRS. OKBC provides an explicit treatment of entities that are not frames, and it has a much better way of controlling inference and specifying default values. OKBC can be used on practically any platform because it supports network transparency and has implementations for multiple programming languages. In this paper, we discuss technical design issues faced in the development of OKBC, highlight how OKBC improves upon GFP, and report on practical experiences in using it.

354 citations

Journal ArticleDOI
TL;DR: The objective of this research is to provide a conceptual framework that identifies major research areas of data mining and knowledge discovery (DMKD) for students and beginners and describe the longitudinal changes of DMKD research activities.
Abstract: As our abilities to collect and store various types of datasets are continually increasing, the demands for advanced techniques and tools to understand and make use of these large data keep growing. No single existing field is capable of satisfying the needs. Data Mining and Knowledge Discovery (DMKD), which utilizes methods, techniques, and tools from diverse disciplines, emerged in last decade to solve this problem. It brings knowledge and theories from several fields including databases, machine learning, optimization, statistics, and data visualization and has been applied to various real-life applications. Even though data mining has made significant progress during the past fifteen years, most research effort is devoted to developing effective and efficient algorithms that can extract knowledge from data and not enough attention has been paid to the philosophical foundations of data mining. The objective of this research is to provide a conceptual framework that identifies major research areas of data mining and knowledge discovery (DMKD) for students and beginners and describe the longitudinal changes of DMKD research activities. Using the textual documents collected from premier DMKD journals, conference proceedings, syllabi, and dissertations, this study is intended to address the following issues: What are the major subjects of this field? What is the central theme? What are the connections among these subjects? What are the longitudinal changes of DMKD research? To answer these questions, this research uses a combination of grounded theory and document clustering. The result will represent previous and current DMKD research activities in the form of a framework. The resulting framework should allow people to comprehend the entire domain of DMKD research and assist identification of areas in need of more research efforts.

353 citations

Proceedings ArticleDOI
21 Aug 2011
TL;DR: Algorithms which construct outlier causality trees based on temporal and spatial properties of detected outliers reveal not only recurring interactions among spatio-temporal outliers, but potential flaws in the design of existing traffic networks.
Abstract: The detection of outliers in spatio-temporal traffic data is an important research problem in the data mining and knowledge discovery community. However to the best of our knowledge, the discovery of relationships, especially causal interactions, among detected traffic outliers has not been investigated before. In this paper we propose algorithms which construct outlier causality trees based on temporal and spatial properties of detected outliers. Frequent substructures of these causality trees reveal not only recurring interactions among spatio-temporal outliers, but potential flaws in the design of existing traffic networks. The effectiveness and strength of our algorithms are validated by experiments on a very large volume of real taxi trajectories in an urban road network.

350 citations

Journal ArticleDOI
TL;DR: A review of the available literature on the various measures devised for evaluating and ranking the discovered patterns produced by the data mining process and their strengths and weaknesses with respect to the level of user integration within the discovery process is presented.
Abstract: It is a well-known fact that the data mining process can generate many hundreds and often thousands of patterns from data. The task for the data miner then becomes one of determining the most useful patterns from those that are trivial or are already well known to the organization. It is therefore necessary to filter out those patterns through the use of some measure of the patterns actual worth. This article presents a review of the available literature on the various measures devised for evaluating and ranking the discovered patterns produced by the data mining process. These so-called interestingness measures are generally divided into two categories: objective measures based on the statistical strengths or properties of the discovered patterns and subjective measures that are derived from the user's beliefs or expectations of their particular problem domain. We evaluate the strengths and weaknesses of the various interestingness measures with respect to the level of user integration within the discovery process.

344 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683