scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
01 Jan 1999
TL;DR: An overview of the evolution of Protégé is given, examining the methodological assumptions underlying the original ProtÉgé system and discussing the ways in which the methodology has changed over time.
Abstract: It has been 13 years since the first version of Protégé was run. The original tool was a small application, aimed mainly at building knowledge-acquisition tools for a few very specialized programs (it grew out of the ONCOCIN project and the subsequent attempts to build expert systems for protocol-based therapy planning). The most recent version, Protégé-2000, incorporates the Open Knowledge Base Connectivity (OKBC) knowledge model, is written to run across a wide variety of platforms, supports customized user-interface extensions, and has been used by over 300 individuals and research groups, most of whom are only peripherally interested in medical informatics. Researchers not directly involved in the project might well wonder how Protégé evolved, what are the reasons for the repeated reimplementations, and how to tell the various versions apart. In this paper, we give an overview of the evolution of Protégé, examining the methodological assumptions underlying the original Protégé system and discussing the ways in which the methodology has changed over time. We conclude with an overview of the latest version of Protégé, Protégé-2000. 1. MOTIVATION AND A TIMELINE The Protégé applications (hereafter ‘Protégé’) are a set of tools that have been evolving for over a decade, from a simple program which helped construct specialized knowledge-bases to a set of general purpose knowledge-base creation and maintenance tools. While Protégé began as a small application designed for a medical domain (protocol-based therapy planning), it has grown and evolved to become a much more general-purpose set of tools for building knowledge-based systems. The original goal of Protégé was to reduce the knowledge-acquisition bottleneck (Hayes-Roth et al, 1983) by minimizing the role of the knowledge-engineer in constructing knowledge-bases. In order to do this, Musen (1988, 1989b) posited that knowledge-acquisition proceeds in welldefined stages and that knowledge acquired in one stage could be used to generate and customize knowledge-acquisition tools for subsequent stages. In (Musen, 1988), Protégé was defined as an application that takes advantage of this structured information to simplify the knowledgeacquisition process. The original Protégé was described this way (Musen, 1988): Protégé is neither an expert system itself nor a program that builds expert systems directly. Instead, Protégé is a tool that helps users build other tools that are custom-tailored to assist with knowledgeacquisition for expert systems in specific application areas. The original Protégé demonstrated the viability of this approach, and of the use of task-specific knowledge to generate and customize knowledge-acquisition tools. But as with many first-

295 citations

Proceedings Article
01 Jul 1998
TL;DR: This work shows how information extraction can be cast as a standard machine learning problem, and argues for the suitability of relational learning in solving it, and the implementation of a general-purpose relational learner for information extraction, SRV.
Abstract: Because the World Wide Web consists primarily of text, information extraction is central to any effort that would use the Web as a resource for knowledge discovery. We show how information extraction can be cast as a standard machine learning problem, and argue for the suitability of relational learning in solving it. The implementation of a general-purpose relational learner for information extraction, SRV, is described. In contrast with earlier learning systems for information extraction, SRV makes no assumptions about document structure and the kinds of information available for use in learning extraction patterns. Instead, structural and other information is supplied as input in the form of an extensible token-oriented feature set. We demonstrate the effectiveness of this approach by adapting SRV for use in learning extraction rules for a domain consisting of university course and research project pages sampled from the Web. Making SRV Web-ready only involves adding several simple HTML-specific features to its basic feature set.

294 citations

Book ChapterDOI
Bamshad Mobasher1, Honghua Dai1, Tao Luo1, Yuqing Sun1, Jiang Zhu1 
04 Sep 2000
TL;DR: This paper presents a framework for Web usage mining, distinguishing between the offine tasks of data preparation and mining, and the online process of customizing Web pages based on a user's active session, and describes effective techniques based on clustering to obtain a uniform representation for both site usage and site content profiles.
Abstract: Recent proposals have suggested Web usage mining as an enabling mechanism to overcome the problems associated with more traditional Web personalization techniques such as collaborative or content-based filtering. These problems include lack of scalability, reliance on subjective user ratings or static profiles, and the inability to capture a richer set of semantic relationships among objects (in content-based systems). Yet, usage-based personalization can be problematic when little usage data is available pertaining to some objects or when the site content changes regularly. For more effective personalization, both usage and content attributes of a site must be integrated into a Web mining framework and used by the recommendation engine in a uniform manner. In this paper we present such a framework, distinguishing between the offine tasks of data preparation and mining, and the online process of customizing Web pages based on a user's active session. We describe effective techniques based on clustering to obtain a uniform representation for both site usage and site content profiles, and we show how these profiles can be used to perform real-time personalization.

293 citations

Journal ArticleDOI
TL;DR: It is suggested that data, information, and knowledge could serve as both the input and output of a visualization process, raising questions about their exact role in visualization.
Abstract: In visualization, we use the terms data, information and knowledge extensively, often in an interrelated context. In many cases, they indicate different levels of abstraction, understanding, or truthfulness. For example, "visualization is concerned with exploring data and information," "the primary objective in data visualization is to gain insight into an information space," and "information visualization" is for "data mining and knowledge discovery." In other cases, these three terms indicate data types, for instance, as adjectives in noun phrases, such as data visualization, information visualization, and knowledge visualization. These examples suggest that data, information, and knowledge could serve as both the input and output of a visualization process, raising questions about their exact role in visualization.

293 citations

Journal ArticleDOI

292 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683