scispace - formally typeset
Search or ask a question

Showing papers on "Knowledge extraction published in 1992"


Journal ArticleDOI
TL;DR: After a decade of fundamental interdisciplinary research in machine learning, the spadework in this field has been done; the 1990s should see the widespread exploitation of knowledge discovery as an aid to assembling knowledge bases.
Abstract: After a decade of fundamental interdisciplinary research in machine learning, the spadework in this field has been done; the 1990s should see the widespread exploitation of knowledge discovery as an aid to assembling knowledge bases. The contributors to the AAAI Press book Knowledge Discovery in Databases were excited at the potential benefits of this research. The editors hope that some of this excitement will communicate itself to "AI Magazine readers of this article.

1,332 citations


Proceedings Article
23 Aug 1992
TL;DR: An attribute-oriented induction method has been developed for knowledge discovery in databases that integrates a machine learning paradigm with set-oriented database operations and extracts generalized data from actual data in databases.
Abstract: Knowledge discovery in databases, or data mining, is an important issue in the development of data- and knowledge-base systems. An attribute-oriented induction method has been developed for knowledge discovery in databases. The method integrates a machine learning paradigm, especially learning-from-examples techniques, with set-oriented database operations and extracts generalized data from actual data in databases. An attribute-oriented concept tree ascension technique is applied in generalization, which substantially reduces the computational complex@ of database learning processes. Different kinas of knowledge rules, including characteristic rules, discrimination rules, quantitative rules, and data evolution regularities can be discovered efficiently using the attribute-oriented approach. In addition to learning in relational databases, the approach can be applied to knowledge discovery in nested relational and deductive databases. Learning can also be performed with databases containing noisy data and exceptional cases using database statistics. Furthermore, the rules discovered can be used to query database knowledge, answer cooperative queries and facilitate semantic query optimization. Based upon these principles, a prototyped database learning system, DBLEARN, has been constructed for experimentation.

432 citations


Journal ArticleDOI
01 Feb 1992
TL;DR: In this paper, the authors considered the problem of first-order theories of expert systems and presented techniques for resolving inconsistencies in such knowledge bases, and also provided algorithms for implementing these techniques.
Abstract: Consider the construction of an expert system by encoding the knowledge of different experts. Suppose the knowledge provided by each expert is encoded into a knowledge base. Then the process of combining the knowledge of these different experts is an important and nontrivial problem. We study this problem here when the expert systems are considered to be first-order theories. We present techniques for resolving inconsistencies in such knowledge bases. We also provide algorithms for implementing these techniques.

224 citations


Journal ArticleDOI
TL;DR: The authors develop back-propagation learning for acyclic, event-driven networks in general and derive a specific algorithm for learning in EMYCIN-derived expert networks, which offers automation of the knowledge acquisition task for certainty factors, often the most difficult part of knowledge extraction.
Abstract: Expert networks are event-driven, acyclic networks of neural objects derived from expert systems. The neural objects process information through a nonlinear combining function that is different from, and more complex than, typical neural network node processors. The authors develop back-propagation learning for acyclic, event-driven networks in general and derive a specific algorithm for learning in EMYCIN-derived expert networks. The algorithm combines back-propagation learning with other features of expert networks, including calculation of gradients of the nonlinear combining functions and the hypercube nature of the knowledge space. It offers automation of the knowledge acquisition task for certainty factors, often the most difficult part of knowledge extraction. Results of testing the learning algorithm with a medium-scale (97-node) expert network are presented. >

139 citations


Journal ArticleDOI
TL;DR: This article discusses the use of examples in the design of databases, and gives an overview of the complexity results and algorithms that have been developed for this problem.
Abstract: We consider the problem of discovering the functional and inclusion dependencies that a given database instance satisfies. This technique is used in a database design tool that uses example databases to give feedback to the designer. If the examples show deficiencies in the design, the designer can directly modify the examples. the tool then infers new dependencies and the database schema can be modified, if necessary. the discovery of the functional and inclusion dependencies can also be used in analyzing an existing database. the problem of inferring functional dependencies has several connections to other topics in knowledge discovery and machine learning. In this article we discuss the use of examples in the design of databases, and give an overview of the complexity results and algorithms that have been developed for this problem. © 1992 John Wiley & Sons, Inc.

113 citations


Proceedings ArticleDOI
23 Aug 1992
TL;DR: KANT is described, the first system to combine principled source language design, semi-automated knowledge acquisition, and knowledge compilation techniques to produce fast, high-quality translation to multiple languages.
Abstract: Knowledge-based interlingual machine translation systems produce semantically accurate translations, but typically require massive knowledge acquisition. Ongoing research and development at the Center for Machine Translation has focussed on reducing this requirement to produce large-scale practical applications of knowledge-based MT. This paper describes KANT, the first system to combine principled source language design, semi-automated knowledge acquisition, and knowledge compilation techniques to produce fast, high-quality translation to multiple languages.

105 citations


Journal ArticleDOI
TL;DR: The architecture of an intelligent multistrategy assistant for knowledge discovery from facts, INLEN, is described and illustrated by an exploratory application and its performance is illustrated by applying it to a database of scientific publications.
Abstract: The architecture of an intelligent multistrategy assistant for knowledge discovery from facts, INLEN, is described and illustrated by an exploratory application. INLEN integrates a database, a knowledge base, and machine learning methods within a uniform user-oriented framework. A variety of machine learning programs are incorporated into the system to serve as high-levelknowledge generation operators (KGOs). These operators can generate diverse kinds of knowledge about the properties and regularities existing in the data. For example, they can hypothesize general rules from facts, optimize the rules according to problem-dependent criteria, determine differences and similarities among groups of facts, propose new variables, create conceptual classifications, determine equations governing numeric variables and the conditions under which the equations apply, deriving statistical properties and using them for qualitative evaluations, etc. The initial implementation of the system, INLEN 1b, is described, and its performance is illustrated by applying it to a database of scientific publications.

82 citations


Journal ArticleDOI
TL;DR: This article introduces patterns to identify what is interesting in data and gives some examples of patterns for difference‐, change‐, and trend‐detection, and summarizes what must be specified to define a pattern.
Abstract: In this article we describe some goals and problems of KDD. Approaches are presented which have been implemented in the Statistics Interpreter Explora, a prototype assistant system for discovering interesting findings in recurrent datasets. We introduce patterns to identify what is interesting in data and give some examples of patterns for difference-, change-, and trend-detection. Then we summarize what must be specified to define a pattern. Besides some descriptive parts, this includes a procedural verification method. Object-oriented programming techniques can simplify the specializations of general patterns. We identify search as a constituent principle of discovery and introduce object structures as a basis to induce a graph structure on the search space. We mention several strategies for graph search and describe approaches for dealing with the aggregation, redundancy, and overlapping problems. Then we address the presentation of findings in natural language and graphical form, focusing on the methods to design good graphical presentations by knowledge-based techniques. Finally, we discuss the paradigm of an adaptive discovery assistant, including the problem of how to reuse the discovered knowledge for further discovery. © 1992 John Wiley & Sons, Inc.

80 citations


Journal ArticleDOI
TL;DR: Electronic Fraud Detection (EFD) assists Investigative Consultants in the Managed Care & Employee Benefits Security Unit of the Travelers Insurance Companies in the detection and preinvestigative analysis of healthcare provider fraud.
Abstract: EFD (Electronic Fraud Detection) assists Investigative Consultants in the Managed Care & Employee Benefits Security Unit of the Travelers Insurance Companies in the detection and preinvestigative analysis of healthcare provider fraud. the task EFD performs, scanning a large population of health insurance claims in search of likely fraud, has never been done manually. Furthermore, the available database has few positive examples. Thus, neither existing knowledge engineering techniques nor statistical methods are sufficient for designing the identification process. to overcome these problems, EFD uses knowledge discovery techniques on two levels. First, EFD integrates expert knowledge with statistical information assessment to identify cases of unusual provider behavior. the heart of EFD is 27 behavioral heuristics, knowledge-based ways of viewing and measuring provider behavior. Rules operate on them to identify providers whose behavior merits a closer look by the Investigative Consultants. Second, machine learning is used to develop new rules and improve the identification process. Pilot operations involved analysis of nearly 22 000 providers in six metropolitan areas. the pilot is implemented in SAS Institute's SAS® System, AICorp's Knowledge Base Management System (KBMS®), and Borland International's Turbo Prolog®. © 1992 John Wiley & Sons, Inc.

75 citations


Journal ArticleDOI
TL;DR: The Knowledge Discovery Workbench is described, an interactive system for database exploration that capabilities in data clustering, summarization, classification, and discovery of changes are illustrated.
Abstract: We describe the Knowledge Discovery Workbench, an interactive system for database exploration. We then illustrate KDW capabilities in data clustering, summarization, classification, and discovery of changes. We also examine extracting dependencies from data and using them to order the multitude of data patterns. © 1992 John Wiley & Sons, Inc.

66 citations


Proceedings ArticleDOI
01 Jul 1992
TL;DR: The authors describe ATKET (automatic test knowledge extraction tool), which synthesizes test knowledge using structural and behavioral information available in the very high-speed IC description language (VHDL) description of a design.
Abstract: The authors describe ATKET (automatic test knowledge extraction tool), which synthesizes test knowledge using structural and behavioral information available in the very high-speed IC description language (VHDL) description of a design. A VHDL analyzer produces an intermediate representation of the information contained in a VHDL design. ATKET interfaces to this intermediate representation to access structural and behavioral information in the design and stores it in suitable data structures. A convenient representation called the module operation tree (MOT) is used to capture the behavior of modules in the design. Information stored in the MOT along with structural information describing connections between modules in the design is used to generate test knowledge. Results obtained from ATKET for a circuit which was difficult to test are presented. >


Book ChapterDOI
03 Jan 1992
TL;DR: A context sensitive rewrite grammar is developed that allows us to capture a large class of inference layer models and their instantiation in the ACKnowledge Knowledge Engineering Workbench.
Abstract: In this paper we describe Generalised Directive Models and their instantiation in the ACKnowledge Knowledge Engineering Workbench. We have developed a context sensitive rewrite grammar that allows us to capture a large class of inference layer models. We use the grammar to progressively refine the model of problem solving for an application. It is also used as the basis of the scheduling of KA activities and the selection of KA tools.

Journal ArticleDOI
01 Nov 1992
TL;DR: The method promotes several general ideas for the automation of knowledge acquisition, such as understanding-based knowledge extension, knowledge acquisition through multistrategy learning, consistency-driven concept formation and refinement, closed-loop learning, and synergistic cooperation between a human expert and a learning system.
Abstract: A method for the automation of knowledge acquisition that is viewed as a process of incremental extension, updating, and improvement of an incomplete and possibly partially incorrect knowledge base of an expert system is presented. The knowledge base is an approximate representation of objects and inference processes in the expertise domain. Its gradual development is guided by the general goal of improving this representation to consistently integrate new input information received from the human expert. The knowledge acquisition method is presented as part of a methodology for the automation of the entire process of building expert systems, and is implemented in the system NeoDISCIPLE. The method promotes several general ideas for the automation of knowledge acquisition, such as understanding-based knowledge extension, knowledge acquisition through multistrategy learning, consistency-driven concept formation and refinement, closed-loop learning, and synergistic cooperation between a human expert and a learning system. >

Journal ArticleDOI
TL;DR: The correlation matrix memory (CMM), a linear system with a single-layer of input-output connections, that is used as the neural network system's classifier is described and the limitations of the learning system are discussed.
Abstract: A paradigm for diagnostic neural network systems that emphasizes informative data representation and encoding and uses generic preprocessing techniques to extract knowledge from database records is discussed. The proposed diagnostic system differs from other approaches to automatic knowledge extraction in the following ways: by emphasizing the importance of intelligent encoding and preprocessing of raw data, rather than classifications; by demonstrating the importance of making a clear distinction between diagnostic and classification tasks; and by providing a generic, uniform representation for data records comprising interdependent, heterogeneous features. The correlation matrix memory (CMM), a linear system with a single-layer of input-output connections, that is used as the neural network system's classifier is described. The limitations of the learning system are discussed. >

Proceedings Article
01 Jan 1992

Journal ArticleDOI
TL;DR: It is argued that attitudes are somewhat short‐sighted and that the techniques of these two communities are complementary and a specific visualization system is discussed that must be overcome in integrating it into an AKD system.
Abstract: Although the fields of data visualization and automated knowledge discovery (AKD) share many goals, workers in each field have been reluctant to adopt the tools and methods of the other field. Many AKD researchers discourage the use of visualization tools because they believe that dependence on human steering will impede the development of numerical or analytical descriptions of complex data. Many visualization researchers are concerned that their present platforms are being pushed to the limits of their performance by the most advanced visualization techniques and are therefore unwilling to incur the perceived overhead of having a database system mediate access to the data. We argue that these attitudes are somewhat short-sighted and that the techniques of these two communities are complementary. We discuss a specific visualization system that we have developed and describe the obstacles that must be overcome in integrating it into an AKD system. © 1992 John Wiley & Sons, Inc.

Proceedings Article
01 Jan 1992
TL;DR: It is suggested that standards for representing medical logic preserve this separation between factual medical knowledge and knowledge of how the medical facts should be applied to a particular clinical situation to engender knowledge reuse.
Abstract: Knowledge Data Systems is building a medical expert system for monitoring clinical events. This system uses the Arden syntax as a knowledge representation. Having encoded may different types of rules in the Arden syntax, we have noticed a number of shortcomings of the syntax. Many of these shortcomings originate from Arden's procedural orientation, from its failure to separate factual medical knowledge from knowledge of how the medical facts should be applied to a particular clinical situation. The absence of this separation leads to redundancy of knowledge and to difficulties in knowledge reuse. We suggest that standards for representing medical logic preserve this separation to engender knowledge reuse. We propose a general framework for representing medical logic which supports both knowledge sharing and reuse.

Book ChapterDOI
08 Jul 1992
TL;DR: The METEXA system is presented, which emphasizes the use of radiological domain knowledge to determine the semantics of utterances, and the Conceptual Graph Theory by John Sowa has been chosen.
Abstract: The telegraphic language found in radiological reports can be well understood by a natural language system using the underlying domain knowledge. We present the METEXA system, which emphasizes the use of radiological domain knowledge to determine the semantics of utterances. Syntactic and semantic analysis, lexical semantics and the structure of the domain model are described in some detail. A resolution-based inference engine answers relevant questions concerning the contents of the reports. As knowledge representation formalism the Conceptual Graph Theory by John Sowa has been chosen.

Journal ArticleDOI
TL;DR: This article presents research on the design of knowledge-based document retrieval systems that adopted a semantic network structure to represent subject knowledge and clissification scheme knowledge and modeled experts search strategies and user modeling capability as procedural knowledge.
Abstract: This article presents research on the design of knowledge-based document retrieval systems. We adopted a semantic network structure to represent subject knowledge and classification scheme knowledge and modeled experts' search strategies and user modeling capability as procedural knowledge. These functionalities were incorporated into a prototype knowledge-based retrieval system, Metacat. Our system, the design of which was based on the blackboard architecture, was able to create a user profile, identify task requirements, suggest heuristics-based search strategies, perform semantic-based search assistance, and assist online query refinement

Journal ArticleDOI
TL;DR: The concept of “communication knowledge”, distinct from the domain knowledge, is emphasized, and three aspects of communication knowledge are identified and research related to them presented.
Abstract: Knowledge systems can be regarded as agents communicating between domain experts and end users. We emphasize the concept of “communication knowledge”, distinct from the domain knowledge. Three aspects of communication knowledge are identified and research related to them presented. These are domain-related knowledge, discourse knowledge and mediating knowledge. This frame of reference is applied in the contexts of knowledge acquisition, user interface management in knowledge systems, text generation in expert critiquing systems and tutoring systems. We discuss the implications of the proposed framework in terms of implemented systems and finally suggest a future research agenda emanating from the analyses.

Journal ArticleDOI
TL;DR: A knowledge‐based system has been developed which can use a priori knowledge which cannot be readily incorporated as a mathematical model but which can be represented by language‐based rules, such as “IF/THEN” conditions for the control of bioprocesses.
Abstract: For the control of bioprocesses, a priori knowledge exists which cannot be readily incorporated as a mathematical model but which can be represented by language-based rules, such as "IF/THEN" conditions. This knowledge is usually implemented via a process operator for the automation of the fermentation. A knowledge-based system has been developed which can use this kind of knowledge and which is complemented by algorithmic systems, such as statistical analyses and mathematical modeling, for the supervision and control of a bioprocess. In this article, a description of this system is presented. Furthermore, the following features of the system are discussed in detail which are especially important for the development of real-time, on-line, knowledge-based systems: the representation of time-dependent knowledge; processing of imprecise, uncertain, and incomplete knowledge; the combination of shallow reasoning with model-based reasoning; informing the bioprocess operator about the inferences and decisions; the demands of the diversity of the knowledge handling; performance; and maintenance and extension of the system.

Book ChapterDOI
02 Jan 1992
TL;DR: The necessity for knowledge representation (KR)—the describing or writing down of the knowledge in machine-usable form—underlies and shapes the whole KA process and the development of expert system software.
Abstract: The enterprise of artificial intelligence (AI) has given rise to a new class of software systems These software systems, commonly called expert systems, or knowledge-based systems, are distinguished in that they contain, and can apply, knowledge or some particular skill or expertise in the execution of a task These systems embody, in some form, humanlike expertise The construction of such software therefore requires that we somehow get hold of the knowledge and transfer it into the computer, representing it in a form usable by the machine This total process has come to be called knowledge acquisition (KA) The necessity for knowledge representation (KR)—the describing or writing down of the knowledge in machine-usable form—underlies and shapes the whole KA process and the development of expert system software


Proceedings Article
01 Jan 1992
TL;DR: The approach to visual interfaces outlined in this paper treats user inputs such as mouse clicks, menu selections, etc. not as invocations of methods that can be executed without regarding the dialogue context, but instead as dialogue acts expressing discourse goals of the user.
Abstract: The approach to visual interfaces outlined in this paper treats user inputs such as mouse clicks, menu selections, etc. not as invocations of methods that can be executed without regarding the dialogue context, but instead as dialogue acts expressing discourse goals of the user. The system can respond in a more flexible way by taking into account the pragmatic and semantic aspects of the user’s input. Our prototypical information system MERIT provides a flexible visual interface to a database covering European research projects and project consortia. The system supports the user by offering a set of sample retrieval strategies – called cases – which are modifiable to meet situative requirements. Then the system is able to offer situation-dependent query forms. MERIT generates graphical presentations which support – in accordance with the current case and actual dialogue situation – a survey or detail oriented reception of the retrieved data.

Journal ArticleDOI
01 Oct 1992
TL;DR: A system supporting knowledge engineering is presented that combines hypermedia, knowledge acquisition, expert system shell and database systems to provide an integrated end-user environment that supports a wide variety of forms and representations of knowledge.
Abstract: A system supporting knowledge engineering is presented that combines hypermedia, knowledge acquisition, expert system shell and database systems to provide an integrated end-user environment. The integration is achieved through inter-application communication protocols operating between four programs designed independently for stand-alone use. The system supports a wide variety of forms and representations of knowledge from the highly informal to the highly computational. It is extremely non-modal and allows knowledge acquisition and knowledge consultation to be combined in a variety of ways. In particular, the acquisition tool which may then be re-analyzed, possibly resulting in changed advice.

Book ChapterDOI
09 Jun 1992
TL;DR: Complex real world domains can be characterized by a large amount of data, their interactions and that the knowledge must often be related to concrete problems so that a knowledge acquisition process must adequately relate the acquired knowledge to the given problem.
Abstract: Complex real world domains can be characterized by a large amount of data, their interactions and that the knowledge must often be related to concrete problems. Therefore, the available descriptions of real world domains do not easily lend themselves to an adequate representation. The knowledge which is relevant for solving a given problem must be extracted from such descriptions with the help of the knowledge acquisition process. Such a process must adequately relate the acquired knowledge to the given problem.

Proceedings ArticleDOI
15 Jun 1992
TL;DR: The authors present a knowledge-based approach to encouraging the reuse of existing simulation and modeling programs by treating each existing program as an operator in a planning system, and creating a knowledge base describing the goals each program achieves, the pre- and post-conditions of running the program, and its I/O behavior.
Abstract: The authors present a knowledge-based approach to encouraging the reuse of existing simulation and modeling programs. To get around the problems of poor interfaces and minimal documentation, they are: treating each existing program as an operator in a planning system; creating a knowledge base describing the goals each program achieves, the pre- and post-conditions of running the program, and its I/O behavior; and developing several tools that make use of this knowledge to automate the development of new interfaces, to assist the creation of scripts that achieve high-level user goals, and to allow potential reusers to view a graphical representation of the component's role in solving high-level domain problems. >

Proceedings ArticleDOI
20 Sep 1992
TL;DR: The problem is examined of managing design knowledge as a crucial component in a large-scale software development project and an implemented design knowledge tool is presented instantiating the framework that gives software developers access to knowledge about a particular error handling mechanism.
Abstract: The problem is examined of managing design knowledge as a crucial component in a large-scale software development project. The authors explore this design knowledge problem in more detail, describe both technical and nontechnical challenges, discuss the maintenance of such knowledge, and briefly explore the issue of acquisition. A framework is described for providing knowledge-based assistance to software developers. This framework is integrated with and extends an existing design process and exploits that process to address the problem of knowledge maintenance. Then, an implemented design knowledge tool is presented instantiating the framework that gives software developers access to knowledge about a particular error handling mechanism. The organization of the knowledge, the design of the interface, and the status of the implementation are discussed. >

Journal ArticleDOI
TL;DR: This paper presents a four layer model for working with legal knowledge in expert systems that distinguishes five sources of knowledge that contain basic legal knowledge found in published and unpublished sources and which consists of legal metaknowledge.
Abstract: This paper presents a four layer model for working with legal knowledge in expert systems. It distinguishes five sources of knowledge. Four contain basic legal knowledge found in published and unpublished sources. The fifth consists of legal metaknowledge. In the model the four basic legal knowledge sources are placed at the lowest level. The metaknowledge is placed at levels above the other four knowledge sources. The assumption is that the knowledge is represented only once. The use of metaknowledge at various levels should make it possible to use the appropriate knowledge for the problem presented to the system. The knowledge has to be represented as closely to the original format as possible for this purpose. Suitable representation formalisms for the various types of knowledge in the five knowledge sources are discussed. It is not possible to indicate a `best' representation formalism for each knowledge source.