scispace - formally typeset
Search or ask a question

Showing papers on "Knowledge extraction published in 1993"


Journal ArticleDOI
TL;DR: The authors' perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology is presented and an algorithm for classification obtained by combining the basic rule discovery operations is given.
Abstract: The authors' perspective of database mining as the confluence of machine learning techniques and the performance emphasis of database technology is presented. Three classes of database mining problems involving classification, associations, and sequences are described. It is argued that these problems can be uniformly viewed as requiring discovery of rules embedded in massive amounts of data. A model and some basic operations for the process of rule discovery are described. It is shown how the database mining problems considered map to this model, and how they can be solved by using the basic operations proposed. An example is given of an algorithm for classification obtained by combining the basic rule discovery operations. This algorithm is efficient in discovering classification rules and has accuracy comparable to ID3, one of the best current classifiers. >

1,539 citations


Journal ArticleDOI
02 Jan 1993
TL;DR: Aquinas, an expanded version of the Expertise Transfer System (ETS), is a knowledge-acquisition workbench that combines ideas from psychology and knowledge-based systems research to support knowledge- Acquisition tasks.
Abstract: Acquiring knowledge from a human expert is a major problem when building a knowledge-based system. Aquinas, an expanded version of the Expertise Transfer System (ETS), is a knowledge-acquisition workbench that combines ideas from psychology and knowledge-based systems research to support knowledge-acquisition tasks. These tasks include eliciting distinctions, decomposing problems, combining uncertain information, incremental testing, integration of data types, automatic expansion and refinement of the knowledge base, use of multiple sources of knowledge and providing process guidance. Aquinas interviews experts and helps them analyse, test, and refine the knowledge base. Expertise from multiple experts or other knowledge sources can be represented and used separately or combined. Results from user consultations are derived from information propagated through hierarchies. Aquinas delivers knowledge by creating knowledge bases for several different expert-system shells. Help is given to the expert by a dialog manager that embodies knowledge-acquisition heuristics. Aquinas contains many techniques and tools for knowledge acquisition; the techniques combine to make it a powerful testbed for rapidly prototyping portions of many kinds of complex knowledge-based systems.

281 citations


Journal ArticleDOI
TL;DR: A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems and is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench.
Abstract: Knowledge-discovery systems face challenging problems from real-world databases, which tend to be dynamic, incomplete, redundant, noisy, sparse, and very large. These problems are addressed and some techniques for handling them are described. A model of an idealized knowledge-discovery system is presented as a reference for studying and designing new systems. This model is used in the comparison of three systems: CoverStory, EXPLORA, and the Knowledge Discovery Workbench. The deficiencies of existing systems relative to the model reveal several open problems for future research. >

278 citations


Journal ArticleDOI
TL;DR: Knowledge acquisition tools can be associated with knowledge-based application problems and problem-solving methods as mentioned in this paper, and a framework for analysing and comparing tools and techniques, and focusing the task of building knowledge based systems on the knowledge acquisition process.

233 citations


01 Jan 1993
TL;DR: The study shows that knowledge discovery has wide applications in spatial databases, and relatively efficient algorithms can be developed for discovery of general knowledge in large spatial databases.
Abstract: Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial dataand knowledge-base systems. In this paper, we investigate knowledge discovery in spatial databases and develop a generalization-based knowledge discovery mechanism which integrates attribute-oriented induction on nonspatial data and spatial merge and generalization on spatial data. The study shows that knowledge discovery has wide applications in spatial databases, and relatively efficient algorithms can be developed for discovery of general knowledge in large spatial databases.

148 citations


Journal ArticleDOI
TL;DR: The knowledge acquisition bottleneck impeding theDevelopment of expert systems is being alleviated by the development of computer-based knowledge acquisition tools, which work directly with experts to elicit knowledge, and structure it appropriately to operate as a decision support tool within an expert system.
Abstract: The knowledge acquisition bottleneck impeding the development of expert systems is being alleviated by the development of computer-based knowledge acquisition tools. These work directly with experts to elicit knowledge, and structure it appropriately to operate as a decision support tool within an expert system. However, the elicitation of expert knowledge and its effective transfer to a useful knowledge-based system is complex and involves diverse activities. The complete development of a decision support system using knowledge acquisition tools is illustrated. The example is simple enough to be completely analyzed but exhibits enough real-world characteristics to give significant insights into the processes and problems of knowledge engineering. >

135 citations


Patent
29 Jan 1993
TL;DR: In this paper, a method and system for retrieving images using a coupled knowledge-base/database is provided, the method comprising the steps of modeling structural knowledge by identifying classes and attributes of classes, determining relationships among the classes, operations for each classes, and deriving a schema for the coupled database from the structural knowledge.
Abstract: A method and system for retrieving images using a coupled knowledge-base/database is provided, the method comprising the steps of modeling structural knowledge by identifying classes and attributes of classes, determining relationships among the classes, operations for each classes; modeling heuristic and general procedural knowledge by acquiring heuristic rules for each class dependent on the application domain, specifying data processing procedures required by the heuristic rules acquired; modeling control knowledge by specifying intra-class-hierarchy searching paths, specifying intra-class-hierarchy searching paths, and representing the specified search paths in triggers for each class; and deriving a schema for the coupled database from the structural knowledge. The knowledge-based system for retrieving images provided includes a coupled knowledge-base/database and comprises a knowledge-base storing expert knowledge information including structural knowledge, general procedural knowledge, heuristic knowledge, and control knowledge; a database storing patient information; a knowledge-base/database interface for coupling the database to the knowledge-base; reasoning means to search the classes for the selecting rules; retrieving means for retrieving the examination data; and a user interface; and a control interface for coupling the user interface to the knowledge-base.

129 citations


Journal ArticleDOI
TL;DR: A rule refinement strategy is presented, partly implemented in a Prolog program, that operationalizes “interestingness” into performance, simplicity, novelty, and significance and yielded 10 “genuinely interesting” rules.
Abstract: Rule induction can achieve orders of magnitude reduction in the volume of data descriptions. For example, we applied a commercial tool (IXLtm) to a 1,819 record tropical storm database, yielding 161 rules. However, the human comprehension goals of Knowledge Discovery in Databases may require still more orders, of magnitude. We present a rule refinement strategy, partly implemented in a Prolog program, that operationalizes "interestingness" into performance, simplicity, novelty, and significance. Applying the strategy to the induced rulebase yielded 10 "genuinely interesting" rules.

105 citations


Proceedings ArticleDOI
01 May 1993
TL;DR: A tool is built that serves as a living design memory for a large software development organization that delivers knowledge to developers effectively and is embedded in organizational practice to ensure that the knowledge it contains evolves as necessary.
Abstract: We identify an important type of software design knowledge that we call community specific folklore and show problems with current approaches to managing it. We built a tool that serves as a living design memory for a large software development organization. The tool delivers knowledge to developers effectively and is embedded in organizational practice to ensure that the knowledge it contains evolves as necessary. This work illustrates important lessons in building knowledge management systems, integrating novel technology into organizational practice, and managing research-development partnerships.

82 citations


Journal ArticleDOI
Inderpal Bhandari1, M.J. Halliday1, E. D. Tarver1, D. Brown1, Jarir K. Chaar1, Ram Chillarege1 
TL;DR: It is shown that analysis of defect data can readily lead a project team to improve their process during development and be used to capture the semantics of defects in a fashion which is useful for process correction.
Abstract: We present a case study of the use of a software process improvement method which is based on the analysis of defect data. The first step of the method is the classification of software defects using attributes which relate defects to specific process activities. Such classification captures the semantics of the defects in a fashion which is useful for process correction. The second step utilizes a machine-assisted approach to data exploration which allows a project team to discover such knowledge from defect data as is useful for process correction. We show that such analysis of defect data can readily lead a project team to improve their process during development. >

81 citations


Journal ArticleDOI
TL;DR: This work describes a system that supports the data archaeologist with a natural, object-oriented representation of an application domain; a powerful query language and database translation routines; and an easy-to-use and flexible user interface that supports interactive exploration.
Abstract: Corporate databases increasingly are being viewed as potentially rich sources of new and valuable knowledge. Various approaches to “discovering” or “mining” such knowledge have been proposed. Here we identify an important and previously ignored discovery task, which we call data archaeology. Data archaeology is a skilled human task, in which the knowledge sought depends on the goals of the analyst, cannot be specified in advance, and emerges only through an iterative process of data segmentation and analysis. We describe a system that supports the data archaeologist with a natural, object-oriented representation of an application domain, a powerful query language and database translation routines, and an easy-to-use and flexible user interface that supports interactive exploration. A formal knowledge representation system provides the core technology that facilitates database integration, querying, and the reuse of queries and query results.

11 Jul 1993
TL;DR: The evaluation of a methodology for discovering strong probabilistic rules in data revealed that the strong rules essentially confirm the expert's experiences whereas weak rules are often difficult to interpret, suggesting the use of rule strength as the primary criteria for the selection of potentially useful predictive rules.
Abstract: An application of a methodology for discovering strong probabilistic rules in data is presented. The methodology is based on an extended model of rough sets called variable precision rough sets model incorporated in DATALOGIC/R knowledge discovery tool from Reduct Systems Inc. It has been applied to analyze monthly stock market data collected over a ten year period. The objective of the analysis was to identify dominant relationships among fluctuations of market indicators and stock prices. For the purpose of comparison, both precise and imprecise, strong and weak rules were discovered and evaluated by a domain expert, a stock broker. The evaluation revealed that the strong rules (supported by many cases) essentially confirm the expert's experiences whereas weak rules are often difficult to interpret. This suggests the use of rule strength as the primary criteria for the selection of potentially useful predictive rules.

Book ChapterDOI
01 Jan 1993
TL;DR: The system enhances the reasoning capabilities of classical expert systems with the ability of generalise and the handling of incomplete cases and uses neural nets with unsupervised learning algorithms to extract regularities out of case data.
Abstract: In this work we present the integration of neural networks with a rule based expert system. The system realizes the automatic acquisition of knowledge out of a set of examples. It enhances the reasoning capabilities of classical expert systems with the ability of generalise and the handling of incomplete cases. It uses neural nets with unsupervised learning algorithms to extract regularities out of case data. A symbolic rule generator transforms these regularities into Prolog rules. The generated rules and the trained neural nets are embedded into the expert system as knowledge bases. In the system’s diagnosis phase it is possible to use these knowledge bases together with human expert’s knowledge bases in order to diagnose a unknown case. Furthermore the system is able to diagnose and to complete inconsistent data using the trained neural nets exploiting their ability to generalise.

Journal ArticleDOI
TL;DR: Forty-Niner (49er), a general-purpose database mining system which conducts large-scale search for patterns in many subsets of data, conducting a more costly search for equations only when data indicate a functional relationship.
Abstract: Large databases can be a source of useful knowledge. Yet this knowledge is implicit in the data. It must be mined and expressed in a concise, useful form of statistical patterns, equations, rules, conceptual hierarchies, and the like. Automation of knowledge discovery is important because databases are growing in size and number, and standard data analysis techniques are not designed for exploration of huge hypotheses spaces. We concentrate on discovery of regularities, defining a regularity by a pattern and the range in which that pattern holds. We argue that two types of patterns are particularly important: contingency tables and equations, and we present Forty-Niner (49er), a general-purpose database mining system which conducts large-scale search for those patterns in many subsets of data, conducting a more costly search for equations only when data indicate a functional relationship. 49er can refine the initial regularities to yield stronger and more general regularities and more useful concepts. 49er combines several searches, each contributing to a different aspect of a regularity. Correspondence between the components of search and the structure of regularities makes the system easy to understand, use, and expand. Finally, we discuss 49er's performance in four categories of tests: (1) open exploration of new databases; (2) reproduction of human findings (limited because databases which have been extensively explored are very rare); (3) hide- and -seek testing on artificially created data, to evaluate 49er on large scale against known results; (4) exploration of randomly generated databases.


Journal ArticleDOI
TL;DR: A method is presented in which experts in the field are confronted with a carefully chosen sequence of questions about specific relationships between the problems of the domain and the questioning procedure is directly applicable in other contexts, viz., in the determination of any system of sets that is closed under either union or intersection.

Journal ArticleDOI
TL;DR: A method is proposed that uses neural networks as the basis for the automation of knowledge acquisition and can be applied to noisy, realworld domains.
Abstract: A major bottleneck in developing knowledge-based systems is the acquisition of knowledge. Machine learning is an area concerned with the automation of this process of knowledge acquisition. Neural networks generally represent their knowledge at the lower level, while knowledge-based systems use higher-level knowledge representations. the method we propose here provides a technique that automatically allows us to extract conjunctive rules from the lower-level representation used by neural networks, the strength of neural networks in dealing with noise has enabled us to produce correct rules in a noisy domain. Thus we propose a method that uses neural networks as the basis for the automation of knowledge acquisition and can be applied to noisy, realworld domains. © 1993 John Wiley & Sons, Inc.

Journal ArticleDOI
TL;DR: A tool for characterizing the exceptions in databases and evolving knowledge as a database evolves is developed, which includes using a database query to discover new rules.
Abstract: A concept for knowledge discovery and evolution in databases is described. The key issues include: using a database query to discover new rules; using not only positive examples (answer to a query), but also negative examples to discover new rules; and harmonizing existing rules with the new rules. A tool for characterizing the exceptions in databases and evolving knowledge as a database evolves is developed. >

Book ChapterDOI
08 Jun 1993
TL;DR: It is demonstrated that, in analogy with IS-A relations, part-of relations form hierarchies (dag's) which constitute an important conceptual aid in understanding complex systems.
Abstract: The incorporation of semantics into conceptual models has for long been a goal of the data/knowledge modelling communities. Equally, conceptual models strive for a high degree of intuitiveness in order be better understood by their human users. This paper aims to go one step in this direction by introducing the part-of relation as a special case of aggregation. To do so we investigate the semantic constraints accompanying this specialization and suggest different ways of incorporating part-of semantics into data/knowledge models. Further, it is demonstrated that, in analogy with IS-A relations, part-of relations form hierarchies (dag's) which constitute an important conceptual aid in understanding complex systems. Finally, we investigate the conditions under which the part-of relation exhibits transitive behavior which can be exploited for automated inferences facilitated by the transitivity property.

Book ChapterDOI
12 Oct 1993
TL;DR: The rough sets model is used as a departure point to study formal reasoning with uncertain information, machine learning, knowledge discovery, and representation and reasoning about imprecise knowledge in this book.
Abstract: The primary methodological framework to study classification problems with imprecise or incomplete information in this book is the theory of rough sets. The theory was originally introduced by Pawlak[1]. The uniqueness as well as the complementary character of rough set theory to other approaches for dealing with imprecise, noisy, or incomplete information such as fuzzy set theory[4], or theory of evidence[5] was recognized by mathematicians and researchers working on mathematical foundations of Computer Science. Currently, there are over 800 publications in this area, including two books and an annual workshop. The rough sets model is used as a departure point to study formal reasoning with uncertain information[6–8], machine learning, knowledge discovery[9–13, 20], and representation and reasoning about imprecise knowledge[6]. The theory of rough sets has been applied in numerous domains such as, for example, analysis of clinical data and medical diagnosis[14], information retrieval[15], control algorithm acquisition and process control[16], analysis of complex chemical compounds[17], structural engineering[18], market analysis[12], and others[9].

Journal ArticleDOI
Ole J. Mengshoel1
TL;DR: The KVAT knowledge validation tool, which tests core knowledge bases (CKBs) using cases, is discussed and its strengths are found to be a high level of abstraction, user friendliness, and generality.
Abstract: The KVAT knowledge validation tool, which tests core knowledge bases (CKBs) using cases, is discussed. KVAT relieves experts and knowledge engineers from manually validating the CKBs in a prototype knowledge-base system (KBS). KVAT addresses some of the most central problems met when KBSs are validated: what to validate, what to validate with, what to validate against, when to validate, and how to validate. A help desk prototype developed using KVAT is described to illustrate how the tool works. The authors have found KVAT's strengths to be a high level of abstraction, user friendliness, and generality. >

Journal ArticleDOI
David A. Bell1
TL;DR: The method investigated seeks to exploit information available from conventional database systems, namely, the integrity assertions or data dependency information contained in the database that allows ranking arguments in terms of their strengths.
Abstract: The problem of making decisions among propositions based on both uncertain data items and arguments which are not certain is addressed. The primary knowledge discovery issue addressed is a classification problem: which classification does the available evidence support? The method investigated seeks to exploit information available from conventional database systems, namely, the integrity assertions or data dependency information contained in the database. This information allows ranking arguments in terms of their strengths. As a step in the process of discovering classification knowledge, using a database as a secondary knowledge discovery exercise, latent knowledge pertinent to arguments of relevance to the purpose at hand is explicated. This is called evidence. Information is requested via user prompts from an evidential reasoner. It is fed as evidence to the reasoner. An object-oriented structure for managing evidence is used to model the conclusion space and to reflect the evidence structure. The implementation of the evidence structure and an example of its use are outlined. >

Book ChapterDOI
12 Oct 1993
TL;DR: This study shows that attribute-oriented induction combined with the rough set technique provides an efficient and effective mechanism for knowledge discovery in database systems.
Abstract: In this paper we present an attribute-oriented rough set approach for knowledge discovery in databases The method integrates machine learning paradigm, especially learning-from-examples techniques, with rough-set techniques An attribute-oriented concept tree ascension technique is first applied in generalization, which substantially reduces the computational complexity of database learning processes Then the cause-effect relationship among the attributes in the database is analyzed using rough set techniques and the unimportant or irrelevant attributes are eliminated Thus concise and strong rules with little or no redundant information can be learned efficiently Our study shows that attribute-oriented induction combined with the rough set technique provides an efficient and effective mechanism for knowledge discovery in database systems

Journal ArticleDOI
TL;DR: This work presents EDM as a means of representing design information in a modular way, and proposes a system architecture for the integration of knowledge and compares it with the PDES/STEP international effort.

Proceedings ArticleDOI
24 Mar 1993
TL;DR: A process for constructing an information workspace with low-cost access to reusable analysis and design knowledge has been developed and validated and it is concluded that a process for consolidating and reusing design knowledge is powerful by itself and in the authors' engineering domains, the reuse of analyses and designs is more useful than the reuseof software.
Abstract: A process for constructing an information workspace with low-cost access to reusable analysis and design knowledge has been developed and validated. The intent is to support the design and evolution of product families. The process involves three related efforts: techniques to consolidate critical analysis and design information-domain analysis; organization of the information in structured form-technology books; and methods and tools to reuse information. It is concluded that a process for consolidating and reusing design knowledge is powerful by itself, and in the authors' engineering domains, the reuse of analyses and designs is more useful than the reuse of software. >

Book ChapterDOI
26 Apr 1993
TL;DR: The postulates upon which MRD research has been based over the past fifteen years are considered, the validity of these postulates are discussed, and the results of this work are evaluated.
Abstract: Machine-readable versions of everyday dictionaries have been seen as a likely source of information for use in natural language processing because they contain an enormous amount of lexical and semantic knowledge However, after 15 years of research, the results appear to be disappointing No comprehensive evaluation of machine-readable dictionaries (MRDs) as a knowledge source has been made to date, although this is necessary to determine what, if anything, can be gained from MRD research To this end, this paper will first consider the postulates upon which MRD research has been based over the past fifteen years, discuss the validity of these postulates, and evaluate the results of this work We will then propose possible future directions and applications that may exploit these years of effort, in the light of current directions in not only NLP research, but also fields such as lexicography and electronic publishing

Journal ArticleDOI
TL;DR: The findings show that structured analysis techniques are useful in planning for knowledge acquisition, the participation of end users is important, and that a designated primary expert is helpful when multiple experts are involved.

Journal ArticleDOI
TL;DR: TheNielsen Opportunity Explorertmproduct can be used by sales and trade marketing personnel within consumer packaged goods manufacturers to understand how their products are performing in the market place and find opportunities to sell more product, more profitably to the retailers.
Abstract: TheNielsen Opportunity Explorer tmproduct can be used by sales and trade marketing personnel within consumer packaged goods manufacturers to understand how their products are performing in the market place and find opportunities to sell more product, more profitably to the retailers Opportunity Explorer uses data collected at the point-of-sale terminals, and by auditors of A C Nielsen Opportunity Explorer uses a knowledge-base of market research expertise to analyze large databases and generate interactive reports using knowledge discovery templates, converting a large space of data into concise, inter-linkedinformation frames Each information frame addresses specific business issues, and leads the user to seek related information by means of dynamically created hyperlinks

Proceedings Article
01 Jun 1993
TL;DR: An approach is advocated where reasoning from situation-specific knowledge - captured as a collection of previously solved cases, is combined with generalised domain knowledge in the form of a densely connected semantic network, which enables continuos, sustained learning by updating the case base after each new problem has been solved.
Abstract: Among the most important challenges for contemporary AI research are the development of methods for improved robustness, adaptability, and overall interactiveness of systems. Interactiveness, the ability to perform and react in tight co-operation with a user and/or other parts of the environment, can be viewed as subsuming the other two. There are various approaches to addressing these problems, spanning from minor improvements of existing methods and theories, through new and different methodologies, up to completely different paradigms. As an example of the latter, the very foundation of knowledge-based systems, based on a designer's explicit representation of real world knowledge in computer software structures, has recently been questioned by prominent members of the KBS community. In the present paper, some foundational issues of the knowledge-based paradigm are reviewed, and the main arguments of the critiquing position are discussed. Some of the deficiencies of current approaches pointed out by the critics are acknowledged, but it is argued that the deficiencies cannot be solved by escaping from the knowledge-based paradigm. However, an alternative to the main- stream, generalisation-based, approach is needed. An approach is advocated where reasoning from situation-specific knowledge - captured as a collection of previously solved cases, is combined with generalised domain knowledge in the form of a densely connected semantic network. The approach enables continuos, sustained learning by updating the case base after each new problem has been solved. The paper shows how an example of this approach - the CREEK system - can provide an answer within the knowledge-based paradigm to the problems pointed out by the critics.

01 Jan 1993
TL;DR: The mechanisms for knowledge discovery in object-oriented and active database systems are overviewed, with an emphasis on the techniques for generalization of complex data objects, methods, class hierarchies and dynamically evolving data, and on the integration of knowledge discovery mechanisms with production control processes.
Abstract: Knowledge discovery in databases (or data mining), which extracts interesting knowledge from large databases, represents an important direction in the development of data-and knowledge-base systems. With fruitful research results on knowledge discovery in re-lational databases and the emerging trend in the development of object-oriented and active database systems , it is natural to investigate knowledge discovery in object-oriented and active databases. This paper overviews the mechanisms for knowledge discovery in object-oriented and active database systems, with an emphasis on the techniques for generalization of complex data objects, methods, class hierarchies and dynamically evolving data, and on the integration of knowledge discovery mechanisms with production control processes.