Showing papers on "Knowledge extraction published in 1999"

PDF

Open Access

Book•

Knowledge Engineering and Management: The CommonKADS Methodology

[...]

17 Dec 1999

TL;DR: The CommonKADS methodology, developed over the last decade by an industry-university consortium led by the authors, is used and makes as much use as possible of the new UML notation standard.

...read moreread less

Abstract: The book covers in an integrated fashion the complete route from corporate knowledge management, through knowledge analysis andengineering, to the design and implementation of knowledge-intensiveinformation systems. The disciplines of knowledge engineering and knowledge management are closely tied. Knowledge engineering deals with the development of information systems in which knowledge and reasoning play pivotal roles. Knowledge management, a newly developed field at the intersection of computer science and management, deals with knowledge as a key resource in modern organizations. Managing knowledge within an organization is inconceivable without the use of advanced information systems; the design and implementation of such systems pose great organization as well as technical challenges. The book covers in an integrated fashion the complete route from corporate knowledge management, through knowledge analysis and engineering, to the design and implementation of knowledge-intensive information systems. The CommonKADS methodology, developed over the last decade by an industry-university consortium led by the authors, is used throughout the book. CommonKADS makes as much use as possible of the new UML notation standard. Beyond information systems applications, all software engineering and computer systems projects in which knowledge plays an important role stand to benefit from the CommonKADS methodology.

...read moreread less

1,720 citations

Journal Article•DOI•

Efficient mining of association rules using closed itemset lattices

[...]

Nicolas Pasquier¹, Yves Bastide¹, Rafik Taouil¹, Lotfi Lakhal¹•Institutions (1)

Blaise Pascal University¹

01 Mar 1999-Information Systems

TL;DR: Experiments showed that Close is very efficient for mining dense and/or correlated data such as census style data, and performs reasonably well for market basket style data.

...read moreread less

845 citations

Proceedings Article•

Constructing Biological Knowledge Bases by Extracting Information from Text Sources

[...]

Mark Craven¹, Johan Kumlien•Institutions (1)

Carnegie Mellon University¹

06 Aug 1999

TL;DR: A research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases, is begun, to use machine-learning methods to induce routines for extracting facts from text.

...read moreread less

Abstract: Recently, there has been much effort in making databases for molecular biology more accessible and interoperable. However, information in text form, such as MEDLINE records, remains a greatly underutilized source of biological information. We have begun a research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases. Our approach to this task is to use machine-learning methods to induce routines for extracting facts from text. We describe two learning methods that we have applied to this task--a statistical text classification method, and a relational learning method--and our initial experiments in learning such information-extraction routines. We also present an approach to decreasing the cost of learning information-extraction routines by learning from "weakly" labeled training data.

...read moreread less

660 citations

Journal Article•DOI•

Data is more than knowledge: implications of the reversed knowledge hierarchy for knowledge management and organizational memory

[...]

Ilkka Tuomi¹•Institutions (1)

Nokia¹

05 Jan 1999

TL;DR: The reversed hierarchy of knowledge is shown to lead to a different approach in developing information systems that support knowledge management and organizational memory, and this difference may have major implications for organizational flexibility and renewal.

...read moreread less

Abstract: In knowledge management literature it is often pointed out that it is important to distinguish between data, information and knowledge. The generally accepted view sees data as simple facts that become information as data is combined into meaningful structures, which subsequently become knowledge as meaningful information is put into a context and when it can be used to make predictions. This view sees data as a prerequisite for information, and information as a prerequisite for knowledge. I explore the conceptual hierarchy of data, information and knowledge, showing that data emerges only after we have information, and that information emerges only after we already have knowledge. The reversed hierarchy of knowledge is shown to lead to a different approach in developing information systems that support knowledge management and organizational memory. It is also argued that this difference may have major implications for organizational flexibility and renewal.

...read moreread less

583 citations

Book•

Knowledge Management Handbook

[...]

Jay Liebowitz

01 Feb 1999

TL;DR: The Knowledge Management Handbook provides an essential reference, integrating perspectives from researchers and practitioners on knowledge management, and outlines a sound foundation of the methodologies, techniques, and practices in the field.

...read moreread less

Abstract: From the Publisher: The Knowledge Management Handbook provides an essential reference, integrating perspectives from researchers and practitioners on knowledge management. With many prominent individuals and organizations contributing to the work, this book outlines a sound foundation of the methodologies, techniques, and practices in the field. Advanced topics include knowledge discovery, data warehousing, data mining, web-based technology, and intelligent agents.

...read moreread less

579 citations

Journal Article•

Knowledge Discovery through Co-Word Analysis.

[...]

Qin He

22 Jun 1999-Library Trends

TL;DR: Co-word analysis is a content analysis technique that uses patterns of co-occurrence of pairs of items in a corpus of texts to identify the relationships between ideas within the subject areas presented in these texts.

...read moreread less

Abstract: IN THE LAST HALF CENTURY, AS THE SCIENCE I,ITERATURE has increased dramatically, scientists found it increasingly difficult to locate needed data, and it is increasingly difficult for policymakers to understand the complex interrelationship of science in order to achieve effective research planning. Some quantitative techniques have been developed to ameliorate these problems; co-word analysis is one of these techniques. Based on the co-occurrence frequency of pairs of words or phrases, co-word analysis is used to discover linkages among subjects in a research field and thus to trace the development of science. Within the last two decades, this technique, implemented by several research groups, has proved to be a powerful tool for knowledge discovery in databases. This article reviews the development of co-word analysis, summarizes the advantages and disadvantages of this method, and discusses several research issues. INTRODUCTION Since World War 11, the scope and volume of scientific research have increased dramatically. This is well reflected in the growth of the literature. In the 1960s, the amount of scientific literature was estimated to be doubling approximately every ten years (Price, 1963). Three decades later, in the 199Os, along with developments in information technolo

...read moreread less

520 citations

Journal Article•DOI•

A survey of data mining and knowledge discovery software tools

[...]

Michael Goebel¹, Le Gruenwald²•Institutions (2)

University of Auckland¹, University of Oklahoma²

01 Jun 1999-Sigkdd Explorations

TL;DR: An overview of common knowledge discovery tasks and approaches to solve these tasks is provided, and a feature classification scheme that can be used to study knowledge and data mining software is proposed.

...read moreread less

Abstract: Knowledge discovery in databases is a rapidly growing field, whose development is driven by strong research interests as well as urgent practical, social, and economical needs. While the last few years knowledge discovery tools have been used mainly in research environments, sophisticated software products are now rapidly emerging. In this paper, we provide an overview of common knowledge discovery tasks and approaches to solve these tasks. We propose a feature classification scheme that can be used to study knowledge and data mining software. This scheme is based on the software's general characteristics, database connectivity, and data mining characteristics. We then apply our feature classification scheme to investigate 43 software products, which are either research prototypes or commercially available. Finally, we specify features that we consider important for knowledge discovery software to possess in order to accommodate its users effectively, as well as issues that are either not addressed or insufficiently solved yet.

...read moreread less

427 citations

Book Chapter•DOI•

Rule Evaluation Measures: A Unifying View

[...]

Nada Lavrač, Peter A. Flach¹, Blaz Zupan²•Institutions (2)

University of Bristol¹, University of Ljubljana²

24 Jun 1999

TL;DR: This paper develops a unifying view on some of the existing measures for predictive and descriptive induction by means of contingency tables, and demonstrates that many rule evaluation measures developed for predictive knowledge discovery can be adapted to descriptive knowledge discovery tasks.

...read moreread less

Abstract: Numerous measures are used for performance evaluation in machine learning. In predictive knowledge discovery, the most frequently used measure is classification accuracy. With new tasks being addressed in knowledge discovery, new measures appear. In descriptive knowledge discovery, where induced rules are not primarily intended for classification, new measures used are novelty in clausal and subgroup discovery, and support and confidence in association rule learning. Additional measures are needed as many descriptive knowledge discovery tasks involve the induction of a large set of redundant rules and the problem is the ranking and filtering of the induced rule set. In this paper we develop a unifying view on some of the existing measures for predictive and descriptive induction. We provide a common terminology and notation by means of contingency tables. We demonstrate how to trade off these measures, by using what we call weighted relative accuracy. The paper furthermore demonstrates that many rule evaluation measures developed for predictive knowledge discovery can be adapted to descriptive knowledge discovery tasks.

...read moreread less

422 citations

Proceedings Article•

Overview of Knowledge Sharing and Reuse Components: Ontologies and Problem-Solving Methods

[...]

A. Gomez Perez¹, V.R. Benjamins²•Institutions (2)

Technical University of Madrid¹, University of Amsterdam²

01 Aug 1999

TL;DR: An overview of approaches for ontologies and problem-solving methods is given, which can be viewed as complementary entities that can be used to configure new knowledge systems from existing, reusable components.

...read moreread less

Abstract: Ontologies and problem-solving methods are promising candidates for reuse in Knowledge Engineering. Ontologies define domain knowledge at a generic level, while problem-solving methods specify generic reasoning knowledge. Both type of components can be viewed as complementary entities that can be used to configure new knowledge systems from existing, reusable components. In this paper, we give an overview of approaches for ontologies and problem-solving methods.

...read moreread less

418 citations

Journal Article•DOI•

Discovery of frequent DATALOG patterns

[...]

Luc Dehaspe¹, Hannu Toivonen²•Institutions (2)

Katholieke Universiteit Leuven¹, University of Helsinki²

01 Mar 1999-Data Mining and Knowledge Discovery

TL;DR: WARMR is presented, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem.

...read moreread less

Abstract: Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present WARMR, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem. The motivation for this novel approach is twofold. First, exploratory data mining is well supported: WARMR offers the flexibility required to experiment with standard and in particular novel settings not supported by special purpose algorithms. Also, application prototypes based on WARMR can be used as benchmarks in the comparison and evaluation of new special purpose algorithms. Second, the unified representation gives insight to the blurred picture of the frequent pattern discovery domain. Within the DATALOG formulation a number of dimensions appear that relink diverged settings. We demonstrate the frequent query approach and its use on two applications, one in alarm analysis, and one in a chemical toxicology domain.

...read moreread less

330 citations

Knowledge modeling at the millennium : The design and evolution of Protégé-2000

[...]

William Grosso, Henrik Eriksson, Ray W. Fergerson, John H. Gennari, Samson W. Tu, Mark A. Musen - Show less +2 more

01 Jan 1999

TL;DR: An overview of the evolution of Protégé is given, examining the methodological assumptions underlying the original ProtÉgé system and discussing the ways in which the methodology has changed over time.

...read moreread less

Abstract: It has been 13 years since the first version of Protégé was run. The original tool was a small application, aimed mainly at building knowledge-acquisition tools for a few very specialized programs (it grew out of the ONCOCIN project and the subsequent attempts to build expert systems for protocol-based therapy planning). The most recent version, Protégé-2000, incorporates the Open Knowledge Base Connectivity (OKBC) knowledge model, is written to run across a wide variety of platforms, supports customized user-interface extensions, and has been used by over 300 individuals and research groups, most of whom are only peripherally interested in medical informatics. Researchers not directly involved in the project might well wonder how Protégé evolved, what are the reasons for the repeated reimplementations, and how to tell the various versions apart. In this paper, we give an overview of the evolution of Protégé, examining the methodological assumptions underlying the original Protégé system and discussing the ways in which the methodology has changed over time. We conclude with an overview of the latest version of Protégé, Protégé-2000. 1. MOTIVATION AND A TIMELINE The Protégé applications (hereafter ‘Protégé’) are a set of tools that have been evolving for over a decade, from a simple program which helped construct specialized knowledge-bases to a set of general purpose knowledge-base creation and maintenance tools. While Protégé began as a small application designed for a medical domain (protocol-based therapy planning), it has grown and evolved to become a much more general-purpose set of tools for building knowledge-based systems. The original goal of Protégé was to reduce the knowledge-acquisition bottleneck (Hayes-Roth et al, 1983) by minimizing the role of the knowledge-engineer in constructing knowledge-bases. In order to do this, Musen (1988, 1989b) posited that knowledge-acquisition proceeds in welldefined stages and that knowledge acquired in one stage could be used to generate and customize knowledge-acquisition tools for subsequent stages. In (Musen, 1988), Protégé was defined as an application that takes advantage of this structured information to simplify the knowledgeacquisition process. The original Protégé was described this way (Musen, 1988): Protégé is neither an expert system itself nor a program that builds expert systems directly. Instead, Protégé is a tool that helps users build other tools that are custom-tailored to assist with knowledgeacquisition for expert systems in specific application areas. The original Protégé demonstrated the viability of this approach, and of the use of task-specific knowledge to generate and customize knowledge-acquisition tools. But as with many first-

...read moreread less

Journal Article•DOI•

Knowledge discovery

[...]

Toshinori Munakata¹•Institutions (1)

Cleveland State University¹

01 Nov 1999-Communications of The ACM

Proceedings Article•

Finding frequent substructures in chemical compounds

[...]

Luc Dehaspe, Hannu Toivonen, Ross D. King

01 Jan 1999

TL;DR: This paper applies data mining to the problem of predicting chemical carcinogenicity, and presents a knowledge discovery method for structured data, where patterns reflect the one- to-many and many-to-many relationships of several tables.

...read moreread less

Abstract: The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI'97 as a research challenge for artificial intelligence. Our approach to the problem is descriptive rather than based on classification; the goal being to find common substructures and properties in chemical compounds, and in this way to contribute to scientific insight. This approach contrasts with previous machine learning research on this problem, which has mainly concentrated on predicting the toxicity of unknown chemicals. Our contribution to the field of data mining is the ability to discover useful frequent patterns that are beyond the complexity of association rules or their known variants. This is vital to the problem, which requires the discovery of patterns that are out of the reach of simple transformations to frequent itemsets. We present a knowledge discovery method for structured data, where patterns reflect the one-to-many and many-to-many relationships of several tables. Background knowledge, represented in a uniform manner in some of the tables, has an essential role here, unlike in most data mining settings for the discovery of frequent patterns.

...read moreread less

Book•

Neuro-Fuzzy Pattern Recognition: Methods in Soft Computing

[...]

Sankar K. Pal, Sushmita Mitra

01 Sep 1999

TL;DR: The authors consolidate a wealth of information previously scattered in disparate articles, journals, and edited volumes, explaining both the theory of neuro-fuzzy computing and the latest methodologies for performing different pattern recognition tasks in the neuro- fuzzy network.

...read moreread less

Abstract: From the Publisher: The authors consolidate a wealth of information previously scattered in disparate articles, journals, and edited volumes, explaining both the theory of neuro-fuzzy computing and the latest methodologies for performing different pattern recognition tasks in the neuro-fuzzy network - classification, feature evaluation, rule generation, knowledge extraction, and hybridization. Special emphasis is given to the integration of neuro-fuzzy methods with rough sets and genetic algorithms (GAs) to ensure more efficient recognition systems.

...read moreread less

Journal Article•DOI•

Mathematical Programming for Data Mining: Formulations and Challenges

[...]

Paul S. Bradley¹, Usama M. Fayyad¹, Olvi L. Mangasarian²•Institutions (2)

Microsoft¹, University of Wisconsin-Madison²

01 Mar 1999-Informs Journal on Computing

TL;DR: The aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities, and include formulations of the basic categories of data mining methods as optimization problems.

...read moreread less

Abstract: This article is intended to serve as an overview of a rapidly emerging research and applications area. In addition to providing a general overview, motivating the importance of data mining problems within the area of knowledge discovery in databases, our aim is to list some of the pressing research challenges, and outline opportunities for contributions by the optimization research communities. Towards these goals, we include formulations of the basic categories of data mining methods as optimization problems. We also provide examples of successful mathematical programming approaches to some data mining problems.

...read moreread less

Journal Article•

The Role of Classification in Knowledge Representation and Discovery.

[...]

Barbara H. Kwasnik

01 Jun 1999-Library Trends

TL;DR: The strengths and limitations of four classificatory approaches are described in terms of their ability to reflect, discover, and create new knowledge.

...read moreread less

Abstract: THELINK BETWEEN CLASSIFICATION AND KNOWLEDGE is explored. Classification schemes have properties that enable the representation of entities and relationships in structures that reflect knowledge of the domain being classified. The strengths and limitations of four classificatory approaches are described in terms of their ability to reflect, discover, and create new knowledge. These approaches are hierarchies, trees, paradigms, and faceted analysis. Examples are provided of the way in which knowledge and the classification process affect each other.

...read moreread less

Journal Article•DOI•

Constructing knowledge from multivariate spatiotemporal data: integrating geographical visualization with knowledge discovery in database methods

[...]

Alan M. MacEachren, Monica Wachowicz¹, Robert M. Edsall¹, Daniel Haug¹, Raymon Masters¹ - Show less +1 more•Institutions (1)

Pennsylvania State University¹

01 Jun 1999-International Journal of Geographical Information Science

TL;DR: An approach to design of an integrated GVis-KDD environment directed to exploration and discovery in the context of spatiotemporal environmental data, which emphasizes a matching of GVis and KDD meta-operations.

...read moreread less

Abstract: We present an approach to the process of constructing knowledge through structured exploration of large spatiotemporal data sets. First, we introduce our problem context and define both Geographic Visualization (GVis) and Knowledge Discovery in Databases (KDD), the source domains for methods being integrated. Next, we review and compare recent GVis and KDD developments and consider the potential for their integration, emphasizing that an iterative process with user interaction is a central focus for uncovering interest and meaningful patterns through each. We then introduce an approach to design of an integrated GVis-KDD environment directed to exploration and discovery in the context of spatiotemporal environmental data. The approach emphasizes a matching of GVis and KDD meta-operations. Following description of the GVis and KDD methods that are linked in our prototype system, we present a demonstration of the prototype applied to a typical spatiotemporal dataset. We conclude by outlining, briefly, resea...

...read moreread less

Patent•

Method and apparatus for knowledge discovery in databases

[...]

John Duncan Bankier, Charles Allan Beck, Andrew Craig Brind, David John Brown, Kristy Irene Brown, John Dominic Burns, Peter Docherty, John Michael Gilchrist, Timothy Simon Jones, Gordon McIntyre, Alan Ryman, William Wallace - Show less +8 more

26 Aug 1999

TL;DR: In this paper, the authors present a computer-based method and apparatus for knowledge discovery from databases, which involves the user creation of a project plan comprising a plurality of operational components adapted to cooperatively extract desired information from a database.

...read moreread less

Abstract: A computer-based method and apparatus for knowledge discovery from databases. The disclosed method involves the user creation of a project plan comprising a plurality of operational components adapted to cooperatively extract desired information from a database. In one embodiment, the project plan is created within a graphical user interface and consists of objects representing the various functional components of the overall plan interconnected by links representing the flow of data from the data source to a data sink. Data visualization components may be inserted essentially anywhere in the project plan. One or more data links in the project plan may be designated as caching links which maintain copies of the data flowing across them, such that the cached data is available to other components in the project plan. In one embodiment, compression technology is applied to reduce the overall size of the database.

...read moreread less

Journal Article•DOI•

Finding interesting patterns using user expectations

[...]

Bing Liu¹, Wynne Hsu¹, Lai-Fun Mun¹, Hing-Yan Lee•Institutions (1)

National University of Singapore¹

01 Nov 1999-IEEE Transactions on Knowledge and Data Engineering

TL;DR: In this paper, a technique to prevent the user from being overwhelmed by the large number of patterns, techniques are needed to rank them according to their interestingness, called the user-expectation method.

...read moreread less

Abstract: One of the major problems in the field of knowledge discovery (or data mining) is the interestingness problem. Past research and applications have found that, in practice, it is all too easy to discover a huge number of patterns in a database. Most of these patterns are actually useless or uninteresting to the user. But due to the huge number of patterns, it is difficult for the user to comprehend them and to identify those interesting to him/her. To prevent the user from being overwhelmed by the large number of patterns, techniques are needed to rank them according to their interestingness. In this paper, we propose such a technique, called the user-expectation method. In this technique, the user is first asked to provide his/her expected patterns according to his/her past knowledge or intuitive feelings. Given these expectations, the system uses a fuzzy matching technique to match the discovered patterns against the user's expectations, and then rank the discovered patterns according to the matching results. A variety of rankings can be performed for different purposes, such as to confirm the user's knowledge and to identify unexpected patterns, which are by definition interesting. The proposed technique is general and interactive.

...read moreread less

Proceedings Article•DOI•

Discovering association rules based on image content

[...]

Carlos Ordonez¹, Edward Omiecinski•Institutions (1)

Georgia Institute of Technology¹

19 Mar 1999

TL;DR: This paper presents a data mining algorithm to find association rules in 2-dimensional color images to explore the feasibility of this approach and shows that there is promise in image mining based on content.

...read moreread less

Abstract: Our focus for data mining in the paper is concerned with knowledge discovery in image databases. We present a data mining algorithm to find association rules in 2-dimensional color images. The algorithm has four major steps: feature extraction, object identification, auxiliary image creation and object mining. Our emphasis is on data mining of image content without the use of auxiliary domain knowledge. The purpose of our experiments is to explore the feasibility of this approach. A synthetic image set containing geometric shapes was generated to test our initial algorithm implementation. Our experimental results show that there is promise in image mining based on content. We compare these results against the rules obtained from manually identifying the shapes. We analyze the reasons for discrepancies. We also suggest directions for future work.

...read moreread less

Journal Article•DOI•

Unexpectedness as a measure of interestingness in knowledge discovery

[...]

Balaji Padmanabhan¹, Alexander Tuzhilin²•Institutions (2)

University of Pennsylvania¹, New York University²

01 Dec 1999

TL;DR: In this article, the authors focus on generating unexpected patterns with respect to managerial intuition by eliciting managers' beliefs about the domain and using these beliefs to seed the search for unexpected patterns in data, which should lead to the development of decision support systems that provide managers with more relevant patterns from data and aid in effective decision making.

...read moreread less

Abstract: Organizations are taking advantage of “data-mining” techniques to leverage the vast amounts of data captured as they process routine transactions. Data mining is the process of discovering hidden structure or patterns in data. However, several of the pattern discovery methods in data-mining systems have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverage to a full extent valuable prior domain knowledge that managers have. This research addresses these drawbacks by developing ways to generate interesting patterns by incorporating managers' prior knowledge in the process of searching for patterns in data. Specifically, we focus on providing methods that generate unexpected patterns with respect to managerial intuition by eliciting managers' beliefs about the domain and using these beliefs to seed the search for unexpected patterns in data. Our approach should lead to the development of decision-support systems that provide managers with more relevant patterns from data and aid in effective decision making.

...read moreread less

Book•

Reusable Components for Knowledge Modelling: Case Studies in Parametric Design Problem Solving

[...]

Enrico Motta

01 Dec 1999

TL;DR: An in-depth analysis of knowledge modelling technology is provided, illustrating the main tenets of this paradigm, surveying the state-of-the-art and then presenting in detail the application of this technology to parametric design problems.

...read moreread less

Abstract: From the Publisher: Knowledge modelling technologies, such as ontologies and problem solving methods, are gaining popularity, as they enable the development of cost-effective and robust knowledge systems through a reuse-centred process model. In addition, there is also much interest in modelling technology as a means to support the acquisition, organization, formalization and distribution of the knowledge assets of a company. This book provides an in-depth analysis of knowledge modelling technology, illustrating the main tenets of this paradigm, surveying the state-of-the-art and then presenting in detail the application of this technology to parametric design problems. This book will be of interest to readers who wish to learn about the state of the art in knowledge technologies. More specifically, the book offers a theoretical perspective and practical solutions to students, researchers and practitioners working in the areas of knowledge-based systems, knowledge management and design.

...read moreread less

Journal Article•DOI•

Mining very large databases

[...]

Venkatesh Ganti¹, Johannes Gehrke¹, Raghu Ramakrishnan¹•Institutions (1)

University of Wisconsin-Madison¹

01 Aug 1999-IEEE Computer

TL;DR: A broad range of algorithms are described that address three classical data mining problems: market basket analysis, clustering, and classification that are scalable to very large data sets.

...read moreread less

Abstract: Established companies have had decades to accumulate masses of data about their customers, suppliers, products and services, and employees. Data mining, also known as knowledge discovery in databases, gives organizations the tools to sift through these vast data stores to find the trends, patterns, and correlations that can guide strategic decision making. Traditionally, algorithms for data analysis assume that the input data contains relatively few records. Current databases however, are much too large to be held in main memory. To be efficient, the data mining techniques applied to very large databases must be highly scalable. An algorithm is said to be scalable if (given a fixed amount of main memory), its runtime increases linearly with the number of records in the input database. Recent work has focused on scaling data mining algorithms to very large data sets. The authors describe a broad range of algorithms that address three classical data mining problems: market basket analysis, clustering, and classification.

...read moreread less

Journal Article•DOI•

Statistics and data mining: intersecting disciplines

[...]

David J. Hand¹•Institutions (1)

Imperial College London¹

01 Jun 1999-Sigkdd Explorations

TL;DR: The nature of the two disciplines, statistics and data mining, is examined, with emphasis on their similarities and differences.

...read moreread less

Abstract: Statistics and data mining have much in common, but they also have differences. The nature of the two disciplines is examined, with emphasis on their similarities and differences.

...read moreread less

Journal Article•DOI•

Towards a knowledge technology for knowledge management

[...]

Nick Milton¹, Nigel Shadbolt¹, Hugh Cottam¹, Mark Hammersley¹•Institutions (1)

University of Nottingham¹

01 Sep 1999-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: It is believed that the evidence presented here shows that knowledge engineering has much to offer KM and can be the basis on which to move towards a Knowledge Technology.

...read moreread less

Abstract: Knowledge Management (KM) is crucial to organizational survival, yet is a difficult task requiring large expenditure of resources. Information Technology solutions, such as email, document management and intranets, are proving very useful in certain areas. However, many important problems still exist, providing opportunities for new techniques and tools more oriented towards knowledge. We refer to this as Knowledge Technology. A framework has been developed which has allowed opportunities for Knowledge Technology to be identified in support of five key KM activities: personalization, creation/innovation, codification, discovery and capture/monitor. In developing Knowledge Technology for these areas, methods from knowledge engineering are being explored. Our main work in this area has involved the application and evaluation of existing knowledge for a large intranet system. This, and other case studies, have provided important lessons and insights which have led to ongoing research in ontologies, generic models and process modelling methods. We believe that the evidence presented here shows that knowledge engineering has much to offer KM and can be the basis on which to move towards a Knowledge Technology.

...read moreread less

Book Chapter•DOI•

Data Mining and Knowledge Discovery with Emergent Self-Organizing Feature Maps for Multivariate Time Series

[...]

Alfred Ultsch¹•Institutions (1)

University of Marburg¹

01 Jan 1999

TL;DR: All these steps were combined into a system for neuronal data mining which has been applied successfully for knowledge discovery in multivariate time series.

...read moreread less

Abstract: Publisher Summary Data mining aims to discover hitherto unknown knowledge in large datasets. The most important step thereby is the transition from sub-symbolic to symbolic knowledge. Self-organizing feature maps (SOFM), when used appropriately, can exhibit emergent phenomena. SOFM with only a few neurons limit this ability, therefore emergent feature maps need to have thousands of neurons. The structures of emergent feature maps can be visualized using u-matrix methods. U-matrices lead to the construction of self-organizing classifiers possessing the ability to classify new data points. This sub-symbolic knowledge can be converted to a symbolic form which is understandable for humans. All these steps were combined into a system for neuronal data mining. This system has been applied successfully for knowledge discovery in multivariate time series.

...read moreread less

Book•DOI•

Methodologies for Knowledge Discovery and Data Mining

[...]

Ning Zhong, Lizhu Zhou

01 Jan 1999

TL;DR: This paper provides a survey of various data mining techniques for advanced database applications, including association rule generation, clustering and classification, on high dimensional data spaces with large volumes of data.

...read moreread less

Abstract: This paper provides a survey of various data mining techniques for advanced database applications. These include association rule generation, clustering and classification. With the recent increase in large online repositories of information, such techniques have great importance. The focus is on high dimensional data spaces with large volumes of data. The paper discusses past research on the topic and also studies the corresponding algorithms and applications.

...read moreread less

Proceedings Article•

Enterprise expert and knowledge discovery

[...]

David Mattox, Mark T. Maybury, Daryl Morey¹•Institutions (1)

Mitre Corporation¹

22 Aug 1999

TL;DR: Using information retrieval, information extraction, and collaborative filtering techniques, these systems are able to enhance corporate knowledge management by overcoming traditional problems of knowledge acquisition and maintenance and associated (human and financial) costs.

...read moreread less

Abstract: In this paper we describe two systems designed to connect users to distributed, continuously changing experts and their knowledge. Using information retrieval, information extraction, and collaborative filtering techniques, these systems are able to enhance corporate knowledge management by overcoming traditional problems of knowledge acquisition and maintenance and associated (human and financial) costs. We describe the purpose of these two systems, how they work, and current deployment in a global corporate environment to enable end users to directly discover experts and their knowledge.

...read moreread less

Journal Article•DOI•

Meta analysis of classification algorithms for pattern recognition

[...]

So Young Sohn¹•Institutions (1)

Yonsei University¹

01 Nov 1999-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A statistical meta-model is developed which compares the classification performances of several algorithms in terms of data characteristics and is expected to aid decision making processes of finding the best classification tool in the sense of providing the minimum classification error among alternatives.

...read moreread less

Abstract: Various classification algorithms became available due to a surge of interdisciplinary research interests in the areas of data mining and knowledge discovery. We develop a statistical meta-model which compares the classification performances of several algorithms in terms of data characteristics. This empirical model is expected to aid decision making processes of finding the best classification tool in the sense of providing the minimum classification error among alternatives.

...read moreread less

Book Chapter•DOI•

Data Swapping: Balancing Privacy against Precision in Mining for Logic Rules

[...]

Vladimir Estivill-Castro¹, Ljiljana Brankovic¹•Institutions (1)

University of Newcastle¹

01 Sep 1999

TL;DR: It is argued that in data mining the major requirement of security control mechanism is not to ensure precise and bias-free statistics, but rather to preserve the high-level descriptions of knowledge constructed by artificial data mining tools.

...read moreread less

Abstract: The recent proliferation of data mining tools for the analysis of large volumes of data has paid little attention to individual privacy issues. Here, we introduce methods aimed at finding a balance between the individuals' right to privacy and the data-miners' need to find general patterns in huge volumes of detailed records. In particular, we focus on the data-mining task of classification with decision trees. We base our security-control mechanism on noise-addition techniques used in statistical databases because (1) the multidimensional matrix model of statistical databases and the multidimensional cubes of On-Line Analytical Processing (OLAP) are essentially the same, and (2) noise-addition techniques are very robust. The main drawback of noise addition techniques in the context of statistical databases is low statistical quality of released statistics. We argue that in data mining the major requirement of security control mechanism (in addition to protect privacy) is not to ensure precise and bias-free statistics, but rather to preserve the high-level descriptions of knowledge constructed by artificial data mining tools.

...read moreread less

Collapse