scispace - formally typeset
Search or ask a question

Showing papers in "Knowledge and Information Systems in 2002"


Journal ArticleDOI
TL;DR: A novel deviation (or outlier) detection approach, termed FindOut, based on wavelet transform is introduced, which can successfully identify outliers from large datasets.
Abstract: Finding the rare instances or the outliers is important in many KDD (knowledge discovery and data-mining) applications, such as detecting credit card fraud or finding irregularities in gene expressions. Signal-processing techniques have been introduced to transform images for enhancement, filtering, restoration, analysis, and reconstruction. In this paper, we present a new method in which we apply signal-processing techniques to solve important problems in data mining. In particular, we introduce a novel deviation (or outlier) detection approach, termed FindOut, based on wavelet transform. The main idea in FindOut is to remove the clusters from the original data and then identify the outliers. Although previous research showed that such techniques may not be effective because of the nature of the clustering, FindOut can successfully identify outliers from large datasets. Experimental results on very large datasets are presented which show the efficiency and effectiveness of the proposed approach.

213 citations


Journal ArticleDOI
TL;DR: A knowledge management framework that integrates multiple information technologies to collect, analyze, and manage information and knowledge for supporting decision making in HA/DR and can be applied to other similar real-time decision-making environments, such as crisis management and emergency medical assistance.
Abstract: The major challenge in current humanitarian assistance/disaster relief (HA/DR) efforts is that diverse information and knowledge are widely distributed and owned by different organizations. These resources are not efficiently organized and utilized during HA/DR operations. We present a knowledge management framework that integrates multiple information technologies to collect, analyze, and manage information and knowledge for supporting decision making in HA/DR. The framework will help identify the information needs, be aware of a disaster situation, and provide decision-makers with useful relief recommendations based on past experience. A comprehensive, consistent and authoritative knowledge base within the framework will facilitate knowledge sharing and reuse. This framework can also be applied to other similar real-time decision-making environments, such as crisis management and emergency medical assistance.

163 citations


Journal ArticleDOI
TL;DR: This paper presents an approach for dynamically adapting views according to schema changes arising on source relations and provides means to support schema evolution of the data warehouse independently of theData sources.
Abstract: In this paper, we address the issues related to the evolution and maintenance of data warehousing systems, when underlying data sources change their schema capabilities. These changes can invalidate views at the data warehousing system. We present an approach for dynamically adapting views according to schema changes arising on source relations. This type of maintenance concerns both the schema and the data of the data warehouse. The main issue is to avoid the view recomputation from scratch especially when views are defined from multiple sources. The data of the data warehouse is used primarily in organizational decision-making and may be strategic. Therefore, the schema of the data warehouse can evolve for modeling new requirements resulting from analysis or data-mining processing. Our approach provides means to support schema evolution of the data warehouse independently of the data sources.

87 citations


Journal ArticleDOI
TL;DR: The state of the art of agent-mediated electronic commerce (e-commerce), especially in business-to-consumer (B2C) e-commerce and business- to-business (B 2B) e -commerce is surveyed and discussions on the future directions ofAgent-mediated e- commerce are discussed.
Abstract: This paper surveys the state of the art of agent-mediated electronic commerce (e-commerce), especially in business-to-consumer (B2C) e-commerce and business-to-business (B2B) e-commerce. From the consumer buying behaviour perspective, the roles of agents in B2C e-commerce are: product brokering, merchant brokering, and negotiation. The applications of agents in B2B e-commerce are mainly in supply chain management. Mobile agents, evolutionary agents, and data-mining agents are some special techniques which can be applied in agent-mediated e-commerce. In addition, some technologies for implementation are briefly reviewed. Finally, we conclude this paper by discussions on the future directions of agent-mediated e-commerce.

55 citations


Journal ArticleDOI
TL;DR: This paper attempts to synthesize the guidelines and empirical data related to the formatting of screen layouts into a well-defined model and suggests that esthetic characteristics of this model are important to prospective viewers.
Abstract: Gestalt psychologists promulgated the principles of visual organization in the early twentieth century. These principles have been discussed and re-emphasized, and their importance and relevance to user interface design are understood. However, a limited number of systems represent and make adequate use of this knowledge in the form of a design tool that supports certain aspects of the user interface design process. The graphic design rules that these systems use are extremely rudimentary and often vastly oversimplified. Most of them have no concept of design basics such as visual balance or rhythm. In this paper, we attempt to synthesize the guidelines and empirical data related to the formatting of screen layouts into a well-defined model. Fourteen esthetic characteristics have been selected for the purpose. The results of our exercise suggest that these characteristics are important to prospective viewers.

38 citations


Journal ArticleDOI
TL;DR: The conceptual foundations of the integration of data visualization and query processing for knowledge discovery are presented, and a set of query functions for the validation of self-organizing maps in data mining are proposed.
Abstract: In data mining, the usefulness of a data pattern depends on the user of the database and does not solely depend on the statistical strength of the pattern. Based on the premise that heuristic search in combinatorial spaces built on computer and human cognitive theories is useful for effective knowledge discovery, this study investigates how the use of self-organizing maps as a tool of data visualization in data mining plays a significant role in human-computer interactive knowledge discovery. This article presents the conceptual foundations of the integration of data visualization and query processing for knowledge discovery, and proposes a set of query functions for the validation of self-organizing maps in data mining.

26 citations


Journal ArticleDOI
TL;DR: Issues of fast-distributed data mining are investigated, assuming that merging the distributed databases into a single one would either be too costly or the individual fragments would be non-uniform so that mining only one fragment would bias the result (fragmented case).
Abstract: Many successful data-mining techniques and systems have been developed. These techniques usually apply to centralized databases with less restricted requirements on learning and response time. Not so much effort has yet been put into mining distributed databases and real-time issues. In this paper, we investigate issues of fast-distributed data mining. We assume that merging the distributed databases into a single one would either be too costly (distributed case) or the individual fragments would be non-uniform so that mining only one fragment would bias the result (fragmented case). The goal is to classify the objects O of the database into one of several mutually exclusive classes Ci. Our approach to make mining fast and feasible is as follows. From each data site or fragment dbk, only a single rule rik is generated for each class Ci. A small subset {ri1,.....,rih} of these individual rules is selected to form a rule set Ri for each class Ci. These rule subsets represent adequately the hidden knowledge of the entire database. Various selection criteria to form Ri are discussed, both theoretically and experimentally.

26 citations


Journal ArticleDOI
TL;DR: This paper presents a new approach to automatically extract classification knowledge from numerical data by means of premise learning using a genetic algorithm to search for premise structure in combination with parameters of membership functions of input fuzzy sets to yield optimal conditions of classification rules.
Abstract: A key issue in building fuzzy classification systems is the specification of rule conditions, which determine the structure of a knowledge base. This paper presents a new approach to automatically extract classification knowledge from numerical data by means of premise learning. A genetic algorithm is employed to search for premise structure in combination with parameters of membership functions of input fuzzy sets to yield optimal conditions of classification rules. The major advantage of our work is that a parsimonious knowledge base with a low number of rules can be achieved. The practical applicability of the proposed method is examined by computer simulations on two well-known benchmark problems of Iris Data and Cancer Data classification.

24 citations


Journal ArticleDOI
TL;DR: This work proposes a pattern decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass, making it more efficient to mine all frequent patterns in a large dataset.
Abstract: Efficient algorithms to mine frequent patterns are crucial to many tasks in data mining. Since the Apriori algorithm was proposed in 1994, there have been several methods proposed to improve its performance. However, most still adopt its candidate set generation-and-test approach. In addition, many methods do not generate all frequent patterns, making them inadequate to derive association rules. We propose a pattern decomposition (PD) algorithm that can significantly reduce the size of the dataset on each pass, making it more efficient to mine all frequent patterns in a large dataset. The proposed algorithm avoids the costly process of candidate set generation and saves time by reducing the size of the dataset. Our empirical evaluation shows that the algorithm outperforms Apriori by one order of magnitude and is faster than FP-tree algorithm.

19 citations


Journal ArticleDOI
TL;DR: This work proposes a novel approach that the knowledge on attributes relevant to the class is extracted as association rules from the training data, and the new attributes and the values are generated from the association rules among the originally given attributes.
Abstract: A decision tree is considered to be appropriate (1) if the tree can classify the unseen data accurately, and (2) if the size of the tree is small. One of the approaches to induce such a good decision tree is to add new attributes and their values to enhance the expressiveness of the training data at the data pre-processing stage. There are many existing methods for attribute extraction and construction, but constructing new attributes is still an art. These methods are very time consuming, and some of them need a priori knowledge of the data domain. They are not suitable for data mining dealing with large volumes of data. We propose a novel approach that the knowledge on attributes relevant to the class is extracted as association rules from the training data. The new attributes and the values are generated from the association rules among the originally given attributes. We elaborate on the method and investigate its feature. The effectiveness of our approach is demonstrated through some experiments.

15 citations


Journal ArticleDOI
TL;DR: Evaluations of MEDEX performance for both the onset and cessation of winter and summer winds are presented, and it is demonstrated that MEDEX has forecasting skill competitive with the US Navy's regional forecasting center in Rota, Spain.
Abstract: We present a fuzzy expert system, MEDEX, for forecasting gale-force winds in the Mediterranean basin. The most successful local wind forecasting in this region is achieved by an expert human forecaster with access to numerical weather prediction products. That forecaster's knowledge is expressed as a set of 'rules-of-thumb'. Fuzzy set methodologies have proved well suited for encoding the forecaster's knowledge, and for accommodating the uncertainty inherent in the specification of rules, as well as in subjective and objective input. MEDEX uses fuzzy set theory in two ways: as a fuzzy rule base in the expert system, and for fuzzy pattern matching to select dominant wind circulation patterns as one input to the expert system. The system was developed, tuned, and verified over a two-year period, during which the weather conditions from 539 days were individually analyzed. Evaluations of MEDEX performance for both the onset and cessation of winter and summer winds are presented, and demonstrate that MEDEX has forecasting skill competitive with the US Navy's regional forecasting center in Rota, Spain.

Journal ArticleDOI
TL;DR: An architecture and an implementation of a hierarchy reasoner is presented that integrates a class hierarchy, a part hierarchy, and a containment hierarchy into one structure and shows that transitive closure reasoning for combined class/part/containment hierarchies in near constant time is possible for a fixed hardware configuration.
Abstract: Class hierarchies form the backbone of many implemented knowledge representation and reasoning systems. They are used for inheritance, classification and transitive closure reasoning. Part hierarchies are also important in artificial intelligence. Other hierarchies, e.g. containment hierarchies, have received less attention in artificial intelligence. This paper presents an architecture and an implementation of a hierarchy reasoner that integrates a class hierarchy, a part hierarchy, and a containment hierarchy into one structure. In order to make an implemented reasoner useful, it needs to operate at least at speeds comparable to human reasoning. As real-world hierarchies are always large, special techniques need to be used to achieve this. We have developed a set of parallel algorithms and a data representation called maximally reduced tree cover for that purpose. The maximally reduced tree cover is an improvement of a materialized transitive closure representation which has appeared in the literature. Our experiments with a medical vocabulary show that transitive closure reasoning for combined class/part/containment hierarchies in near constant time is possible for a fixed hardware configuration.

Journal ArticleDOI
TL;DR: An induction technique is presented that discovers a set of classification rules, from aSet of examples, using second-order relations as a representational model, and its performance is compared to two state-of-the-art classification systems.
Abstract: This paper presents an induction technique that discovers a set of classification rules, from a set of examples, using second-order relations as a representational model. Second-order relations are database relations in which tuples have sets of atomic values as components. Using sets of values, which are interpreted as disjunctions, provides compact representations that facilitate efficient management and enhance comprehensibility. The second-order relational framework is based on theoretical foundations that link relational database theory, machine learning, and logic synthesis. The rule induction technique can be viewed as a second-order relation compression problem in which the original relation, representing training data, is transformed into a second-order relation with fewer tuples by merging tuples in ways that preserve consistency with the training data. This problem is closely related to two-level Boolean function minimization in logic synthesis. We describe a rule-mining system, SORCER, and compare its performance to two state-of-the-art classification systems: C4.5 and CBA. Experimental results based on the average of error rates ove 26 data sets show that SORCER, using a simple compression scheme, outperforms C4.5 and is competitive to CBA. Using a slightly more sophisticated compression scheme, SORCER outperforms both C4.5 and CBA.

Journal ArticleDOI
TL;DR: A dynamic recognition system founded on two types of learning, where the static aspect of the learning is ensured by lassifiers or systems of lassifier, while the dynamic aspect is translated by the learning of the planning of the various states by a fuzzy Petri net.
Abstract: When involving evolutionary natural objects, the modeling of dynamic classes is the main issue for a pattern recognition system. This problem can be avoided by making dynamic the system of pattern recognition which can then enter into various states according to the evolution of the classes. We propose a dynamic recognition system founded on two types of learning. The static aspect of the learning is ensured by classifiers or systems of classifiers, while the dynamic aspect is translated by the learning of the planning of the various states by a fuzzy Petri net. The method is successfully applied to a synthetic data set.

Journal ArticleDOI
TL;DR: An agent framework is presented and applied to Internet information gathering and a three-tier multi-agent and JAVA-implemented system, which coordinates information-gathering activities using KQML for inter-agent communication, is developed on the basis of the proposed architectural modules.
Abstract: Pragmatic applications and studies of agent-based software engineering have evolved over the last decade. In order to explore how an agent is organized and applied, in this paper an agent framework is presented and applied to Internet information gathering. Agent systems are classified as micro or macro perspectives and agent applications are characterized by the four feature dimensions: behavior (user), knowledge (task), safety (time), and cooperation (social). An agent itself can be modeled according to the information, behavior, and organization aspects of the agent's functional modules as proposed in this paper. A three-tier multi-agent and JAVA-implemented system, which coordinates information-gathering activities using KQML for inter-agent communication, is developed on the basis of the proposed architectural modules. Finally, we explore possible areas for future study.

Journal ArticleDOI
TL;DR: Validation results based on benchmark optimization problems show that the proposed inductive–deductive learning approach is capable of handling different fitness landscapes as well as distributing nondominated solutions uniformly along the final trade-offs in multi-objective optimization, even if there exist many local optima in a high-dimensional search space or the global optimum is outside the predefined search region.
Abstract: Conventional evolutionary algorithms operate in a fixed search space with limiting parameter range, which is often predefined via a priori knowledge or trial and error in order to `guess' a suitable region comprising the global optimal solution. This requirement is hard, if not impossible, to fulfil in many real-world optimization problems since there is often no clue of where the desired solutions are located in these problems. Thus, this paper proposes an inductive---deductive learning approach for single- and multi-objective evolutionary optimization. The method is capable of directing evolution towards more promising search regions even if these regions are outside the initial predefined space. For problems where the global optimum is included in the initial search space, it is capable of shrinking the search space dynamically for better resolution in genetic representation to facilitate the evolutionary search towards more accurate optimal solutions. Validation results based on benchmark optimization problems show that the proposed inductive---deductive learning is capable of handling different fitness landscapes as well as distributing nondominated solutions uniformly along the final trade-offs in multi-objective optimization, even if there exist many local optima in a high-dimensional search space or the global optimum is outside the predefined search region.

Journal ArticleDOI
TL;DR: It is shown how to represent conditional ignorance and informational relevance in the symbolic entropy theory that has been developed in the previous work and some theorems of qualitative reasoning with uncertain knowledge are presented.
Abstract: This paper is devoted to qualitative reasoning under ignorance. We show how to represent conditional ignorance and informational relevance in the symbolic entropy theory that we have developed in our previous work. This theory allows us to represent uncertainty, in the ignorance form, as in common-sense reasoning, by using the linguistic expressions of the interval [Certain, Completely uncertain]. We recall this theory, then we introduce the notions of conditional ignorance and of informational relevance. Finally we present some theorems of qualitative reasoning with uncertain knowledge. Particularly, we show how to extract the best relevant information in order to treat some problems under ignorance.

Journal ArticleDOI
TL;DR: Two algorithms to minimize M in each period are proposed and experiments show that one performs similarly as a theoretical a posteriori algorithm and significantly outperforms the online extensions of two state-of-the-art nearest neighbor search methods.
Abstract: Many data centers have archived a tremendous amount of data and begun to publish them on the Web. Due to limited resources and large amount of service requests, data centers usually do not directly support high-cost queries. On the other hand, users are often overwhelmed by the huge data volume and cannot afford to download the whole data sets and search them locally. To support high-dimensional nearest neighbor searches in this environment, the paper develops a multi-level approximation scheme. The coarsest-level approximations are stored locally and searched first. The result is then refined gradually via accesses to remote data centers. Data centers need only to deliver data items or their precomputed finer level approximations by their identifiers.The searching process is usually long in this environment, since it involves remote sites. This paper describes an online search process: the system periodically reports a data item and a positive integer M. The reported item is guaranteed to be one of the M nearest neighbors of the query one. The paper proposes two algorithms to minimize M in each period. Experiments show that one of them performs similarly as a theoretical a posteriori algorithm and significantly outperforms the online extensions of two state-of-the-art nearest neighbor search methods.

Journal ArticleDOI
Sung I. Yong1, Won Lee1
TL;DR: This paper proposes an effective way to represent the different types of video information in the conventional relational database model and introduces two browsing methods in order to assist a user.
Abstract: Various types of information can be mixed in a continuous video stream without any clear boundary. For this reason, there seems to be no simple solution to support content-based queries in a video database. The meaning of a video scene can be interpreted by multiple levels of abstraction and its description can be varied among different users. Therefore, it is important for a user to be able to describe a scene flexibly while the description given by different users should be maintained consistently. This paper proposes an effective way to represent the different types of video information in the conventional relational database model. Flexibly defined attributes and their values are organized as tree-structured dictionaries while the description of video data is stored in a fixed database schema. In order to assist a user, two browsing methods are introduced. The dictionary browser simplifies the annotation process as well as the querying process of a user while the result browser can help a user analyze the results of a query in terms of various combinations of query conditions.