scispace - formally typeset
Search or ask a question

Showing papers on "Knowledge extraction published in 2008"


Journal ArticleDOI
15 Oct 2008
TL;DR: KEEL as discussed by the authors is a software tool to assess evolutionary algorithms for data mining problems of various kinds including regression, classification, unsupervised learning, etc., which includes evolutionary learning algorithms based on different approaches: Pittsburgh, Michigan and IRL.
Abstract: This paper introduces a software tool named KEEL which is a software tool to assess evolutionary algorithms for Data Mining problems of various kinds including as regression, classification, unsupervised learning, etc. It includes evolutionary learning algorithms based on different approaches: Pittsburgh, Michigan and IRL, as well as the integration of evolutionary learning techniques with different pre-processing techniques, allowing it to perform a complete analysis of any learning model in comparison to existing software tools. Moreover, KEEL has been designed with a double goal: research and educational.

1,297 citations


Proceedings ArticleDOI
01 Jan 2008
TL;DR: In this paper, the performance of a variety of similarity measures in the context of a specific data mining task is evaluated. But their relative performance has not been evaluated for all types of problems.
Abstract: Measuring similarity or distance between two entities is a key step for several data mining and knowledge discovery tasks. The notion of similarity for continuous data is relatively well-understood, but for categorical data, the similarity computation is not straightforward. Several data-driven similarity measures have been proposed in the literature to compute the similarity between two categorical data instances but their relative performance has not been evaluated. In this paper we study the performance of a variety of similarity measures in the context of a specific data mining task: outlier detection. Results on a variety of data sets show that while no one measure dominates others for all types of problems, some measures are able to have consistently high performance.

554 citations


Journal ArticleDOI
TL;DR: In this paper, a self-supervised learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text.
Abstract: To implement open information extraction, a new extraction paradigm has been developed in which a system makes a single data-driven pass over a corpus of text, extracting a large set of relational tuples without requiring any human input. Using training data, a Self-Supervised Learner employs a parser and heuristics to determine criteria that will be used by an extraction classifier (or other ranking model) for evaluating the trustworthiness of candidate tuples that have been extracted from the corpus of text, by applying heuristics to the corpus of text. The classifier retains tuples with a sufficiently high probability of being trustworthy. A redundancy-based assessor assigns a probability to each retained tuple to indicate a likelihood that the retained tuple is an actual instance of a relationship between a plurality of objects comprising the retained tuple. The retained tuples comprise an extraction graph that can be queried for information.

545 citations


Journal ArticleDOI
TL;DR: A method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence string matching algorithm is presented.
Abstract: We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. We focus on computing the similarity between two sentences or two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.

519 citations


01 Jan 2008
TL;DR: Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches.
Abstract: : Data Clustering: Algorithms and Applications (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series) ISBN : #1466558210 | Date : 2013-08-21 Description : PDF-7caf7 | Research on the problem of clustering tends to be fragmented across the pattern recognition, database, data mining, and machine learning communities. Addressing this problem in a unified way, Data Clustering: Algorithms and Applications provides complete coverage of the entire area of clustering, from basic methods to more refined and complex data clustering approaches. It pays special attention t... Data Clustering: Algorithms and Applications (Chapman & Hall/CRC Data Mining and Knowledge Discovery Series)

273 citations


Journal ArticleDOI
01 Aug 2008
TL;DR: This paper proposes an ID mechanism based on mining SQL queries stored in database audit log files that is able to determine role intruders, that is, individuals while holding a specific role, behave differently than expected.
Abstract: A considerable effort has been recently devoted to the development of Database Management Systems (DBMS) which guarantee high assurance and security. An important component of any strong security solution is represented by Intrusion Detection (ID) techniques, able to detect anomalous behavior of applications and users. To date, however, there have been few ID mechanisms proposed which are specifically tailored to function within the DBMS. In this paper, we propose such a mechanism. Our approach is based on mining SQL queries stored in database audit log files. The result of the mining process is used to form profiles that can model normal database access behavior and identify intruders. We consider two different scenarios while addressing the problem. In the first case, we assume that the database has a Role Based Access Control (RBAC) model in place. Under a RBAC system permissions are associated with roles, grouping several users, rather than with single users. Our ID system is able to determine role intruders, that is, individuals while holding a specific role, behave differently than expected. An important advantage of providing an ID technique specifically tailored to RBAC databases is that it can help in protecting against insider threats. Furthermore, the existence of roles makes our approach usable even for databases with large user population. In the second scenario, we assume that there are no roles associated with users of the database. In this case, we look directly at the behavior of the users. We employ clustering algorithms to form concise profiles representing normal user behavior. For detection, we either use these clustered profiles as the roles or employ outlier detection techniques to identify behavior that deviates from the profiles. Our preliminary experimental evaluation on both real and synthetic database traces shows that our methods work well in practical situations.

205 citations


Journal ArticleDOI
TL;DR: Empirical evaluation showed that the proposed algorithm could generate a better discretization scheme that improved the accuracy of classification and the execution time, number of generated rules, and the training time of C5.0.

202 citations


Book
06 Feb 2008
TL;DR: The GeoPKDD project as mentioned in this paper, a research project for Geographic Privacy-Aware Knowledge Discovery and Delivery (GeoPKDD), is an example of such a project, involving 40 researchers from 7 countries.
Abstract: The technologies of mobile communications and ubiquitous computing pervade our society, and wireless networks sense the movement of people and vehicles, generating large volumes of mobility data. This is a scenario of great opportunities and risks: on one side, mining this data can produce useful knowledge, supporting sustainable mobility and intelligent transportation systems; on the other side, individual privacy is at risk, as the mobility data contain sensitive personal information. A new multidisciplinary research area is emerging at this crossroads of mobility, data mining, and privacy. This book assesses this research frontier from a computer science perspective, investigating the various scientific and technological issues, open problems, and roadmap. The editors manage a research project called GeoPKDD, Geographic Privacy-Aware Knowledge Discovery and Delivery, funded by the EU Commission and involving 40 researchers from 7 countries, and this book tightly integrates and relates their findings in 13 chapters covering all related subjects, including the concepts of movement data and knowledge discovery from movement data; privacy-aware geographic knowledge discovery; wireless network and next-generation mobile technologies; trajectory data models, systems and warehouses; privacy and security aspects of technologies and related regulations; querying, mining and reasoning on spatiotemporal data; and visual analytics methods for movement data. This book will benefit researchers and practitioners in the related areas of computer science, geography, social science, statistics, law, telecommunications and transportation engineering.

198 citations


Journal ArticleDOI
TL;DR: In this article, a rough set approach is proposed to discover classification rules through a process of knowledge induction which selects decision rules with a minimal set of features for classification of real-valued data.

191 citations


Journal ArticleDOI
TL;DR: It is proposed that this type of analysis could potentially be valuable for researchers in any field and presented using text mining to identify clusters and trends of related research topics from three major journals in the management information systems field.
Abstract: Text mining is a semi-automated process of extracting knowledge from a large amount of unstructured data. Given that the amount of unstructured data being generated and stored is increasing rapidly, the need for automated means to process it is also increasing. In this study, we present, discuss and evaluate the techniques used to perform text mining on collections of textual information. A case study is presented using text mining to identify clusters and trends of related research topics from three major journals in the management information systems field. Based on the findings of this case study, it is proposed that this type of analysis could potentially be valuable for researchers in any field.

191 citations


Book ChapterDOI
01 Jan 2008
TL;DR: A new SI technique for partitioning any dataset into an optimal number of groups through one run of optimization is described, which may come with a variety of attributes or features.
Abstract: Clustering aims at representing large datasets by a fewer number of prototypes or clusters. It brings simplicity in modeling data and thus plays a central role in the process of knowledge discovery and data mining. Data mining tasks, in these days, require fast and accurate partitioning of huge datasets, which may come with a variety of attributes or features. This, in turn, imposes severe computational requirements on the relevant clustering techniques. A family of bio-inspired algorithms, well-known as Swarm Intelligence (SI) has recently emerged that meets these requirements and has successfully been applied to a number of real world clustering problems. This chapter explores the role of SI in clustering different kinds of datasets. It finally describes a new SI technique for partitioning any dataset into an optimal number of groups through one run of optimization. Computer simulations undertaken in this research have also been provided to demonstrate the effectiveness of the proposed algorithm.

Book ChapterDOI
01 Jan 2008
TL;DR: This chapter reviews and summarizes existing criteria and metrics in evaluating privacy preserving techniques and provides a comprehensive view on a set of metrics related to existing privacy preserving algorithms so that researchers can gain insights on how to design more effective measurement and PPDM algorithms.
Abstract: The aim of privacy preserving data mining (PPDM) algorithms is to extract relevant knowledge from large amounts of data while protecting at the same time sensitive information. An important aspect in the design of such algorithms is the identification of suitable evaluation criteria and the development of related benchmarks. Recent research in the area has devoted much effort to determine a trade-off between the right to privacy and the need of knowledge discovery. It is often the case that no privacy preserving algorithm exists that outperforms all the others on all possible criteria. Therefore, it is crucial to provide a comprehensive view on a set of metrics related to existing privacy preserving algorithms so that we can gain insights on how to design more effective measurement and PPDM algorithms. In this chapter, we review and summarize existing criteria and metrics in evaluating privacy preserving techniques.

Journal ArticleDOI
TL;DR: The importance of business insiders in the process of knowledge development to make DM more relevant to business is discussed, and a blog‐based model of knowledge sharing system to support the DM process for effective BI is proposed.
Abstract: Purpose – Data mining (DM) has been considered to be a tool of business intelligence (BI) for knowledge discovery. Recent discussions in this field state that DM does not contribute to business in a large‐scale. The purpose of this paper is to discuss the importance of business insiders in the process of knowledge development to make DM more relevant to business.Design/methodology/approach – This paper proposes a blog‐based model of knowledge sharing system to support the DM process for effective BI.Findings – Through an illustrative case study, the paper has demonstrated the usefulness of the model of knowledge sharing system for DM in the dynamic transformation of explicit and tacit knowledge for BI. DM can be an effective BI tool only when business insiders are involved and organizational knowledge sharing is implemented.Practical implications – The structure of blog‐based knowledge sharing systems for DM process can be practically applied to enterprises for BI.Originality/value – The paper suggests th...

Proceedings Article
01 Jan 2008
TL;DR: In this article, a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short, is proposed, which constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others.
Abstract: Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges.

Journal ArticleDOI
TL;DR: This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM, which constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others.
Abstract: Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges.

Patent
17 Nov 2008
TL;DR: In this paper, a method of encoding knowledge is disclosed, which can be used to automatically detect problems in software application deployments, including accessing a source of knowledge describing a problem known to occur in deployments of a particular software application, and which identifies a plurality of conditions associated with the problem.
Abstract: A method of encoding knowledge is disclosed, which can be used to automatically detect problems in software application deployments. The method includes accessing a source of knowledge describing a problem known to occur in deployments of a particular software application, and which identifies a plurality of conditions associated with the problem. An encoded representation of the knowledge source is generated according to a predefined knowledge encoding methodology. The encoded representation is adapted to be applied automatically by a computer to analyze data representing a current state of a monitored deployment of the software application to detect whether the conditions and the problem exist therein. In various implementations, the encoded representation of the knowledge can include queries for deployment information, information concerning the relative importance of the conditions to a detection of the problem, and/or logical constructs for computing a confidence value in the existence of the problem and for determining whether to report the problem if some of the conditions are not true. The knowledge source can comprise a text document (such as a knowledge base article), a flowchart of a diagnostic troubleshooting method, and the like. Also disclosed are methods of at least partially automating the encoding process.

Journal ArticleDOI
TL;DR: The promising experimental performance on goal/corner event detection and sports/commercials/building concepts extraction from soccer videos and TRECVID news collections demonstrates the effectiveness of the proposed framework and indicates the great potential of extending the proposed multimedia data mining framework to a wide range of different application domains.
Abstract: In this paper, a subspace-based multimedia data mining framework is proposed for video semantic analysis, specifically video event/concept detection, by addressing two basic issues, i.e., semantic gap and rare event/concept detection. The proposed framework achieves full automation via multimodal content analysis and intelligent integration of distance-based and rule-based data mining techniques. The content analysis process facilitates the comprehensive video analysis by extracting low-level and middle-level features from audio/visual channels. The integrated data mining techniques effectively address these two basic issues by alleviating the class imbalance issue along the process and by reconstructing and refining the feature dimension automatically. The promising experimental performance on goal/corner event detection and sports/commercials/building concepts extraction from soccer videos and TRECVID news collections demonstrates the effectiveness of the proposed framework. Furthermore, its unique domain-free characteristic indicates the great potential of extending the proposed multimedia data mining framework to a wide range of different application domains.

Book
30 Sep 2008
TL;DR: This compendium of pioneering studies from leading experts is essential to academic reference collections and introduces researchers and students to cutting-edge techniques for gaining knowledge discovery from unstructured text.
Abstract: The massive daily overflow of electronic data to information seekers creates the need for better ways to digest and organize this information to make it understandable and useful. Text mining, a variation of data mining, extracts desired information from large, unstructured text collections stored in electronic forms. The Handbook of Research on Text and Web Mining Technologies is the first comprehensive reference to the state of research in the field of text mining, serving a pivotal role in educating practitioners in the field. This compendium of pioneering studies from leading experts is essential to academic reference collections and introduces researchers and students to cutting-edge techniques for gaining knowledge discovery from unstructured text.

Journal ArticleDOI
01 Jul 2008
TL;DR: By shifting the concept of k-anonymity from the source data to the extracted patterns, this paper formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provides a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns.
Abstract: It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in frequent pattern mining. In this paper we show that this belief is ill-founded. By shifting the concept of k -anonymity from the source data to the extracted patterns, we formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provide a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns. On this basis, we obtain a formal notion of privacy protection that allows the disclosure of the extracted knowledge while protecting the anonymity of the individuals in the source database. Moreover, in order to handle the cases where the threats to anonymity cannot be avoided, we study how to eliminate such threats by means of pattern (not data!) distortion performed in a controlled way.

Book
26 Sep 2008
TL;DR: In the book, the author underlines the importance of approximation spaces in searching for relevant patterns and other granules on different levels of modeling for compound concept approximations.
Abstract: The book "Rough-Granular Computing in Knowledge Discovery and Data Mining" written by Professor Jaroslaw Stepaniuk is dedicated to methods based on a combination of the following three closely related and rapidly growing areas: granular computing, rough sets, and knowledge discovery and data mining (KDD). In the book, the KDD foundations based on the rough set approach and granular computing are discussed together with illustrative applications. In searching for relevant patterns or in inducing (constructing) classifiers in KDD, different kinds of granules are modeled. In this modeling process, granules called approximation spaces play a special rule. Approximation spaces are defined by neighborhoods of objects and measures between sets of objects. In the book, the author underlines the importance of approximation spaces in searching for relevant patterns and other granules on different levels of modeling for compound concept approximations. Calculi on such granules are used for modeling computations on granules in searching for target (sub) optimal granules and their interactions on different levels of hierarchical modeling. The methods based on the combination of granular computing, the rough and fuzzy set approaches allow for an efficient construction of the high quality approximation of compound concepts.

Book
15 Jan 2008
TL;DR: Rough Ethology: Towards a Biologically-Inspired Study of Collective Behavior in Intelligent Systems with Approximation Spaces and Information Granulation.
Abstract: Regular Papers.- Flow Graphs and Data Mining.- The Rough Set Exploration System.- Rough Validity, Confidence, and Coverage of Rules in Approximation Spaces.- Knowledge Extraction from Intelligent Electronic Devices.- Processing of Musical Data Employing Rough Sets and Artificial Neural Networks.- Computational Intelligence in Bioinformatics.- Rough Ethology: Towards a Biologically-Inspired Study of Collective Behavior in Intelligent Systems with Approximation Spaces.- Approximation Spaces and Information Granulation.- The Rough Set Database System: An Overview.- Rough Sets and Bayes Factor.- Formal Concept Analysis and Rough Set Theory from the Perspective of Finite Topological Approximations.- Dissertations and Monographs.- Time Complexity of Decision Trees.

Journal ArticleDOI
TL;DR: The knowledge access control model proposed in this study can facilitate VE Knowledge management and sharing across enterprises, enhance knowledge sharing security and flexibility and regulate knowledge sharing to expeditiously reflect changes in the business environment.

Journal IssueDOI
TL;DR: It is shown that a mutual reinforcement relationship between ranking and Web-snippet clustering does exist, and the better the ranking of the underlying search engines, the more relevant the results from which SnakeT distills the hierarchy of labeled folders, and hence the more useful this hierarchy is to the user.
Abstract: We propose a (meta-)search engine, called SnakeT (SNippet Aggregation for Knowledge ExtracTion), which queries more than 18 commodity search engines and offers two complementary views on their returned results. One is the classical flat-ranked list, the other consists of a hierarchical organization of these results into folders created on-the-fly at query time and labeled with intelligible sentences that capture the themes of the results contained in them. Users can browse this hierarchy with various goals: knowledge extraction, query refinement and personalization of search results. In this novel form of personalization, the user is requested to interact with the hierarchy by selecting the folders whose labels (themes) best fit her query needs. SnakeT then personalizes on-the-fly the original ranked list by filtering out those results that do not belong to the selected folders. Consequently, this form of personalization is carried out by the users themselves and thus results fully adaptive, privacy preserving, scalable and non-intrusive for the underlying search engines. We have extensively tested SnakeT and compared it against the best available Web-snippet clustering engines. SnakeT is efficient and effective, and shows that a mutual reinforcement relationship between ranking and Web-snippet clustering does exist. In fact, the better the ranking of the underlying search engines, the more relevant the results from which SnakeT distills the hierarchy of labeled folders, and hence the more useful this hierarchy is to the user. Vice versa, the more intelligible the folder hierarchy, the more effective the personalization offered by SnakeT on the ranking of the query results. Copyright © 2007 John Wiley & Sons, Ltd. This work was done while the second author was a PhD student at the Dipartimento di Informatica, University of Pisa. The work contains the complete description and a full set of experiments on the software system SnakeT, which was partially published in the Proceedings of the 14th International World Wide Web Conference, Chiba, Japan, 2005

Journal ArticleDOI
01 Mar 2008
TL;DR: A mathematical programming model is proposed that addresses speed and scalability issues in data mining and knowledge discovery and applied the model to Credit Classification Problems and the theoretical relationship between the proposed MCQP model and SVM was discussed.
Abstract: Speed and scalability are two essential issues in data mining and knowledge discovery. This paper proposed a mathematical programming model that addresses these two issues and applied the model to Credit Classification Problems. The proposed Multi-criteria Convex Quadric Programming (MCQP) model is highly efficient (computing time complexity O(n^1^.^5^-^2)) and scalable to massive problems (size of O(10^9)) because it only needs to solve linear equations to find the global optimal solution. Kernel functions were introduced to the model to solve nonlinear problems. In addition, the theoretical relationship between the proposed MCQP model and SVM was discussed.

Proceedings ArticleDOI
24 Aug 2008
TL;DR: Experimental results show that the integration of multiple data sources leads to a considerable improvement in the prediction accuracy, and the proposed algorithm identifies biomarkers that play more significant roles than others in AD diagnosis.
Abstract: Effective diagnosis of Alzheimer's disease (AD) is of primary importance in biomedical research. Recent studies have demonstrated that neuroimaging parameters are sensitive and consistent measures of AD. In addition, genetic and demographic information have also been successfully used for detecting the onset and progression of AD. The research so far has mainly focused on studying one type of data source only. It is expected that the integration of heterogeneous data (neuroimages, demographic, and genetic measures) will improve the prediction accuracy and enhance knowledge discovery from the data, such as the detection of biomarkers. In this paper, we propose to integrate heterogeneous data for AD prediction based on a kernel method. We further extend the kernel framework for selecting features (biomarkers) from heterogeneous data sources. The proposed method is applied to a collection of MRI data from 59 normal healthy controls and 59 AD patients. The MRI data are pre-processed using tensor factorization. In this study, we treat the complementary voxel-based data and region of interest (ROI) data from MRI as two data sources, and attempt to integrate the complementary information by the proposed method. Experimental results show that the integration of multiple data sources leads to a considerable improvement in the prediction accuracy. Results also show that the proposed algorithm identifies biomarkers that play more significant roles than others in AD diagnosis.

Journal ArticleDOI
TL;DR: Relations of attribute reduction between object and property oriented formal concept lattices are discussed and beautiful results are obtained that attribute reducts and attribute characteristics in the two concept lattice are the same based on new approaches to attribute reduction by means of irreducible elements.
Abstract: As one of the basic problems of knowledge discovery and data analysis, knowledge reduction can make the discovery of implicit knowledge in data easier and the representation simpler. In this paper, relations of attribute reduction between object and property oriented formal concept lattices are discussed. And beautiful results are obtained that attribute reducts and attribute characteristics in the two concept lattices are the same based on new approaches to attribute reduction by means of irreducible elements. It turns out to be meaningful and effective in dealing with knowledge reduction, as attribute reducts and attribute characteristics in the object and property oriented formal concept lattices can be acquainted by only investigating one of the two concept lattices.

Journal ArticleDOI
TL;DR: A grey-based rough set approach to deal with the supplier selection in supply chain management takes advantage of mathematical analysis power of grey system theory while at the same time utilizing data mining and knowledge discovery power of rough set theory.
Abstract: In this paper, we propose a grey-based rough set approach to deal with the supplier selection in supply chain management. The proposed approach takes advantage of mathematical analysis power of grey system theory while at the same time utilizing data mining and knowledge discovery power of rough set theory. It is suitable to the decision-making under more uncertain environments. We also provide a viewpoint on the attribute values in rough set decision table under the condition that all alternatives are described by linguistic variables that can be expressed in grey number. The most suitable supplier can be determined by grey relational analysis based on grey number. A case of supplier selection was used to validate the proposed approach.

Journal ArticleDOI
TL;DR: In this paper, strong consistence and weak consistence of decision formal context are defined respectively, the judgment theorems of consistent sets are examined, and approaches to reduction are given.
Abstract: The theory of concept lattices is an efficient tool for knowledge representation and knowledge discovery, and is applied to many fields successfully One focus of knowledge discovery is knowledge reduction Based on the reduction theory of classical formal context, this paper proposes the definition of decision formal context and its reduction theory, which extends the reduction theory of concept lattices In this paper, strong consistence and weak consistence of decision formal context are defined respectively For strongly consistent decision formal context, the judgment theorems of consistent sets are examined, and approaches to reduction are given For weakly consistent decision formal context, implication mapping is defined, and its reduction is studied Finally, the relation between reducts of weakly consistent decision formal context and reducts of implication mapping is discussed

Journal IssueDOI
TL;DR: A new methodology for making easier the design process of interpretable knowledge bases that considers both expert knowledge and knowledge extracted from data, comparable to that achieved by other methodologies is described.
Abstract: This work describes a new methodology for making easier the design process of interpretable knowledge bases. It considers both expert knowledge and knowledge extracted from data. The combination of both kinds of knowledge is likely to yield robust compact systems with a good trade-off between accuracy and interpretability. Fuzzy logic offers an integration framework where both types of knowledge are represented using the same formalism. However, as two knowledge bases may convey contradictions and-or redundancies, the integration process must be made carefully. Results obtained, in four well-known benchmark classification problems, show that our methodology leads to highly interpretable knowledge bases with a good accuracy, comparable to that achieved by other methodologies. © 2008 Wiley Periodicals, Inc.

Journal ArticleDOI
TL;DR: This article suggests four kernels: predicate, walk, dependency and hybrid kernels to adequately encapsulate information required for a relation prediction based on the sentential structures involved in two entities, and views the dependency structure of a sentence as a graph, which allows the system to deal with an essential one from the complex syntactic structure by finding the shortest path between entities.
Abstract: Motivation: Automatic knowledge discovery and efficient information access such as named entity recognition and relation extraction between entities have recently become critical issues in the biomedical literature. However, the inherent difficulty of the relation extraction task, mainly caused by the diversity of natural language, is further compounded in the biomedical domain because biomedical sentences are commonly long and complex. In addition, relation extraction often involves modeling long range dependencies, discontiguous word patterns and semantic relations for which the pattern-based methodology is not directly applicable. Results: In this article, we shift the focus of biomedical relation extraction from the problem of pattern extraction to the problem of kernel construction. We suggest four kernels: predicate, walk, dependency and hybrid kernels to adequately encapsulate information required for a relation prediction based on the sentential structures involved in two entities. For this purpose, we view the dependency structure of a sentence as a graph, which allows the system to deal with an essential one from the complex syntactic structure by finding the shortest path between entities. The kernels we suggest are augmented gradually from the flat features descriptions to the structural descriptions of the shortest paths. As a result, we obtain a very promising result, a 77.5 F-score with the walk kernel on the Language Learning in Logic (LLL) 05 genic interaction shared task. Availability: The used algorithms are free for use for academic research and are available from our Web site http://mllab.sogang.ac.kr/~shkim/LLL05.tar.gz. Contact: shkim@lex.yonsei.ac.kr