scispace - formally typeset
Search or ask a question

Showing papers in "Knowledge Based Systems in 2008"


Journal ArticleDOI
TL;DR: Results of comparison show that comparing to other approaches for dealing with incomplete data, these approaches presented in this paper are preferable for reflecting actual states of incomplete data in soft sets.
Abstract: In view of the particularity of the value domains of mapping functions in soft sets, this paper presents data analysis approaches of soft sets under incomplete information. For standard soft sets, the decision value of an object with incomplete information is calculated by weighted-average of all possible choice values of the object, and the weight of each possible choice value is decided by the distribution of other objects. For fuzzy soft sets, incomplete data will be predicted based on the method of average-probability. Results of comparison show that comparing to other approaches for dealing with incomplete data, these approaches presented in this paper are preferable for reflecting actual states of incomplete data in soft sets. At last, an example is provided to illuminate the practicability and validity of the data analysis approach of soft sets under incomplete information.

403 citations


Journal ArticleDOI
Guiwu Wei1
TL;DR: An optimization model based on the maximizing deviation method, by which the attribute weights can be determined, is established and another optimization model is established for the special situations where the information about attribute weights is completely unknown.
Abstract: With respect to multiple attribute decision making problems with intuitionistic fuzzy information, some operational laws of intuitionistic fuzzy numbers, score function and accuracy function of intuitionistic fuzzy numbers are introduced. An optimization model based on the maximizing deviation method, by which the attribute weights can be determined, is established. For the special situations where the information about attribute weights is completely unknown, we establish another optimization model. By solving this model, we get a simple and exact formula, which can be used to determine the attribute weights. We utilize the intuitionistic fuzzy weighted averaging (IFWA) operator to aggregate the intuitionistic fuzzy information corresponding to each alternative, and then rank the alternatives and select the most desirable one(s) according to the score function and accuracy function. Finally, an illustrative example is given to verify the developed approach and to demonstrate its practicality and effectiveness.

290 citations


Journal ArticleDOI
TL;DR: A practical method is proposed to implement the multi-word extraction from documents based on the syntactical structure and two strategies as general concept representation and subtopic representation are presented to represent the documents using the extracted multi-words to investigate the effectiveness of using multi- words for text representation on the performances of text classification.
Abstract: One of the main themes which support text mining is text representation; that is, its task is to look for appropriate terms to transfer documents into numerical vectors. Recently, many efforts have been invested on this topic to enrich text representation using vector space model (VSM) to improve the performances of text mining techniques such as text classification and text clustering. The main concern in this paper is to investigate the effectiveness of using multi-words for text representation on the performances of text classification. Firstly, a practical method is proposed to implement the multi-word extraction from documents based on the syntactical structure. Secondly, two strategies as general concept representation and subtopic representation are presented to represent the documents using the extracted multi-words. In particular, the dynamic k-mismatch is proposed to determine the presence of a long multi-word which is a subtopic of the content of a document. Finally, we carried out a series of experiments on classifying the Reuters-21578 documents using the representations with multi-words. We used the performance of representation in individual words as the baseline, which has the largest dimension of feature set for representation without linguistic preprocessing. Moreover, linear kernel and non-linear polynomial kernel in support vector machines (SVM) are examined comparatively for classification to investigate the effect of kernel type on their performances. Index terms with low information gain (IG) are removed from the feature set at different percentages to observe the robustness of each classification method. Our experiments demonstrate that in multi-word representation, subtopic representation outperforms the general concept representation and the linear kernel outperforms the non-linear kernel of SVM in classifying the Reuters data. The effect of applying different representation strategies is greater than the effect of applying the different SVM kernels on classification performance. Furthermore, the representation using individual words outperforms any representation using multi-words. This is consistent with the major opinions concerning the role of linguistic preprocessing on documents' features when using SVM for text classification.

239 citations


Journal ArticleDOI
TL;DR: A greedy attribute reduction algorithm is constructed based on Pawlak's rough set model, where the objects with numerical attributes are granulated with @d neighborhood relations or k-nearest-neighbor relations, while objects with categorical features are granulation with equivalence relations.
Abstract: Feature subset selection presents a common challenge for the applications where data with tens or hundreds of features are available. Existing feature selection algorithms are mainly designed for dealing with numerical or categorical attributes. However, data usually comes with a mixed format in real-world applications. In this paper, we generalize Pawlak's rough set model into @d neighborhood rough set model and k-nearest-neighbor rough set model, where the objects with numerical attributes are granulated with @d neighborhood relations or k-nearest-neighbor relations, while objects with categorical features are granulated with equivalence relations. Then the induced information granules are used to approximate the decision with lower and upper approximations. We compute the lower approximations of decision to measure the significance of attributes. Based on the proposed models, we give the definition of significance of mixed features and construct a greedy attribute reduction algorithm. We compare the proposed algorithm with others in terms of the number of selected features and classification performance. Experiments show the proposed technique is effective.

214 citations


Journal ArticleDOI
TL;DR: A data mining method combining attribute-oriented induction, information gain, and decision tree, which is suitable for preprocessing financial data and constructing decision tree model for financial distress prediction is put forward.
Abstract: Data mining technique is capable of mining valuable knowledge from large and changeable database. This paper puts forward a data mining method combining attribute-oriented induction, information gain, and decision tree, which is suitable for preprocessing financial data and constructing decision tree model for financial distress prediction. On the base of financial ratios attributes and one class attribute, adopting entropy-based discretization method, a data mining model for listed companies' financial distress prediction is designed. The empirical experiment with 35 financial ratios and 135 pairs of listed companies as initial samples got satisfying result, which testifies the feasibility and validity of the proposed data mining method for listed companies' financial distress prediction.

159 citations


Journal ArticleDOI
TL;DR: Based on a review of the literature on the knowledge management in enterprise system implementation projects, two major areas of concern are identified regarding the management of knowledge in this specific type of projects: managing tacit knowledge, and issues regarding the process-based nature of organizational knowledge viewed through the lens of organizational memory.
Abstract: Special attention to critical success factors in the implementation of Enterprise Resource Planning systems is evident from the bulk of literature on this issue. In order to implement these systems that are aimed at improving the sharing of enterprise-wide information and knowledge, organizations must have the capability of effective knowledge sharing to start with. Based on a review of the literature on the knowledge management in enterprise system implementation projects, this paper identifies two major areas of concern regarding the management of knowledge in this specific type of projects: managing tacit knowledge, and issues regarding the process-based nature of organizational knowledge viewed through the lens of organizational memory. The more capable an organization is in handling these issues, the more likely it is that the implementation will result in competitive advantage for the organization. The competitive advantage arises from the organization's capabilities in internalizing and integrating the adopted processes with the existing knowledge paradigms and harmonizing the new system and the organizational culture towards getting the most out of the implementation effort.

135 citations


Journal ArticleDOI
TL;DR: Empirical results indicate that ROCBR outperforms ECBR, MCBR, ICBR, MDA, and Logit significantly in financial distress prediction of Chinese listed companies 1 year prior to distress, if irrelevant information among features has been handled effectively.
Abstract: This paper addresses a new method of financial distress prediction using case-based reasoning (CBR) with financial ratios derived from financial statements. The aim of this work presented here is threefold. First, we make a brief review on financial distress prediction from the view of categories of the earliest applied models, models that generate If-Then rules, the most widely applied models historically, the most hotly researched models recently, and the most potential models. On the other hand, we make use of ranking-order information of distance between target case and each historical case on each feature to generate similarities between pairwise cases. The similarity between two cases on each feature is calculated by corresponding ranking-order information of distance in the first place, followed by a weighted integration to generate the final similarity between two cases. The CBR system that employs the new similarity measure model in the frame of k-nearest neighbor (k-NN) is named as ranking-order case-based reasoning (ROCBR). At the same time, we introduce ROCBR in financial distress prediction, and analyze the obtained results of financial distress prediction of Chinese listed companies, comparing them with those provided by the other three well-known CBR models with Euclidean distance, Manhuttan distance, and inductive approach as its heart of retrieval. The three compared CBR models are called as ECBR, MCBR, and ICBR, respectively. The two famous statistical models of logistic regression (Logit) and multi-variant discriminate analysis (MDA) are also employed for a comparison. The financial distress dataset used in the experiments come from Shanghai Stock Exchange and Shenzhen Stock Exchange. Empirical results indicate that ROCBR outperforms ECBR, MCBR, ICBR, MDA, and Logit significantly in financial distress prediction of Chinese listed companies 1 year prior to distress, if irrelevant information among features has been handled effectively.

131 citations


Journal ArticleDOI
TL;DR: A post-aggregation strategy with an overall rank loss function is proposed to arrive at candidate materials that show the most stable and the highest ranks in a list of given candidates.
Abstract: Towards the end of a design process designers may face a number of candidate materials with different attributes that are difficult to distinguish with the aid of available databases In such situations material selection of sensitive components is perhaps one of the most challenging problems in the design of structural elements in some industries such as aerospace The selection process is often realized as a team-work task to enhance reliability of the chosen material During group decision making, however, separations in design preferences can be encountered Furthermore, there may be uncertainties in each designer's mind with regards to expressing his/her preferences over design criteria This paper using a revised Simos' method with the ELECTRE III optimization model is an attempt to provide a decision aid framework that account for both of these effects A post-aggregation strategy with an overall rank loss function is proposed to arrive at candidate materials that show the most stable and the highest ranks in a list of given candidates To show the applicability of the approach, a sample case study in the material selection of a thermal loaded conductor cover sheet is conducted and validated by an available database It is also illustrated that, using ELECTRE III, a non-compensatory aspect of material selection can be assumed while attaining a reasonable sensitivity to weight fluctuations

130 citations


Journal ArticleDOI
TL;DR: It is proved that frequent itemsets, including representative item and having the same supports as representative item, can be identified directly by connecting the representative item with all the combinations of items in its subsume index, so the cost for processing this kind of itemsets is lowered, and the efficiency is improved.
Abstract: Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a BitTable structure. BitTableFI is such a recently proposed efficient BitTable-based algorithm, which exploits BitTable both horizontally and vertically. Although making use of efficient bit wise operations, BitTableFI still may suffer from the high cost of candidate generation and test. To address this problem, a new algorithm Index-BitTableFI is proposed. Index-BitTableFI also uses BitTable horizontally and vertically. To make use of BitTable horizontally, index array and the corresponding computing method are proposed. By computing the subsume index, those itemsets that co-occurrence with representative item can be identified quickly by using breadth-first search at one time. Then, for the resulting itemsets generated through the index array, depth-first search strategy is used to generate all other frequent itemsets. Thus, the hybrid search is implemented, and the search space is reduced greatly. The advantages of the proposed methods are as follows. On the one hand, the redundant operations on intersection of tidsets and frequency-checking can be avoided greatly; On the other hand, it is proved that frequent itemsets, including representative item and having the same supports as representative item, can be identified directly by connecting the representative item with all the combinations of items in its subsume index. Thus, the cost for processing this kind of itemsets is lowered, and the efficiency is improved. Experimental results show that the proposed algorithm is efficient especially for dense datasets.

124 citations


Journal ArticleDOI
Zeshui Xu1
TL;DR: The concept of dynamic weighted averaging (DWA) operator is defined, and some methods are introduced to obtain the weights associated with the DWA operator to solve the MP-MADM problems where all the attribute values provided at different periods are expressed in interval numbers.
Abstract: Multiple attribute decision making (MADM) is an important part of modern decision science. It has been extensively applied to various areas such as society, economics, military, management, etc., and has been receiving more and more attention over the last decades. To date, however, most research has focused on single-period multi-attribute decision making in which all the original decision information is given at the same period, and a number of methods have been proposed to solve this kind of problems. This paper is devoted to investigating the multi-period multi-attribute decision making (MP-MADM) problems where the decision information (including attribute weights and attribute values) are provided by decision maker(s) at different periods. We define the concept of dynamic weighted averaging (DWA) operator, and introduce some methods, such as the arithmetic series based method, geometric series based method and normal distribution based method, to obtain the weights associated with the DWA operator. Based on the DWA operator, we develop an approach to MP-MADM. Moreover, we extend the DWA operator and the developed approach to solve the MP-MADM problems where all the attribute values provided at different periods are expressed in interval numbers, and use a possibility-degree formula to rank and select the given alternatives.

124 citations


Journal ArticleDOI
TL;DR: This paper proposes a personalization strategy that overcomes drawbacks in recommender systems by applying inference techniques borrowed from the Semantic Web, and illustrates its use in AVATAR, a tool that selects appealing audiovisual programs from among the myriad available in Digital TV.
Abstract: Recommender systems arose with the goal of helping users search in overloaded information domains (like e-commerce, e-learning or Digital TV). These tools automatically select items (commercial products, educational courses, TV programs, etc.) that may be appealing to each user taking into account his/her personal preferences. The personalization strategies used to compare these preferences with the available items suffer from well-known deficiencies that reduce the quality of the recommendations. Most of the limitations arise from using syntactic matching techniques because they miss a lot of useful knowledge during the recommendation process. In this paper, we propose a personalization strategy that overcomes these drawbacks by applying inference techniques borrowed from the Semantic Web. Our approach reasons about the semantics of items and user preferences to discover complex associations between them. These semantic associations provide additional knowledge about the user preferences, and permit the recommender system to compare them with the available items in a more effective way. The proposed strategy is flexible enough to be applied in many recommender systems, regardless of their application domain. Here, we illustrate its use in AVATAR, a tool that selects appealing audiovisual programs from among the myriad available in Digital TV.

Journal ArticleDOI
TL;DR: A conversational agent, or ''chatbot'' has been developed to allow the learner to negotiate over the representations held about them using natural language, to support the metacognitive goals of self-assessment and reflection, which are increasingly seen as key to learning and are being incorporated into UK educational policy.
Abstract: This paper describes a system which incorporates natural language technologies, database manipulation and educational theories in order to offer learners a Negotiated Learner Model, for integration into an Intelligent Tutoring System. The system presents the learner with their learner model, offering them the opportunity to compare their own beliefs regarding their capabilities with those inferred by the system. A conversational agent, or ''chatbot'' has been developed to allow the learner to negotiate over the representations held about them using natural language. The system aims to support the metacognitive goals of self-assessment and reflection, which are increasingly seen as key to learning and are being incorporated into UK educational policy. The paper describes the design of the system, and reports a user trial, in which the chatbot was found to support users in increasing the accuracy of their self-assessments, and in reducing the number of discrepancies between system and user beliefs in the learner model. Some lessons learned in the development have been highlighted and future research and experimentation directions are outlined.

Journal ArticleDOI
TL;DR: Experimental results show that the models using MBPNN outperform than the basic BPNN and the application of LSA for this system can lead to dramatic dimensionality reduction while achieving good classification results.
Abstract: New text categorization models using back-propagation neural network (BPNN) and modified back-propagation neural network (MBPNN) are proposed. An efficient feature selection method is used to reduce the dimensionality as well as improve the performance. The basic BPNN learning algorithm has the drawback of slow training speed, so we modify the basic BPNN learning algorithm to accelerate the training speed. The categorization accuracy also has been improved consequently. Traditional word-matching based text categorization system uses vector space model (VSM) to represent the document. However, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which can also lead to poor classification accuracy. Latent semantic analysis (LSA) can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimensionality but also discovers the important associative relationship between terms. We test our categorization models on 20-newsgroup data set, experimental results show that the models using MBPNN outperform than the basic BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

Journal ArticleDOI
TL;DR: In this paper, a method for measuring the similarity of FCA concepts is presented, which is a refinement of a previous proposal of the author, which consists in determining similarity of concept descriptors (attributes) by using the information content approach, rather than relying on human domain expertise.
Abstract: Formal Concept Analysis (FCA) is revealing interesting in supporting difficult activities that are becoming fundamental in the development of the Semantic Web. Assessing concept similarity is one of such activities since it allows the identification of different concepts that are semantically close. In this paper, a method for measuring the similarity of FCA concepts is presented, which is a refinement of a previous proposal of the author. The refinement consists in determining the similarity of concept descriptors (attributes) by using the information content approach, rather than relying on human domain expertise. The information content approach which has been adopted allows a higher correlation with human judgement than other proposals for evaluating concept similarity in a taxonomy defined in the literature.

Journal ArticleDOI
TL;DR: Relations of attribute reduction between object and property oriented formal concept lattices are discussed and beautiful results are obtained that attribute reducts and attribute characteristics in the two concept lattice are the same based on new approaches to attribute reduction by means of irreducible elements.
Abstract: As one of the basic problems of knowledge discovery and data analysis, knowledge reduction can make the discovery of implicit knowledge in data easier and the representation simpler. In this paper, relations of attribute reduction between object and property oriented formal concept lattices are discussed. And beautiful results are obtained that attribute reducts and attribute characteristics in the two concept lattices are the same based on new approaches to attribute reduction by means of irreducible elements. It turns out to be meaningful and effective in dealing with knowledge reduction, as attribute reducts and attribute characteristics in the object and property oriented formal concept lattices can be acquainted by only investigating one of the two concept lattices.

Journal ArticleDOI
TL;DR: Five different user models are compared by means of both supervised and unsupervised learning techniques, namely the multilayer perceptron and the hierarchical agglomerative clustering to identify the user model that best identifies fraud cases.
Abstract: This paper investigates the usefulness of applying different learning approaches to a problem of telecommunications fraud detection. Five different user models are compared by means of both supervised and unsupervised learning techniques, namely the multilayer perceptron and the hierarchical agglomerative clustering. One aim of the study is to identify the user model that best identifies fraud cases. The second task is to explore different views of the same problem and see what can be learned form the application of each different technique. All data come from real defrauded user accounts in a telecommunications network. The models are compared in terms of their performances. Each technique's outcome is evaluated with appropriate measures.

Journal ArticleDOI
TL;DR: Experimental results show that NN classifier is unsuitable for using alone as a spam rejection tool, and SVM and RVM are more suitable than SVM for spam classification in terms of the applications that require low complexity.
Abstract: The growth of email users has resulted in the dramatic increasing of the spam emails during the past few years. In this paper, four machine learning algorithms, which are Naive Bayesian (NB), neural network (NN), support vector machine (SVM) and relevance vector machine (RVM), are proposed for spam classification. An empirical evaluation for them on the benchmark spam filtering corpora is presented. The experiments are performed based on different training set size and extracted feature size. Experimental results show that NN classifier is unsuitable for using alone as a spam rejection tool. Generally, the performances of SVM and RVM classifiers are obviously superior to NB classifier. Compared with SVM, RVM is shown to provide the similar classification result with less relevance vectors and much faster testing time. Despite the slower learning procedure, RVM is more suitable than SVM for spam classification in terms of the applications that require low complexity.

Journal ArticleDOI
TL;DR: A framework for sports prediction using Bayesian inference and rule-based reasoning, together with an in-game time-series approach to predict sports matches is proposed, which enables the framework to reflect the tides/flows of a sports match, making predictions certainly more realistic, and somewhat more accurate.
Abstract: We propose a framework for sports prediction using Bayesian inference and rule-based reasoning, together with an in-game time-series approach. The framework is novel in three ways. The framework consists of two major components: a rule-based reasoner and a Bayesian network component. The two different approaches cooperate in predicting the results of sports matches. It is motivated by the observation that sports matches are highly stochastic, but at the same time, the strategies of a team can be approximated by crisp logic rules. Furthermore, because of the rule-based component, our framework can give reasonably good predictions even when statistical data is scanty: it can be used to predict results of matches between teams which have had few previous encounters. Machine learning techniques have great difficulty in handling such situations of insufficient data. Second, our framework is able to consider many factors, such as current scores, morale, fatigue, skills, etc. when it predicts the results of sports matches: most previous work considered only one factor, usually the score. Third, in contrast to most previous work on sports results prediction, we use a knowledge-based in-game time-series approach to predict sports matches. This approach enables our framework to reflect the tides/flows of a sports match, making our predictions certainly more realistic, and somewhat more accurate. We have implemented a football results predictor called FRES (Football Result Expert System) based on this framework, and show that it gives reasonable and stable predictions.

Journal ArticleDOI
TL;DR: This work proposes a novel hybrid recommendation approach to address the well-known cold-start problem in Collaborative Filtering that makes use of Cross-Level Association RulEs (CLARE) to integrate content information about domain items into collaborative filters.
Abstract: We propose a novel hybrid recommendation approach to address the well-known cold-start problem in Collaborative Filtering (CF). Our approach makes use of Cross-Level Association RulEs (CLARE) to integrate content information about domain items into collaborative filters. We first introduce a preference model comprising both user-item and item-item relationships in recommender systems, and present a motivating example of our work based on the model. We then describe how CLARE generates cold-start recommendations. We empirically evaluated the effectiveness of CLARE, which shows superior performance to related work in addressing the cold-start problem.

Journal ArticleDOI
TL;DR: In this system, the multimedia content description interface (MPEG-7) image feature descriptors consisting of color descriptors, texture descriptors and shape descriptors are employed to represent low-level image features and a bi-coded chromosome genetic algorithm is used for the simultaneity of weight optimization and descriptor subset selection.
Abstract: Machine learning techniques for feature selection, which include the optimization of feature descriptor weights and the selection of optimal feature descriptor subset, are desirable to enhance the performance of image annotation systems. In our system, the multimedia content description interface (MPEG-7) image feature descriptors consisting of color descriptors, texture descriptors and shape descriptors are employed to represent low-level image features. We use a real coded chromosome genetic algorithm and k-nearest neighbor (k-NN) classification accuracy as fitness function to optimize the weights of MPEG-7 image feature descriptors. A binary one and k-NN classification accuracy combining with the size of feature descriptor subset as fitness function are used to select optimal MPEG-7 feature descriptor subset. Furthermore, a bi-coded chromosome genetic algorithm is used for the simultaneity of weight optimization and descriptor subset selection, whose fitness function is the same as that of the binary one. The experimental results over 2000 classified Corel images show that with the real coded genetic algorithm, the binary coded one and the bi-coded one, the accuracies of image annotation system are improved by 7%, 9% and 13.6%, respectively, comparing to the method without machine learning. Furthermore, 2 of 25 MPEG-7 feature descriptors are selected with the binary coded genetic algorithm and four with the bi-coded one, which may improve the efficiency of system significantly.

Journal ArticleDOI
TL;DR: A new framework for mining weighted frequent patterns in which weight constraints are deeply pushed in sequential pattern mining is suggested, which shows that WSpan detects fewer but important weighted sequential patterns in large sequence databases even with a low minimum threshold.
Abstract: Sequential pattern mining is an essential research topic with broad applications which discovers the set of frequent subsequences satisfying a support threshold in a sequence database. The major problems of mining sequential patterns are that a huge set of sequential patterns are generated and the computation time is so high. Although efficient algorithms have been developed to tackle these problems, the performance of the algorithms dramatically degrades in case of mining long sequential patterns in dense databases or using low minimum supports. In addition, the algorithms may reduce the number of patterns but unimportant patterns are still found in the result patterns. It would be better if the unimportant patterns could be pruned first, resulting in fewer but important patterns after mining. In this paper, we suggest a new framework for mining weighted frequent patterns in which weight constraints are deeply pushed in sequential pattern mining. Previous sequential mining algorithms treat sequential patterns uniformly while real sequential patterns have different importance. In our approach, the weights of items are given according to the priority or importance. During the mining process, we consider not only supports but also weights of patterns. Based on the framework, we present a weighted sequential pattern mining algorithm (WSpan). To our knowledge, this is the first work to mine weighted sequential patterns. The experimental results show that WSpan detects fewer but important weighted sequential patterns in large sequence databases even with a low minimum threshold.

Journal ArticleDOI
TL;DR: Experimental results performed on five real-world datasets demonstrate the effectiveness of the proposed 2dSVD, an extension of standard SVD that captures explicitly the two-dimensional nature of MTS samples.
Abstract: Multivariate time series (MTS) are used in very broad areas such as multimedia, medicine, finance and speech recognition. A new approach for MTS classification using two-dimensional singular value decomposition (2dSVD) is proposed. 2dSVD is an extension of standard SVD, it captures explicitly the two-dimensional nature of MTS samples. The eigenvectors of row-row and column-column covariance matrices of MTS samples are computed for feature extraction. After the feature matrix is obtained for each MTS sample, one-nearest-neighbor classifier is used for MTS classification. Experimental results performed on five real-world datasets demonstrate the effectiveness of our proposed approach.

Journal ArticleDOI
TL;DR: An ontology-based platform for acquainting the most relevant users, according to their context, is proposed and two kinds of contexts with semantic information derived from ontologies are modeled; personal context, and consensual context, integrated from several personal contexts.
Abstract: To efficiently support collaborations between people (agents) in real-time, we propose an ontology-based platform for acquainting the most relevant users (e.g., colleagues and classmates), according to their context. Thereby, we modeled two kinds of contexts with semantic information derived from ontologies; (i) personal context, and (ii) consensual context, integrated from several personal contexts. More importantly, we formulate measurement criteria to compare them. Consequently, groups can be dynamically organized with respect to the similarities among several aspects of personal context. In particular, users can engage in complex collaborations related to multiple semantics. For experimentation, we implemented a social browsing system based on context synchronization.

Journal ArticleDOI
TL;DR: Research shows that 80% of offender characteristics are predicted correctly on average in new single-victim homicides, and when confidence levels are taken into account this accuracy increases to 95.6%.
Abstract: The increased availability of information technologies has enabled law enforcement agencies to compile databases with detailed information about major felonies. Machine learning techniques can utilize these databases to produce decision-aid tools to support police investigations. This paper presents a methodology for obtaining a Bayesian network (BN) model of offender behavior from a database of cleared homicides. The BN can infer the characteristics of an unknown offender from the crime scene evidence, and help narrow the list of suspects in an unsolved homicide. Our research shows that 80% of offender characteristics are predicted correctly on average in new single-victim homicides, and when confidence levels are taken into account this accuracy increases to 95.6%.

Journal ArticleDOI
TL;DR: The multi-attribute decision making problems are studied, in which the information about the attribute values take the form of uncertain linguistic variables, and an optimization model is established to determine the attribute weights, and a method based on possibility degree is given to rank the alternatives.
Abstract: The multi-attribute decision making problems are studied, in which the information about the attribute values take the form of uncertain linguistic variables. The concept of deviation degree between uncertain linguistic variables is defined, and ideal point of uncertain linguistic decision making matrix is also defined. A formula of possibility degree for the comparison between uncertain linguistic variables is proposed. Based on the deviation degree and ideal point of uncertain linguistic variables, an optimization model is established, by solving the model, a simple and exact formula is derived to determine the attribute weights where the information about the attribute weights is completely unknown. For the information about the attribute weights is partly known, another optimization model is established to determine the weights, and then to aggregate the given uncertain linguistic decision information, respectively. A method based on possibility degree is given to rank the alternatives. Finally, an illustrative example is also given.

Journal ArticleDOI
TL;DR: This paper introduces a novel k-motif-based algorithm that can solve the existing problem and provide a way to generate the original patterns by summarizing the discovered motifs.
Abstract: Finding previously unknown patterns in a time series has received much attention in recent years. Of the associated algorithms, the k-motif algorithm is one of the most effective and efficient. It is also widely used as a time series preprocessing routine for many other data mining tasks. However, the k-motif algorithm depends on the predefine of the parameter w, which is the length of the pattern. This paper introduces a novel k-motif-based algorithm that can solve the existing problem and, moreover, provide a way to generate the original patterns by summarizing the discovered motifs.

Journal ArticleDOI
TL;DR: The proposed algorithm, A-NSGAII, was shown to produce acceptable and robust solutions in the tested applications, where state-of-art algorithms and circuit designers failed.
Abstract: The increasing complexity of circuit design needs to be managed with appropriate optimization algorithms and accurate statistical description of design models in order to reach the design specifics, guaranteeing ''zero defects''. In the Design for Yield open problems are the design of effective optimization algorithms and statistical analysis for yield design, which require time consuming techniques. New methods have to balance accuracy, robustness and computational effort. Typical analog integrated circuit optimization problems are computationally hard and require the handling of multiple, conflicting, and non-commensurate objectives having strong nonlinear interdependence. This paper tackles the problem by evolutionary algorithms to produce tradeoff solutions on the Pareto Front. In this research work Integrated Circuit (IC) design has been formulated as a constrained multi-objective optimization problem defined in a mixed integer/discrete/continuous domain. The following real-life circuits, RF Low Noise Amplifier, LeapFrog Filter, and Ultra Wideband LNA, were selected as test bed. The proposed algorithm, A-NSGAII, was shown to produce acceptable and robust solutions in the tested applications, where state-of-art algorithms and circuit designers failed. The results show significant improvement in all the chosen IC design problems.

Journal ArticleDOI
TL;DR: The KDSS system provides not only company’s various financial data query, but also enterprise performance based on knowledge reasoning, which integrates a database, a knowledge base, an inference engine, and a model base.
Abstract: This paper presents a knowledge-based decision support system for measuring enterprise performance. The KDSS system provides not only company’s various financial data query, but also enterprise performance based on knowledge reasoning. Additionally, an artificial neural network is adopted to predict future total sales. The system integrates a database, a knowledge base, an inference engine, and a model base. It can offer a wide range of different queries and all rules in the knowledge base are explained in detail to illustrate the process of reasoning. Meanwhile, in order to reduce subjective judgment on performance measurement, a group assessment is used to assess the scores of each dimension for measuring enterprise performance. Finally, the result of enterprise performance evaluation is presented and some suggestions are given to managers for making decisions.

Journal ArticleDOI
TL;DR: An access sequence miner to mine popular surfing 2-sequences with their conditional probabilities from the proxy log and stored them in the rule table, and a prediction-based buffer manager developed here will make appropriate actions such as document caching, document prefetching, and even cache/prefetch buffer size adjusting to achieve better buffer utilization.
Abstract: In the Internet, proxy servers play the key roles between users and web sites, which could reduce the response time of user requests and save network bandwidth. Basically, an efficient buffer manager should be built in a proxy server to cache frequently accessed documents in the buffer, thereby achieving better response time. In the paper, we developed an access sequence miner to mine popular surfing 2-sequences with their conditional probabilities from the proxy log, and stored them in the rule table. Then, according to buffer contents and the rule table, a prediction-based buffer manager also developed here will make appropriate actions such as document caching, document prefetching, and even cache/prefetch buffer size adjusting to achieve better buffer utilization. Through the simulation, we found that our approach has much better performance than the other ones, in the quantitative measures such as hit ratios and byte hit ratios of accessed documents.

Journal ArticleDOI
Gun Ho Lee1
TL;DR: This study presents an integrated audit approach of rule-based and case-based reasoning, which includes two stages of reasoning, i.e., screening stage based on rule- based reasoning and auditing stagebased on case-Based reasoning.
Abstract: Banks currently have a great interest in internal audits to reduce risks, to prevent themselves from insolvency, and to take quick action for financial incidents. This study presents an integrated audit approach of rule-based and case-based reasoning, which includes two stages of reasoning, i.e., screening stage based on rule-based reasoning and auditing stage based on case-based reasoning. Rule-based reasoning uses induction rules to determine whether a new problem should be inspected further or not. Case-based reasoning performs similarity-based matching to find the most similar case in case base to the new problem. The method presented is applied to data of internal audits of a bank.