Showing papers in &quot;Knowledge Based Systems in 2008&quot;

Maximizing deviation method for multiple attribute decision making in intuitionistic fuzzy setting

TL;DR: Results of comparison show that comparing to other approaches for dealing with incomplete data, these approaches presented in this paper are preferable for reflecting actual states of incomplete data in soft sets.

...read moreread less

Abstract: In view of the particularity of the value domains of mapping functions in soft sets, this paper presents data analysis approaches of soft sets under incomplete information. For standard soft sets, the decision value of an object with incomplete information is calculated by weighted-average of all possible choice values of the object, and the weight of each possible choice value is decided by the distribution of other objects. For fuzzy soft sets, incomplete data will be predicted based on the method of average-probability. Results of comparison show that comparing to other approaches for dealing with incomplete data, these approaches presented in this paper are preferable for reflecting actual states of incomplete data in soft sets. At last, an example is provided to illuminate the practicability and validity of the data analysis approach of soft sets under incomplete information.

...read moreread less

403 citations

Journal Article•DOI•

[...]

Guiwu Wei¹•Institutions (1)

Chongqing University¹

Text classification based on multi-word with support vector machine

TL;DR: An optimization model based on the maximizing deviation method, by which the attribute weights can be determined, is established and another optimization model is established for the special situations where the information about attribute weights is completely unknown.

...read moreread less

Abstract: With respect to multiple attribute decision making problems with intuitionistic fuzzy information, some operational laws of intuitionistic fuzzy numbers, score function and accuracy function of intuitionistic fuzzy numbers are introduced. An optimization model based on the maximizing deviation method, by which the attribute weights can be determined, is established. For the special situations where the information about attribute weights is completely unknown, we establish another optimization model. By solving this model, we get a simple and exact formula, which can be used to determine the attribute weights. We utilize the intuitionistic fuzzy weighted averaging (IFWA) operator to aggregate the intuitionistic fuzzy information corresponding to each alternative, and then rank the alternatives and select the most desirable one(s) according to the score function and accuracy function. Finally, an illustrative example is given to verify the developed approach and to demonstrate its practicality and effectiveness.

...read moreread less

290 citations

Journal Article•DOI•

[...]

Wen Zhang¹, Taketoshi Yoshida¹, Xijin Tang²•Institutions (2)

Japan Advanced Institute of Science and Technology¹, Chinese Academy of Sciences²

Mixed feature selection based on granulation and approximation

TL;DR: A practical method is proposed to implement the multi-word extraction from documents based on the syntactical structure and two strategies as general concept representation and subtopic representation are presented to represent the documents using the extracted multi-words to investigate the effectiveness of using multi- words for text representation on the performances of text classification.

...read moreread less

Abstract: One of the main themes which support text mining is text representation; that is, its task is to look for appropriate terms to transfer documents into numerical vectors. Recently, many efforts have been invested on this topic to enrich text representation using vector space model (VSM) to improve the performances of text mining techniques such as text classification and text clustering. The main concern in this paper is to investigate the effectiveness of using multi-words for text representation on the performances of text classification. Firstly, a practical method is proposed to implement the multi-word extraction from documents based on the syntactical structure. Secondly, two strategies as general concept representation and subtopic representation are presented to represent the documents using the extracted multi-words. In particular, the dynamic k-mismatch is proposed to determine the presence of a long multi-word which is a subtopic of the content of a document. Finally, we carried out a series of experiments on classifying the Reuters-21578 documents using the representations with multi-words. We used the performance of representation in individual words as the baseline, which has the largest dimension of feature set for representation without linguistic preprocessing. Moreover, linear kernel and non-linear polynomial kernel in support vector machines (SVM) are examined comparatively for classification to investigate the effect of kernel type on their performances. Index terms with low information gain (IG) are removed from the feature set at different percentages to observe the robustness of each classification method. Our experiments demonstrate that in multi-word representation, subtopic representation outperforms the general concept representation and the linear kernel outperforms the non-linear kernel of SVM in classifying the Reuters data. The effect of applying different representation strategies is greater than the effect of applying the different SVM kernels on classification performance. Furthermore, the representation using individual words outperforms any representation using multi-words. This is consistent with the major opinions concerning the role of linguistic preprocessing on documents' features when using SVM for text classification.

...read moreread less

239 citations

Journal Article•DOI•

[...]

Qinghua Hu¹, Jinfu Liu¹, Daren Yu¹•Institutions (1)

Harbin Institute of Technology¹

01 May 2008-Knowledge Based Systems

TL;DR: A greedy attribute reduction algorithm is constructed based on Pawlak's rough set model, where the objects with numerical attributes are granulated with @d neighborhood relations or k-nearest-neighbor relations, while objects with categorical features are granulation with equivalence relations.

...read moreread less

Abstract: Feature subset selection presents a common challenge for the applications where data with tens or hundreds of features are available. Existing feature selection algorithms are mainly designed for dealing with numerical or categorical attributes. However, data usually comes with a mixed format in real-world applications. In this paper, we generalize Pawlak's rough set model into @d neighborhood rough set model and k-nearest-neighbor rough set model, where the objects with numerical attributes are granulated with @d neighborhood relations or k-nearest-neighbor relations, while objects with categorical features are granulated with equivalence relations. Then the induced information granules are used to approximate the decision with lower and upper approximations. We compute the lower approximations of decision to measure the significance of attributes. Based on the proposed models, we give the definition of significance of mixed features and construct a greedy attribute reduction algorithm. We compare the proposed algorithm with others in terms of the number of selected features and classification performance. Experiments show the proposed technique is effective.

...read moreread less

214 citations

Journal Article•DOI•

Short communication: Data mining method for listed companies' financial distress prediction

[...]

Jie Sun¹, Hui Li¹•Institutions (1)

Zhejiang Normal University¹

01 Feb 2008-Knowledge Based Systems

TL;DR: A data mining method combining attribute-oriented induction, information gain, and decision tree, which is suitable for preprocessing financial data and constructing decision tree model for financial distress prediction is put forward.

...read moreread less

Abstract: Data mining technique is capable of mining valuable knowledge from large and changeable database. This paper puts forward a data mining method combining attribute-oriented induction, information gain, and decision tree, which is suitable for preprocessing financial data and constructing decision tree model for financial distress prediction. On the base of financial ratios attributes and one class attribute, adopting entropy-based discretization method, a data mining model for listed companies' financial distress prediction is designed. The empirical experiment with 35 financial ratios and 135 pairs of listed companies as initial samples got satisfying result, which testifies the feasibility and validity of the proposed data mining method for listed companies' financial distress prediction.

...read moreread less

159 citations

Journal Article•DOI•

The role of organizational knowledge management in successful ERP implementation projects

[...]

Ramin Vandaie¹•Institutions (1)

McMaster University¹

Ranking-order case-based reasoning for financial distress prediction

TL;DR: Based on a review of the literature on the knowledge management in enterprise system implementation projects, two major areas of concern are identified regarding the management of knowledge in this specific type of projects: managing tacit knowledge, and issues regarding the process-based nature of organizational knowledge viewed through the lens of organizational memory.

...read moreread less

Abstract: Special attention to critical success factors in the implementation of Enterprise Resource Planning systems is evident from the bulk of literature on this issue. In order to implement these systems that are aimed at improving the sharing of enterprise-wide information and knowledge, organizations must have the capability of effective knowledge sharing to start with. Based on a review of the literature on the knowledge management in enterprise system implementation projects, this paper identifies two major areas of concern regarding the management of knowledge in this specific type of projects: managing tacit knowledge, and issues regarding the process-based nature of organizational knowledge viewed through the lens of organizational memory. The more capable an organization is in handling these issues, the more likely it is that the implementation will result in competitive advantage for the organization. The competitive advantage arises from the organization's capabilities in internalizing and integrating the adopted processes with the existing knowledge paradigms and harmonizing the new system and the organizational culture towards getting the most out of the implementation effort.

...read moreread less

135 citations

Journal Article•DOI•

[...]

Hui Li¹, Jie Sun¹•Institutions (1)

Zhejiang Normal University¹

A new application of ELECTRE III and revised Simos' procedure for group material selection under weighting uncertainty

TL;DR: Empirical results indicate that ROCBR outperforms ECBR, MCBR, ICBR, MDA, and Logit significantly in financial distress prediction of Chinese listed companies 1 year prior to distress, if irrelevant information among features has been handled effectively.

...read moreread less

Abstract: This paper addresses a new method of financial distress prediction using case-based reasoning (CBR) with financial ratios derived from financial statements. The aim of this work presented here is threefold. First, we make a brief review on financial distress prediction from the view of categories of the earliest applied models, models that generate If-Then rules, the most widely applied models historically, the most hotly researched models recently, and the most potential models. On the other hand, we make use of ranking-order information of distance between target case and each historical case on each feature to generate similarities between pairwise cases. The similarity between two cases on each feature is calculated by corresponding ranking-order information of distance in the first place, followed by a weighted integration to generate the final similarity between two cases. The CBR system that employs the new similarity measure model in the frame of k-nearest neighbor (k-NN) is named as ranking-order case-based reasoning (ROCBR). At the same time, we introduce ROCBR in financial distress prediction, and analyze the obtained results of financial distress prediction of Chinese listed companies, comparing them with those provided by the other three well-known CBR models with Euclidean distance, Manhuttan distance, and inductive approach as its heart of retrieval. The three compared CBR models are called as ECBR, MCBR, and ICBR, respectively. The two famous statistical models of logistic regression (Logit) and multi-variant discriminate analysis (MDA) are also employed for a comparison. The financial distress dataset used in the experiments come from Shanghai Stock Exchange and Shenzhen Stock Exchange. Empirical results indicate that ROCBR outperforms ECBR, MCBR, ICBR, MDA, and Logit significantly in financial distress prediction of Chinese listed companies 1 year prior to distress, if irrelevant information among features has been handled effectively.

...read moreread less

131 citations

Journal Article•DOI•

[...]

Ali Shanian¹, Abbas S. Milani², Carl J. Carson¹, R. C. Abeyaratne³•Institutions (3)

Rolls-Royce Motor Cars¹, University of British Columbia², Massachusetts Institute of Technology³

Index-BitTableFI: An improved algorithm for mining frequent itemsets

TL;DR: A post-aggregation strategy with an overall rank loss function is proposed to arrive at candidate materials that show the most stable and the highest ranks in a list of given candidates.

...read moreread less

Abstract: Towards the end of a design process designers may face a number of candidate materials with different attributes that are difficult to distinguish with the aid of available databases In such situations material selection of sensitive components is perhaps one of the most challenging problems in the design of structural elements in some industries such as aerospace The selection process is often realized as a team-work task to enhance reliability of the chosen material During group decision making, however, separations in design preferences can be encountered Furthermore, there may be uncertainties in each designer's mind with regards to expressing his/her preferences over design criteria This paper using a revised Simos' method with the ELECTRE III optimization model is an attempt to provide a decision aid framework that account for both of these effects A post-aggregation strategy with an overall rank loss function is proposed to arrive at candidate materials that show the most stable and the highest ranks in a list of given candidates To show the applicability of the approach, a sample case study in the material selection of a thermal loaded conductor cover sheet is conducted and validated by an available database It is also illustrated that, using ELECTRE III, a non-compensatory aspect of material selection can be assumed while attaining a reasonable sensitivity to weight fluctuations

...read moreread less

130 citations

Journal Article•DOI•

[...]

Wei Song¹, Bingru Yang², Zhangyan Xu•Institutions (2)

North China University of Technology¹, University of Science and Technology Beijing²

01 Aug 2008-Knowledge Based Systems

TL;DR: It is proved that frequent itemsets, including representative item and having the same supports as representative item, can be identified directly by connecting the representative item with all the combinations of items in its subsume index, so the cost for processing this kind of itemsets is lowered, and the efficiency is improved.

...read moreread less

Abstract: Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a BitTable structure. BitTableFI is such a recently proposed efficient BitTable-based algorithm, which exploits BitTable both horizontally and vertically. Although making use of efficient bit wise operations, BitTableFI still may suffer from the high cost of candidate generation and test. To address this problem, a new algorithm Index-BitTableFI is proposed. Index-BitTableFI also uses BitTable horizontally and vertically. To make use of BitTable horizontally, index array and the corresponding computing method are proposed. By computing the subsume index, those itemsets that co-occurrence with representative item can be identified quickly by using breadth-first search at one time. Then, for the resulting itemsets generated through the index array, depth-first search strategy is used to generate all other frequent itemsets. Thus, the hybrid search is implemented, and the search space is reduced greatly. The advantages of the proposed methods are as follows. On the one hand, the redundant operations on intersection of tidsets and frequency-checking can be avoided greatly; On the other hand, it is proved that frequent itemsets, including representative item and having the same supports as representative item, can be identified directly by connecting the representative item with all the combinations of items in its subsume index. Thus, the cost for processing this kind of itemsets is lowered, and the efficiency is improved. Experimental results show that the proposed algorithm is efficient especially for dense datasets.

...read moreread less

124 citations

Journal Article•DOI•

On multi-period multi-attribute decision making

[...]

Zeshui Xu¹•Institutions (1)

Tsinghua University¹

A flexible semantic inference methodology to reason about user preferences in knowledge-based recommender systems

TL;DR: The concept of dynamic weighted averaging (DWA) operator is defined, and some methods are introduced to obtain the weights associated with the DWA operator to solve the MP-MADM problems where all the attribute values provided at different periods are expressed in interval numbers.

...read moreread less

Abstract: Multiple attribute decision making (MADM) is an important part of modern decision science. It has been extensively applied to various areas such as society, economics, military, management, etc., and has been receiving more and more attention over the last decades. To date, however, most research has focused on single-period multi-attribute decision making in which all the original decision information is given at the same period, and a number of methods have been proposed to solve this kind of problems. This paper is devoted to investigating the multi-period multi-attribute decision making (MP-MADM) problems where the decision information (including attribute weights and attribute values) are provided by decision maker(s) at different periods. We define the concept of dynamic weighted averaging (DWA) operator, and introduce some methods, such as the arithmetic series based method, geometric series based method and normal distribution based method, to obtain the weights associated with the DWA operator. Based on the DWA operator, we develop an approach to MP-MADM. Moreover, we extend the DWA operator and the developed approach to solve the MP-MADM problems where all the attribute values provided at different periods are expressed in interval numbers, and use a possibility-degree formula to rank and select the given alternatives.

...read moreread less

124 citations

Journal Article•DOI•

[...]

Yolanda Blanco-Fernández, José J. Pazos-Arias, Alberto Gil-Solla, Manuel Ramos-Cabrer, Martín López-Nores, Jorge García-Duque, Ana Fernández-Vilas, Rebeca P. Díaz-Redondo, Jesús Bermejo-Muñoz¹ - Show less +5 more•Institutions (1)

Telvent¹

01 May 2008-Knowledge Based Systems

TL;DR: This paper proposes a personalization strategy that overcomes drawbacks in recommender systems by applying inference techniques borrowed from the Semantic Web, and illustrates its use in AVATAR, a tool that selects appealing audiovisual programs from among the myriad available in Digital TV.

...read moreread less

Abstract: Recommender systems arose with the goal of helping users search in overloaded information domains (like e-commerce, e-learning or Digital TV). These tools automatically select items (commercial products, educational courses, TV programs, etc.) that may be appealing to each user taking into account his/her personal preferences. The personalization strategies used to compare these preferences with the available items suffer from well-known deficiencies that reduce the quality of the recommendations. Most of the limitations arise from using syntactic matching techniques because they miss a lot of useful knowledge during the recommendation process. In this paper, we propose a personalization strategy that overcomes these drawbacks by applying inference techniques borrowed from the Semantic Web. Our approach reasons about the semantics of items and user preferences to discover complex associations between them. These semantic associations provide additional knowledge about the user preferences, and permit the recommender system to compare them with the available items in a more effective way. The proposed strategy is flexible enough to be applied in many recommender systems, regardless of their application domain. Here, we illustrate its use in AVATAR, a tool that selects appealing audiovisual programs from among the myriad available in Digital TV.

...read moreread less

Journal Article•DOI•

CALMsystem: A Conversational Agent for Learner Modelling

[...]

Alice Kerly¹, Richard Ellis, Susan Bull¹•Institutions (1)

University of Birmingham¹

01 Apr 2008-Knowledge Based Systems

TL;DR: A conversational agent, or ''chatbot'' has been developed to allow the learner to negotiate over the representations held about them using natural language, to support the metacognitive goals of self-assessment and reflection, which are increasingly seen as key to learning and are being incorporated into UK educational policy.

...read moreread less

Abstract: This paper describes a system which incorporates natural language technologies, database manipulation and educational theories in order to offer learners a Negotiated Learner Model, for integration into an Intelligent Tutoring System. The system presents the learner with their learner model, offering them the opportunity to compare their own beliefs regarding their capabilities with those inferred by the system. A conversational agent, or ''chatbot'' has been developed to allow the learner to negotiate over the representations held about them using natural language. The system aims to support the metacognitive goals of self-assessment and reflection, which are increasingly seen as key to learning and are being incorporated into UK educational policy. The paper describes the design of the system, and reports a user trial, in which the chatbot was found to support users in increasing the accuracy of their self-assessments, and in reducing the number of discrepancies between system and user beliefs in the learner model. Some lessons learned in the development have been highlighted and future research and experimentation directions are outlined.

...read moreread less

Journal Article•DOI•

Latent semantic analysis for text categorization using neural network

[...]

Bo Yu¹, Zongben Xu¹, Cheng-hua Li²•Institutions (2)

Xi'an Jiaotong University¹, Chonbuk National University²

Relations of attribute reduction between object and property oriented concept lattices

TL;DR: Experimental results show that the models using MBPNN outperform than the basic BPNN and the application of LSA for this system can lead to dramatic dimensionality reduction while achieving good classification results.

...read moreread less

Abstract: New text categorization models using back-propagation neural network (BPNN) and modified back-propagation neural network (MBPNN) are proposed. An efficient feature selection method is used to reduce the dimensionality as well as improve the performance. The basic BPNN learning algorithm has the drawback of slow training speed, so we modify the basic BPNN learning algorithm to accelerate the training speed. The categorization accuracy also has been improved consequently. Traditional word-matching based text categorization system uses vector space model (VSM) to represent the document. However, it needs a high dimensional space to represent the document, and does not take into account the semantic relationship between terms, which can also lead to poor classification accuracy. Latent semantic analysis (LSA) can overcome the problems caused by using statistically derived conceptual indices instead of individual words. It constructs a conceptual vector space in which each term or document is represented as a vector in the space. It not only greatly reduces the dimensionality but also discovers the important associative relationship between terms. We test our categorization models on 20-newsgroup data set, experimental results show that the models using MBPNN outperform than the basic BPNN. And the application of LSA for our system can lead to dramatic dimensionality reduction while achieving good classification results.

...read moreread less

Journal Article•DOI•

[...]

Anna Formica

01 Feb 2008-Knowledge Based Systems

TL;DR: In this paper, a method for measuring the similarity of FCA concepts is presented, which is a refinement of a previous proposal of the author, which consists in determining similarity of concept descriptors (attributes) by using the information content approach, rather than relying on human domain expertise.

...read moreread less

Abstract: Formal Concept Analysis (FCA) is revealing interesting in supporting difficult activities that are becoming fundamental in the development of the Semantic Web. Assessing concept similarity is one of such activities since it allows the identification of different concepts that are semantically close. In this paper, a method for measuring the similarity of FCA concepts is presented, which is a refinement of a previous proposal of the author. The refinement consists in determining the similarity of concept descriptors (attributes) by using the information content approach, rather than relying on human domain expertise. The information content approach which has been adopted allows a higher correlation with human judgement than other proposals for evaluating concept similarity in a taxonomy defined in the literature.

...read moreread less

Journal Article•DOI•

[...]

Xia Wang¹, Wenxiu Zhang¹•Institutions (1)

Xi'an Jiaotong University¹

01 Jul 2008-Knowledge Based Systems

TL;DR: Relations of attribute reduction between object and property oriented formal concept lattices are discussed and beautiful results are obtained that attribute reducts and attribute characteristics in the two concept lattice are the same based on new approaches to attribute reduction by means of irreducible elements.

...read moreread less

Abstract: As one of the basic problems of knowledge discovery and data analysis, knowledge reduction can make the discovery of implicit knowledge in data easier and the representation simpler. In this paper, relations of attribute reduction between object and property oriented formal concept lattices are discussed. And beautiful results are obtained that attribute reducts and attribute characteristics in the two concept lattices are the same based on new approaches to attribute reduction by means of irreducible elements. It turns out to be meaningful and effective in dealing with knowledge reduction, as attribute reducts and attribute characteristics in the object and property oriented formal concept lattices can be acquainted by only investigating one of the two concept lattices.

...read moreread less

Journal Article•DOI•

An application of supervised and unsupervised learning approaches to telecommunications fraud detection

[...]

Constantinos S. Hilas¹, Paris As. Mastorocostas¹•Institutions (1)

Technological Educational Institute of Serres¹

A comparative study for content-based dynamic spam classification using four machine learning algorithms

TL;DR: Five different user models are compared by means of both supervised and unsupervised learning techniques, namely the multilayer perceptron and the hierarchical agglomerative clustering to identify the user model that best identifies fraud cases.

...read moreread less

Abstract: This paper investigates the usefulness of applying different learning approaches to a problem of telecommunications fraud detection. Five different user models are compared by means of both supervised and unsupervised learning techniques, namely the multilayer perceptron and the hierarchical agglomerative clustering. One aim of the study is to identify the user model that best identifies fraud cases. The second task is to explore different views of the same problem and see what can be learned form the application of each different technique. All data come from real defrauded user accounts in a telecommunications network. The models are compared in terms of their performances. Each technique's outcome is evaluated with appropriate measures.

...read moreread less

Journal Article•DOI•

[...]

Bo Yu¹, Zongben Xu¹•Institutions (1)

Xi'an Jiaotong University¹

01 May 2008-Knowledge Based Systems

TL;DR: Experimental results show that NN classifier is unsuitable for using alone as a spam rejection tool, and SVM and RVM are more suitable than SVM for spam classification in terms of the applications that require low complexity.

...read moreread less

Abstract: The growth of email users has resulted in the dramatic increasing of the spam emails during the past few years. In this paper, four machine learning algorithms, which are Naive Bayesian (NB), neural network (NN), support vector machine (SVM) and relevance vector machine (RVM), are proposed for spam classification. An empirical evaluation for them on the benchmark spam filtering corpora is presented. The experiments are performed based on different training set size and extracted feature size. Experimental results show that NN classifier is unsuitable for using alone as a spam rejection tool. Generally, the performances of SVM and RVM classifiers are obviously superior to NB classifier. Compared with SVM, RVM is shown to provide the similar classification result with less relevance vectors and much faster testing time. Despite the slower learning procedure, RVM is more suitable than SVM for spam classification in terms of the applications that require low complexity.

...read moreread less

Journal Article•DOI•

A compound framework for sports results prediction: A football case study

[...]

Byungho Min¹, Jinhyuck Kim¹, Chongyoun Choe¹, Hyeonsang Eom¹, Robert I. McKay¹ - Show less +1 more•Institutions (1)

Seoul National University¹

An empirical study of a cross-level association rule mining approach to cold-start recommendations

TL;DR: A framework for sports prediction using Bayesian inference and rule-based reasoning, together with an in-game time-series approach to predict sports matches is proposed, which enables the framework to reflect the tides/flows of a sports match, making predictions certainly more realistic, and somewhat more accurate.

...read moreread less

Abstract: We propose a framework for sports prediction using Bayesian inference and rule-based reasoning, together with an in-game time-series approach. The framework is novel in three ways. The framework consists of two major components: a rule-based reasoner and a Bayesian network component. The two different approaches cooperate in predicting the results of sports matches. It is motivated by the observation that sports matches are highly stochastic, but at the same time, the strategies of a team can be approximated by crisp logic rules. Furthermore, because of the rule-based component, our framework can give reasonably good predictions even when statistical data is scanty: it can be used to predict results of matches between teams which have had few previous encounters. Machine learning techniques have great difficulty in handling such situations of insufficient data. Second, our framework is able to consider many factors, such as current scores, morale, fatigue, skills, etc. when it predicts the results of sports matches: most previous work considered only one factor, usually the score. Third, in contrast to most previous work on sports results prediction, we use a knowledge-based in-game time-series approach to predict sports matches. This approach enables our framework to reflect the tides/flows of a sports match, making our predictions certainly more realistic, and somewhat more accurate. We have implemented a football results predictor called FRES (Football Result Expert System) based on this framework, and show that it gives reasonable and stable predictions.

...read moreread less

Journal Article•DOI•

[...]

Cane Wing-Ki Leung¹, Stephen C. F. Chan¹, Fu-Lai Chung¹•Institutions (1)

Hong Kong Polytechnic University¹

Feature selection based-on genetic algorithm for image annotation

TL;DR: This work proposes a novel hybrid recommendation approach to address the well-known cold-start problem in Collaborative Filtering that makes use of Cross-Level Association RulEs (CLARE) to integrate content information about domain items into collaborative filters.

...read moreread less

Abstract: We propose a novel hybrid recommendation approach to address the well-known cold-start problem in Collaborative Filtering (CF). Our approach makes use of Cross-Level Association RulEs (CLARE) to integrate content information about domain items into collaborative filters. We first introduce a preference model comprising both user-item and item-item relationships in recommender systems, and present a motivating example of our work based on the model. We then describe how CLARE generates cold-start recommendations. We empirically evaluated the effectiveness of CLARE, which shows superior performance to related work in addressing the cold-start problem.

...read moreread less

Journal Article•DOI•

[...]

Jianjiang Lu, Tianzhong Zhao, Yafei Zhang

A new framework for detecting weighted sequential patterns in large sequence databases

TL;DR: In this system, the multimedia content description interface (MPEG-7) image feature descriptors consisting of color descriptors, texture descriptors and shape descriptors are employed to represent low-level image features and a bi-coded chromosome genetic algorithm is used for the simultaneity of weight optimization and descriptor subset selection.

...read moreread less

Abstract: Machine learning techniques for feature selection, which include the optimization of feature descriptor weights and the selection of optimal feature descriptor subset, are desirable to enhance the performance of image annotation systems. In our system, the multimedia content description interface (MPEG-7) image feature descriptors consisting of color descriptors, texture descriptors and shape descriptors are employed to represent low-level image features. We use a real coded chromosome genetic algorithm and k-nearest neighbor (k-NN) classification accuracy as fitness function to optimize the weights of MPEG-7 image feature descriptors. A binary one and k-NN classification accuracy combining with the size of feature descriptor subset as fitness function are used to select optimal MPEG-7 feature descriptor subset. Furthermore, a bi-coded chromosome genetic algorithm is used for the simultaneity of weight optimization and descriptor subset selection, whose fitness function is the same as that of the binary one. The experimental results over 2000 classified Corel images show that with the real coded genetic algorithm, the binary coded one and the bi-coded one, the accuracies of image annotation system are improved by 7%, 9% and 13.6%, respectively, comparing to the method without machine learning. Furthermore, 2 of 25 MPEG-7 feature descriptors are selected with the binary coded genetic algorithm and four with the bi-coded one, which may improve the efficiency of system significantly.

...read moreread less

Journal Article•DOI•

[...]

Unil Yun¹•Institutions (1)

Chungbuk National University¹

Classification of multivariate time series using two-dimensional singular value decomposition

TL;DR: A new framework for mining weighted frequent patterns in which weight constraints are deeply pushed in sequential pattern mining is suggested, which shows that WSpan detects fewer but important weighted sequential patterns in large sequence databases even with a low minimum threshold.

...read moreread less

Abstract: Sequential pattern mining is an essential research topic with broad applications which discovers the set of frequent subsequences satisfying a support threshold in a sequence database. The major problems of mining sequential patterns are that a huge set of sequential patterns are generated and the computation time is so high. Although efficient algorithms have been developed to tackle these problems, the performance of the algorithms dramatically degrades in case of mining long sequential patterns in dense databases or using low minimum supports. In addition, the algorithms may reduce the number of patterns but unimportant patterns are still found in the result patterns. It would be better if the unimportant patterns could be pruned first, resulting in fewer but important patterns after mining. In this paper, we suggest a new framework for mining weighted frequent patterns in which weight constraints are deeply pushed in sequential pattern mining. Previous sequential mining algorithms treat sequential patterns uniformly while real sequential patterns have different importance. In our approach, the weights of items are given according to the priority or importance. During the mining process, we consider not only supports but also weights of patterns. Based on the framework, we present a weighted sequential pattern mining algorithm (WSpan). To our knowledge, this is the first work to mine weighted sequential patterns. The experimental results show that WSpan detects fewer but important weighted sequential patterns in large sequence databases even with a low minimum threshold.

...read moreread less

Journal Article•DOI•

[...]

Xiaoqing Weng¹, Junyi Shen¹•Institutions (1)

Xi'an Jiaotong University¹

Ontology-based context synchronization for ad hoc social collaborations

TL;DR: Experimental results performed on five real-world datasets demonstrate the effectiveness of the proposed 2dSVD, an extension of standard SVD that captures explicitly the two-dimensional nature of MTS samples.

...read moreread less

Abstract: Multivariate time series (MTS) are used in very broad areas such as multimedia, medicine, finance and speech recognition. A new approach for MTS classification using two-dimensional singular value decomposition (2dSVD) is proposed. 2dSVD is an extension of standard SVD, it captures explicitly the two-dimensional nature of MTS samples. The eigenvectors of row-row and column-column covariance matrices of MTS samples are computed for feature extraction. After the feature matrix is obtained for each MTS sample, one-nearest-neighbor classifier is used for MTS classification. Experimental results performed on five real-world datasets demonstrate the effectiveness of our proposed approach.

...read moreread less

Journal Article•DOI•

[...]

Jason J. Jung¹•Institutions (1)

Yeungnam University¹

Constructing Bayesian networks for criminal profiling from limited data

TL;DR: An ontology-based platform for acquainting the most relevant users, according to their context, is proposed and two kinds of contexts with semantic information derived from ontologies are modeled; personal context, and consensual context, integrated from several personal contexts.

...read moreread less

Abstract: To efficiently support collaborations between people (agents) in real-time, we propose an ontology-based platform for acquainting the most relevant users (e.g., colleagues and classmates), according to their context. Thereby, we modeled two kinds of contexts with semantic information derived from ontologies; (i) personal context, and (ii) consensual context, integrated from several personal contexts. More importantly, we formulate measurement criteria to compare them. Consequently, groups can be dynamically organized with respect to the similarities among several aspects of personal context. In particular, users can engage in complex collaborations related to multiple semantics. For experimentation, we implemented a social browsing system based on context synchronization.

...read moreread less

Journal Article•DOI•

[...]

K. Baumgartner¹, Silvia Ferrari¹, G. Palermo²•Institutions (2)

Duke University¹, Medical College of Wisconsin²

A method for multiple attribute decision making with incomplete weight information under uncertain linguistic environment

TL;DR: Research shows that 80% of offender characteristics are predicted correctly on average in new single-victim homicides, and when confidence levels are taken into account this accuracy increases to 95.6%.

...read moreread less

Abstract: The increased availability of information technologies has enabled law enforcement agencies to compile databases with detailed information about major felonies. Machine learning techniques can utilize these databases to produce decision-aid tools to support police investigations. This paper presents a methodology for obtaining a Bayesian network (BN) model of offender behavior from a database of cleared homicides. The BN can infer the characteristics of an unknown offender from the crime scene evidence, and help narrow the list of suspects in an unsolved homicide. Our research shows that 80% of offender characteristics are predicted correctly on average in new single-victim homicides, and when confidence levels are taken into account this accuracy increases to 95.6%.

...read moreread less

Journal Article•DOI•

[...]

Yejun Xu¹, Qingli Da¹•Institutions (1)

Southeast University¹

Discovering original motifs with different lengths from time series

TL;DR: The multi-attribute decision making problems are studied, in which the information about the attribute values take the form of uncertain linguistic variables, and an optimization model is established to determine the attribute weights, and a method based on possibility degree is given to rank the alternatives.

...read moreread less

Abstract: The multi-attribute decision making problems are studied, in which the information about the attribute values take the form of uncertain linguistic variables. The concept of deviation degree between uncertain linguistic variables is defined, and ideal point of uncertain linguistic decision making matrix is also defined. A formula of possibility degree for the comparison between uncertain linguistic variables is proposed. Based on the deviation degree and ideal point of uncertain linguistic variables, an optimization model is established, by solving the model, a simple and exact formula is derived to determine the attribute weights where the information about the attribute weights is completely unknown. For the information about the attribute weights is partly known, another optimization model is established to determine the weights, and then to aggregate the given uncertain linguistic decision information, respectively. A method based on possibility degree is given to rank the alternatives. Finally, an illustrative example is also given.

...read moreread less

Journal Article•DOI•

[...]

Heng Tang¹, Stephen Shaoyi Liao¹•Institutions (1)

City University of Hong Kong¹

An evolutionary algorithm-based approach to robust analog circuit design using constrained multi-objective optimization

TL;DR: This paper introduces a novel k-motif-based algorithm that can solve the existing problem and provide a way to generate the original patterns by summarizing the discovered motifs.

...read moreread less

Abstract: Finding previously unknown patterns in a time series has received much attention in recent years. Of the associated algorithms, the k-motif algorithm is one of the most effective and efficient. It is also widely used as a time series preprocessing routine for many other data mining tasks. However, the k-motif algorithm depends on the predefine of the parameter w, which is the length of the pattern. This paper introduces a novel k-motif-based algorithm that can solve the existing problem and, moreover, provide a way to generate the original patterns by summarizing the discovered motifs.

...read moreread less

Journal Article•DOI•

[...]

Giuseppe Nicosia¹, Salvatore Rinaudo², Eva Sciacca¹•Institutions (2)

University of Catania¹, STMicroelectronics²

01 Apr 2008-Knowledge Based Systems

TL;DR: The proposed algorithm, A-NSGAII, was shown to produce acceptable and robust solutions in the tested applications, where state-of-art algorithms and circuit designers failed.

...read moreread less

Abstract: The increasing complexity of circuit design needs to be managed with appropriate optimization algorithms and accurate statistical description of design models in order to reach the design specifics, guaranteeing ''zero defects''. In the Design for Yield open problems are the design of effective optimization algorithms and statistical analysis for yield design, which require time consuming techniques. New methods have to balance accuracy, robustness and computational effort. Typical analog integrated circuit optimization problems are computationally hard and require the handling of multiple, conflicting, and non-commensurate objectives having strong nonlinear interdependence. This paper tackles the problem by evolutionary algorithms to produce tradeoff solutions on the Pareto Front. In this research work Integrated Circuit (IC) design has been formulated as a constrained multi-objective optimization problem defined in a mixed integer/discrete/continuous domain. The following real-life circuits, RF Low Noise Amplifier, LeapFrog Filter, and Ultra Wideband LNA, were selected as test bed. The proposed algorithm, A-NSGAII, was shown to produce acceptable and robust solutions in the tested applications, where state-of-art algorithms and circuit designers failed. The results show significant improvement in all the chosen IC design problems.

...read moreread less

Journal Article•DOI•

A knowledge-based decision support system for measuring enterprise performance

[...]

W. Wen, Y. H. Chen, I. C. Chen

Mining web logs to improve hit ratios of prefetching and caching

TL;DR: The KDSS system provides not only company’s various financial data query, but also enterprise performance based on knowledge reasoning, which integrates a database, a knowledge base, an inference engine, and a model base.

...read moreread less

Abstract: This paper presents a knowledge-based decision support system for measuring enterprise performance. The KDSS system provides not only company’s various financial data query, but also enterprise performance based on knowledge reasoning. Additionally, an artificial neural network is adopted to predict future total sales. The system integrates a database, a knowledge base, an inference engine, and a model base. It can offer a wide range of different queries and all rules in the knowledge base are explained in detail to illustrate the process of reasoning. Meanwhile, in order to reduce subjective judgment on performance measurement, a group assessment is used to assess the scores of each dimension for measuring enterprise performance. Finally, the result of enterprise performance evaluation is presented and some suggestions are given to managers for making decisions.

...read moreread less

Journal Article•DOI•

[...]

Yin-Fu Huang¹, Jhao-Min Hsu¹•Institutions (1)

National Yunlin University of Science and Technology¹

01 Feb 2008-Knowledge Based Systems

TL;DR: An access sequence miner to mine popular surfing 2-sequences with their conditional probabilities from the proxy log and stored them in the rule table, and a prediction-based buffer manager developed here will make appropriate actions such as document caching, document prefetching, and even cache/prefetch buffer size adjusting to achieve better buffer utilization.

...read moreread less

Abstract: In the Internet, proxy servers play the key roles between users and web sites, which could reduce the response time of user requests and save network bandwidth. Basically, an efficient buffer manager should be built in a proxy server to cache frequently accessed documents in the buffer, thereby achieving better response time. In the paper, we developed an access sequence miner to mine popular surfing 2-sequences with their conditional probabilities from the proxy log, and stored them in the rule table. Then, according to buffer contents and the rule table, a prediction-based buffer manager also developed here will make appropriate actions such as document caching, document prefetching, and even cache/prefetch buffer size adjusting to achieve better buffer utilization. Through the simulation, we found that our approach has much better performance than the other ones, in the quantitative measures such as hit ratios and byte hit ratios of accessed documents.

...read moreread less

Journal Article•DOI•

Rule-based and case-based reasoning approach for internal audit of bank

[...]

Gun Ho Lee¹•Institutions (1)

Soongsil University¹