scispace - formally typeset
Search or ask a question

Showing papers in "Knowledge Based Systems in 2012"


Journal ArticleDOI
TL;DR: It is found in this article that the RMSE value of the Fruit Fly Optimization Algorithm optimized General Regression Neural Network model has a very good convergence, and the model also has avery good classification and prediction capability.
Abstract: The treatment of an optimization problem is a problem that is commonly researched and discussed by scholars from all kinds of fields. If the problem cannot be optimized in dealing with things, usually lots of human power and capital will be wasted, and in the worst case, it could lead to failure and wasted efforts. Therefore, in this article, a much simpler and more robust optimization algorithm compared with the complicated optimization method proposed by past scholars is proposed; the Fruit Fly Optimization Algorithm. In this article, throughout the process of finding the maximal value and minimal value of a function, the function of this algorithm is tested repeatedly, in the mean time, the population size and characteristic is also investigated. Moreover, the financial distress data of Taiwan's enterprise is further collected, and the fruit fly algorithm optimized General Regression Neural Network, General Regression Neural Network and Multiple Regression are adopted to construct a financial distress model. It is found in this article that the RMSE value of the Fruit Fly Optimization Algorithm optimized General Regression Neural Network model has a very good convergence, and the model also has a very good classification and prediction capability.

1,232 citations


Journal ArticleDOI
Guiwu Wei1
TL;DR: This paper develops some prioritized aggregation operators for aggregating hesitant fuzzy information, and applies them to develop some models for hesitant fuzzy multiple attribute decision making (MADM) problems in which the attributes are in different priority level.
Abstract: In this paper, we investigate the hesitant fuzzy multiple attribute decision making (MADM) problems in which the attributes are in different priority level. Motivated by the ideal of prioritized aggregation operators [R.R. Yager, Prioritized aggregation operators, International Journal of Approximate Reasoning 48 (2008) 263-274], we develop some prioritized aggregation operators for aggregating hesitant fuzzy information, and then apply them to develop some models for hesitant fuzzy multiple attribute decision making (MADM) problems in which the attributes are in different priority level. Finally, a practical example about talent introduction is given to verify the developed approaches and to demonstrate its practicality and effectiveness.

494 citations


Journal ArticleDOI
TL;DR: A new similarity measure perfected using optimization based on neural learning is presented, which exceeds the best results obtained with current metrics and achieves important improvements in the measures of accuracy, precision and recall when applied to new user cold start situations.
Abstract: The new user cold start issue represents a serious problem in recommender systems as it can lead to the loss of new users who decide to stop using the system due to the lack of accuracy in the recommendations received in that first stage in which they have not yet cast a significant number of votes with which to feed the recommender system's collaborative filtering core. For this reason it is particularly important to design new similarity metrics which provide greater precision in the results offered to users who have cast few votes. This paper presents a new similarity measure perfected using optimization based on neural learning, which exceeds the best results obtained with current metrics. The metric has been tested on the Netflix and Movielens databases, obtaining important improvements in the measures of accuracy, precision and recall when applied to new user cold start situations. The paper includes the mathematical formalization describing how to obtain the main quality measures of a recommender system using leave-one-out cross validation.

444 citations


Journal ArticleDOI
TL;DR: Experiments show that over-sampling the minority class consistently outperforms under-sampled the majority class when data sets are strongly imbalanced, whereas there are not significant differences for databases with a low imbalance.
Abstract: The present paper investigates the influence of both the imbalance ratio and the classifier on the performance of several resampling strategies to deal with imbalanced data sets. The study focuses on evaluating how learning is affected when different resampling algorithms transform the originally imbalanced data into artificially balanced class distributions. Experiments over 17 real data sets using eight different classifiers, four resampling algorithms and four performance evaluation measures show that over-sampling the minority class consistently outperforms under-sampling the majority class when data sets are strongly imbalanced, whereas there are not significant differences for databases with a low imbalance. Results also indicate that the classifier has a very poor influence on the effectiveness of the resampling strategies.

283 citations


Journal ArticleDOI
TL;DR: This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification that is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution.
Abstract: High dimensionality of the feature space is one of the most important concerns in text classification problems due to processing time and accuracy considerations. Selection of distinctive features is therefore essential for text classification. This study proposes a novel filter based probabilistic feature selection method, namely distinguishing feature selector (DFS), for text classification. The proposed method is compared with well-known filter approaches including chi square, information gain, Gini index and deviation from Poisson distribution. The comparison is carried out for different datasets, classification algorithms, and success measures. Experimental results explicitly indicate that DFS offers a competitive performance with respect to the abovementioned approaches in terms of classification accuracy, dimension reduction rate and processing time.

278 citations


Journal ArticleDOI
TL;DR: An approach for multi-criteria decision making under intuitionistic fuzzy environment is developed, and an example to show the behavior of the proposed operators is illustrated.
Abstract: Archimedean t-conorm and t-norm are generalizations of a lot of other t-conorms and t-norms, such as Algebraic, Einstein, Hamacher and Frank t-conorms and t-norms or others, and some of them have been applied to intuitionistic fuzzy set, which contains three functions: the membership function, the non-membership function and the hesitancy function describing uncertainty and fuzziness more objectively. Recently, Beliakov et al. [3] constructed some operations about intuitionistic fuzzy sets based on Archimedean t-conorm and t-norm, from which an aggregation principle is proposed for intuitionistic fuzzy information. In this paper, we propose some other operations on intuitionistic fuzzy sets, study their properties and relationships, and based on which, we study the properties of the aggregation principle proposed by Beliakov et al. [3], and give some specific intuitionistic fuzzy aggregation operators, which can be considered as the extensions of the known ones. In the end, we develop an approach for multi-criteria decision making under intuitionistic fuzzy environment, and illustrate an example to show the behavior of the proposed operators.

251 citations


Journal ArticleDOI
TL;DR: Although ANN outperforms SVM when balanced learning is absent, the performance from the two classifiers becomes very comparable when both balanced learning and optimized decision making are employed and this has fully validated the effectiveness of the proposed method for the successful classification of clustered microcalcifications.
Abstract: Classification of microcalcification clusters from mammograms plays essential roles in computer-aided diagnosis for early detection of breast cancer, where support vector machine (SVM) and artificial neural network (ANN) are two commonly used techniques. Although some work suggest that SVM performs better than ANN, the average accuracy achieved is only around 80% in terms of the area under the receiver operating characteristic curve Az. This performance may become much worse when the training samples are imbalanced. As a result, a new strategy namely balanced learning with optimized decision making is proposed to enable effective learning from imbalanced samples, which is further employed to evaluate the performance of ANN and SVM in this context. When the proposed learning strategy is applied to individual classifiers, the results on the DDSM database have demonstrated that the performance from both ANN and SVM has been significantly improved. Although ANN outperforms SVM when balanced learning is absent, the performance from the two classifiers becomes very comparable when both balanced learning and optimized decision making are employed. Consequently, an average improvement of more than 10% in the measurements of F"1 score and Az measurement are achieved for the two classifiers. This has fully validated the effectiveness of our proposed method for the successful classification of clustered microcalcifications.

232 citations


Journal ArticleDOI
TL;DR: A novel integrated index called Glaucoma Risk Index (GRI) is proposed which is made up of HOS and DWT features, to diagnose the unknown class using a single feature and it is hoped that this GRI will aid clinicians to make a faster glaucomA diagnosis during the mass screening of normal/glaucoman images.
Abstract: Eye images provide an insight into important parts of the visual system, and also indicate the health of the entire human body. Glaucoma is one of the most common causes of blindness. It is a disease in which fluid pressure in the eye increases gradually, damaging the optic nerve and causing vision loss. Robust mass screening may help to extend the symptom-free life for the affected patients. The retinal optic nerve fiber layer can be assessed using optical coherence tomography, scanning laser polarimetry (SLP), and Heidelberg Retina Tomography (HRT) scanning methods. These methods are expensive and hence a novel low cost automated glaucoma diagnosis system using digital fundus images is proposed. The paper discusses the system for the automated identification of normal and glaucoma classes using Higher Order Spectra (HOS) and Discrete Wavelet Transform (DWT) features. The extracted features are fed to the Support Vector Machine (SVM) classifier with linear, polynomial order 1, 2, 3 and Radial Basis Function (RBF) to select the best kernel function for automated decision making. In this work, SVM classifier with kernel function of polynomial order 2 was able to identify the glaucoma and normal images automatically with an accuracy of 95%, sensitivity and specificity of 93.33% and 96.67% respectively. Finally, we have proposed a novel integrated index called Glaucoma Risk Index (GRI) which is made up of HOS and DWT features, to diagnose the unknown class using a single feature. We hope that this GRI will aid clinicians to make a faster glaucoma diagnosis during the mass screening of normal/glaucoma images.

214 citations


Journal ArticleDOI
TL;DR: This study investigates the group decision making under interval-valued intuitionistic fuzzy environment in which the attributes and experts are in different priority level and proposed operators can capture the prioritization phenomenon among the aggregated arguments.
Abstract: This study investigates the group decision making under interval-valued intuitionistic fuzzy environment in which the attributes and experts are in different priority level. We first propose some interval-valued intuitionistic fuzzy aggregation operators such as the interval-valued intuitionistic fuzzy prioritized weighted average (IVIFPWA) operator, the interval-valued intuitionistic fuzzy prioritized weighted geometric (IVIFPWG) operator. These proposed operators can capture the prioritization phenomenon among the aggregated arguments. Then, some of their desirable properties are investigated in detail. Furthermore, an approach to multi-criteria group decision making based on the proposed operators is given under interval-valued intuitionistic fuzzy environment. Finally, a practical example about talent introduction is provided to illustrate the developed method.

204 citations


Journal ArticleDOI
TL;DR: RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring and get the better results than five single classifiers and four popular ensemble classifiers.
Abstract: Decision tree (DT) is one of the most popular classification algorithms in data mining and machine learning. However, the performance of DT based credit scoring model is often relatively poorer than other techniques. This is mainly due to two reasons: DT is easily affected by (1) the noise data and (2) the redundant attributes of data under the circumstance of credit scoring. In this study, we propose two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT, which are based on two ensemble strategies: bagging and random subspace, to reduce the influences of the noise data and the redundant attributes of data and to get the relatively higher classification accuracy. Two real world credit datasets are selected to demonstrate the effectiveness and feasibility of proposed methods. Experimental results reveal that single DT gets the lowest average accuracy among five single classifiers, i.e., Logistic Regression Analysis (LRA), Linear Discriminant Analysis (LDA), Multi-layer Perceptron (MLP) and Radial Basis Function Network (RBFN). Moreover, RS-Bagging DT and Bagging-RS DT get the better results than five single classifiers and four popular ensemble classifiers, i.e., Bagging DT, Random Subspace DT, Random Forest and Rotation Forest. The results show that RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring.

202 citations


Journal ArticleDOI
TL;DR: A research model to examine the direct and indirect effects of knowledge management systems quality, KMS self-efficacy, organizational climate and attitude on the intention to share knowledge in the new product development process suggests that attitude is the key factor influencing intention to engage in knowledge sharing.
Abstract: Firms can obtain competitive advantages from their employees' knowledge sharing behaviors. This paper presents a research model to examine the direct and indirect effects of knowledge management systems (KMS) quality, KMS self-efficacy, organizational climate and attitude on the intention to share knowledge in the new product development process. The hypotheses are tested on data collected from 134 major electronic manufacturing firms in Taiwan, using partial least squares regression. The results of the empirical study suggest that attitude is the key factor influencing intention to engage in knowledge sharing. The more a factor (such as KMS self-efficacy and organizational climate) positively contributes to attitude, the more the factor contributes to knowledge sharing. The findings provide useful insights into how organizations should encourage employees' collaborative behaviors or activities so as to reinforce KMS self-efficacy create a favorable organizational climate that will in turn enhance attitude and intention to engage in knowledge sharing leading to benefits for the organization as a whole.

Journal ArticleDOI
TL;DR: This work aims to design an incremental CF recommender based on the Regularized Matrix Factorization (RMF), and first simplifies the training rule of RMF to propose the SI-RMF, which provides a simple mathematic form for further investigation.
Abstract: The Matrix-Factorization (MF) based models have become popular when building Collaborative Filtering (CF) recommenders, due to the high accuracy and scalability. However, most of the current MF based models are batch models that are incapable of being incrementally updated; while in real world applications users always enjoy receiving quick responses from the system once they have made feedbacks. In this work, we aim to design an incremental CF recommender based on the Regularized Matrix Factorization (RMF). To achieve this objective, we first simplify the training rule of RMF to propose the SI-RMF, which provides a simple mathematic form for further investigation; whereby we design two Incremental RMF models, respectively are the Incremental RMF (IRMF) and the Incremental RMF with linear biases (IRMF-B). The experiments on two large, real datasets suggest positive results, which prove the efficiency of our strategy.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed model outperforms the single BPN model without EMD preprocessing and the traditional autoregressive integrated moving average (ARIMA) models.
Abstract: Due to the fluctuation and complexity of the tourism industry, it is difficult to capture its non-stationary property and accurately describe its moving tendency. In this study, a novel forecasting model based on empirical mode decomposition (EMD) and neural network is proposed to predict tourism demand (i.e. the number of arrivals). The proposed approach first uses EMD, which can adaptively decompose the complicated raw data into a finite set of intrinsic mode functions (IMFs) and a residue, which have simpler frequency components and higher correlations. The IMF components and residue are than modeled and forecasted using back-propagation neural network (BPN) and the final forecasting value can be obtained by the sum of these prediction results. In order to evaluate the performance of the proposed approach, the majority of international visitors to Taiwan are used as illustrative examples. Experimental results show that the proposed model outperforms the single BPN model without EMD preprocessing and the traditional autoregressive integrated moving average (ARIMA) models.

Journal ArticleDOI
TL;DR: This article investigates the group decision making problems in which all the information provided by the decision makers is expressed as IT2 fuzzy decision matrices, and the information about attribute weights is partially known, which may be constructed by various forms.
Abstract: Interval type-2 fuzzy sets (IT2 FSs) are a very useful means to depict the decision information in the process of decision making. In this article, we investigate the group decision making problems in which all the information provided by the decision makers (DMs) is expressed as IT2 fuzzy decision matrices, and the information about attribute weights is partially known, which may be constructed by various forms. We first use the IT2 fuzzy weighted arithmetic averaging operator to aggregate all individual IT2 fuzzy decision matrices provided by the DMs into the collective IT2 fuzzy decision matrix, then we utilize the ranking-value measure to calculate the ranking value of each attribute value and construct the ranking-value matrix of the collective IT2 fuzzy decision matrix. Based on the ranking-value matrix and the given attribute weight information, we establish some optimization models to determine the weights of attributes. Furthermore, we utilize the obtained attribute weights and the IT2 fuzzy weighted arithmetic average operator to fuse the IT2 fuzzy information in the collective IT2 fuzzy decision matrix to get the overall IT2 fuzzy values of alternatives by which the ranking of all the given alternatives can be found. Finally, we give an illustrative example.

Journal ArticleDOI
TL;DR: A topic oriented community detection approach which combines both social objects clustering and link analysis, which can achieve a better performance when the topics are at least as important as the links to the analysis.
Abstract: Community detection is an important issue in social network analysis. Most existing methods detect communities through analyzing the linkage of the network. The drawback is that each community identified by those methods can only reflect the strength of connections, but it cannot reflect the semantics such as the interesting topics shared by people. To address this problem, we propose a topic oriented community detection approach which combines both social objects clustering and link analysis. We first use a subspace clustering algorithm to group all the social objects into topics. Then we divide the members that are involved in those social objects into topical clusters, each corresponding to a distinct topic. In order to differentiate the strength of connections, we perform a link analysis on each topical cluster to detect the topical communities. Experiments on real data sets have shown that our approach was able to identify more meaningful communities. The quantitative evaluation indicated that our approach can achieve a better performance when the topics are at least as important as the links to the analysis.

Journal ArticleDOI
TL;DR: An algorithm that iteratively alternates between filter ranking construction and wrapper feature subset selection (FSS), which shows an impressive reduction in the number of wrapper evaluations without degrading the quality of the obtained subset.
Abstract: This paper deals with the problem of supervised wrapper-based feature subset selection in datasets with a very large number of attributes. Recently the literature has contained numerous references to the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their problems: (1) depending on the complexity of the wrapper search method, the number of wrapper evaluations can still be too large; and (2) they rely on a univariate ranking that does not take into account interaction between the variables already included in the selected subset and the remaining ones. Here we propose a new approach whose main goal is to drastically reduce the number of wrapper evaluations while maintaining good performance (e.g. accuracy and size of the obtained subset). To do this we propose an algorithm that iteratively alternates between filter ranking construction and wrapper feature subset selection (FSS). Thus, the FSS only uses the first block of ranked attributes and the ranking method uses the current selected subset in order to build a new ranking where this knowledge is considered. The algorithm terminates when no new attribute is selected in the last call to the FSS algorithm. The main advantage of this approach is that only a few blocks of variables are analyzed, and so the number of wrapper evaluations decreases drastically. The proposed method is tested over eleven high-dimensional datasets (2400-46,000 variables) using different classifiers. The results show an impressive reduction in the number of wrapper evaluations without degrading the quality of the obtained subset.

Journal ArticleDOI
Jinchao Ji1, Wei Pang1, Chunguang Zhou1, Xiao Han1, Zhe Wang1 
TL;DR: A new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters is employed, which takes into account the significance of different attributes towards the clustering process.
Abstract: In many applications, data objects are described by both numeric and categorical features. The k-prototype algorithm is one of the most important algorithms for clustering this type of data. However, this method performs hard partition, which may lead to misclassification for the data objects in the boundaries of regions, and the dissimilarity measure only uses the user-given parameter for adjusting the significance of attribute. In this paper, first, we combine mean and fuzzy centroid to represent the prototype of a cluster, and employ a new measure based on co-occurrence of values to evaluate the dissimilarity between data objects and prototypes of clusters. This measure also takes into account the significance of different attributes towards the clustering process. Then we present our algorithm for clustering mixed data. Finally, the performance of the proposed method is demonstrated by a series of experiments on four real world datasets in comparison with that of traditional clustering algorithms.

Journal ArticleDOI
TL;DR: A hybrid intelligent model for stock exchange index prediction using a combination of data preprocessing methods, genetic algorithms and Levenberg-Marquardt (LM) algorithm for learning feed forward neural networks is proposed.
Abstract: Artificial Intelligence models (AI) which computerize human reasoning has found a challenging test bed for various paradigms in many areas including financial time series prediction. Extensive researches have resulted in numerous financial applications using AI models. Since stock investment is a major investment activity, Lack of accurate information and comprehensive knowledge would result in some certain loss of investment. Hence, stock market prediction has always been a subject of interest for most investors and professional analysts. Stock market prediction is a challenging problem because uncertainties are always involved in the market movements. This paper proposes a hybrid intelligent model for stock exchange index prediction. The proposed model is a combination of data preprocessing methods, genetic algorithms and Levenberg-Marquardt (LM) algorithm for learning feed forward neural networks. Actually it evolves neural network initial weights for tuning with LM algorithm by using genetic algorithm. We also use data pre-processing methods such as data transformation and input variables selection for improving the accuracy of the model. The capability of the proposed method is tested by applying it for predicting some stock exchange indices used in the literature. The results show that the proposed approach is able to cope with the fluctuations of stock market values and also yields good prediction accuracy. So it can be used to model complex relationships between inputs and outputs or to find data patterns while performing financial prediction.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the rough decision entropy measure and the interval approximation roughness measure are effective and valid for evaluating the uncertainty measurement of interval-valued decision systems.
Abstract: Uncertainty measures can supply new points of view for analyzing data and help us to disclose the substantive characteristics of data sets. Some uncertainty measures for single-valued information systems or single-valued decision systems have been developed. However, there are few studies on the uncertainty measurement for interval-valued information systems or interval-valued decision systems. This paper addresses the uncertainty measurement problem in interval-valued decision systems. An extended conditional entropy is proposed in interval-valued decision systems based on possible degree between interval values. Consequently, a concept called rough decision entropy is introduced to evaluate the uncertainty of an interval-valued decision system. Besides, the original approximation accuracy measure proposed by Pawlak is extended to deal with interval-valued decision systems and the concept of interval approximation roughness is presented. Experimental results demonstrate that the rough decision entropy measure and the interval approximation roughness measure are effective and valid for evaluating the uncertainty measurement of interval-valued decision systems. Experimental results also indicate that the rough decision entropy measure outperforms the interval approximation roughness measure.

Journal ArticleDOI
TL;DR: A new automatic identification system has been designed to identify insect specimen images at the order level with good stability and accuracy, and results from tests using the support vector machine further improved accuracy.
Abstract: A new automatic identification system has been designed to identify insect specimen images at the order level. Several relative features were designed according to the methods of digital image progressing, pattern recognition and the theory of taxonomy. Artificial neural networks (ANNs) and a support vector machine (SVM) are used as pattern recognition methods for the identification tests. During tests on nine common orders and sub-orders with an artificial neural network, the system performed with good stability and accuracy reached 93%. Results from tests using the support vector machine further improved accuracy. We also did tests on eight- and nine-orders with different features and based on these results we compare the advantages and disadvantages of our system and provide some advice for future research on insect image recognition.

Journal ArticleDOI
TL;DR: An application of the new approach in a multiple attribute group decision making problem concerning the evaluation of university faculty for tenure and promotion is developed.
Abstract: We introduce a wide range of linguistic generalized power aggregation operators. First, we present the generalized power average (GPA) operator and the generalized power ordered weighted average (GPOWA) operator. Then we extend the GPA operator and the GPOWA operator to linguistic environment and propose the linguistic generalized power average (LGPA) operator, the weighted linguistic generalized power average (WLGPA) operator and the linguistic generalized power ordered weighted average (LGPOWA) operator, which are aggregation functions that use linguistic information and generalized mean in the power average (PA) operator. We give their particular cases such as the linguistic power ordered weighted average (LPOWA) operator, the linguistic power ordered weighted geometric average (LPOWGA) operator, the linguistic power ordered weighted harmonic average (LPOWHA) operator and the linguistic power ordered weighted quadratic average (LPOWQA) operator. Finally, we develop an application of the new approach in a multiple attribute group decision making problem concerning the evaluation of university faculty for tenure and promotion.

Journal ArticleDOI
TL;DR: Compared with other published football forecast models, pi-football not only appears to be exceptionally accurate, but it can also be used to 'beat the bookies'.
Abstract: A Bayesian network is a graphical probabilistic model that represents the conditional dependencies among uncertain variables, which can be both objective and subjective. We present a Bayesian network model for forecasting Association Football matches in which the subjective variables represent the factors that are important for prediction but which historical data fails to capture. The model (pi-football) was used to generate forecasts about the outcomes of the English Premier League (EPL) matches during season 2010/11 (but is easily extended to any football league). Forecasts were published online prior to the start of each match. We show that:(a)using an appropriate measure of forecast accuracy, the subjective information improved the model such that posterior forecasts were on par with bookmakers' performance; (b)using a standard profitability measure with discrepancy levels at >=5%, the model generates profit under maximum, mean, and common bookmakers' odds, even allowing for the bookmakers' built-in profit margin. Hence, compared with other published football forecast models, pi-football not only appears to be exceptionally accurate, but it can also be used to 'beat the bookies'.

Journal ArticleDOI
TL;DR: An optimization model to determine attribute weights for MADM problems with incomplete weight information of criteria under IVIFSs environment is presented and an extended technique for order preference by similarity to ideal solution (TOPSIS) is suggested to ranking all the alternatives.
Abstract: Many authors have investigated multiattribute decision making (MADM) problems under interval-valued intuitionistic fuzzy sets (IVIFSs) environment. This paper presents an optimization model to determine attribute weights for MADM problems with incomplete weight information of criteria under IVIFSs environment. In this method, a series of mathematical programming models based on cross-entropy are constructed and eventually transformed into a single mathematical programming model to determine the weights of attributes. In addition, an extended technique for order preference by similarity to ideal solution (TOPSIS) is suggested to ranking all the alternatives. Furthermore, an illustrative example is provided to compare the proposed approach with existing methods. Finally, the paper concludes with suggestions for future research.

Journal ArticleDOI
TL;DR: This study builds an intellectual structure by examining a total of 10,974 publications in the knowledge management (KM) field from 1995 to 2010 and presents a longitudinal analysis of the development of the KM related studies.
Abstract: Visualizing the entire domain of knowledge and tracking the latest developments of an important discipline are challenging tasks for researchers. This study builds an intellectual structure by examining a total of 10,974 publications in the knowledge management (KM) field from 1995 to 2010. Document co-citation analysis, pathfinder network and strategic diagram techniques are applied to provide a dynamic view of the evolution of knowledge management research trends. This study provides a systematic and objective means in exploring the development of the KM discipline. This paper not only drew its finding from a large data set but also presented a longitudinal analysis of the development of the KM related studies. The results of this study reflect that the coverage of key KM papers has expanded into a broad spectrum of disciplines. A discussion of the future of KM research is also provided.

Journal ArticleDOI
TL;DR: This paper proposes a method belonging to the family of the nested generalized exemplar that accomplishes learning by storing objects in Euclidean n-space that outperforms other classic and recent models in accuracy and requires to store a lower number of generalized examples.
Abstract: In supervised classification, we often encounter many real world problems in which the data do not have an equitable distribution among the different classes of the problem. In such cases, we are dealing with the so-called imbalanced data sets. One of the most used techniques to deal with this problem consists of preprocessing the data previously to the learning process. This paper proposes a method belonging to the family of the nested generalized exemplar that accomplishes learning by storing objects in Euclidean n-space. Classification of new data is performed by computing their distance to the nearest generalized exemplar. The method is optimized by the selection of the most suitable generalized exemplars based on evolutionary algorithms. An experimental analysis is carried out over a wide range of highly imbalanced data sets and uses the statistical tests suggested in the specialized literature. The results obtained show that our evolutionary proposal outperforms other classic and recent models in accuracy and requires to store a lower number of generalized examples.

Journal ArticleDOI
Qiong Bao1, Da Ruan1, Yongjun Shen1, Elke Hermans1, Davy Janssens1 
TL;DR: This study proposes an improved hierarchical fuzzy TOPSIS model to combine the multilayer SPIs into one overall index by incorporating experts' knowledge and implies the feasibility of applying this model to a great number of performance evaluation and decision making activities in other wide ranging fields as well.
Abstract: With the ever increasing public awareness of complicated road safety phenomenon, much more detailed aspects of crash and injury causation rather than only crash data are extensively investigated in the current road safety research. Safety performance indicators (SPIs), which are causally related to the number of crashes or to the injury consequences of a crash, are rapidly developed and increasingly used. To measure the multi-dimensional concept of road safety which cannot be captured by a single indicator, the exploration of a composite road safety performance index is vital for rational decision-making about road safety. In doing so, a proper decision support system is required. In this study, we propose an improved hierarchical fuzzy TOPSIS model to combine the multilayer SPIs into one overall index by incorporating experts' knowledge. Using the number of road fatalities per million inhabitants as a relevant reference, the proposed model provides with a promising intelligent decision support system to evaluate the road safety performance for a case study of a given set of European countries. It effectively handles experts' linguistic expressions and takes the layered hierarchy of the indicators into account. The comparison results with those from the original hierarchical fuzzy TOPSIS model further verify the robustness of the proposed model, and imply the feasibility of applying this model to a great number of performance evaluation and decision making activities in other wide ranging fields as well.

Journal ArticleDOI
TL;DR: This work proposes an efficient algorithm, running in O(@km), being m the number of edges in the graph, that is feasible for large scale network analysis and defines the @k-path edge centrality, a measure of centrality introduced to compute the importance of edges.
Abstract: The problem of assigning centrality values to nodes and edges in graphs has been widely investigated during last years. Recently, a novel measure of node centrality has been proposed, called @k-path centrality index, which is based on the propagation of messages inside a network along paths consisting of at most @k edges. On the other hand, the importance of computing the centrality of edges has been put into evidence since 1970s by Anthonisse and, subsequently by Girvan and Newman. In this work we propose the generalization of the concept of @k-path centrality by defining the @k-path edge centrality, a measure of centrality introduced to compute the importance of edges. We provide an efficient algorithm, running in O(@km), being m the number of edges in the graph. Thus, our technique is feasible for large scale network analysis. Finally, the performance of our algorithm is analyzed, discussing the results obtained against large online social network datasets.

Journal ArticleDOI
TL;DR: Some fundamental properties of the multigranulation rough set model are considered, and it is shown that both the collection of lower definable sets and that of upper definable Set can form a lattice, but such lattices are not distributive, not complemented and pseudo-complemented in the general case.
Abstract: The original rough set model, i.e., Pawlak's single-granulation rough set model has been extended to a multigranulation rough set model, where two kinds of multigranulation approximations, i.e., the optimistic and pessimistic approximations were introduced. In this paper, we consider some fundamental properties of the multigranulation rough set model, and show that (i)Both the collection of lower definable sets and that of upper definable sets in the optimistic multigranulation rough set model can form a lattice, such lattices are not distributive, not complemented and pseudo-complemented in the general case. The collection of definable sets in the optimistic multigranulation rough set model does not even form a lattice in general conditions. (ii)The collection of (lower, upper) definable sets in the optimistic multigranulation rough set model forms a topology on the universe if and only the optimistic multigranulation rough set model is equivalent to Pawlak's single-granulation rough set model. (iii)In the context of the pessimistic multigranulation rough set model, the collections of three different kinds of definable sets coincide with each other, and they determine a clopen topology on the universe, furthermore, they form a Boolean algebra under the usual set-theoretic operations.

Journal ArticleDOI
TL;DR: Compared with several representative reducts, the proposed reduction method in incomplete decision systems can provide a mathematical quantitative measure of knowledge uncertainty and is indeed efficient, and outperforms other available approaches for feature selection from incomplete and complete data sets.
Abstract: Feature selection in large, incomplete decision systems is a challenging problem. To avoid exponential computation in exhaustive feature selection methods, many heuristic feature selection algorithms have been presented in rough set theory. However, these algorithms are still time-consuming to compute. It is therefore necessary to investigate effective and efficient heuristic algorithms. In this paper, rough entropy-based uncertainty measures are introduced to evaluate the roughness and accuracy of knowledge. Moreover, some of their properties are derived and the relationships among these measures are established. Furthermore, compared with several representative reducts, the proposed reduction method in incomplete decision systems can provide a mathematical quantitative measure of knowledge uncertainty. Then, a heuristic algorithm with low computational complexity is constructed to improve computational efficiency of feature selection in incomplete decision systems. Experimental results show that the proposed method is indeed efficient, and outperforms other available approaches for feature selection from incomplete and complete data sets.

Journal ArticleDOI
TL;DR: An algorithm named CMRules is proposed for mining a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered, and its performance is compared with an adaptation of an algorithm from the literature that is named CMDeo.
Abstract: Sequential rule mining is an important data mining task used in a wide range of applications. However, current algorithms for discovering sequential rules common to several sequences use very restrictive definitions of sequential rules, which make them unable to recognize that similar rules can describe a same phenomenon. This can have many undesirable effects such as (1) similar rules that are rated differently, (2) rules that are not found because they are considered uninteresting when taken individually, (3) and rules that are too specific, which makes them less likely to be used for making predictions. In this paper, we address these problems by proposing a more general form of sequential rules such that items in the antecedent and in the consequent of each rule are unordered. We propose an algorithm named CMRules for mining this form of rules. The algorithm proceeds by first finding association rules to prune the search space for items that occur jointly in many sequences. Then it eliminates association rules that do not meet the minimum confidence and support thresholds according to the sequential ordering. We evaluate the performance of CMRules in three different ways. First, we provide an analysis of its time complexity. Second, we compare its performance (in terms of execution time, memory usage and scalability) with an adaptation of an algorithm from the literature that we name CMDeo. For this comparison, we use three real-life public datasets, which have different characteristics and represent three kinds of data. In many cases, results show that CMRules is faster and has a better scalability for low support thresholds than CMDeo. Lastly, we report a successful application of the algorithm in a tutoring agent.