scispace - formally typeset
Search or ask a question

Showing papers in "Knowledge Based Systems in 2011"


Journal ArticleDOI
Harun Uğuz1
TL;DR: Two-stage feature selection and feature extraction is used to improve the performance of text categorization and the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.
Abstract: Text categorization is widely used when organizing documents in a digital form. Due to the increasing number of documents in digital form, automated text categorization has become more promising in the last ten years. A major problem of text categorization is its large number of features. Most of those are irrelevant noise that can mislead the classifier. Therefore, feature selection is often used in text categorization to reduce the dimensionality of the feature space and to improve performance. In this study, two-stage feature selection and feature extraction is used to improve the performance of text categorization. In the first stage, each term within the document is ranked depending on their importance for classification using the information gain (IG) method. In the second stage, genetic algorithm (GA) and principal component analysis (PCA) feature selection and feature extraction methods are applied separately to the terms which are ranked in decreasing order of importance, and a dimension reduction is carried out. Thereby, during text categorization, terms of less importance are ignored, and feature selection and extraction methods are applied to the terms of highest importance; thus, the computational time and complexity of categorization is reduced. To evaluate the effectiveness of dimension reduction methods on our purposed model, experiments are conducted using the k-nearest neighbour (KNN) and C4.5 decision tree algorithm on Reuters-21,578 and Classic3 datasets collection for text categorization. The experimental results show that the proposed model is able to achieve high categorization effectiveness as measured by precision, recall and F-measure.

431 citations


Journal ArticleDOI
Zeshui Xu1
TL;DR: This paper develops a series of operators for aggregating IFNs, establishes various properties of these power aggregation operators, and applies them to develop some approaches to multiple attribute group decision making with Atanassov's intuitionistic fuzzy information.
Abstract: Intuitionistic fuzzy numbers (IFNs) are very suitable to be used for depicting uncertain or fuzzy information. Motivated by the idea of power aggregation [R.R. Yager, The power average operator, IEEE Transactions on Systems, Man, and Cybernetics-Part A 31 (2001) 724-731], in this paper, we develop a series of operators for aggregating IFNs, establish various properties of these power aggregation operators, and then apply them to develop some approaches to multiple attribute group decision making with Atanassov's intuitionistic fuzzy information. Moreover, we extend these aggregation operators and decision making approaches to interval-valued Atanassov's intuitionistic fuzzy environments.

411 citations


Journal ArticleDOI
TL;DR: The authors' experiment indicates that RBF optimized by AFSA is an easy-to-use algorithm with considerable accuracy, and of all the combinations tried, BIAS6+MA5+ASY4 was the optimum group with the least errors.
Abstract: Stock index forecasting is a hot issue in the financial arena. As the movements of stock indices are non-linear and subject to many internal and external factors, they pose a great challenge to researchers who try to predict them. In this paper, we select a radial basis function neural network (RBFNN) to train data and forecast the stock indices of the Shanghai Stock Exchange. We introduce the artificial fish swarm algorithm (AFSA) to optimize RBF. To increase forecasting efficiency, a K-means clustering algorithm is optimized by AFSA in the learning process of RBF. To verify the usefulness of our algorithm, we compared the forecasting results of RBF optimized by AFSA, genetic algorithms (GA) and particle swarm optimization (PSO), as well as forecasting results of ARIMA, BP and support vector machine (SVM). Our experiment indicates that RBF optimized by AFSA is an easy-to-use algorithm with considerable accuracy. Of all the combinations we tried in this paper, BIAS6+MA5+ASY4 was the optimum group with the least errors.

350 citations


Journal ArticleDOI
TL;DR: A method for determining weights of decision makers under group decision environment, in which the each individual decision information is expressed by a matrix in interval numbers, and the relative closeness to the ideal solution is defined based on Euclidean distance.
Abstract: In this paper, we develop a method for determining weights of decision makers under group decision environment, in which the each individual decision information is expressed by a matrix in interval numbers. We define the positive and negative ideal solutions of group decision, which are expressed by a matrix, respectively. The positive ideal solution is expressed by the average matrix of group decision and the negative ideal solution is maximum separation from positive ideal solution. The separation measures of each individual decision from the ideal solution and the relative closeness to the ideal solution are defined based on Euclidean distance. According to the relative closeness, we determine the weights of decision makers in accordance with the values of the relative closeness. Finally, we give an example for integrated assessment of air quality in Guangzhou during 16th Asian Olympic Games to illustrate in detail the calculation process of the developed approach.

294 citations


Journal ArticleDOI
TL;DR: Choquet integral and Dempster-Shafer theory of evidence are applied to aggregate inuitionistic fuzzy information and some new types of aggregation operators are developed, including the induced generalized intuitionistic fuzzy Choquet integral operators and induced generalized intuistic fuzzy Dem pster-shafer operators.
Abstract: We study the induced generalized aggregation operators under intuitionistic fuzzy environments. Choquet integral and Dempster-Shafer theory of evidence are applied to aggregate inuitionistic fuzzy information and some new types of aggregation operators are developed, including the induced generalized intuitionistic fuzzy Choquet integral operators and induced generalized intuitionistic fuzzy Dempster-Shafer operators. Then we investigate their various properties and some of their special cases. Additionally, we apply the developed operators to financial decision making under intuitionistic fuzzy environments. Some extensions in interval-valued intuitionistic fuzzy situations are also pointed out.

293 citations


Journal ArticleDOI
TL;DR: This paper proposes a new hybrid wind speed forecasting method based on a back-propagation (BP) neural network and the idea of eliminating seasonal effects from actual wind speed datasets using seasonal exponential adjustment that can forecast the daily average wind speed one year ahead with lower mean absolute errors.
Abstract: Wind energy, which is intermittent by nature, can have a significant impact on power grid security, power system operation, and market economics, especially in areas with a high level of wind power penetration. Wind speed forecasting has been a vital part of wind farm planning and the operational planning of power grids with the aim of reducing greenhouse gas emissions. Improving the accuracy of wind speed forecasting algorithms has significant technological and economic impacts on these activities, and significant research efforts have addressed this aim recently. However, there is no single best forecasting algorithm that can be applied to any wind farm due to the fact that wind speed patterns can be very different between wind farms and are usually influenced by many factors that are location-specific and difficult to control. In this paper, we propose a new hybrid wind speed forecasting method based on a back-propagation (BP) neural network and the idea of eliminating seasonal effects from actual wind speed datasets using seasonal exponential adjustment. This method can forecast the daily average wind speed one year ahead with lower mean absolute errors compared to figures obtained without adjustment, as demonstrated by a case study conducted using a wind speed dataset collected from the Minqin area in China from 2001 to 2006.

285 citations


Journal ArticleDOI
TL;DR: This paper analyzes ontology-based approaches for IC computation and proposes several improvements aimed to better capture the semantic evidence modelled in the ontology for the particular concept.
Abstract: The information content (IC) of a concept provides an estimation of its degree of generality/concreteness, a dimension which enables a better understanding of concept's semantics. As a result, IC has been successfully applied to the automatic assessment of the semantic similarity between concepts. In the past, IC has been estimated as the probability of appearance of concepts in corpora. However, the applicability and scalability of this method are hampered due to corpora dependency and data sparseness. More recently, some authors proposed IC-based measures using taxonomical features extracted from an ontology for a particular concept, obtaining promising results. In this paper, we analyse these ontology-based approaches for IC computation and propose several improvements aimed to better capture the semantic evidence modelled in the ontology for the particular concept. Our approach has been evaluated and compared with related works (both corpora and ontology-based ones) when applied to the task of semantic similarity estimation. Results obtained for a widely used benchmark show that our method enables similarity estimations which are better correlated with human judgements than related works.

256 citations


Journal ArticleDOI
TL;DR: A hybrid feature selection strategy based on genetic algorithm and support vector machine (GA-SVM) formed a wrapper to search for the best combination of bands with higher classification accuracy, which reduced the computational cost of the genetic algorithm.
Abstract: With the development and popularization of the remote-sensing imaging technology, there are more and more applications of hyperspectral image classification tasks, such as target detection and land cover investigation. It is a very challenging issue of urgent importance to select a minimal and effective subset from those mass of bands. This paper proposed a hybrid feature selection strategy based on genetic algorithm and support vector machine (GA-SVM), which formed a wrapper to search for the best combination of bands with higher classification accuracy. In addition, band grouping based on conditional mutual information between adjacent bands was utilized to counter for the high correlation between the bands and further reduced the computational cost of the genetic algorithm. During the post-processing phase, the branch and bound algorithm was employed to filter out those irrelevant band groups. Experimental results on two benchmark data sets have shown that the proposed approach is very competitive and effective.

252 citations


Journal ArticleDOI
TL;DR: This paper considers the situation with intuitionistic fuzzy information and develops an intuistic fuzzy ordered weighted distance (IFOWD) operator, which is very suitable to deal with the situations where the input data are represented in intuitionism fuzzy information.
Abstract: The ordered weighted distance and is a new decision-making technique, having been proved useful for the treatment of input data in the form of exact numbers. In this paper, we consider the situation with intuitionistic fuzzy information and develop an intuitionistic fuzzy ordered weighted distance (IFOWD) operator. The IFOWD operator is very suitable to deal with the situations where the input data are represented in intuitionistic fuzzy information and includes a wide range of distance measures and aggregation operators. We study some of its main properties and different families of IFOWD operators. Finally, we develop an application of the new approach in a group decision-making under intuitionistic fuzzy environment and illustrate it with a numerical example.

212 citations


Journal ArticleDOI
TL;DR: This study proposes a hybrid forecasting model for nonlinear time series by combining ARimA with genetic programming (GP) to improve upon both the ANN and the ARIMA forecasting models.
Abstract: The autoregressive integrated moving average (ARIMA), which is a conventional statistical method, is employed in many fields to construct models for forecasting time series. Although ARIMA can be adopted to obtain a highly accurate linear forecasting model, it cannot accurately forecast nonlinear time series. Artificial neural network (ANN) can be utilized to construct more accurate forecasting model than ARIMA for nonlinear time series, but explaining the meaning of the hidden layers of ANN is difficult and, moreover, it does not yield a mathematical equation. This study proposes a hybrid forecasting model for nonlinear time series by combining ARIMA with genetic programming (GP) to improve upon both the ANN and the ARIMA forecasting models. Finally, some real data sets are adopted to demonstrate the effectiveness of the proposed forecasting model.

204 citations


Journal ArticleDOI
TL;DR: A metric to measure similarity between users, which is applicable in collaborative filtering processes carried out in recommender systems, is presented, presenting significant improvements in prediction quality, recommendation quality and performance.
Abstract: This paper presents a metric to measure similarity between users, which is applicable in collaborative filtering processes carried out in recommender systems. The proposed metric is formulated via a simple linear combination of values and weights. Values are calculated for each pair of users between which the similarity is obtained, whilst weights are only calculated once, making use of a prior stage in which a genetic algorithm extracts weightings from the recommender system which depend on the specific nature of the data from each recommender system. The results obtained present significant improvements in prediction quality, recommendation quality and performance.

Journal ArticleDOI
TL;DR: This paper defines the notion of regular and maximal association rules between two sets of parameters, also their support, confidence and maximal support, maximal confidences, respectively properly using soft set theory, and shows that the soft regular and soft maximal associationrules provide identical rules as compared to the regular and minimal association rules.
Abstract: In this paper, we present an alternative approach for mining regular association rules and maximal association rules from transactional datasets using soft set theory. This approach is started by a transformation of a transactional dataset into a Boolean-valued information system. Since the ''standard'' soft set deals with such information system, thus a transactional dataset can be represented as a soft set. Using the concept of parameters co-occurrence in a transaction, we define the notion of regular and maximal association rules between two sets of parameters, also their support, confidence and maximal support, maximal confidences, respectively properly using soft set theory. The results show that the soft regular and soft maximal association rules provide identical rules as compared to the regular and maximal association rules.

Journal ArticleDOI
Peide Liu, Fang Jin, Xin Zhang, Yu Su, Minghe Wang1 
TL;DR: With respect to risk decision making problems with interval probability in which the attribute values take the form of the uncertain linguistic variables, a multi-attribute decision making method based on prospect theory is proposed.
Abstract: With respect to risk decision making problems with interval probability in which the attribute values take the form of the uncertain linguistic variables, a multi-attribute decision making method based on prospect theory is proposed To begin with, the uncertain linguistic variables can be transformed into the trapezoidal fuzzy number, and the prospect value function of the trapezoidal fuzzy number based on the decision-making reference point of each attribute and the weight function of interval probability can be constructed; then the prospect value of attribute for every alternative is calculated through prospect value function of the trapezoidal fuzzy number and the weight function of interval probability, and the weighted prospect value of alternative is acquired by using weighted average method according to attribute weights, and all the alternatives are sorted according to the expected values of the weighted prospect values; Finally, an illustrate example is given to show the decision-making steps, the influence on decision making for different parameters of value function and different decision-making reference point, and the feasibility of the method

Journal ArticleDOI
TL;DR: The proposed model for content-based image retrieval (CBIR) which depends only on extracting the most relevant features according to a feature selection technique is providing a precise image retrieval in a short time.
Abstract: This paper presents a proposed model for content-based image retrieval (CBIR) which depends only on extracting the most relevant features according to a feature selection technique. The suggested feature selection technique aims at selecting the optimal features that not only maximize the detection rate but also simplify the computation of the image retrieval process. The proposed model is divided into three main techniques, the first one is concerned with the features extraction from images database, the second is performing feature discrimination, and the third is concerned with the feature selection from the original ones. As for the first technique, the 3D color histogram and the Gabor filter algorithm are used to extract the color and texture features respectively. While the second technique depends on a genetic algorithm (GA) for replacing numerical features with nominal features that represent intervals of numerical domains with discrete values. The GA is utilized in this technique to obtain the optimal boundaries of these intervals, and consequently to reduce the complexity in feature space. In the third technique, the feature selection performs two successive functions which are called preliminary and deeply reduction for extracting the most relevant features from the original features set. Indeed, the main contribution of the proposed model is providing a precise image retrieval in a short time.

Journal ArticleDOI
Guiwu Wei1
TL;DR: The dynamic hybrid multiple attribute decision making problems, in which the decision information is expressed in real numbers, interval numbers or linguistic labels, are investigated and the concept of fuzzy membership grade and clustering is adopted to aggregate the grey relational degree of all the evaluated periods.
Abstract: In this paper, the dynamic hybrid multiple attribute decision making problems, in which the decision information, provided by decision makers at different periods, is expressed in real numbers, interval numbers or linguistic labels (linguistic labels can be described by triangular fuzzy numbers), respectively, are investigated. The method first utilizes three different GRA (grey relational analysis (real-valued GRA, interval-valued GRA and fuzzy-valued GRA) to calculate the individual grey relational degree of each alternative to the positive and negative ideal alternatives based on the decision information expressed in real numbers, interval numbers and linguistic labels, respectively, provided by each decision maker at each period, and then adopt the concept of fuzzy membership grade and clustering to aggregate the grey relational degree of all the evaluated periods. Finally, an illustrative example is given to verify the developed approach and to demonstrate its practicality and effectiveness.

Journal ArticleDOI
Huiling Chen1, Bo Yang1, Gang Wang1, Jie Liu1, Xin Xu1, Su-Jing Wang1, Dayou Liu1 
TL;DR: A novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor (FKNN) method, where the neighborhood size k and the fuzzy strength parameter m are adaptively specified by the continuous particle swarm optimization (PSO) approach, that might serve as a new candidate of powerful early warning systems for bankruptcy prediction with excellent performance.
Abstract: Bankruptcy prediction is one of the most important issues in financial decision-making. Constructing effective corporate bankruptcy prediction models in time is essential to make companies or banks prevent bankruptcy. This study proposes a novel bankruptcy prediction model based on an adaptive fuzzy k-nearest neighbor (FKNN) method, where the neighborhood size k and the fuzzy strength parameter m are adaptively specified by the continuous particle swarm optimization (PSO) approach. In addition to performing the parameter optimization for FKNN, PSO is also utilized to choose the most discriminative subset of features for prediction. Adaptive control parameters including time-varying acceleration coefficients (TVAC) and time-varying inertia weight (TVIW) are employed to efficiently control the local and global search ability of PSO algorithm. Moreover, both the continuous and binary PSO are implemented in parallel on a multi-core platform. The proposed bankruptcy prediction model, named PTVPSO-FKNN, is compared with five other state-of-the-art classifiers on two real-life cases. The obtained results clearly confirm the superiority of the proposed model in terms of classification accuracy, Type I error, Type II error and area under the receiver operating characteristic curve (AUC) criterion. The proposed model also demonstrates its ability to identify the most discriminative financial ratios. Additionally, the proposed model has reduced a large amount of computational time owing to its parallel implementation. Promisingly, PTVPSO-FKNN might serve as a new candidate of powerful early warning systems for bankruptcy prediction with excellent performance.

Journal ArticleDOI
TL;DR: The presented framework of knowledge reduction is for general decision formal contexts, and based on the proposed reduction method, knowledge hidden in a decision formal context can compactly be unravelled in the form of implication rules.
Abstract: This study deals with the problem of knowledge reduction in decision formal contexts. From the perspective of rule acquisition, a new framework of knowledge reduction for decision formal contexts is formulated and a corresponding reduction method is also developed by using the discernibility matrix and Boolean function. The presented framework of knowledge reduction is for general decision formal contexts, and based on the proposed reduction method, knowledge hidden in a decision formal context can compactly be unravelled in the form of implication rules.

Journal ArticleDOI
TL;DR: The results suggest that external variables that affect perceived usefulness, perceived ease of use, and intention to use, need to be considered as important factors in the process of designing, implementing, and operating e-learning systems.
Abstract: This study examines the factors that influence employees' adoption and use of e-learning systems and tests the applicability of the technology acceptance model (TAM) in the organizational context. We examined the relationship of employees' perceptions of their behavioral intention to use e-learning systems in terms of four determinants (individual, organizational, task characteristics, and subjective norm), to further explore the effects of management and organizational support on the subjective norm. Data were 357 valid questionnaires from four industries in Taiwan. The findings indicate that organizational support and management support significantly affected perceived usefulness and intention to use. Individuals' experience with computers and computer self-efficacy had significantly positive effects on perceived ease of use. Task equivocality significantly influenced perceived usefulness. Organizational and management supports significantly impacted the subjective norm, perceived usefulness, perceived ease of use, and intention to use. Additionally, the results suggest that external variables that affect perceived usefulness, perceived ease of use, and intention to use, need to be considered as important factors in the process of designing, implementing, and operating e-learning systems. The results provided a more comprehensive insight of individual, organizational, and task characteristics in predicting e-learning acceptance behavior in the organizational contexts, rarely tested in previous studies. By considering these identified factors, practitioners can take corresponding measures to predict or promote organizational employees' e-learning systems acceptance more effectively and efficiently. Furthermore, by explaining employees' acceptance behavior, the findings of this research help to develop more user-friendly e-learning systems and provide insight into the best way to promote e-learning systems for employees.

Journal ArticleDOI
TL;DR: This work proposes an alternative method, based on naive Bayes, which removes the requirement for the variables to be normally distributed, but retains the essential structure and other underlying assumptions of the method when dealing with data sets in which non-normal distributions are observed.
Abstract: Many algorithms have been proposed for the machine learning task of classification. One of the simplest methods, the naive Bayes classifier, has often been found to give good performance despite the fact that its underlying assumptions (of independence and a normal distribution of the variables) are perhaps violated. In previous work, we applied naive Bayes and other standard algorithms to a breast cancer database from Nottingham City Hospital in which the variables are highly non-normal and found that the algorithm performed well when predicting a class that had been derived from the same data. However, when we then applied naive Bayes to predict an alternative clinical variable, it performed much worse than other techniques. This motivated us to propose an alternative method, based on naive Bayes, which removes the requirement for the variables to be normally distributed, but retains the essential structure and other underlying assumptions of the method. We tested our novel algorithm on our breast cancer data and on three UCI datasets which also exhibited strong violations of normality. We found our algorithm outperformed naive Bayes in all four cases and outperformed multinomial logistic regression (MLR) in two cases. We conclude that our method offers a competitive alternative to MLR and naive Bayes when dealing with data sets in which non-normal distributions are observed.

Journal ArticleDOI
TL;DR: This work compares and evaluates how the length of available trust paths and aggregation methods affects prediction accuracy and then proposes the best strategy to maximize the prediction accuracy.
Abstract: Trust plays a critical role in determining social interactions in both online and offline networks, and reduces information overload, uncertainties and risk from unreliable users. In a social network, even if two users are not directly connected, one user can still trust the other user if there exists at least one path between the two users through friendship networks. This is the result of trust propagation based on the transitivity property of trust, which is "A trusts B and B trusts C, so A will trust C". It is important to provide a trust inference model to find reliable trust paths from a source user to an unknown target user, and to systematically combine multiple trust paths leading to a target user. We propose strategies for estimating level of trust based on Reinforcement Learning, which is particularly well suited to predict a long-term goal (i.e. indirect trust value on long-distance user) with short-term reward (i.e. direct trust value between directly connected users). In other words, we compare and evaluate how the length of available trust paths and aggregation methods affects prediction accuracy and then propose the best strategy to maximize the prediction accuracy.

Journal ArticleDOI
TL;DR: It is shown that in the incomplete information system, the smaller upper approximations can be obtained by neighborhood system based rough sets than by the methods in [Y.Y. Leung], and a new knowledge operation is discussed in the neighborhood system, from which more knowledge can be derived from the initial neighborhood system.
Abstract: Neighborhood system formalized the ancient intuition, infinitesimals, which led to the invention of calculus, topology and non-standard analysis. In this paper, the neighborhood system is researched from the view point of knowledge engineering and then each neighborhood is considered as a basic unit with knowledge. By using these knowledge in neighborhood system, the rough approximations and the corresponding properties are discussed. It is shown that in the incomplete information system, the smaller upper approximations can be obtained by neighborhood system based rough sets than by the methods in [Y. Leung, D.Y. Li, Maximal consistent block technique for rule acquisition in incomplete information systems, Information Sciences 115 (2003) 85-106] and [Y. Leung, W.Z. Wu, W.X. Zhang, Knowledge acquisition in incomplete information systems: a rough set approach, European Journal of Operational Research 168 (2006) 164-180]. Furthermore, a new knowledge operation is discussed in the neighborhood system, from which more knowledge can be derived from the initial neighborhood system. By such operations, the regions of lower and upper approximations are further expanded and narrowed, respectively. Some numerical examples are employed to substantiate the conceptual arguments.

Journal ArticleDOI
TL;DR: Experiments show that given a suitable number of virtual sample replicates, the generalization ability of the classifiers on the new training sets can be better than that on the original training sets.
Abstract: Traditional machine learning algorithms are not with satisfying generalization ability on noisy, imbalanced, and small sample training set. In this work, a novel virtual sample generation (VSG) method based on Gaussian distribution is proposed. Firstly, the method determines the mean and the standard error of Gaussian distribution. Then, virtual samples can be generated by such Gaussian distribution. Finally, a new training set is constructed by adding the virtual samples to the original training set. This work has shown that training on the new training set is equivalent to a form of regularization regarding small sample problems, or cost-sensitive learning regarding imbalanced sample problems. Experiments show that given a suitable number of virtual sample replicates, the generalization ability of the classifiers on the new training sets can be better than that on the original training sets.

Journal ArticleDOI
TL;DR: A novel quantified SWOT analytical method based multiple criteria group decision-making (MCGDM) concept, in which the priorities of SWOT factors and groups are derived by multiple decision makers with nonhomogeneous uncertain preference information (NUPI), such as interval multiplicative preference Relations, interval fuzzy preference relations, and uncertain linguistic preference relations.
Abstract: SWOT analysis is an important support tool for decision-making, and is commonly used to systematically analyze organizations' internal and external environments. However, one of its deficiencies is in the measurement and evaluation of prioritization of the factors and strategies. This paper is aimed to present a novel quantified SWOT analytical method based multiple criteria group decision-making (MCGDM) concept, in which the priorities of SWOT factors and groups are derived by multiple decision makers (DMs) with nonhomogeneous uncertain preference information (NUPI), such as interval multiplicative preference relations, interval fuzzy preference relations, and uncertain linguistic preference relations. In this method, the SWOT analysis provides a basic frame within which to perform analyses of decision situations, in turn, MCGDM methods assist in carrying out SWOT more analytically and in elaborating the results of the analyses so that SWOT factors and groups can be prioritized with respect to the entire SWOT. The uniform and aggregation of the NUPI and the derivation of priorities for SWOT groups and factors are investigated in detail. Finally, an example is to validate the procedure of the proposed method.

Journal ArticleDOI
TL;DR: In the proposed framework of TiWS pattern mining, the weight of each sequence in a sequence database is first obtained from the time-intervals of elements in the sequence, and subsequently TiWS patterns are found considering the weight.
Abstract: Sequential pattern mining, including weighted sequential pattern mining, has been attracting much attention since it is one of the essential data mining tasks with broad applications The weighted sequential pattern mining aims to find more interesting sequential patterns, considering the different significance of each data element in a sequence database In the conventional weighted sequential pattern mining, usually pre-assigned weights of data elements are used to get the importance, which are derived from their quantitative information and their importance in real world application domains In general sequential pattern mining, the generation order of data elements is considered to find sequential patterns However, their generation times and time-intervals are also important in real world application domains Therefore, time-interval information of data elements can be helpful in finding more interesting sequential patterns This paper presents a new framework for finding time-interval weighted sequential (TiWS) patterns in a sequence database and time-interval weighted support (TiW-support) to find the TiWS patterns In addition, a new method of mining TiWS patterns in a sequence database is also presented In the proposed framework of TiWS pattern mining, the weight of each sequence in a sequence database is first obtained from the time-intervals of elements in the sequence, and subsequently TiWS patterns are found considering the weight A series of evaluation results shows that TIWS pattern mining is efficient and helpful in finding more interesting sequential patterns

Journal ArticleDOI
TL;DR: The approach for ontology extraction on top of RDB by incorporating concept hierarchy as background knowledge is proposed, which is more efficient than the current approaches and can be applied in any of the fields such as eGoverment, eCommerce and so on.
Abstract: Relational Database (RDB) has been widely used as the back-end database of information system. Contains a wealth of high-quality information, RDB provides conceptual model and metadata needed in the ontology construction. However, most of the existing ontology building approaches convert RDB schema without considering the knowledge resided in the database. This paper proposed the approach for ontology extraction on top of RDB by incorporating concept hierarchy as background knowledge. Incorporating the background knowledge in the building process of Web Ontology Language (OWL) ontology gives two main advantages: (1) accelerate the building process, thereby minimizing the conversion cost; (2) background knowledge guides the extraction of knowledge resided in database. The experimental simulation using a gold standard shows that the Taxonomic F-measure (TF) evaluation reaches 90% while Relation Overlap (RO) is 83.33%. In term of processing time, this approach is more efficient than the current approaches. In addition, our approach can be applied in any of the fields such as eGoverment, eCommerce and so on.

Journal ArticleDOI
TL;DR: The experiment results show that the SVMFW model can reduce unnecessary information, satisfactorily detect FFS, and provide directions for properly allocating audit resources in limited audits.
Abstract: Detecting fraudulent financial statements (FFS) is critical in order to protect the global financial market. In recent years, FFS have begun to appear and continue to grow rapidly, which has shocked the confidence of investors and threatened the economics of entire countries. While auditors are the last line of defense to detect FFS, many auditors lack the experience and expertise to deal with the related risks. This study introduces a support vector machine-based fraud warning (SVMFW) model to reduce these risks. The model integrates sequential forward selection (SFS), support vector machine (SVM), and a classification and regression tree (CART). SFS is employed to overcome information overload problems, and the SVM technique is then used to assess the likelihood of FFS. To select the parameters of SVM models, particle swarm optimization (PSO) is applied. Finally, CART is employed to enable auditors to increase substantive testing during their audit procedures by adopting reliable, easy-to-grasp decision rules. The experiment results show that the SVMFW model can reduce unnecessary information, satisfactorily detect FFS, and provide directions for properly allocating audit resources in limited audits. The model is a promising alternative for detecting FFS caused by top management, and it can assist in both taxation and the banking system.

Journal ArticleDOI
TL;DR: An agent-based conceptual and computational model of consumer decision-making based on culture, personality and human needs that serves as a model for individual behavior in models that investigate system-level resulting behavior is proposed.
Abstract: Simulating consumer decision making processes involves different disciplines such as: sociology, social psychology, marketing, and computer science. In this paper, we propose an agent-based conceptual and computational model of consumer decision-making based on culture, personality and human needs. It serves as a model for individual behavior in models that investigate system-level resulting behavior. Theoretical concepts operationalized in the model are the Power Distance dimension of Hofstede's model of national culture; Extroversion, Agreeableness and Openness of Costa and McCrae's five-factor model of personality, and social status and social responsibility needs. These factors are used to formulate the utility function, process and update the agent state, need recognition and action estimation modules of the consumer decision process. The model was validated against data on culture, personality, wealth and car purchasing from eleven European countries. It produces believable results for the differences of consumer purchasing across eleven European countries.

Journal ArticleDOI
TL;DR: This paper defines a structure called power set tree (PS-tree), which is an order tree representing the power set, and each possible reduct is mapped to a node of the tree.
Abstract: Feature selection is viewed as an important preprocessing step for pattern recognition, machine learning and data mining Traditional hill-climbing search approaches to feature selection have difficulties to find optimal reducts And the current stochastic search strategies, such as GA, ACO and PSO, provide a more robust solution but at the expense of increased computational effort It is necessary to investigate fast and effective search algorithms Rough set theory provides a mathematical tool to discover data dependencies and reduce the number of features contained in a dataset by purely structural methods In this paper, we define a structure called power set tree (PS-tree), which is an order tree representing the power set, and each possible reduct is mapped to a node of the tree Then, we present a rough set approach to feature selection based on PS-tree Two kinds of pruning rules for PS-tree are given And two novel feature selection algorithms based on PS-tree are also given Experiment results demonstrate that our algorithms are effective and efficient

Journal ArticleDOI
TL;DR: A discrete model to support the consensus reaching process for MAGDM problems is presented and a convergent algorithm is presented to autocratically guide experts to reach a predefined consensus level.
Abstract: In multiple attribute group decision making (MAGDM), it is preferable that the set of experts reach a high degree of consensus amongst their opinions before applying a selection process. In this paper, we present a discrete model to support the consensus reaching process for MAGDM problems. Firstly, a consensus scheme for a set of arguments is provided, where the basic idea is to tighten the range of opinions amongst experts. Based on the well-defined scheme, a convergent algorithm is presented to autocratically guide experts to reach a predefined consensus level. In the selection process, the maximizing deviation method is applied to determine the attribute weights. Then, the choice of the best alternative(s) from the group decision matrix is obtained by the simple additive weighting method. Finally, one example is presented to show the application and effectiveness of the proposed model.

Journal ArticleDOI
TL;DR: A comprehensive review on the recent development of KBS, methods and tools in supporting rapid product development and how product knowledge is identified, captured, represented and reused during the processes of One-of-a-Kind product development is provided.
Abstract: In recent years, product knowledge has played increasingly significant roles in new product development process especially in the development of One-of-a-Kind products. Although knowledge-based systems (KBSs) have been proposed to support product development activities and new knowledge modelling methodologies have been developed, they are still far from complete. This area has become attractive to many researchers and as a result, many new knowledge-based systems, methods and tools have been developed. However, to the best of our knowledge, knowledge-based systems for product development have not been systematically reviewed, compared and summarized. This paper provides a comprehensive review on the recent development of KBS, methods and tools in supporting rapid product development. In the paper, the relevant technologies for modelling, managing and representing knowledge are investigated and reviewed systematically for better understanding their characteristics. The focus is placed on knowledge-based systems that support product development, and how product knowledge is identified, captured, represented and reused during the processes of One-of-a-Kind product development. The limitations and the future trend of KBS are presented in terms of how they can help One-of-a-Kind Production (OKP) companies.