scispace - formally typeset
Search or ask a question

Showing papers in "Knowledge Based Systems in 2015"


Journal ArticleDOI
TL;DR: The MFO algorithm is compared with other well-known nature-inspired algorithms on 29 benchmark and 7 real engineering problems and the statistical results show that this algorithm is able to provide very promising and competitive results.
Abstract: In this paper a novel nature-inspired optimization paradigm is proposed called Moth-Flame Optimization (MFO) algorithm. The main inspiration of this optimizer is the navigation method of moths in nature called transverse orientation. Moths fly in night by maintaining a fixed angle with respect to the moon, a very effective mechanism for travelling in a straight line for long distances. However, these fancy insects are trapped in a useless/deadly spiral path around artificial lights. This paper mathematically models this behaviour to perform optimization. The MFO algorithm is compared with other well-known nature-inspired algorithms on 29 benchmark and 7 real engineering problems. The statistical results on the benchmark functions show that this algorithm is able to provide very promising and competitive results. Additionally, the results of the real problems demonstrate the merits of this algorithm in solving challenging problems with constrained and unknown search spaces. The paper also considers the application of the proposed algorithm in the field of marine propeller design to further investigate its effectiveness in practice. Note that the source codes of the MFO algorithm are publicly available at http://www.alimirjalili.com/MFO.html.

2,892 citations


Journal ArticleDOI
TL;DR: A rigorous survey on sentiment analysis is presented, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis.
Abstract: With the advent of Web 2.0, people became more eager to express and share their opinions on web regarding day-to-day activities and global issues as well. Evolution of social media has also contributed immensely to these activities, thereby providing us a transparent platform to share views across the world. These electronic Word of Mouth (eWOM) statements expressed on the web are much prevalent in business and service industry to enable customer to share his/her point of view. In the last one and half decades, research communities, academia, public and service industries are working rigorously on sentiment analysis, also known as, opinion mining, to extract and analyze public mood and views. In this regard, this paper presents a rigorous survey on sentiment analysis, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis. Several sub-tasks need to be performed for sentiment analysis which in turn can be accomplished using various approaches and techniques. This survey covering published literature during 2002-2015, is organized on the basis of sub-tasks to be performed, machine learning and natural language processing techniques used and applications of sentiment analysis. The paper also presents open issues and along with a summary table of a hundred and sixty-one articles.

1,011 citations


Journal ArticleDOI
TL;DR: This paper systematically examines computational intelligence-based transfer learning techniques and clusters related technique developments into four main categories and provides state-of-the-art knowledge that will directly support researchers and practice-based professionals to understand the developments in computational Intelligence- based transfer learning research and applications.
Abstract: Transfer learning aims to provide a framework to utilize previously-acquired knowledge to solve new but similar problems much more quickly and effectively. In contrast to classical machine learning methods, transfer learning methods exploit the knowledge accumulated from data in auxiliary domains to facilitate predictive modeling consisting of different data patterns in the current domain. To improve the performance of existing transfer learning methods and handle the knowledge transfer process in real-world systems, computational intelligence has recently been applied in transfer learning. This paper systematically examines computational intelligence-based transfer learning techniques and clusters related technique developments into four main categories: (a) neural network-based transfer learning; (b) Bayes-based transfer learning; (c) fuzzy transfer learning, and (d) applications of computational intelligence-based transfer learning. By providing state-of-the-art knowledge, this survey will directly support researchers and practice-based professionals to understand the developments in computational intelligence-based transfer learning research and applications.

662 citations


Journal ArticleDOI
TL;DR: Experimental results prove that the proposed method performs significantly better than other previous well-known metaheuristic algorithms in terms of avoiding getting stuck in local minimums, and finding the global minimum.
Abstract: Evolutionary Algorithms (EAs) are well-known terms in many science fields. EAs usually interfere with science problems when common mathematical methods are unable to provide a good solution or finding the exact solution requires an unreasonable amount of time. Nowadays, many EA methods have been proposed and developed. Most of them imitate natural behavior, such as swarm animal movement. In this paper, inspired by the natural phenomenon of growth, a new metaheuristic algorithm is presented that uses a mathematic concept called the fractal. Using the diffusion property which is seen regularly in random fractals, the particles in the new algorithm explore the search space more efficiently. To verify the performance of our approach, both the constrained and unconstrained standard benchmark functions are employed. Some classic functions including unimodal and multimodal functions, as well as some modern hard functions, are employed as unconstrained benchmark functions; On the other hand, some well-known engineering design optimization problems commonly used in the literature are considered as constrained benchmark functions. Numerical results and comparisons with other state of the art stochastic algorithms are also provided. Considering both convergence and accuracy simultaneously, experimental results prove that the proposed method performs significantly better than other previous well-known metaheuristic algorithms in terms of avoiding getting stuck in local minimums, and finding the global minimum.

447 citations


Journal ArticleDOI
TL;DR: A novel feature representation approach, namely the cluster center and nearest neighbor (CANN) approach, which shows that the CANN classifier not only performs better than or similar to k-NN and support vector machines trained and tested by the original feature representation in terms of classification accuracy, detection rates, and false alarms.
Abstract: The aim of an intrusion detection systems (IDS) is to detect various types of malicious network traffic and computer usage, which cannot be detected by a conventional firewall. Many IDS have been developed based on machine learning techniques. Specifically, advanced detection approaches created by combining or integrating multiple learning techniques have shown better detection performance than general single learning techniques. The feature representation method is an important pattern classifier that facilitates correct classifications, however, there have been very few related studies focusing how to extract more representative features for normal connections and effective detection of attacks. This paper proposes a novel feature representation approach, namely the cluster center and nearest neighbor (CANN) approach. In this approach, two distances are measured and summed, the first one based on the distance between each data sample and its cluster center, and the second distance is between the data and its nearest neighbor in the same cluster. Then, this new and one-dimensional distance based feature is used to represent each data sample for intrusion detection by a k-Nearest Neighbor (k-NN) classifier. The experimental results based on the KDD-Cup 99 dataset show that the CANN classifier not only performs better than or similar to k-NN and support vector machines trained and tested by the original feature representation in terms of classification accuracy, detection rates, and false alarms. I also provides high computational efficiency for the time of classifier training and testing (i.e., detection).

423 citations


Journal ArticleDOI
TL;DR: An application example concerning the traditional Chinese medical diagnosis is given to illustrate the applicability and validation of the proposed correlation coefficients of HFLTSs in the process of qualitative decision making.
Abstract: The hesitant fuzzy linguistic term set (HFLTS) is a new and flexible tool in representing hesitant qualitative information in decision making. Correlation measures and correlation coefficients have been applied widely in many research domains and practical fields. This paper focuses on the correlation measures and correlation coefficients of HFLTSs. To start the investigation, the definition of HFLTS is improved and the concept of hesitant fuzzy linguistic element (HFLE) is introduced. Motivated by the idea of traditional correlation coefficients of fuzzy sets, intuitionistic fuzzy sets and hesitant fuzzy sets, several different types of correlation coefficients for HFLTSs are proposed. The prominent properties of these correlation coefficients are then investigated. In addition, considering that different HFLEs may have different weights, the weighted correlation coefficients and ordered weighted correlation coefficients are further investigated. Finally, an application example concerning the traditional Chinese medical diagnosis is given to illustrate the applicability and validation of the proposed correlation coefficients of HFLTSs in the process of qualitative decision making.

366 citations


Journal ArticleDOI
TL;DR: Different ranges for various entropies used to differentiate normal, interictal, and ictal EEG signals are proposed and ranked them depending on the ability to discrimination ability of three classes to classify the different stages of epilepsy.
Abstract: Epilepsy can be detected using EEG signals.The entropy indicates the complexity of the EEG signal.Various entropies are used to diagnose epilepsy.Unique ranges for various entropies are proposed. Epilepsy is the neurological disorder of the brain which is difficult to diagnose visually using Electroencephalogram (EEG) signals. Hence, an automated detection of epilepsy using EEG signals will be a useful tool in medical field. The automation of epilepsy detection using signal processing techniques such as wavelet transform and entropies may optimise the performance of the system. Many algorithms have been developed to diagnose the presence of seizure in the EEG signals. The entropy is a nonlinear parameter that reflects the complexity of the EEG signal. Many entropies have been used to differentiate normal, interictal and ictal EEG signals. This paper discusses various entropies used for an automated diagnosis of epilepsy using EEG signals. We have presented unique ranges for various entropies used to differentiate normal, interictal, and ictal EEG signals and also ranked them depending on the ability to discrimination ability of three classes. These entropies can be used to classify the different stages of epilepsy and can also be used for other biomedical applications.

361 citations


Journal ArticleDOI
TL;DR: This analysis shows the conceptual evolution of the journal and some of its performance bibliometric indicators based on citations, as the evolution of its impact factor, its h-index, and its most cited authors/documents.
Abstract: In commemoration of the Anniversary 25th of KnoSys we present a bibliometric analysis of the scientific content of the journal during the period 1991-2014. This analysis shows the conceptual evolution of the journal and some of its performance bibliometric indicators based on citations, as the evolution of its impact factor, its h-index, and its most cited authors/documents.

283 citations


Journal ArticleDOI
TL;DR: The origins and importance of feature selection are discussed and recent contributions in a range of applications are outlined, from DNA microarray analysis to face recognition.
Abstract: The explosion of big data has posed important challenges to researchers.Feature selection is paramount when dealing with high-dimensional datasets.We review the state-of-the-art and recent contributions in feature selection.The emerging challenges in feature selection are identified and discussed. In an era of growing data complexity and volume and the advent of big data, feature selection has a key role to play in helping reduce high-dimensionality in machine learning problems. We discuss the origins and importance of feature selection and outline recent contributions in a range of applications, from DNA microarray analysis to face recognition. Recent years have witnessed the creation of vast datasets and it seems clear that these will only continue to grow in size and number. This new big data scenario offers both opportunities and challenges to feature selection researchers, as there is a growing need for scalable yet efficient feature selection methods, given that existing methods are likely to prove inadequate.

255 citations


Journal ArticleDOI
TL;DR: This paper proposes a similarity measure for neighborhood based collaborative filtering, which uses all ratings made by a pair of users and finds importance of each pair of rated items by exploiting Bhattacharyya similarity.
Abstract: Collaborative filtering (CF) is the most successful approach for personalized product or service recommendations Neighborhood based collaborative filtering is an important class of CF, which is simple, intuitive and efficient product recommender system widely used in commercial domain Typically, neighborhood-based CF uses a similarity measure for finding similar users to an active user or similar products on which she rated Traditional similarity measures utilize ratings of only co-rated items while computing similarity between a pair of users Therefore, these measures are not suitable in a sparse data In this paper, we propose a similarity measure for neighborhood based CF, which uses all ratings made by a pair of users Proposed measure finds importance of each pair of rated items by exploiting Bhattacharyya similarity To show effectiveness of the measure, we compared performances of neighborhood based CFs using state-of-the-art similarity measures with the proposed measured based CF Recommendation results on a set of real data show that proposed measure based CF outperforms existing measures based CFs in various evaluation metrics

215 citations


Journal ArticleDOI
TL;DR: The aim of this research is to provide insights about the evolution of multi-granular fuzzy linguistic modeling approaches during the last years and discuss their drawbacks and advantages and some possible approaches that could improve the current multi- granular linguistic methodologies.
Abstract: The multi-granular fuzzy linguistic modeling allows the use of several linguistic term sets in fuzzy linguistic modeling. This is quite useful when the problem involves several people with different knowledge levels since they could describe each item with different precision and they could need more than one linguistic term set. Multi-granular fuzzy linguistic modeling has been frequently used in group decision making field due to its capability of allowing each expert to express his/her preferences using his/her own linguistic term set. The aim of this research is to provide insights about the evolution of multi-granular fuzzy linguistic modeling approaches during the last years and discuss their drawbacks and advantages. A systematic literature review is proposed to achieve this goal. Additionally, some possible approaches that could improve the current multi-granular linguistic methodologies are presented.

Journal ArticleDOI
TL;DR: A new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost, which involves enforcing random class proportions in addition to instance re-weighting.
Abstract: Proportions of the classes for each ensemble member are chosen randomly.Member training data: sub-sample and over-sample through SMOTE.RB-Boost combines Random Balance with AdaBoost.M2.Experiments with 86 data sets demonstrate the advantage of Random Balance. In Machine Learning, a data set is imbalanced when the class proportions are highly skewed. Imbalanced data sets arise routinely in many application domains and pose a challenge to traditional classifiers. We propose a new approach to building ensembles of classifiers for two-class imbalanced data sets, called Random Balance. Each member of the Random Balance ensemble is trained with data sampled from the training set and augmented by artificial instances obtained using SMOTE. The novelty in the approach is that the proportions of the classes for each ensemble member are chosen randomly. The intuition behind the method is that the proposed diversity heuristic will ensure that the ensemble contains classifiers that are specialized for different operating points on the ROC space, thereby leading to larger AUC compared to other ensembles of classifiers. Experiments have been carried out to test the Random Balance approach by itself, and also in combination with standard ensemble methods. As a result, we propose a new ensemble creation method called RB-Boost which combines Random Balance with AdaBoost.M2. This combination involves enforcing random class proportions in addition to instance re-weighting. Experiments with 86 imbalanced data sets from two well known repositories demonstrate the advantage of the Random Balance approach.

Journal ArticleDOI
TL;DR: This paper considers the task of forecasting the future electricity load from a time series of previous electricity loads, recorded every 5min, with a two-step approach that identifies a set of candidate features based on the data characteristics and then selects a subset of them using correlation and instance-based feature selection methods, applied in a systematic way.
Abstract: Appropriate feature (variable) selection is crucial for accurate forecasting. In this paper we consider the task of forecasting the future electricity load from a time series of previous electricity loads, recorded every 5min. We propose a two-step approach that identifies a set of candidate features based on the data characteristics and then selects a subset of them using correlation and instance-based feature selection methods, applied in a systematic way. We evaluate the performance of four feature selection methods - one traditional (autocorrelation) and three advanced machine learning (mutual information, RReliefF and correlation-based), in conjunction with state-of-the-art prediction algorithms (neural networks, linear regression and model tree rules), using two years of Australian electricity load data. Our results show that all feature selection methods were able to identify small subsets of highly relevant features. The best two prediction models utilized instance and autocorrelation based feature selectors and an efficient neural network prediction algorithm. They were more accurate than advanced exponential smoothing prediction models, a typical industry model and other baselines used for comparison.

Journal ArticleDOI
TL;DR: A novel supervised filter-based feature selection method using ACO that integrates graph clustering with a modified ant colony search process for the feature selection problem and has produced consistently better classification accuracies is proposed.
Abstract: A novel supervised filter-based feature selection method using ACO is proposed.Our method integrates graph clustering with a modified ant colony search process.Each feature set is evaluated using a novel measure without using any learning model.The sizes of the final feature set is determined automatically.The method is compared to the state-of-the-art filter and wrapper based methods. Feature selection is an important preprocessing step in machine learning and pattern recognition. The ultimate goal of feature selection is to select a feature subset from the original feature set to increase the performance of learning algorithms. In this paper a novel feature selection method based on the graph clustering approach and ant colony optimization is proposed for classification problems. The proposed method's algorithm works in three steps. In the first step, the entire feature set is represented as a graph. In the second step, the features are divided into several clusters using a community detection algorithm and finally in the third step, a novel search strategy based on the ant colony optimization is developed to select the final subset of features. Moreover the selected subset of each ant is evaluated using a supervised filter based method called novel separability index. Thus the proposed method does not need any learning model and can be classified as a filter based feature selection method. The proposed method integrates the community detection algorithm with a modified ant colony based search process for the feature selection problem. Furthermore, the sizes of the constructed subsets of each ant and also size of the final feature subset are determined automatically. The performance of the proposed method has been compared to those of the state-of-the-art filter and wrapper based feature selection methods on ten benchmark classification problems. The results show that our method has produced consistently better classification accuracies.

Journal ArticleDOI
TL;DR: This systematic and comprehensive review study provides an insight for researchers on interval type-2 fuzzy MCDM in terms of showing current state and potential areas to be focused in the future.
Abstract: A literature review of IT2FSs based MCDM approaches is presented.The systematic classification covers 35 MCDM approaches as single and hybrid.It provides an insight for researchers on IT2FSs based MCDM approaches. Multi criteria decision making (MCDM) is a discipline of operations research which has widely studied by researchers and practitioners. It deals with evaluating and ranking alternatives from the best to the worst under conflicting criteria with respect to decision maker(s) preferences. Since, many real-world systems include uncertainty and vagueness in information, MCDM uses fuzzy sets. In recent years, as an extension of the traditional fuzzy sets concept, type-2 fuzzy sets are preferred to have the capability of handling more uncertainty, and hence, to produce more accurate and robust results, MCDM approaches based on interval type-2 fuzzy sets (IT2FSs) have been published in various subjects. This paper reviews 82 different papers using various MCDM approaches based on IT2FSs which are classified into 35 categories. All papers with respect to single and hybrid approaches are discussed, pointing out their real applications or empirical results and limitations. Furthermore, the papers are statistically analyzed to show new trends within the context of IT2FSs. This systematic and comprehensive review study provides an insight for researchers on interval type-2 fuzzy MCDM in terms of showing current state and potential areas to be focused in the future.

Journal ArticleDOI
TL;DR: A new distance measure for IT2FS is proposed, which is comes as a sound alternative when being compared with the existing interval type-2 fuzzy distance measures, and a decision model integrating VIKOR method and prospect theory is proposed.
Abstract: Interval type-2 fuzzy set (IT2FS) offers interesting avenue to handle high order information and uncertainty in decision support system (DSS) when dealing with both extrinsic and intrinsic aspects of uncertainty. Recently, multiple attribute decision making (MADM) problems with interval type-2 fuzzy information have received increasing attentions both from researchers and practitioners. As a result, a number of interval type-2 fuzzy MADM methods have been developed. In this paper, we extend the VIKOR (VlseKriterijumska Optimizacijia I Kompromisno Resenje, in Serbian) method based on the prospect theory to accommodate interval type-2 fuzzy circumstances. First, we propose a new distance measure for IT2FS, which is comes as a sound alternative when being compared with the existing interval type-2 fuzzy distance measures. Then, a decision model integrating VIKOR method and prospect theory is proposed. A case study concerning a high-tech risk evaluation is provided to illustrate the applicability of the proposed method. In addition, a comparative analysis with interval type-2 fuzzy TOPSIS method is also presented.

Journal ArticleDOI
TL;DR: A comprehensive study is conducted to examine the effect of performing filter and wrapper based feature selection methods on financial distress prediction and finds that on average performing the genetic algorithm and logistic regression for feature selection can provide prediction improvements over the credit and bankruptcy datasets respectively.
Abstract: Financial distress prediction is always important for financial institutions in order for them to assess the financial health of enterprises and individuals. Bankruptcy prediction and credit scoring are two important issues in financial distress prediction where various statistical and machine learning techniques have been employed to develop financial prediction models. Since there are no generally agreed upon financial ratios as input features for model development, many studies consider feature selection as a pre-processing step in data mining before constructing the models. However, most works only focused on applying specific feature selection methods over either bankruptcy prediction or credit scoring problem domains. In this work, a comprehensive study is conducted to examine the effect of performing filter and wrapper based feature selection methods on financial distress prediction. In addition, the effect of feature selection on the prediction models obtained using various classification techniques is also investigated. In the experiments, two bankruptcy and two credit datasets are used. In addition, three filter and two wrapper based feature selection methods combined with six different prediction models are studied. Our experimental results show that there is no the best combination of the feature selection method and the classification technique over the four datasets. Moreover, depending on the chosen techniques, performing feature selection does not always improve the prediction performance. However, on average performing the genetic algorithm and logistic regression for feature selection can provide prediction improvements over the credit and bankruptcy datasets respectively.

Journal ArticleDOI
TL;DR: This paper presents a new fuzzy time series model combined with ant colony optimization (ACO) and auto-regression and shows that the proposed model outperforms other existing models.
Abstract: This paper presents a new fuzzy time series model combined with ant colony optimization (ACO) and auto-regression. The ACO is adopted to obtain a suitable partition of the universe of discourse to promote the forecasting performance. Furthermore, the auto-regression method is adopted instead of the traditional high-order method to make better use of historical information, which is proved to be more practical. To calculate coefficients of different orders, autocorrelation is used to calculate the initial values and then the Levenberg–Marquardt (LM) algorithm is employed to optimize these coefficients. Actual trading data of Taiwan capitalization weighted stock index is used as benchmark data. Computational results show that the proposed model outperforms other existing models.

Journal ArticleDOI
TL;DR: A new format-preserving encryption (FPE) scheme is constructed in this paper, which can be used to encrypt all types of character strings stored in database and is highly efficient and provably secure under existing security model.
Abstract: With the advent of cloud computing, individuals and organizations have become interested in moving their databases from local to remote cloud servers However, data owners and cloud service providers are not in the same trusted domain in practice For the protection of data privacy, sensitive data usually have to be encrypted before outsourcing, which makes effective database utilization a very challenging task To address this challenge, in this paper, we propose L-EncDB, a novel lightweight encryption mechanism for database, which (i) keeps the database structure and (ii) supports efficient SQL-based queries To achieve this goal, a new format-preserving encryption (FPE) scheme is constructed in this paper, which can be used to encrypt all types of character strings stored in database Extensive analysis demonstrates that the proposed L-EncDB scheme is highly efficient and provably secure under existing security model

Journal ArticleDOI
TL;DR: A new method for diagnosis of CAD using tunable-Q wavelet transform (TQWT) based features extracted from heart rate signals is presented and a novel CAD Risk index is developed using significant features to discriminate the two classes using a single number.
Abstract: Coronary artery disease (CAD) is the narrowing of coronary arteries leading to inadequate supply of nutrients and oxygen to the heart muscles. Over time, the condition can weaken the heart muscles and may lead to heart failure, arrhythmias and even sudden cardiac death. Hence, the early diagnosis of CAD can save life and prevent the risk of stroke. Electrocardiogram (ECG) depicts the state of the heart and can be used to detect the CAD. Small changes in the ECG signal indicate a particular disease. It is very difficult to decipher these minute changes in the ECG signal, as it is prone to artifacts and noise. Hence, we detect the R peaks from the ECG and use heart rate signals for our analysis. The manual inspection of the heart rate signals is time consuming, taxing and prone to errors due to fatigue. Hence, a decision support system independent of human intervention can yield accurate repeatable results. In this paper, we present a new method for diagnosis of CAD using tunable-Q wavelet transform (TQWT) based features extracted from heart rate signals. The heart rate signals are decomposed into various sub-bands using TQWT for better diagnostic feature extraction. The nonlinear feature called centered correntropy ( CC ) is computed on decomposed detail sub-band. Then the principal component analysis (PCA) is performed on these CC to transform the number of features. These clinically significant features are subjected to least squares support vector machine (LS-SVM) with different kernel functions for automated diagnosis. The experimental results demonstrate better classification accuracy, sensitivity, specificity and Matthews correlation coefficient using Morlet wavelet kernel function with optimized kernel and regularization parameters. Also, we have developed a novel CAD Risk index using significant features to discriminate the two classes using a single number. Our proposed methodology is more suitable in classification of normal and CAD heart rate signals and can aid the clinicians while screening the CAD patients.

Journal ArticleDOI
TL;DR: A comprehensive survey on decision making withIFPRs is presented with the aim of providing a clear perspective on the originality, the consistency, the prioritization, and the consensus of IFPRs.
Abstract: Intuitionistic fuzzy preference relations (IFPRs) have attracted more and more scholars' attentions in recent years due to their efficiency in representing experts' imprecise cognitions. With IFPRs, people can express their opinions over different pairs of alternatives from positive, negative and hesitative points of view. This paper presents a comprehensive survey on decision making with IFPRs with the aim of providing a clear perspective on the originality, the consistency, the prioritization, and the consensus of IFPRs. Finally, some directions for future research are pointed out.

Journal ArticleDOI
TL;DR: Improved standard FOA is improved by introducing the novel parameter integrated with chaos and overall research findings show that FOA with Chebyshev map show superiority in terms of reliability of global optimality and algorithm success rate.
Abstract: Display Omitted Development of new method named chaotic fruit fly optimization algorithm (CFOA).Fruit fly algorithm (FOA) is integrated with ten different chaos maps.Novel algorithm is tested on ten different well known benchmark problems.CFOA is compared with FOA, FOA with Levy distribution, and similar chaotic methods.Experiments show superiority of CFOA in terms of obtained statistical results. Fruit fly optimization algorithm (FOA) is recently presented metaheuristic technique that is inspired by the behavior of fruit flies. This paper improves the standard FOA by introducing the novel parameter integrated with chaos. The performance of developed chaotic fruit fly algorithm (CFOA) is investigated in details on ten well known benchmark problems using fourteen different chaotic maps. Moreover, we performed comparison studies with basic FOA, FOA with Levy flight distribution, and other recently published chaotic algorithms. Statistical results on every optimization task indicate that the chaotic fruit fly algorithm (CFOA) has a very fast convergence rate. In addition, CFOA is compared with recently developed chaos enhanced algorithms such as chaotic bat algorithm, chaotic accelerated particle swarm optimization, chaotic firefly algorithm, chaotic artificial bee colony algorithm, and chaotic cuckoo search. Overall research findings show that FOA with Chebyshev map show superiority in terms of reliability of global optimality and algorithm success rate.

Journal ArticleDOI
TL;DR: The weighted averaged operator and the ordered weighted averaging operator for the linguistic distribution assessments with interval symbolic proportions are presented and the transformation functions among the multi-granular unbalanced linguistic distribution assessment with interval symbolism proportions are developed.
Abstract: We define the linguistic distribution assessments and their operations.We propose the transformations in the multi-granular unbalanced context.We discuss the application of this proposal in the MAGDM. Linguistic distribution assessments with exact symbolic proportions have been recently presented. Due to various subjective and objective conditions, it is often difficult for decision makers to provide exact symbolic proportions in linguistic distribution assessments. In some situations, decision makers will express their preferences in multi-granular unbalanced linguistic contexts. Therefore, in this study, we propose the concept of linguistic distribution assessments with interval symbolic proportions under multi-granular unbalanced linguistic contexts. First, the weighted averaging operator and the ordered weighted averaging operator for the linguistic distribution assessments with interval symbolic proportions are presented. Then, we develop the transformation functions among the multi-granular unbalanced linguistic distribution assessments with interval symbolic proportions. Finally, we present the application of the proposed linguistic distribution assessments in multiple attribute group decision making.

Journal ArticleDOI
TL;DR: A novel correlation coefficient formulation to measure the relationship between two HFSs and the weighted correlation coefficient is proposed to make it more applicable and implemented in medical diagnosis and cluster analysis.
Abstract: Hesitant fuzzy set (HFS) is now attracting more and more scholars' attention due to its efficiency in representing comprehensively uncertain and vague information. Considering that correlation coefficient is one of the most widely used indices in data analysis, in this paper, after pointing out the weakness of the existing correlation coefficients between HFSs, we propose a novel correlation coefficient formulation to measure the relationship between two HFSs. As a departure, some new concepts, such as the mean of a hesitant fuzzy element (HFE), the hesitant degree of a HFE, the mean of a HFS, the variance of a HFS and the correlation between two HFSs are defined. Based on these concepts, a novel correlation coefficient formulation between two HFSs is developed. Afterwards, the upper and lower bounds of the correlation coefficient are defined. A theorem is given to determine these two bounds. It is stated that the correlation coefficient between two HFSs should also be hesitant and thus the upper and lower bounds can further help to identify the correlation coefficient between HFSs. The significant characteristic of the introduced correlation coefficient is that it lies in the interval -1,1], which is in accordance with the classical correlation coefficient in statistics, whereas all the old correlation coefficients between HFSs in the literature are within unit interval 0,1]. The weighted correlation coefficient is also proposed to make it more applicable. In order to show the efficiency of the proposed correlation coefficients, they are implemented in medical diagnosis and cluster analysis. Some numerical examples are given to support the findings and also illustrate the applicability and efficiency of the proposed correlation coefficient between HFSs.

Journal ArticleDOI
TL;DR: A novel preference learning algorithm is designed to learn a confidence for each uncertain examination record with the help of transaction records and is called adaptive Bayesian personalized ranking (ABPR), which has the merits of uncertainty reduction on examination records and accurate pairwise preference learning on implicit feedbacks.
Abstract: Implicit feedbacks have recently received much attention in recommendation communities due to their close relationship with real industry problem settings. However, most works only exploit users’ homogeneous implicit feedbacks such as users’ transaction records from “bought” activities, and ignore the other type of implicit feedbacks like examination records from “browsed” activities. The latter are usually more abundant though they are associated with high uncertainty w.r.t. users’ true preferences. In this paper, we study a new recommendation problem called heterogeneous implicit feedbacks (HIF), where the fundamental challenge is the uncertainty of the examination records. As a response, we design a novel preference learning algorithm to learn a confidence for each uncertain examination record with the help of transaction records. Specifically, we generalize Bayesian personalized ranking (BPR), a seminal pairwise learning algorithm for homogeneous implicit feedbacks, and learn the confidence adaptively, which is thus called adaptive Bayesian personalized ranking (ABPR). ABPR has the merits of uncertainty reduction on examination records and accurate pairwise preference learning on implicit feedbacks. Experimental results on two public data sets show that ABPR is able to leverage uncertain examination records effectively, and can achieve better recommendation performance than the state-of-the-art algorithm on various ranking-oriented evaluation metrics.

Journal ArticleDOI
TL;DR: A novel intelligent fault diagnosis method with multivariable ensemble-based incremental support vector machine (MEISVM) is proposed, which proves the capability of detecting multiple faults including complex compound faults and different severe degrees with the same fault.
Abstract: Since roller bearings are the key components in rotating machinery, detecting incipient failure occurring in bearings is an essential attempt to assure machinery operational safety. With a view to design a well intelligent system that can effectively correlate multiple monitored variables with corresponding defect types, a novel intelligent fault diagnosis method with multivariable ensemble-based incremental support vector machine (MEISVM) is proposed, which is testified on a benchmark of roller bearing experiment in comparison with other methods. Moreover, the proposed method is applied in the intelligent fault diagnosis of locomotive roller bearings, which proves the capability of detecting multiple faults including complex compound faults and different severe degrees with the same fault. Both experimental and engineering test results illustrate that the proposed method is effective in intelligent fault diagnosis of roller bearings from vibration signals.

Journal ArticleDOI
TL;DR: This work develops a multiview clustering method through which users are iteratively clustered from the views of both rating patterns and social trust relationships, which can effectively improve both the accuracy and coverage of recommendations as well as in the cold start situation.
Abstract: Although demonstrated to be efficient and scalable to large-scale data sets, clustering-based recommender systems suffer from relatively low accuracy and coverage. To address these issues, we develop a multiview clustering method through which users are iteratively clustered from the views of both rating patterns and social trust relationships. To accommodate users who appear in two different clusters simultaneously, we employ a support vector regression model to determine a prediction for a given item, based on user-, item- and prediction-related features. To accommodate (cold) users who cannot be clustered due to insufficient data, we propose a probabilistic method to derive a prediction from the views of both ratings and trust relationships. Experimental results on three real-world data sets demonstrate that our approach can effectively improve both the accuracy and coverage of recommendations as well as in the cold start situation, moving clustering-based recommender systems closer towards practical use.

Journal ArticleDOI
TL;DR: In this paper, a new algorithm called MLSMOTE (Multilabel Synthetic Minority Over-sampling Technique) is proposed to produce synthetic instances for imbalanced MLDs.
Abstract: Learning from imbalanced data is a problem which arises in many real-world scenarios, so does the need to build classifiers able to predict more than one class label simultaneously (multilabel classification). Dealing with imbalance by means of resampling methods is an approach that has been deeply studied lately, primarily in the context of traditional (non-multilabel) classification. In this paper the process of synthetic instance generation for multilabel datasets (MLDs) is studied and MLSMOTE (Multilabel Synthetic Minority Over-sampling Technique), a new algorithm aimed to produce synthetic instances for imbalanced MLDs, is proposed. An extensive review on how imbalance in the multilabel context has been tackled in the past is provided, along with a thorough experimental study aimed to verify the benefits of the proposed algorithm. Several multilabel classification algorithms and other multilabel oversampling methods are considered, as well as ensemble-based algorithms for imbalanced multilabel classification. The empirical analysis shows that MLSMOTE is able to improve the classification results produced by existent proposals.

Journal ArticleDOI
TL;DR: A comparative analysis of these multi-classifiers in terms of their advantages, disadvantages and computational complexity is performed.
Abstract: Least Squares Twin Support Vector Machine (LSTSVM) is a binary classifier and the extension of it to multiclass is still an ongoing research issue. In this paper, we extended the formulation of binary LSTSVM classifier to multi-class by using the concepts such as "One-versus-All", "One-versus-One", "All-versus-One" and Directed Acyclic Graph (DAG). This paper performs a comparative analysis of these multi-classifiers in terms of their advantages, disadvantages and computational complexity. The performance of all the four proposed classifiers has been validated on twelve benchmark datasets by using predictive accuracy and training-testing time. All the proposed multi-classifiers have shown better performance as compared to the typical multi-classifiers based on 'Support Vector Machine' and 'Twin Support Vector Machine'. Friedman's statistic and Nemenyi post hoc tests are also used to test significance of predictive accuracy differences between classifiers.

Journal ArticleDOI
TL;DR: This paper will review the progression of Evolutionary Fuzzy Systems by analyzing their taxon- omy and components, and present a discussion on the most recent and difficult Data Mining tasks to be addressed, and which are the latest trends in the development.
Abstract: Evolutionary Fuzzy Systems are a successful hybridization between fuzzy systems and Evolutionary Algo- rithms. They integrate both the management of imprecision/uncertainty and inherent interpretability of Fuzzy Rule Based Systems, with the learning and adaptation capabilities of evolutionary optimization. Over the years, many different approaches in Evolutionary Fuzzy Systems have been developed for improving the behavior of fuzzy systems, either acting on the Fuzzy Rule Base Systems' elements, or by defining new approaches for the evolutionary components. All these efforts have enabled Evolutionary Fuzzy Systems to be successfully applied in several areas of Data Mining and engineering. In accordance with the former, a wide number of applications have been also taken advantage of these types of systems. However, with the new advances in computation, novel problems and challenges are raised every day. All these issues motivate researchers to make an effort in releasing new ways of addressing them with Evolutionary Fuzzy Systems. In this paper, we will review the progression of Evolutionary Fuzzy Systems by analyzing their taxon- omy and components. We will also stress those problems and applications already tackled by this type of approach. We will present a discussion on the most recent and difficult Data Mining tasks to be addressed, and which are the latest trends in the development of Evolutionary Fuzzy Systems.