scispace - formally typeset
Search or ask a question

Showing papers in "Expert Systems With Applications in 2012"


Journal ArticleDOI
TL;DR: A state-of-the-art literature survey is conducted to taxonomize the research on TOPSIS applications and methodologies and suggests a framework for future attempts in this area for academic researchers and practitioners.
Abstract: Multi-Criteria Decision Aid (MCDA) or Multi-Criteria Decision Making (MCDM) methods have received much attention from researchers and practitioners in evaluating, assessing and ranking alternatives across diverse industries. Among numerous MCDA/MCDM methods developed to solve real-world decision problems, the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) continues to work satisfactorily across different application areas. In this paper, we conduct a state-of-the-art literature survey to taxonomize the research on TOPSIS applications and methodologies. The classification scheme for this review contains 266 scholarly papers from 103 journals since the year 2000, separated into nine application areas: (1) Supply Chain Management and Logistics, (2) Design, Engineering and Manufacturing Systems, (3) Business and Marketing Management, (4) Health, Safety and Environment Management, (5) Human Resources Management, (6) Energy Management, (7) Chemical Engineering, (8) Water Resources Management and (9) Other topics. Scholarly papers in the TOPSIS discipline are further interpreted based on (1) publication year, (2) publication journal, (3) authors' nationality and (4) other methods combined or compared with TOPSIS. We end our review paper with recommendations for future research in TOPSIS decision-making that is both forward-looking and practically oriented. This paper provides useful insights into the TOPSIS method and suggests a framework for future attempts in this area for academic researchers and practitioners.

1,571 citations


Journal ArticleDOI
TL;DR: In this study, most complete and up-to-date thirty-seven time domain and frequency domain features have been proposed and it is indicated that most time domain features are superfluity and redundancy.
Abstract: Feature extraction is a significant method to extract the useful information which is hidden in surface electromyography (EMG) signal and to remove the unwanted part and interferences. To be successful in classification of the EMG signal, selection of a feature vector ought to be carefully considered. However, numerous studies of the EMG signal classification have used a feature set that have contained a number of redundant features. In this study, most complete and up-to-date thirty-seven time domain and frequency domain features have been proposed to be studied their properties. The results, which were verified by scatter plot of features, statistical analysis and classifier, indicated that most time domain features are superfluity and redundancy. They can be grouped according to mathematical property and information into four main types: energy and complexity, frequency, prediction model, and time-dependence. On the other hand, all frequency domain features are calculated based on statistical parameters of EMG power spectral density. Its performance in class separability viewpoint is not suitable for EMG recognition system. Recommendation of features to avoid the usage of redundant features for classifier in EMG signal classification applications is also proposed in this study.

1,151 citations


Journal ArticleDOI
TL;DR: A hybrid fuzzy multi criteria decision making model that can assist in evaluating green suppliers is proposed that combines DEMATEL, ANP and TOPSIS methods in a fuzzy context and a case study is proposed for green supplier evaluation in a specific company, namely Ford Otosan.
Abstract: Highlights? This study proposes a hybrid fuzzy multi criteria decision making model that can assist in evaluating green suppliers. ? The proposed model integrates DEMATEL, ANP and TOPSIS methods in a fuzzy context. ? Ford Otosan is selected as a case company in this study for the evaluation of green supplier alternatives. ? The supplied case study provides additional insights for research and practical applications. It is well known that "green" principles and strategies have become vital for companies as the public awareness increased against their environmental impacts. A company's environmental performance is not only related to the company's inner environmental efforts, but also it is affected by the suppliers' environmental performance and image. For industries, environmentally responsible manufacturing, return flows, and related processes require green supply chain (GSC) and accompanying suppliers with environmental/green competencies. During recent years, how to determine suitable and green suppliers in the supply chain has become a key strategic consideration. Therefore this paper examines GSC management (GSCM) and GSCM capability dimensions to propose an evaluation framework for green suppliers. However, the nature of supplier selection is a complex multi-criteria problem including both quantitative and qualitative factors which may be in conflict and may also be uncertain. The identified components are integrated into a novel hybrid fuzzy multiple criteria decision making (MCDM) model combines the fuzzy Decision Making Trial and Evaluation Laboratory Model (DEMATEL), the Analytical Network Process (ANP), and Technique for Order Performance by Similarity to Ideal Solution (TOPSIS) in a fuzzy context. A case study is proposed for green supplier evaluation in a specific company, namely Ford Otosan.

820 citations


Journal ArticleDOI
TL;DR: This study investigates the quantitative relationship between knowledge sharing, innovation and performance and develops a research model positing that knowledge sharing not only have positive relationship with performance directly but also influence innovation which in turn contributes to firm performance.
Abstract: Highlights?Exploring the effect Knowledge sharing (KS) have on innovation and firm performance. ?Confirming the mediating role of innovation between KS and performance. ?finding that explicit KS impacts innovation speed more than financial performance. ?finding that tacit KS impacts innovation quality more than operational performance. This study investigates the quantitative relationship between knowledge sharing, innovation and performance. Based on the literature review, we develop a research model positing that knowledge sharing not only have positive relationship with performance directly but also influence innovation which in turn contributes to firm performance. This model is empirically tested using data collected from 89 high technology firms in Jiangsu Province of China. It is found that both explicit and tacit knowledge sharing practices facilitate innovation and performance. Explicit knowledge sharing has more significant effects on innovation speed and financial performance while tacit knowledge sharing has more significant effects on innovation quality and operational performance.

812 citations


Journal ArticleDOI
TL;DR: This research provides information about trends in recommender systems research by examining the publication years of the articles, and provides practitioners and researchers with insight and future direction on recommender system research.
Abstract: Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. Although academic research on recommender systems has increased significantly over the past 10years, there are deficiencies in the comprehensive literature review and classification of that research. For that reason, we reviewed 210 articles on recommender systems from 46 journals published between 2001 and 2010, and then classified those by the year of publication, the journals in which they appeared, their application fields, and their data mining techniques. The 210 articles are categorized into eight application fields (books, documents, images, movie, music, shopping, TV programs, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). Our research provides information about trends in recommender systems research by examining the publication years of the articles, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this paper helps anyone who is interested in recommender systems research with insight for future research direction.

604 citations


Journal ArticleDOI
TL;DR: It is suggested that different social science methodologies, such as psychology, cognitive science and human behavior might implement DMT, as an alternative to the methodologies already on offer, and the direction of any future developments in DMT methodologies and applications is discussed.
Abstract: In order to determine how data mining techniques (DMT) and their applications have developed, during the past decade, this paper reviews data mining techniques and their applications and development, through a survey of literature and the classification of articles, from 2000 to 2011. Keyword indices and article abstracts were used to identify 216 articles concerning DMT applications, from 159 academic journals (retrieved from five online databases), this paper surveys and classifies DMT, with respect to the following three areas: knowledge types, analysis types, and architecture types, together with their applications in different research and practical domains. A discussion deals with the direction of any future developments in DMT methodologies and applications: (1) DMT is finding increasing applications in expertise orientation and the development of applications for DMT is a problem-oriented domain. (2) It is suggested that different social science methodologies, such as psychology, cognitive science and human behavior might implement DMT, as an alternative to the methodologies already on offer. (3) The ability to continually change and acquire new understanding is a driving force for the application of DMT and this will allow many new future applications.

563 citations


Journal ArticleDOI
TL;DR: This study presents an integrated approach for selecting the appropriate supplier in the supply chain, addressing the carbon emission issue, using fuzzy-AHP and fuzzy multi-objective linear programming.
Abstract: Environmental sustainability of a supply chain depends on the purchasing strategy of the supply chain members. Most of the earlier models have focused on cost, quality, lead time, etc. issues but not given enough importance to carbon emission for supplier evaluation. Recently, there is a growing pressure on supply chain members for reducing the carbon emission of their supply chain. This study presents an integrated approach for selecting the appropriate supplier in the supply chain, addressing the carbon emission issue, using fuzzy-AHP and fuzzy multi-objective linear programming. Fuzzy AHP (FAHP) is applied first for analyzing the weights of the multiple factors. The considered factors are cost, quality rejection percentage, late delivery percentage, green house gas emission and demand. These weights of the multiple factors are used in fuzzy multi-objective linear programming for supplier selection and quota allocation. An illustration with a data set from a realistic situation is presented to demonstrate the effectiveness of the proposed model. The proposed approach can handle realistic situation when there is information vagueness related to inputs.

552 citations


Journal ArticleDOI
TL;DR: The results from this empirical study indicate that the random forest and gradient boosting classifiers perform very well in a credit scoring context and are able to cope comparatively well with pronounced class imbalances in these data sets.
Abstract: In this paper, we set out to compare several techniques that can be used in the analysis of imbalanced credit scoring data sets. In a credit scoring context, imbalanced data sets frequently occur as the number of defaulting loans in a portfolio is usually much lower than the number of observations that do not default. As well as using traditional classification techniques such as logistic regression, neural networks and decision trees, this paper will also explore the suitability of gradient boosting, least square support vector machines and random forests for loan default prediction.Five real-world credit scoring data sets are used to build classifiers and test their performance. In our experiments, we progressively increase class imbalance in each of these data sets by randomly under-sampling the minority class of defaulters, so as to identify to what extent the predictive power of the respective techniques is adversely affected. The performance criterion chosen to measure this effect is the area under the receiver operating characteristic curve (AUC); Friedman's statistic and Nemenyi post hoc tests are used to test for significance of AUC differences between techniques.The results from this empirical study indicate that the random forest and gradient boosting classifiers perform very well in a credit scoring context and are able to cope comparatively well with pronounced class imbalances in these data sets. We also found that, when faced with a large class imbalance, the C4.5 decision tree algorithm, quadratic discriminant analysis and k-nearest neighbours perform significantly worse than the best performing classifiers.

528 citations


Journal ArticleDOI
TL;DR: A fuzzy approach, allowing experts to use linguistic variables for determining S, O, and D, is considered for FMEA by applying fuzzy 'technique for order preference by similarity to ideal solution' (TOPSIS) integrated with fuzzy 'analytical hierarchy process' (AHP).
Abstract: Failure mode and effects analysis (FMEA) is a widely used engineering technique for designing, identifying and eliminating known and/or potential failures, problems, errors and so on from system, design, process, and/or service before they reach the customer (Stamatis, 1995). In a typical FMEA, for each failure modes, three risk factors; severity (S), occurrence (O), and detectability (D) are evaluated and a risk priority number (RPN) is obtained by multiplying these factors. There are significant efforts which have been made in FMEA literature to overcome the shortcomings of the crisp RPN calculation. In this study a fuzzy approach, allowing experts to use linguistic variables for determining S, O, and D, is considered for FMEA by applying fuzzy 'technique for order preference by similarity to ideal solution' (TOPSIS) integrated with fuzzy 'analytical hierarchy process' (AHP). The hypothetical case study demonstrated the applicability of the model in FMEA under fuzzy environment.

483 citations


Journal ArticleDOI
TL;DR: Three findings appear to be consistently supported by the experimental results: Multiple-Output strategies are the best performing approaches, deseasonalization leads to uniformly improved forecast accuracy, and input selection is more effective when performed in conjunction with dese Masonalization.
Abstract: Multi-step ahead forecasting is still an open challenge in time series forecasting. Several approaches that deal with this complex problem have been proposed in the literature but an extensive comparison on a large number of tasks is still missing. This paper aims to fill this gap by reviewing existing strategies for multi-step ahead forecasting and comparing them in theoretical and practical terms. To attain such an objective, we performed a large scale comparison of these different strategies using a large experimental benchmark (namely the 111 series from the NN5 forecasting competition). In addition, we considered the effects of deseasonalization, input variable selection, and forecast combination on these strategies and on multi-step ahead forecasting at large. The following three findings appear to be consistently supported by the experimental results: Multiple-Output strategies are the best performing approaches, deseasonalization leads to uniformly improved forecast accuracy, and input selection is more effective when performed in conjunction with deseasonalization.

456 citations


Journal ArticleDOI
TL;DR: This paper investigates for the first time the use of Permutation Entropy (PE) as a feature for automated epileptic seizure detection using the fact that the EEG during epileptic seizures is characterized by lower PE than normal EEG.
Abstract: The electroencephalogram (EEG) has proven a valuable tool in the study and detection of epilepsy. This paper investigates for the first time the use of Permutation Entropy (PE) as a feature for automated epileptic seizure detection. A Support Vector Machine (SVM) is used to classify segments of normal and epileptic EEG based on PE values. The proposed system utilizes the fact that the EEG during epileptic seizures is characterized by lower PE than normal EEG. It is shown that average sensitivity of 94.38% and average specificity of 93.23% is obtained by using PE as a feature to characterize epileptic and seizure-free EEG, while 100% sensitivity and specificity were also obtained in single-trial classifications.

Journal ArticleDOI
TL;DR: This paper survey and classify most of the ontology-based approaches developed in order to evaluate their advantages and limitations and compare their expected performance both from theoretical and practical points of view, and presents a new ontological-based measure relying on the exploitation of taxonomical features.
Abstract: Estimation of the semantic likeness between words is of great importance in many applications dealing with textual data such as natural language processing, knowledge acquisition and information retrieval. Semantic similarity measures exploit knowledge sources as the base to perform the estimations. In recent years, ontologies have grown in interest thanks to global initiatives such as the Semantic Web, offering an structured knowledge representation. Thanks to the possibilities that ontologies enable regarding semantic interpretation of terms many ontology-based similarity measures have been developed. According to the principle in which those measures base the similarity assessment and the way in which ontologies are exploited or complemented with other sources several families of measures can be identified. In this paper, we survey and classify most of the ontology-based approaches developed in order to evaluate their advantages and limitations and compare their expected performance both from theoretical and practical points of view. We also present a new ontology-based measure relying on the exploitation of taxonomical features. The evaluation and comparison of our approach's results against those reported by related works under a common framework suggest that our measure provides a high accuracy without some of the limitations observed in other works.

Journal ArticleDOI
TL;DR: A hybrid multi criteria decision making approach that can assist in evaluating a set of hospital web site alternatives is used and the applicability of the e-sq framework is shown in explaining the complexity of aspects observed in the implementation of healthcare services via internet.
Abstract: Highlights? The aim of this study is to use a hybrid multi criteria decision making approach that can assist in evaluating a set of hospital web site alternatives. ? This study includes a combined fuzzy AHP and fuzzy TOPSIS methods to measure electronic service quality performance. ? Tangibles, responsiveness, reliability, information quality, assurance and empathy are determined as the main criteria for evaluating web based healthcare service quality. ? The proposed approach is used to evaluate the performance of some leading hospitals' web sites in Turkey. ? The electronic service quality instrument developed in this study can be used to monitor and improve the quality of service delivered to customers via internet. Service sector is under pressure to deliver continuing performance and quality improvement while being customer-focused. In recent terms, there exists web based or electronic service quality (e-sq) concept. With the birth of electronic commerce, it has become important to be able to monitor and enhance e-sq. Therefore, this study will examine the e-sq concept and determine the key components of e-sq. The e-sq framework is employed by the aid of service quality (SERVQUAL) methodology as the theoretical instrument. Finally, proposed e-sq framework is illustrated with a web service performance example of healthcare sector in Turkey by using a combined multiple criteria decision making (MCDM) methodology containing fuzzy analytic hierarchy process (AHP) and fuzzy technique for order performance by similarity to ideal solution (TOPSIS). The work presented in this paper shows the applicability of the e-sq framework in explaining the complexity of aspects observed in the implementation of healthcare services via internet.

Journal ArticleDOI
TL;DR: The main computational, morphometric and image processing methods that have been used in recent years to analyze images of plants are reviewed, introducing readers to relevant botanical concepts along the way.
Abstract: Plants are of fundamental importance to life on Earth. The shapes of leaves, petals and whole plants are of great significance to plant science, as they can help to distinguish between different species, to measure plant health, and even to model climate change. The growing interest in biodiversity and the increasing availability of digital images combine to make this topic timely. The global shortage of expert taxonomists further increases the demand for software tools that can recognize and characterize plants from images. A robust automated species identification system would allow people with only limited botanical training and expertise to carry out valuable field work. We review the main computational, morphometric and image processing methods that have been used in recent years to analyze images of plants, introducing readers to relevant botanical concepts along the way. We discuss the measurement of leaf outlines, flower shape, vein structures and leaf textures, and describe a wide range of analytical methods in use. We also discuss a number of systems that apply this research, including prototypes of hand-held digital field guides and various robotic systems used in agriculture. We conclude with a discussion of ongoing work and outstanding problems in the area.

Journal ArticleDOI
TL;DR: This thesis proposes a new senti-lexicon for the sentiment analysis of restaurant reviews using the improved Naive Bayes algorithm, and shows that when this algorithm was used and a unigrams+bigrams was used as the feature, the gap between the positive accuracy and the negative accuracy was narrowed.
Abstract: The existing senti-lexicon does not sufficiently accommodate the sentiment word that is used in the restaurant review. Therefore, this thesis proposes a new senti-lexicon for the sentiment analysis of restaurant reviews. When classifying a review document as a positive sentiment and as a negative sentiment using the supervised learning algorithm, there is a tendency for the positive classification accuracy to appear up to approximately 10% higher than the negative classification accuracy. This creates a problem of decreasing the average accuracy when the accuracies of the two classes are expressed as an average value. In order to mitigate such problem, an improved Naive Bayes algorithm is proposed. The result of the experiment showed that when this algorithm was used and a unigrams+bigrams was used as the feature, the gap between the positive accuracy and the negative accuracy was narrowed to 3.6% compared to when the original Naive Bayes was used, and that the 28.5% gap was able to be narrowed compared to when SVM was used. Additionally, the use of this algorithm based on the senti-lexicon showed an accuracy that improved by a maximum of 10.2% in recall and a maximum of 26.2% in precision compared to when SVM was used, and by a maximum of 5.6% in recall and a maximum of 1.9% in precision compared to when Naive Bayes was used.

Journal ArticleDOI
TL;DR: With the combination of clustering method, ant colony algorithm and support vector machine, an efficient and reliable classifier is developed to judge a network visit to be normal or not.
Abstract: The efficiency of the intrusion detection is mainly depended on the dimension of data features. By using the gradually feature removal method, 19 critical features are chosen to represent for the various network visit. With the combination of clustering method, ant colony algorithm and support vector machine (SVM), an efficient and reliable classifier is developed to judge a network visit to be normal or not. Moreover, the accuracy achieves 98.6249% in 10-fold cross validation and the average Matthews correlation coefficient (MCC) achieves 0.861161.

Journal ArticleDOI
TL;DR: A modified ACO model is proposed which is applied for network routing problem and compared with existing traditional routing algorithms.
Abstract: Ant Colony Optimization (ACO) is a Swarm Intelligence technique which inspired from the foraging behaviour of real ant colonies. The ants deposit pheromone on the ground in order to mark the route for identification of their routes from the nest to food that should be followed by other members of the colony. This ACO exploits an optimization mechanism for solving discrete optimization problems in various engineering domain. From the early nineties, when the first Ant Colony Optimization algorithm was proposed, ACO attracted the attention of increasing numbers of researchers and many successful applications are now available. Moreover, a substantial corpus of theoretical results is becoming available that provides useful guidelines to researchers and practitioners in further applications of ACO. This paper review varies recent research and implementation of ACO, and proposed a modified ACO model which is applied for network routing problem and compared with existing traditional routing algorithms.

Journal ArticleDOI
TL;DR: The Hidden Naive Bayes (HNB) model can be applied to intrusion detection problems that suffer from dimensionality, highly correlated features and high network data stream volumes and significantly improves the accuracy of detecting denial-of-services (DoS) attacks.
Abstract: With increasing Internet connectivity and traffic volume, recent intrusion incidents have reemphasized the importance of network intrusion detection systems for combating increasingly sophisticated network attacks. Techniques such as pattern recognition and the data mining of network events are often used by intrusion detection systems to classify the network events as either normal events or attack events. Our research study claims that the Hidden Naive Bayes (HNB) model can be applied to intrusion detection problems that suffer from dimensionality, highly correlated features and high network data stream volumes. HNB is a data mining model that relaxes the Naive Bayes method's conditional independence assumption. Our experimental results show that the HNB model exhibits a superior overall performance in terms of accuracy, error rate and misclassification cost compared with the traditional Naive Bayes model, leading extended Naive Bayes models and the Knowledge Discovery and Data Mining (KDD) Cup 1999 winner. Our model performed better than other leading state-of-the art models, such as SVM, in predictive accuracy. The results also indicate that our model significantly improves the accuracy of detecting denial-of-services (DoS) attacks.

Journal ArticleDOI
TL;DR: A fuzzy FMEA based on fuzzy set theory and VIKOR method is proposed for prioritization of failure modes, specifically intended to address some limitations of the traditional FMEa.
Abstract: Failure mode and effects analysis (FMEA) is a widely used risk assessment tool for defining, identifying, and eliminating potential failures or problems in products, process, designs, and services In traditional FMEA, the risk priorities of failure modes are determined by using risk priority numbers (RPNs), which can be obtained by multiplying the scores of risk factors like occurrence (O), severity (S), and detection (D) However, the crisp RPN method has been criticized to have several deficiencies In this paper, linguistic variables, expressed in trapezoidal or triangular fuzzy numbers, are used to assess the ratings and weights for the risk factors O, S, and D For selecting the most serious failure modes, the extended VIKOR method is used to determine risk priorities of the failure modes that have been identified As a result, a fuzzy FMEA based on fuzzy set theory and VIKOR method is proposed for prioritization of failure modes, specifically intended to address some limitations of the traditional FMEA A case study, which assesses the risk of general anesthesia process, is presented to demonstrate the application of the proposed model under fuzzy environment

Journal ArticleDOI
TL;DR: A model for measuring success of e-learning systems in universities, based on opinions of 33 experts, and assessing their suggestions, research indicators were finalized and the final model (MELSS Model) was presented.
Abstract: In the era of internet, universities and higher education institutions are increasingly tend to provide e-learning. For suitable planning and more enjoying the benefits of this educational approach, a model for measuring success of e-learning systems is essential. So in this paper, we try to survey and present a model for measuring success of e-learning systems in universities. For this purpose, at first, according to literature review, a conceptual model was designed. Then, based on opinions of 33 experts, and assessing their suggestions, research indicators were finalized. After that, to examine the relationships between components and finalize the proposed model, a case study was done in 5 universities: Amir Kabir University, Tehran University, Shahid Beheshti University, Iran University of Science & Technology and Khaje Nasir Toosi University of Technology. Finally, by analyzing questionnaires completed by 369 instructors, students and alumni, which were e-learning systems user, the final model (MELSS Model).

Journal ArticleDOI
TL;DR: To this knowledge, this model is the first effort to consider supplier selection, order allocation, and CLSC network configuration, simultaneously, and the mathematical programming model is validated through numerical analysis.
Abstract: Reverse logistics consists of all operations related to the reuse of products. External suppliers are one of the important members of reverse logistics and closed loop supply chain (CLSC) networks. However in CLSC network configuration models, suppliers are assessed based on purchasing cost and other factors such as on-time delivery are ignored. In this research, a general closed loop supply chain network is examined that includes manufacturer, disassembly, refurbishing, and disposal sites. Meanwhile, it is managed by the manufacturer. We propose an integrated model which has two phases. In the first phase, a framework for supplier selection criteria in RL is proposed. Besides, a fuzzy method is designed to evaluate suppliers based on qualitative criteria. The output of this stage is the weight of each supplier according to each part. In the second phase, we propose a multi objective mixed-integer linear programming model to determine which suppliers and refurbishing sites should be selected (strategic decisions), and find out the optimal number of parts and products in CLSC network (tactical decisions). The objective functions maximize profit and weights of suppliers, and one of them minimizes defect rates. To our knowledge, this model is the first effort to consider supplier selection, order allocation, and CLSC network configuration, simultaneously. The mathematical programming model is validated through numerical analysis.

Journal ArticleDOI
TL;DR: An improved KNN algorithm is proposed, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization, which can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naive Bayes and Support Vector Machine classifiers.
Abstract: Text categorization is a significant tool to manage and organize the surging text data. Many text categorization algorithms have been explored in previous literatures, such as KNN, Naive Bayes and Support Vector Machine. KNN text categorization is an effective but less efficient classification method. In this paper, we propose an improved KNN algorithm for text categorization, which builds the classification model by combining constrained one pass clustering algorithm and KNN text categorization. Empirical results on three benchmark corpora show that our algorithm can reduce the text similarity computation substantially and outperform the-state-of-the-art KNN, Naive Bayes and Support Vector Machine classifiers. In addition, the classification model constructed by the proposed algorithm can be updated incrementally, and it has great scalability in many real-word applications.

Journal ArticleDOI
TL;DR: The lightweight IDS has been developed by using a wrapper based feature selection algorithm that maximizes the specificity and sensitivity of the IDS as well as by employing a neural ensemble decision tree iterative procedure to evolve optimal features.
Abstract: The objective of this paper is to construct a lightweight Intrusion Detection System (IDS) aimed at detecting anomalies in networks. The crucial part of building lightweight IDS depends on preprocessing of network data, identifying important features and in the design of efficient learning algorithm that classify normal and anomalous patterns. Therefore in this work, the design of IDS is investigated from these three perspectives. The goals of this paper are (i) removing redundant instances that causes the learning algorithm to be unbiased (ii) identifying suitable subset of features by employing a wrapper based feature selection algorithm (iii) realizing proposed IDS with neurotree to achieve better detection accuracy. The lightweight IDS has been developed by using a wrapper based feature selection algorithm that maximizes the specificity and sensitivity of the IDS as well as by employing a neural ensemble decision tree iterative procedure to evolve optimal features. An extensive experimental evaluation of the proposed approach with a family of six decision tree classifiers namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern has been introduced.

Journal ArticleDOI
TL;DR: The aim of this work is to find the best way for describing a given texture using a local binary pattern (LBP) based approach and to compare several texture descriptors, it is shown that the proposed approach coupled with random subspace ensemble outperforms other recent state-of-the-art approaches.
Abstract: The aim of this work is to find the best way for describing a given texture using a local binary pattern (LBP) based approach. First several different approaches are compared, then the best fusion approach is tested on different datasets and compared with several approaches proposed in the literature (for fair comparisons, when possible we have used code shared by the original authors).Our experiments show that a fusion approach based on uniform local quinary pattern (LQP) and a rotation invariant local quinary pattern, where a bin selection based on variance is performed and Neighborhood Preserving Embedding (NPE) feature transform is applied, obtains a method that performs well on all tested datasets.As the classifier, we have tested a stand-alone support vector machine (SVM) and a random subspace ensemble of SVM. We compare several texture descriptors and show that our proposed approach coupled with random subspace ensemble outperforms other recent state-of-the-art approaches. This conclusion is based on extensive experiments conducted in several domains using six benchmark databases.

Journal ArticleDOI
TL;DR: This work analyzes the performance of data level proposals against algorithm level proposals focusing in cost-sensitive models and versus a hybrid procedure that combines those two approaches to show that an unique approach among the rest cannot be highlighted.
Abstract: Class imbalance is among the most persistent complications which may confront the traditional supervised learning task in real-world applications. The problem occurs, in the binary case, when the number of instances in one class significantly outnumbers the number of instances in the other class. This situation is a handicap when trying to identify the minority class, as the learning algorithms are not usually adapted to such characteristics. The approaches to deal with the problem of imbalanced datasets fall into two major categories: data sampling and algorithmic modification. Cost-sensitive learning solutions incorporating both the data and algorithm level approaches assume higher misclassification costs with samples in the minority class and seek to minimize high cost errors. Nevertheless, there is not a full exhaustive comparison between those models which can help us to determine the most appropriate one under different scenarios. The main objective of this work is to analyze the performance of data level proposals against algorithm level proposals focusing in cost-sensitive models and versus a hybrid procedure that combines those two approaches. We will show, by means of a statistical comparative analysis, that we cannot highlight an unique approach among the rest. This will lead to a discussion about the data intrinsic characteristics of the imbalanced classification problem which will help to follow new paths that can lead to the improvement of current models mainly focusing on class overlap and dataset shift in imbalanced classification.

Journal ArticleDOI
TL;DR: This paper presents two novel methods for segmentation of images based on the Fractional-Order Darwinian Particle Swarm Optimization (FODPSO) and Darwinian particle Swarmoptimization for determining the n-1 optimal n-level threshold on a given image.
Abstract: Image segmentation has been widely used in document image analysis for extraction of printed characters, map processing in order to find lines, legends, and characters, topological features extraction for extraction of geographical information, and quality inspection of materials where defective parts must be delineated among many other applications. In image analysis, the efficient segmentation of images into meaningful objects is important for classification and object recognition. This paper presents two novel methods for segmentation of images based on the Fractional-Order Darwinian Particle Swarm Optimization (FODPSO) and Darwinian Particle Swarm Optimization (DPSO) for determining the n-1 optimal n-level threshold on a given image. The efficiency of the proposed methods is compared with other well-known thresholding segmentation methods. Experimental results show that the proposed methods perform better than other methods when considering a number of different measures.

Journal ArticleDOI
TL;DR: An investigation into accurately discriminating between individual and combined fingers movements using surface EMG signals, so that different finger postures of a prosthetic hand can be controlled in response.
Abstract: A fundamental component of many modern prostheses is the myoelectric control system, which uses the electromyogram (EMG) signals from an individual's muscles to control the prosthesis movements. Despite the extensive research focus on the myoelectric control of arm and gross hand movements, more dexterous individual and combined fingers control has not received the same attention. The main contribution of this paper is an investigation into accurately discriminating between individual and combined fingers movements using surface EMG signals, so that different finger postures of a prosthetic hand can be controlled in response. For this purpose, two EMG electrodes located on the human forearm are utilized to collect the EMG data from eight participants. Various feature sets are extracted and projected in a manner that ensures maximum separation between the finger movements and then fed to two different classifiers. The second contribution is the use of a Bayesian data fusion postprocessing approach to maximize the probability of correct classification of the EMG data belonging to different movements. Practical results and statistical significance tests prove the feasibility of the proposed approach with an average classification accuracy of ~90% across different subjects proving the significance of the proposed fusion scheme in finger movement classification.

Journal ArticleDOI
TL;DR: This work automatically classified five types of ECG beats of MIT-BIH arrhythmia database using feed forward neural network and Least Square-Support Vector Machine and obtained the highest accuracy using the first approach using principal components of segmentedECG beats.
Abstract: Electrocardiogram (ECG) is the P, QRS, T wave indicating the electrical activity of the heart. The subtle changes in amplitude and duration of ECG cannot be deciphered precisely by the naked eye, hence imposing the need for a computer assisted diagnosis tool. In this paper we have automatically classified five types of ECG beats of MIT-BIH arrhythmia database. The five types of beats are Normal (N), Right Bundle Branch Block (RBBB), Left Bundle Branch Block (LBBB), Atrial Premature Contraction (APC) and Ventricular Premature Contraction (VPC). In this work, we have compared the performances of three approaches. The first approach uses principal components of segmented ECG beats, the second approach uses principal components of error signals of linear prediction model, whereas the third approach uses principal components of Discrete Wavelet Transform (DWT) coefficients as features. These features from three approaches were independently classified using feed forward neural network (NN) and Least Square-Support Vector Machine (LS-SVM). We have obtained the highest accuracy using the first approach using principal components of segmented ECG beats with average sensitivity of 99.90%, specificity of 99.10%, PPV of 99.61% and classification accuracy of 98.11%. The system developed is clinically ready to deploy for mass screening programs.

Journal ArticleDOI
TL;DR: The experimental results show that the new features with the six modelling techniques are more effective than the existing ones for customer churn prediction in the telecommunication service field.
Abstract: This paper presents a new set of features for land-line customer churn prediction, including 2 six-month Henley segmentation, precise 4-month call details, line information, bill and payment information, account information, demographic profiles, service orders, complain information, etc. Then the seven prediction techniques (Logistic Regressions, Linear Classifications, Naive Bayes, Decision Trees, Multilayer Perceptron Neural Networks, Support Vector Machines and the Evolutionary Data Mining Algorithm) are applied in customer churn as predictors, based on the new features. Finally, the comparative experiments were carried out to evaluate the new feature set and the seven modelling techniques for customer churn prediction. The experimental results show that the new features with the six modelling techniques are more effective than the existing ones for customer churn prediction in the telecommunication service field.

Journal ArticleDOI
TL;DR: A new method based on the firefly algorithm to construct the codebook of vector quantization, called FF-LBG algorithm, which shows that the reconstructed images get higher quality than those generated form the LBG, PSO and QPSO, but it is no significant superiority to the HBMO algorithm.
Abstract: The vector quantization (VQ) was a powerful technique in the applications of digital image compression. The traditionally widely used method such as the Linde-Buzo-Gray (LBG) algorithm always generated local optimal codebook. Recently, particle swarm optimization (PSO) was adapted to obtain the near-global optimal codebook of vector quantization. An alterative method, called the quantum particle swarm optimization (QPSO) had been developed to improve the results of original PSO algorithm. The honey bee mating optimization (HBMO) was also used to develop the algorithm for vector quantization. In this paper, we proposed a new method based on the firefly algorithm to construct the codebook of vector quantization. The proposed method uses LBG method as the initial of FF algorithm to develop the VQ algorithm. This method is called FF-LBG algorithm. The FF-LBG algorithm is compared with the other four methods that are LBG, particle swarm optimization, quantum particle swarm optimization and honey bee mating optimization algorithms. Experimental results show that the proposed FF-LBG algorithm is faster than the other four methods. Furthermore, the reconstructed images get higher quality than those generated form the LBG, PSO and QPSO, but it is no significant superiority to the HBMO algorithm.