scispace - formally typeset
Search or ask a question

Showing papers in "Expert Systems With Applications in 2016"


Journal ArticleDOI
TL;DR: A novel multi-objective algorithm called Multi-Objective Grey Wolf Optimizer (MOGWO) is proposed in order to optimize problems with multiple objectives for the first time.
Abstract: Due to the novelty of the Grey Wolf Optimizer (GWO), there is no study in the literature to design a multi-objective version of this algorithm. This paper proposes a Multi-Objective Grey Wolf Optimizer (MOGWO) in order to optimize problems with multiple objectives for the first time. A fixed-sized external archive is integrated to the GWO for saving and retrieving the Pareto optimal solutions. This archive is then employed to define the social hierarchy and simulate the hunting behavior of grey wolves in multi-objective search spaces. The proposed method is tested on 10 multi-objective benchmark problems and compared with two well-known meta-heuristics: Multi-Objective Evolutionary Algorithm Based on Decomposition (MOEA/D) and Multi-Objective Particle Swarm Optimization (MOPSO). The qualitative and quantitative results show that the proposed algorithm is able to provide very competitive results and outperforms other algorithms. Note that the source codes of MOGWO are publicly available at http://www.alimirjalili.com/GWO.html. A novel multi-objective algorithm called Multi-objective Grey Wolf Optimizer is proposed.MOGWO is benchmarked on 10 challenging multi-objective test problems.The quantitative results show the superior convergence and coverage of MOGWO.The coverage ability of MOGWO is confirmed by the qualitative results as well.

967 citations


Journal ArticleDOI
TL;DR: A deep convolutional neural network is proposed to perform efficient and effective HAR using smartphone sensors by exploiting the inherent characteristics of activities and 1D time-series signals, at the same time providing a way to automatically and data-adaptively extract robust features from raw data.
Abstract: This paper proposes a deep convolutional neural network for HAR using smartphone sensors.Experiments show that the proposed method derives relevant and more complex features.The method achieved an almost perfect classification on moving activities.It outperforms other state-of-the-art data mining techniques in HAR. Human activities are inherently translation invariant and hierarchical. Human activity recognition (HAR), a field that has garnered a lot of attention in recent years due to its high demand in various application domains, makes use of time-series sensor data to infer activities. In this paper, a deep convolutional neural network (convnet) is proposed to perform efficient and effective HAR using smartphone sensors by exploiting the inherent characteristics of activities and 1D time-series signals, at the same time providing a way to automatically and data-adaptively extract robust features from raw data. Experiments show that convnets indeed derive relevant and more complex features with every additional layer, although difference of feature complexity level decreases with every additional layer. A wider time span of temporal local correlation can be exploited (1?9-1?14) and a low pooling size (1?2-1?3) is shown to be beneficial. Convnets also achieved an almost perfect classification on moving activities, especially very similar ones which were previously perceived to be very difficult to classify. Lastly, convnets outperform other state-of-the-art data mining techniques in HAR for the benchmark dataset collected from 30 volunteer subjects, achieving an overall performance of 94.79% on the test set with raw sensor data, and 95.75% with additional information of temporal fast Fourier transform of the HAR data set.

854 citations


Journal ArticleDOI
TL;DR: The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability ofText classification schemes, which is of practical importance in the application fields of text classification.
Abstract: Text classification is a domain with high dimensional feature space.Extracting the keywords as the features can be extremely useful in text classification.An empirical analysis of five statistical keyword extraction methods.A comprehensive analysis of classifier and keyword extraction ensembles.For ACM collection, a classification accuracy of 93.80% with Bagging ensemble of Random Forest. Automatic keyword extraction is an important research direction in text mining, natural language processing and information retrieval. Keyword extraction enables us to represent text documents in a condensed way. The compact representation of documents can be helpful in several applications, such as automatic indexing, automatic summarization, automatic classification, clustering and filtering. For instance, text classification is a domain with high dimensional feature space challenge. Hence, extracting the most important/relevant words about the content of the document and using these keywords as the features can be extremely useful. In this regard, this study examines the predictive performance of five statistical keyword extraction methods (most frequent measure based keyword extraction, term frequency-inverse sentence frequency based keyword extraction, co-occurrence statistical information based keyword extraction, eccentricity-based keyword extraction and TextRank algorithm) on classification algorithms and ensemble methods for scientific text document classification (categorization). In the study, a comprehensive study of comparing base learning algorithms (Naive Bayes, support vector machines, logistic regression and Random Forest) with five widely utilized ensemble methods (AdaBoost, Bagging, Dagging, Random Subspace and Majority Voting) is conducted. To the best of our knowledge, this is the first empirical analysis, which evaluates the effectiveness of statistical keyword extraction methods in conjunction with ensemble learning algorithms. The classification schemes are compared in terms of classification accuracy, F-measure and area under curve values. To validate the empirical analysis, two-way ANOVA test is employed. The experimental analysis indicates that Bagging ensemble of Random Forest with the most-frequent based keyword extraction method yields promising results for text classification. For ACM document collection, the highest average predictive performance (93.80%) is obtained with the utilization of the most frequent based keyword extraction method with Bagging ensemble of Random Forest algorithm. In general, Bagging and Random Subspace ensembles of Random Forest yield promising results. The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability of text classification schemes, which is of practical importance in the application fields of text classification.

445 citations


Journal ArticleDOI
TL;DR: Four different machine learning algorithms such as Naive Bayes (NB), Maximum Entropy (ME), Stochastic Gradient Descent (SGD), and Support Vector Machine (SVM) have been considered for classification of human sentiments.
Abstract: A large number of sentiment reviews, blogs and comments present online.These reviews must be classified to obtain a meaningful information.Four different supervised machine learning algorithm used for classification.Unigram, Bigram, Trigram models and their combinations used for classification.The classification is done on IMDb movie review dataset. With the ever increasing social networking and online marketing sites, the reviews and blogs obtained from those, act as an important source for further analysis and improved decision making. These reviews are mostly unstructured by nature and thus, need processing like classification or clustering to provide a meaningful information for future uses. These reviews and blogs may be classified into different polarity groups such as positive, negative, and neutral in order to extract information from the input dataset. Supervised machine learning methods help to classify these reviews. In this paper, four different machine learning algorithms such as Naive Bayes (NB), Maximum Entropy (ME), Stochastic Gradient Descent (SGD), and Support Vector Machine (SVM) have been considered for classification of human sentiments. The accuracy of different methods are critically examined in order to access their performance on the basis of parameters such as precision, recall, f-measure, and accuracy.

432 citations


Journal ArticleDOI
TL;DR: An overview of the most important primary studies published from 2009 to 2015, which cover techniques for preprocessing and clustering of financial data, for forecasting future market movements, for mining financial text information, among others, are given.
Abstract: We propose a survey of soft computing techniques applied to financial market.We surveyed several primary studies proposed in the literature.A framework for building an intelligent trading system was proposed.Future directions of this research field are discussed. Financial markets play an important role on the economical and social organization of modern society. In these kinds of markets, information is an invaluable asset. However, with the modernization of the financial transactions and the information systems, the large amount of information available for a trader can make prohibitive the analysis of a financial asset. In the last decades, many researchers have attempted to develop computational intelligent methods and algorithms to support the decision-making in different financial market segments. In the literature, there is a huge number of scientific papers that investigate the use of computational intelligence techniques to solve financial market problems. However, only few studies have focused on review the literature of this topic. Most of the existing review articles have a limited scope, either by focusing on a specific financial market application or by focusing on a family of machine learning algorithms. This paper presents a review of the application of several computational intelligent methods in several financial applications. This paper gives an overview of the most important primary studies published from 2009 to 2015, which cover techniques for preprocessing and clustering of financial data, for forecasting future market movements, for mining financial text information, among others. The main contributions of this paper are: (i) a comprehensive review of the literature of this field, (ii) the definition of a systematic procedure for guiding the task of building an intelligent trading system and (iii) a discussion about the main challenges and open problems in this scientific field.

399 citations


Journal ArticleDOI
TL;DR: This review identifies the popularly used algorithms within the domain of bio-inspired algorithms and discusses their principles, developments and scope of application, which would pave the path for future studies to choose algorithms based on fitment.
Abstract: Review of applications of algorithms in bio-inspired computing.Brief description of algorithms without mathematical notations.Brief description of scope of applications of the algorithms.Identification of algorithms whose applications may be explored.Identification of algorithms on which theory development may be explored. With the explosion of data generation, getting optimal solutions to data driven problems is increasingly becoming a challenge, if not impossible. It is increasingly being recognised that applications of intelligent bio-inspired algorithms are necessary for addressing highly complex problems to provide working solutions in time, especially with dynamic problem definitions, fluctuations in constraints, incomplete or imperfect information and limited computation capacity. More and more such intelligent algorithms are thus being explored for solving different complex problems. While some studies are exploring the application of these algorithms in a novel context, other studies are incrementally improving the algorithm itself. However, the fast growth in the domain makes researchers unaware of the progresses across different approaches and hence awareness across algorithms is increasingly reducing, due to which the literature on bio-inspired computing is skewed towards few algorithms only (like neural networks, genetic algorithms, particle swarm and ant colony optimization). To address this concern, we identify the popularly used algorithms within the domain of bio-inspired algorithms and discuss their principles, developments and scope of application. Specifically, we have discussed the neural networks, genetic algorithm, particle swarm, ant colony optimization, artificial bee colony, bacterial foraging, cuckoo search, firefly, leaping frog, bat algorithm, flower pollination and artificial plant optimization algorithm. Further objectives which could be addressed by these twelve algorithms have also be identified and discussed. This review would pave the path for future studies to choose algorithms based on fitment. We have also identified other bio-inspired algorithms, where there are a lot of scope in theory development and applications, due to the absence of significant literature.

397 citations


Journal ArticleDOI
TL;DR: A CAD scheme for detection of breast cancer has been developed using deep belief network unsupervised path followed by back propagation supervised path using DBN-NN, indicating promising results over previously-published studies.
Abstract: We present a CAD scheme using DBN unsupervised path followed by NN supervised path.Our two-phase method 'DBN-NN' classification accuracy is higher than using one phase.Overall accuracy of DBN-NN reaches 99.68% with 100% sensitivity & 99.47% specificity.DBN-NN was tested on the Wisconsin Breast Cancer Dataset (WBCD).DBN-NN results show classifier performance improvements over previous studies. Over the last decade, the ever increasing world-wide demand for early detection of breast cancer at many screening sites and hospitals has resulted in the need of new research avenues. According to the World Health Organization (WHO), an early detection of cancer greatly increases the chances of taking the right decision on a successful treatment plan. The Computer-Aided Diagnosis (CAD) systems are applied widely in the detection and differential diagnosis of many different kinds of abnormalities. Therefore, improving the accuracy of a CAD system has become one of the major research areas. In this paper, a CAD scheme for detection of breast cancer has been developed using deep belief network unsupervised path followed by back propagation supervised path. The construction is back-propagation neural network with Liebenberg Marquardt learning function while weights are initialized from the deep belief network path (DBN-NN). Our technique was tested on the Wisconsin Breast Cancer Dataset (WBCD). The classifier complex gives an accuracy of 99.68% indicating promising results over previously-published studies. The proposed system provides an effective classification model for breast cancer. In addition, we examined the architecture at several train-test partitions.

378 citations


Journal ArticleDOI
TL;DR: A novel feature-based emotion recognition model is proposed for EEG-based Brain-Computer Interfaces which combines statistical-based feature selection methods and SVM emotion classifiers and incorporates additional features which are relevant for signal pre-processing and recognition classification tasks.
Abstract: A feature-based emotion recognition model is proposed for EEG-based BCI.The approach combines statistical-based feature selection methods and SVM emotion classifiers.The model is based on Valence/Arousal dimensions for emotion classification.Our combined approach outperformed other recognition methods. Current emotion recognition computational techniques have been successful on associating the emotional changes with the EEG signals, and so they can be identified and classified from EEG signals if appropriate stimuli are applied. However, automatic recognition is usually restricted to a small number of emotions classes mainly due to signal's features and noise, EEG constraints and subject-dependent issues. In order to address these issues, in this paper a novel feature-based emotion recognition model is proposed for EEG-based Brain-Computer Interfaces. Unlike other approaches, our method explores a wider set of emotion types and incorporates additional features which are relevant for signal pre-processing and recognition classification tasks, based on a dimensional model of emotions: Valence and Arousal. It aims to improve the accuracy of the emotion classification task by combining mutual information based feature selection methods and kernel classifiers. Experiments using our approach for emotion classification which combines efficient feature selection methods and efficient kernel-based classifiers on standard EEG datasets show the promise of the approach when compared with state-of-the-art computational methods.

368 citations


Journal ArticleDOI
TL;DR: A new mobile technology acceptance model (MTAM) which consists of mobile usefulness (MU) and mobile ease of use (MEU) to determine SCC adoption is proposed which confirms the role of MU in MTAM, but MEU needs for more attention in practice.
Abstract: The study investigates on the factors influencing users' IU to adopt SCC.MPC, MU and MPT have significant impact on IU.MEU, MPSR and MPFR are non-significant with IU.MPC and MPFR have significant influence on MU.MPC has significant influence on MEU while MPFR is non-significant with MEU. Smartphone credit card (SCC) is an emerging payment method using NFC-enabled smartphones. The proximity payment allows consumers to pay their products and services by waving their smartphones with a NFC reader. While there are advantageous adopting SCC, the adoption rate has not been encouraging. Interestingly, existing research work on past information technology and system models have so far focused primarily on organizational context and adopted specifically for work. Furthermore, past antecedents were mainly constructed using electronic commerce literatures which do not reflect the actual mobile environment. In contrast SCC is mainly adopted voluntarily by mobile users and for personal purposes. Thus this leads to the difficulty in drawing meaningful conclusion. The study addresses these limitations by proposing a new mobile technology acceptance model (MTAM) which consists of mobile usefulness (MU) and mobile ease of use (MEU) to determine SCC adoption. In anticipating on the complexity which exists in the mobile environment, additional mobile constructs namely mobile perceived security risk (MPSR), mobile perceived trust (MPT), mobile perceived compatibility (MPC) and mobile perceived financial resources (MPFR) were incorporated into the parsimonious MTAM. The integrated model was applied to 459 mobile users through a questionnaire approach and tested using partial least square-structural equation modelling-artificial neural network (PLS-SEM-ANN) has provided a new impact and a possible new research methodology paradigm as it is able to capture both linear and non-linear relationships. While the model confirms the role of MU in MTAM, MEU needs for more attention in practice. The results from the extended model showed that only three of the proposed hypotheses were non-significant in this study and thus warrant further investigation. The study contributes to academia by proposing new mobile constructs that brings together MTAM to assess the likelihood of mobile users to adopt SCC. The study also offers several important managerial implications which can be generalized to the mobile studies of other transportation, hotel, banking, and tourism industries.

333 citations


Journal ArticleDOI
TL;DR: A decision support model for supplier selection based on analytic hierarchy process (AHP) is proposed using a case of automotive industry in a developing country of Pakistan and further performs sensitivity analysis to check the robustness of the supplier selection decision.
Abstract: AHP applied to decision making of automotive industry supplier selection.Use of AHP in supplier selection gives decision maker confidence of consistency.Sensitivity analysis to check the robustness of the supplier selection decision.Proposed approach divides complex decision making into simpler hierarchy. PurposeThe purpose of this paper is to propose a decision support model for supplier selection based on analytic hierarchy process (AHP) using a case of automotive industry in a developing country of Pakistan and further performs sensitivity analysis to check the robustness of the supplier selection decision. MethodologyThe model starts by identifying the main criteria (price, quality, delivery and service) using literature review and ranking the main criteria based on experts' opinions using AHP. The second stage in the adopted methodology is the identification of sub criteria and ranking them on the basis of main criteria. Lastly perform sensitivity analysis to check the robustness of the decision using Expert Choiceź software. FindingsThe suppliers are selected and ranked based on sub criteria. Sensitivity analysis suggests the effects of changes in the main criteria on the suppliers ranking. The use of AHP in the supplier selection gives the decision maker the confidence of the consistency and the robustness throughout the process. Practical implicationsThe AHP methodology adopted in this study provides managers in automotive industry in Pakistan with the insights of the various factors that need to be considered while selecting suppliers for their organizations. The selected approach also aids them in prioritizing the criterion. Managers can utilize the hierarchical structure of adopted supplier selection methodology suggested in this study to rank the suppliers on the basis of various factors/criteria. Originality/valueThis study makes three novel contributions in supplier selection area. First, AHP is applied to automotive industry and use of AHP in the supplier selection gives decision maker the confidence of the consistency. Second, sensitivity analysis enables in understanding the effects of changes in the main criteria on the suppliers ranking and help decision maker to check the robustness throughout the process. Last, we find it important to come with a simple methodology for managers of automotive industry so that they can select the best suppliers. Moreover, this approach will also help managers in dividing the complex decision making problem into simpler hierarchy.

298 citations


Journal ArticleDOI
TL;DR: The experimental results of this study show that the developed hybrid method is able to select good features for classification tasks to improve run-time performance and accuracy of the classifier.
Abstract: We developed a hybrid method for feature selection of classification tasks.Our hybrid method combines Artificial Bee Colony with Differential Evolution.We performed experiments over fifteen datasets from the UCI Repository.Our method selects good features without reducing accuracy of classification.By selecting features with our method, we reduced time required for classification. "Dimensionality" is one of the major problems which affect the quality of learning process in most of the machine learning and data mining tasks. Having high dimensional datasets for training a classification model may lead to have "overfitting" of the learned model to the training data. Overfitting reduces generalization of the model, therefore causes poor classification accuracy for the new test instances. Another disadvantage of dimensionality of dataset is to have high CPU time requirement for learning and testing the model. Applying feature selection to the dataset before the learning process is essential to improve the performance of the classification task. In this study, a new hybrid method which combines artificial bee colony optimization technique with differential evolution algorithm is proposed for feature selection of classification tasks. The developed hybrid method is evaluated by using fifteen datasets from the UCI Repository which are commonly used in classification problems. To make a complete evaluation, the proposed hybrid feature selection method is compared with the artificial bee colony optimization, and differential evolution based feature selection methods, as well as with the three most popular feature selection techniques that are information gain, chi-square, and correlation feature selection. In addition to these, the performance of the proposed method is also compared with the studies in the literature which uses the same datasets. The experimental results of this study show that our developed hybrid method is able to select good features for classification tasks to improve run-time performance and accuracy of the classifier. The proposed hybrid method may also be applied to other search and optimization problems as its performance for feature selection is better than pure artificial bee colony optimization, and differential evolution.

Journal ArticleDOI
TL;DR: A literature review of 190 application papers, published between 2004 and 2016, by classifying them on the basis of the area of application, the identified theme, the year of publication, and so forth, shows that FAHP is used primarily in the Manufacturing, Industry and Government sectors.
Abstract: A state-of the-art survey of FAHP applications is carried out: 190 papers are reviewedPapers are classified based on their: Application area, Theme, Year, Country, etc.Review is summarized in tabular formats/charts to help readers extract quick info.Results and Findings are made available through an online (free) testbedThe testbed makes fuzzy pairwise comparison matrices (from all papers) available As a practical popular methodology for dealing with fuzziness and uncertainty in Multiple Criteria Decision-Making (MCDM), Fuzzy AHP (FAHP) has been applied to a wide range of applications. As of the time of writing there is no state of the art survey of FAHP, we carry out a literature review of 190 application papers (i.e., applied research papers), published between 2004 and 2016, by classifying them on the basis of the area of application, the identified theme, the year of publication, and so forth. The identified themes and application areas have been chosen based upon the latest state-of-the-art survey of AHP conducted by Vaidya, O., & Kumar, S. (2006). Analytic hierarchy process: An overview of applications. European Journal of operational research, 169(1), 1-29.. To help readers extract quick and meaningful information, the reviewed papers are summarized in various tabular formats and charts. Unlike previous literature surveys, results and findings are made available through an online (and free) testbed, which can serve as a ready reference for those who wish to apply, modify or extend FAHP in various applications areas. This online testbed makes also available one or more fuzzy pairwise comparison matrices (FPCMs) from all the reviewed papers (255źmatrices in total).In terms of results and findings, this survey shows that: (i) FAHP is used primarily in the Manufacturing, Industry and Government sectors; (ii) Asia is the torchbearer in this field, where FAHP is mostly applied in the theme areas of Selection and Evaluation; (iii) a significant amount of research papers (43% of the reviewed literature) combine FAHP with other tools, particularly with TOPSIS, QFD and ANP (AHP's variant); (iv) Chang's extent analysis method, which is used for FPCMs' weight derivation in FAHP, is still the most popular method in spite of a number of criticisms in recent years (considered in 57% of the reviewed literature).

Journal ArticleDOI
TL;DR: Experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed ensemble method can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting.
Abstract: Typically performed by supervised machine learning algorithms, sentiment analysis is highly useful for extracting subjective information from text documents online. Most approaches that use ensemble learning paradigms toward sentiment analysis involve feature engineering in order to enhance the predictive performance. In response, we sought to develop a paradigm of a multiobjective, optimization-based weighted voting scheme to assign appropriate weight values to classifiers and each output class based on the predictive performance of classification algorithms, all to enhance the predictive performance of sentiment classification. The proposed ensemble method is based on static classifier selection involving majority voting error and forward search, as well as a multiobjective differential evolution algorithm. Based on the static classifier selection scheme, our proposed ensemble method incorporates Bayesian logistic regression, naive Bayes, linear discriminant analysis, logistic regression, and support vector machines as base learners, whose performance in terms of precision and recall values determines weight adjustment. Our experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed classification scheme can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting. Of all datasets examined, the laptop dataset showed the best classification accuracy (98.86%).

Journal ArticleDOI
TL;DR: An interactive visualization framework that combines recommendation with visualization techniques to support human-recommender interaction is presented and existing interactive recommender systems are analyzed along the dimensions of the framework.
Abstract: We identify shortcomings of current recommender systems.We present an interactive recommender framework to tackle the shortcomings.We analyze existing interactive recommenders along the dimensions of our framework.Based on the analysis, we identify future research challenges and opportunities. Recommender systems have been researched extensively over the past decades. Whereas several algorithms have been developed and deployed in various application domains, recent research efforts are increasingly oriented towards the user experience of recommender systems. This research goes beyond accuracy of recommendation algorithms and focuses on various human factors that affect acceptance of recommendations, such as user satisfaction, trust, transparency and sense of control. In this paper, we present an interactive visualization framework that combines recommendation with visualization techniques to support human-recommender interaction. Then, we analyze existing interactive recommender systems along the dimensions of our framework, including our work. Based on our survey results, we present future research challenges and opportunities.

Journal ArticleDOI
TL;DR: A novel ensemble model for bankruptcy prediction that utilizes Extreme Gradient Boosting for learning an ensemble of decision trees is proposed and a new approach for generating synthetic features to improve prediction is proposed.
Abstract: We propose a novel ensemble model for bankruptcy prediction.We use Extreme Gradient Boosting as an ensemble of decision trees.We propose a new approach for generating synthetic features to improve prediction.The presented method is evaluated on real-life data of Polish companies. Bankruptcy prediction has been a subject of interests for almost a century and it still ranks high among hottest topics in economics. The aim of predicting financial distress is to develop a predictive model that combines various econometric measures and allows to foresee a financial condition of a firm. In this domain various methods were proposed that were based on statistical hypothesis testing, statistical modeling (e.g., generalized linear models), and recently artificial intelligence (e.g., neural networks, Support Vector Machines, decision tress). In this paper, we propose a novel approach for bankruptcy prediction that utilizes Extreme Gradient Boosting for learning an ensemble of decision trees. Additionally, in order to reflect higher-order statistics in data and impose a prior knowledge about data representation, we introduce a new concept that we refer as to synthetic features. A synthetic feature is a combination of the econometric measures using arithmetic operations (addition, subtraction, multiplication, division). Each synthetic feature can be seen as a single regression model that is developed in an evolutionary manner. We evaluate our solution using the collected data about Polish companies in five tasks corresponding to the bankruptcy prediction in the 1st, 2nd, 3rd, 4th, and 5th year. We compare our approach with the reference methods.

Journal ArticleDOI
TL;DR: This paper enriches the theory and methodology of the selection problem of cloud computing vendoring and MAGDM analysis, and presents a new subjective/objective integrated MAGDM approach for solving decision problems.
Abstract: Propose the criteria of cost with technology, organization and environment.The approach takes both quantitative and qualitative attributes into account.Decision-making process considers both the weights of attributes and experts.Integrate objective and subjective method to weighting for attributes and experts. Cloud computing technology has become increasingly popular and can deliver a host of benefits. However, there are various kinds of cloud providers in the market and firms need scientific decision tools to judge which cloud computing vendor should be chosen. Studies in how a firm should select an appropriate cloud vendor have just started. However, existing studies are mainly from the technology and cost perspective, and neglect other influence factors, such as competitive pressure and managerial skills, etc. Hence, this paper proposes a multi-attribute group decision-making (MAGDM) based scientific decision tool to help firms to judge which cloud computing vendor is more suitable for their need by considering more comprehensive influencefactors. It is argued that objective attributes, i.e., cost, as well as subjective attributes, such as TOE factors (Technology, Organization, and Environment) should be considered for the decision making in cloud computing services, and presents a new subjective/objective integrated MAGDM approach for solving decision problems. The proposed approach integrates statistical variance (SV), improved techniques for order preference by similarity to an ideal solution (TOPSIS), simple additive weighting (SAW), and Delphi-AHP to determine the integrated weights of the attributes and decision-makers (DMs). The method considers both the objective weights of the attributes and DMs, as well as the subjective preferences of the DMs and their identity differences, thereby making the decision results more accurate and theoretically reasonable. A numerical example is given to illustrate the practicability and usefulness of the approach and its suitability as a decision-making tool for a firm using of cloud computing services. This paper enriches the theory and methodology of the selection problem of cloud computing vendoring and MAGDM analysis.

Journal ArticleDOI
TL;DR: The general picture is depicted, which provides a classification of methods related to criteria interaction phenomenon, and the Decision-Making Trial and Evaluation Laboratory (DEMATEL) and Analytical Network Process (ANP) hybridizations first time in the literature are discussed/reviewed.
Abstract: An analysis of DEMATEL approaches for criteria interaction handling within ANP.A classification of the methods related to criteria interaction phenomenon in MADM.Detailed explanations and numerical examples are given.Bibliometric analysis is provided. Majority of the Multiple-Attribute Decision Making (MADM) methods assume that the criteria are independent of each other, which is not a realistic assumption in many real world problems. Several forms of interactions among criteria might occur in real life situations so that more sophisticated/intelligent techniques are required to deal with particular needs of the problem under consideration. Unfortunately, criteria interaction concept is very little issued in the literature. It is still a very important and critical research subject for intelligent decision making within MADM. The present paper aims to put a step forward to fill this gap by depicting the general picture, which provides a classification of methods related to criteria interaction phenomenon, and discuss/review the Decision-Making Trial and Evaluation Laboratory (DEMATEL) and Analytical Network Process (ANP) hybridizations first time in the literature. DEMATEL and ANP hybridizations grab remarkable attention of decision analysis community in recent years and seem as one of the most promising approaches to handle criteria interactions in a MADM setting.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a hybrid Artificial Neural Network (ANN) model for stock market prediction, which combines Harmony Search (HS) and GA for selecting the most relevant technical indicators, such as simple moving average of close price, momentum close price etc.
Abstract: Integrating metaheuristics and ANN for improved stock price prediction.Both topology of ANN and the number of inputs are optimized.The number of the input variables is reduced to almost its half.HS-ANN has better generalization ability than GA-ANN model.Proposed methodologies outperformed both in statistical and financial terms. Stock market price is one of the most important indicators of a country's economic growth. That's why determining the exact movements of stock market price is considerably regarded. However, complex and uncertain behaviors of stock market make exact determination impossible and hence strong forecasting models are deeply desirable for investors' financial decision making process. This study aims at evaluating the effectiveness of using technical indicators, such as simple moving average of close price, momentum close price, etc. in Turkish stock market. To capture the relationship between the technical indicators and the stock market for the period under investigation, hybrid Artificial Neural Network (ANN) models, which consist in exploiting capabilities of Harmony Search (HS) and Genetic Algorithm (GA), are used for selecting the most relevant technical indicators. In addition, this study simultaneously searches the most appropriate number of hidden neurons in hidden layer and in this respect; proposed models mitigate well-known problem of overfitting/underfitting of ANN. The comparison for each proposed model is done in four viewpoints: loss functions, return from investment analysis, buy and hold analysis, and graphical analysis. According to the statistical and financial performance of these models, HS based ANN model is found as a dominant model for stock market forecasting.

Journal ArticleDOI
TL;DR: This paper proposes to create a new set of features based on analyzing the periodic behavior of the time of a transaction using the von Mises distribution, and examines how the different sets of features have an impact on the results.
Abstract: Credit card fraud detection evaluation measure.Each example is assumed to have different financial cost.Transaction aggregation strategy for predicting fraud.Periodic features using the von Mises distribution.Code is open source and available at albahnsen.com/CostSensitiveClassification. Every year billions of Euros are lost worldwide due to credit card fraud. Thus, forcing financial institutions to continuously improve their fraud detection systems. In recent years, several studies have proposed the use of machine learning and data mining techniques to address this problem. However, most studies used some sort of misclassification measure to evaluate the different solutions, and do not take into account the actual financial costs associated with the fraud detection process. Moreover, when constructing a credit card fraud detection model, it is very important how to extract the right features from the transactional data. This is usually done by aggregating the transactions in order to observe the spending behavioral patterns of the customers. In this paper we expand the transaction aggregation strategy, and propose to create a new set of features based on analyzing the periodic behavior of the time of a transaction using the von Mises distribution. Then, using a real credit card fraud dataset provided by a large European card processing company, we compare state-of-the-art credit card fraud detection models, and evaluate how the different sets of features have an impact on the results. By including the proposed periodic features into the methods, the results show an average increase in savings of 13%.

Journal ArticleDOI
TL;DR: Experimental results show that TF-IGM outperforms the famous TF-IDF and the state-of-the-art supervised term weighting schemes and some new findings different from previous studies are obtained and analyzed in depth in the paper.
Abstract: A new supervised term weighting scheme called TF-IGM is proposed.It adopts a new statistical model to measure a term's class distinguishing power.It makes full use of the fine-grained term distribution across different classes.It is adaptive to different text datasets by providing options or parameters.It outperforms TF-IDF and state-of-the-art supervised term weighting schemes. Massive textual data management and mining usually rely on automatic text classification technology. Term weighting is a basic problem in text classification and directly affects the classification accuracy. Since the traditional TF-IDF (term frequency & inverse document frequency) is not fully effective for text classification, various alternatives have been proposed by researchers. In this paper we make comparative studies on different term weighting schemes and propose a new term weighting scheme, TF-IGM (term frequency & inverse gravity moment), as well as its variants. TF-IGM incorporates a new statistical model to precisely measure the class distinguishing power of a term. Particularly, it makes full use of the fine-grained term distribution across different classes of text. The effectiveness of TF-IGM is validated by extensive experiments of text classification using SVM (support vector machine) and kNN (k nearest neighbors) classifiers on three commonly used corpora. The experimental results show that TF-IGM outperforms the famous TF-IDF and the state-of-the-art supervised term weighting schemes. In addition, some new findings different from previous studies are obtained and analyzed in depth in the paper.

Journal ArticleDOI
TL;DR: A new oversampling method called Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO) for imbalanced binary dataset classification that aims to identify hard-to-learn instances by considering minority instances from each sub-cluster that are closer to the borderline.
Abstract: A new oversampling method for imbalanced dataset classification is presented.It clusters the minority class and identifies borderline minority instances.Considering majority class during minority class clustering improves oversampling.Cluster size after oversampling should be dependent on its misclassification error.Generated synthetic instances improved subsequent classification. In many applications, the dataset for classification may be highly imbalanced where most of the instances in the training set may belong to one of the classes (majority class), while only a few instances are from the other class (minority class). Conventional classifiers will strongly favor the majority class and ignore the minority instances. In this paper, we present a new oversampling method called Adaptive Semi-Unsupervised Weighted Oversampling (A-SUWO) for imbalanced binary dataset classification. The proposed method clusters the minority instances using a semi-unsupervised hierarchical clustering approach and adaptively determines the size to oversample each sub-cluster using its classification complexity and cross validation. Then, the minority instances are oversampled depending on their Euclidean distance to the majority class. A-SUWO aims to identify hard-to-learn instances by considering minority instances from each sub-cluster that are closer to the borderline. It also avoids generating synthetic minority instances that overlap with the majority class by considering the majority class in the clustering and oversampling stages. Results demonstrate that the proposed method achieves significantly better results in most datasets compared with other sampling methods.

Journal ArticleDOI
Emad Nabil1
TL;DR: An enhanced version of the Flower Pollination Algorithm (FPA) is introduced and the proposed algorithm is compared with five well-known optimization algorithms and is able to find more accurate solutions than the standard FPA and the other four techniques.
Abstract: An enhanced version of the Flower Pollination Algorithm (FPA) is proposed.Testing is performed using 23 optimization benchmark problems.The proposed algorithm is compared with five well-known optimization algorithms.Experimental results show the superiority of the proposed algorithm. Expert and intelligent systems try to simulate intelligent human experts in solving complex real-world problems. The domain of problems varies from engineering and industry to medicine and education. In most situations, the system is required to take decisions based on multiple inputs, but the search space is usually very huge so that it will be very hard to use the traditional algorithms to take a decision; at this point, the metaheuristic algorithms can be used as an alternative tool to find near-optimal solutions. Thus, inventing new metaheuristic techniques and enhancing the current algorithms is necessary. In this paper, we introduced an enhanced variant of the Flower Pollination Algorithm (FPA). We hybridized the standard FPA with the Clonal Selection Algorithm (CSA) and tested the new algorithm by applying it to 23 optimization benchmark problems. The proposed algorithm is compared with five famous optimization algorithms, namely, Simulated Annealing, Genetic Algorithm, Flower Pollination Algorithm, Bat Algorithm, and Firefly Algorithm. The results show that the proposed algorithm is able to find more accurate solutions than the standard FPA and the other four techniques. The superiority of the proposed algorithm nominates it for being a part of intelligent and expert systems.

Journal ArticleDOI
TL;DR: The results showed that pressure management and control strategy was the most prevalent one, followed by employing advanced techniques and establishment of district metered areas, which could be attributed to the strong consensus in strengthening the best option and neglecting the worst option.
Abstract: A multi-criteria decision analysis method for water loss management is proposed.The method integrates AHP and TOPSIS methods under fuzzy environment.It is applied to a real water distribution system in a developing country.The prevalent strategies were highly connected to the local conditions. Facing water scarcity conditions water utilities cannot longer tolerate inefficiencies in their water systems. To guarantee sustainable water management one central task is reducing water losses from the supply systems. There are numerous challenges in managing water losses, manifested in a variety of options, their complexities, multiple evaluation criteria, inherent uncertainties and the conflicting objectives and interests of different stakeholders. This study demonstrates the effectiveness of multi criteria decision analysis (MCDA) approaches for decision support in this complex topic. The study covers identifying the key options among a set of options that have been proposed within a framework of strategies to reduce water losses in water distribution systems of developing countries. The proposed methodology was initiated by developing a hierarchical structure of the decision problem that consists of four levels: Overall objective, main criteria, evaluation criteria and options. Different stakeholders were engaged in the process of structuring and evaluating the decision problem. An integrated methodology that combines fuzzy set theory with Analytic Hierarchy Process (AHP) and Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) methods was then employed. This methodology has the potential to transform qualitative data into equivalent quantitative measures. Fuzzy AHP was used to create weights for main and evaluation criteria, while Fuzzy TOPSIS was used to aid the ranking of options in terms of their potential to meet the overall objective based on the evaluations and preferences of decision makers. The results showed that pressure management and control strategy was the most prevalent one, followed by employing advanced techniques and establishment of district metered areas. Their dominance was highly connected to the local and boundary conditions of the case study. The sensitivity analysis results showed that strongest and weakest options were less sensitive to changes in weights of evaluation criteria, which could be attributed to the strong consensus in strengthening the best option and neglecting the worst option. This study emphasized the successful application of MCDA in dealing with complicated issues in the context of water loss management. It is anticipated that, the integration of this developed framework in the planning policies of water utilities in developing countries can help in conducting better control over water losses.

Journal ArticleDOI
TL;DR: Experimental results on well-known benchmark datasets with various classifiers indicate that IGFSS improves the performance of classification in terms of two widely-known metrics namely Micro-F1 and Macro-F 1.
Abstract: An improved global feature selection scheme is proposed for text classificationIt is an ensemble method combining the power of two filter-based methodsThe new method combines a global and a one-sided local feature selection methodBy incorporating these methods, the feature set represents classes almost equallyThis method outperforms the individual performances of feature selection methods Feature selection is known as a good solution to the high dimensionality of the feature space and mostly preferred feature selection methods for text classification are filter-based ones In a common filter-based feature selection scheme, unique scores are assigned to features depending on their discriminative power and these features are sorted in descending order according to the scores Then, the last step is to add top-N features to the feature set where N is generally an empirically determined number In this paper, an improved global feature selection scheme (IGFSS) where the last step in a common feature selection scheme is modified in order to obtain a more representative feature set is proposed Although feature set constructed by a common feature selection scheme successfully represents some of the classes, a number of classes may not be even represented Consequently, IGFSS aims to improve the classification performance of global feature selection methods by creating a feature set representing all classes almost equally For this purpose, a local feature selection method is used in IGFSS to label features according to their discriminative power on classes and these labels are used while producing the feature sets Experimental results on well-known benchmark datasets with various classifiers indicate that IGFSS improves the performance of classification in terms of two widely-known metrics namely Micro-F1 and Macro-F1

Journal ArticleDOI
TL;DR: This paper proposes hybrid feature selection approaches based on the Genetic Algorithm that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously.
Abstract: An enhanced genetic algorithm (EGA) is proposed to reduce text dimensionality.The proposed EGA outperformed the traditional genetic algorithm.The EGA is incorporated with six filter feature selection methods to create hybrid feature selection approaches.The proposed hybrid approaches outperformed the single filtering methods. This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naive Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations.

Journal ArticleDOI
TL;DR: A new rule-based method to detect phishing attacks in internet banking is presented by extracting the hidden knowledge from the proposed SVM model by adopting a related method and embedded the extracted rules into a browser extension named PhishDetector.
Abstract: We propose two feature sets to determine the webpage identity.Our proposed features do not have any dependency to 3rd-party services.We proposed a rule-based method by extracting the hidden knowledge from our model.We provide an extension called PhishDetector to detect phishing attacks.Experiments show that PhishDetector detects zero-day phishing with high accuracy. In this paper, we present a new rule-based method to detect phishing attacks in internet banking. Our rule-based method used two novel feature sets, which have been proposed to determine the webpage identity. Our proposed feature sets include four features to evaluate the page resources identity, and four features to identify the access protocol of page resource elements. We used approximate string matching algorithms to determine the relationship between the content and the URL of a page in our first proposed feature set. Our proposed features are independent from third-party services such as search engines result and/or web browser history. We employed support vector machine (SVM) algorithm to classify webpages. Our experiments indicate that the proposed model can detect phishing pages in internet banking with accuracy of 99.14% true positive and only 0.86% false negative alarm. Output of sensitivity analysis demonstrates the significant impact of our proposed features over traditional features. We extracted the hidden knowledge from the proposed SVM model by adopting a related method. We embedded the extracted rules into a browser extension named PhishDetector to make our proposed method more functional and easy to use. Evaluating of the implemented browser extension indicates that it can detect phishing attacks in internet banking with high accuracy and reliability. PhishDetector can detect zero-day phishing attacks too.

Journal ArticleDOI
TL;DR: This work considers the case where the camera is mounted on an unmanned aerial vehicle (UAV) and their main aim is to obtain a path that reduces the battery consumption, through minimizing the number of turns.
Abstract: Focus is on the coverage path planning problem with UAV for 3D terrain reconstruction.The aim is to obtain a path that reduces the battery consumption, minimizing the turns.Our algorithm deals with both convex and non-convex regionsThe algorithm can perform the coverage when complex regions are considered.Can achieve better solutions than a previous result (using less turns). Three-dimensional terrain reconstruction from 2D aerial images is a problem of utmost importance due its wide level of applications. It is relevant in the context of intelligent systems for disaster managements (for example to analyze a flooded area), soil analysis, earthquake crisis, civil engineering, urban planning, surveillance and defense research.It is a two level problem, being the former the acquisition of the aerial images and the later, the 3D reconstruction. We focus here in the first problem, known as coverage path planning, and we consider the case where the camera is mounted on an unmanned aerial vehicle (UAV).In contrast with the case when ground vehicles are used, coverage path planning for a UAV is a lesser studied problem. As the areas to cover become complex, there is a clear need for algorithms that will provide good enough solutions in affordable times, while taking into account certain specificities of the problem at hand. Our algorithm can deal with both convex and non-convex areas and their main aim is to obtain a path that reduces the battery consumption, through minimizing the number of turns.We comment on line sweep calculation and propose improvements for the path generation and the polygon decomposition problems such as coverage alternatives and the interrupted path concept. Illustrative examples show the potential of our algorithm in two senses: ability to perform the coverage when complex regions are considered, and achievement of better solution than a published result (in terms of the number of turns used).

Journal ArticleDOI
TL;DR: Three Grey System theory models for short term traffic speed prediction studied demonstrated better accuracy than other tested nonlinear models, and the Verhulst model with Fourier error correction demonstrates the best accuracy.
Abstract: Three Grey System theory models for short term traffic speed prediction studied.The grey models demonstrated better accuracy than other tested nonlinear models.The Verhulst model with Fourier error correction demonstrates the best accuracy.The simpler derivations can allow the algorithms to be placed on portable devices.Well-defined mathematics of models can allow alteration for multidimensional data. Intelligent transportation systems applications require accurate and robust prediction of traffic parameters such as speed, travel time, and flow. However, traffic exhibits sudden shifts due to various factors such as weather, accidents, driving characteristics, and demand surges, which adversely affect the performance of the prediction models. This paper studies possible applications and accuracy levels of three Grey System theory models for short-term traffic speed and travel time predictions: first order single variable Grey model (GM(1,1)), GM(1,1) with Fourier error corrections (EFGM), and the Grey Verhulst model with Fourier error corrections (EFGVM). Grey models are tested on datasets from California and Virginia. They are compared to nonlinear time series models. Grey models are found to be simple, adaptive, able to deal better with abrupt parameter changes, and not requiring many data points for prediction updates. Based on the sample data used, Grey models consistently demonstrate lower prediction errors over all the time series improving the accuracy on average about 50% in Root Mean Squared Errors and Mean Absolute Percent Errors.

Journal ArticleDOI
TL;DR: A new time series forecasting model which integrates VMD and general regression neural network (GRNN) is presented to demonstrate the superiority of the VMD-GRNN method over the three competing prediction approaches.
Abstract: The empirical mode decomposition (EMD) has been successfully applied to adaptively decompose economic and financial time series for forecasting purpose. Recently, the variational mode decomposition (VMD) has been proposed as an alternative to EMD to easily separate tones of similar frequencies in data where the EMD fails. The purpose of this study is to present a new time series forecasting model which integrates VMD and general regression neural network (GRNN). The performance of the proposed model is evaluated by comparing the forecasting results of VMD-GRNN with three competing prediction models; namely the EMD-GRNN model, feedforward neural networks (FFNN), and autoregressive moving average (ARMA) process on West Texas Intermediate (WTI), Canadian/US exchange rate (CANUS), US industrial production (IP) and the Chicago Board Options Exchange NASDAQ 100 Volatility Index (VIX) time series are used for experimentations. Based on mean absolute error (MAE), mean absolute percentage error (MAPE), and the root mean of squared errors (RMSE), the analysis results from forecasting demonstrate the superiority of the VMD-based method over the three competing prediction approaches. The practical analysis results suggest that VMD is an effective and promising technique for analysis and prediction of economic and financial time series.

Journal ArticleDOI
TL;DR: The present study is focused on the development of a robust automated system for classification against low levels of supervised training and yields ceiling level classification performance in all combinations of datasets in less than 0.028s.
Abstract: A robust method is proposed for efficient detection of seizures in EEG.Dual tree-complex wavelet transform is used for feature extraction.General regression neural network is employed to classify extracted features.The proposed technique is giving ceiling level performance.The model can be used for fast and accurate diagnosis of epilepsy. Identifying seizure patterns in complex electroencephalography (EEG) through visual inspection is often challenging, time-consuming and prone to errors. These problems have motivated the development of various automated seizure detection systems that can aid neurophysiologists in accurate diagnosis of epilepsy. The present study is focused on the development of a robust automated system for classification against low levels of supervised training. EEG data from two different repositories are considered for analysis and validation of the proposed system. The signals are decomposed into time-frequency sub-bands till sixth level using dual-tree complex wavelet transform (DTCWT). All details and last approximation coefficients are used to calculate features viz. energy, standard deviation, root-mean-square, Shannon entropy, mean values and maximum peaks. These feature sets are passed through a general regression neural network (GRNN) for classification with K-fold cross-validation scheme under varying train-to-test ratios. The current model yields ceiling level classification performance (accuracy, sensitivity & specificity) in all combinations of datasets (ictal vs non-ictal) in less than 0.028s. The proposed scheme will not only maximize hit-rate and correct rejection rate but also will aid neurophysiologists in the fast and accurate diagnosis of seizure onset.