scispace - formally typeset
Search or ask a question

Showing papers in "Expert Systems With Applications in 2009"


Journal ArticleDOI
TL;DR: Experimental results show that the proposed algorithm takes a significantly reduced time in computation with comparable performance against the partitioning around medoids.
Abstract: This paper proposes a new algorithm for K-medoids clustering which runs like the K-means algorithm and tests several methods for selecting initial medoids. The proposed algorithm calculates the distance matrix once and uses it for finding new medoids at every iterative step. To evaluate the proposed algorithm, we use some real and artificial data sets and compare with the results of other algorithms in terms of the adjusted Rand index. Experimental results show that the proposed algorithm takes a significantly reduced time in computation with comparable performance against the partitioning around medoids.

1,629 citations


Journal ArticleDOI
TL;DR: In this study, TOPSIS method combined with intuitionistic fuzzy set is proposed to select appropriate supplier in group decision making environment and Intuitionistic fuzzy weighted averaging (IFWA) operator is utilized to aggregate individual opinions of decision makers for rating the importance of criteria and alternatives.
Abstract: Supplier selection, the process of finding the right suppliers who are able to provide the buyer with the right quality products and/or services at the right price, at the right time and in the right quantities, is one of the most critical activities for establishing an effective supply chain. On the other hand, it is a hard problem since supplier selection is typically a multi criteria group decision-making problem involving several conflicting criteria on which decision maker's knowledge is usually vague and imprecise. In this study, TOPSIS method combined with intuitionistic fuzzy set is proposed to select appropriate supplier in group decision making environment. Intuitionistic fuzzy weighted averaging (IFWA) operator is utilized to aggregate individual opinions of decision makers for rating the importance of criteria and alternatives. Finally, a numerical example for supplier selection is given to illustrate application of intuitionistic fuzzy TOPSIS method.

1,278 citations


Journal ArticleDOI
TL;DR: Findings of this paper indicate that the research area of customer retention received most research attention and classification and association models are the two commonly used models for data mining in CRM.
Abstract: Despite the importance of data mining techniques to customer relationship management (CRM), there is a lack of a comprehensive literature review and a classification scheme for it. This is the first identifiable academic literature review of the application of data mining techniques to CRM. It provides an academic database of literature between the period of 2000-2006 covering 24 journals and proposes a classification scheme to classify the articles. Nine hundred articles were identified and reviewed for their direct relevance to applying data mining techniques to CRM. Eighty-seven articles were subsequently selected, reviewed and classified. Each of the 87 selected papers was categorized on four CRM dimensions (Customer Identification, Customer Attraction, Customer Retention and Customer Development) and seven data mining functions (Association, Classification, Clustering, Forecasting, Regression, Sequence Discovery and Visualization). Papers were further classified into nine sub-categories of CRM elements under different data mining techniques based on the major focus of each paper. The review and classification process was independently verified. Findings of this paper indicate that the research area of customer retention received most research attention. Of these, most are related to one-to-one marketing and loyalty programs respectively. On the other hand, classification and association models are the two commonly used models for data mining in CRM. Our analysis provides a roadmap to guide future research and facilitate knowledge accumulation and creation concerning the application of data mining techniques in CRM.

1,135 citations


Journal ArticleDOI
TL;DR: This chapter reviews 55 related studies in the period between 2000 and 2007 focusing on developing single, hybrid, and ensemble classifiers and discusses current achievements and limitations in developing intrusion detection systems by machine learning.
Abstract: The popularity of using Internet contains some risks of network attacks. Intrusion detection is one major research problem in network security, whose aim is to identify unusual access or attacks to secure internal networks. In literature, intrusion detection systems have been approached by various machine learning techniques. However, there is no a review paper to examine and understand the current status of using machine learning techniques to solve the intrusion detection problems. This chapter reviews 55 related studies in the period between 2000 and 2007 focusing on developing single, hybrid, and ensemble classifiers. Related studies are compared by their classifier design, datasets used, and other experimental setups. Current achievements and limitations in developing intrusion detection systems by machine learning are present and discussed. A number of future research directions are also provided.

872 citations


Journal ArticleDOI
TL;DR: With the proposed model, manufacturers can have a better understanding of the capabilities that a green supplier must possess and can evaluate and select the most suitable green supplier for cooperation.
Abstract: With growing worldwide awareness of environmental protection, green production has become an important issue for almost every manufacturer and will determine the sustainability of a manufacturer in the long term. A performance evaluation system for green suppliers thus is necessary to determine the suitability of suppliers to cooperate with the firm. While the works on the evaluation and/or selection of suppliers are abundant, those that concern environmental issues are rather limited. Therefore, in this study, a model for evaluating green suppliers is proposed. The Delphi method is applied first to differentiate the criteria for evaluating traditional suppliers and green suppliers. A hierarchy is constructed next to help evaluate the importance of the selected criteria and the performance of green suppliers. Since experts may not identify the importance of factors clearly, the results of questionnaires may be biased. To consider the vagueness of experts' opinions, the fuzzy extended analytic hierarchy process is exploited. With the proposed model, manufacturers can have a better understanding of the capabilities that a green supplier must possess and can evaluate and select the most suitable green supplier for cooperation.

735 citations


Journal ArticleDOI
TL;DR: A comprehensive review of articles that involve a comparative study of feed forward neural networks and statistical techniques used for prediction and classification problems in various areas of applications is carried out.
Abstract: Neural networks are being used in areas of prediction and classification, the areas where statistical methods have traditionally been used. Both the traditional statistical methods and neural networks are looked upon as competing model-building techniques in literature. This paper carries out a comprehensive review of articles that involve a comparative study of feed forward neural networks and statistical techniques used for prediction and classification problems in various areas of applications. Tabular presentations highlighting the important features of these articles are also provided. This study aims to give useful insight into the capabilities of neural networks and statistical methods used in different kinds of applications.

731 citations


Journal ArticleDOI
TL;DR: The results show that the highest classification accuracy (99.51%) is obtained for the SVM model that contains five features, and this is very promising compared to the previously reported results.
Abstract: Breast cancer is the second largest cause of cancer deaths among women. At the same time, it is also among the most curable cancer types if it can be diagnosed early. Research efforts have reported with increasing confirmation that the support vector machines (SVM) have greater accurate diagnosis ability. In this paper, breast cancer diagnosis based on a SVM-based method combined with feature selection has been proposed. Experiments have been conducted on different training-test partitions of the Wisconsin breast cancer dataset (WBCD), which is commonly used among researchers who use machine learning methods for breast cancer diagnosis. The performance of the method is evaluated using classification accuracy, sensitivity, specificity, positive and negative predictive values, receiver operating characteristic (ROC) curves and confusion matrix. The results show that the highest classification accuracy (99.51%) is obtained for the SVM model that contains five features, and this is very promising compared to the previously reported results.

723 citations


Journal ArticleDOI
TL;DR: This paper surveys more than 100 related published articles that focus on neural and neuro-fuzzy techniques derived and applied to forecast stock markets to show that soft computing techniques are widely accepted to studying and evaluating stock market behavior.
Abstract: The key to successful stock market forecasting is achieving best results with minimum required input data. Given stock market model uncertainty, soft computing techniques are viable candidates to capture stock market nonlinear relations returning significant forecasting results with not necessarily prior knowledge of input data statistical distributions. This paper surveys more than 100 related published articles that focus on neural and neuro-fuzzy techniques derived and applied to forecast stock markets. Classifications are made in terms of input data, forecasting methodology, performance evaluation and performance measures used. Through the surveyed papers, it is shown that soft computing techniques are widely accepted to studying and evaluating stock market behavior.

714 citations


Journal ArticleDOI
TL;DR: Among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default, and its regression intercept is close to zero, and regression coefficient to one.
Abstract: This research aimed at the case of customers' default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel ''Sorting Smoothing Method'' to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y=A+BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

713 citations


Journal ArticleDOI
TL;DR: This study found that self-efficiency was the strongest antecedent of perceived ease-of-use, which directly and indirectly affected behavioral intention through perceived usefulness in mobile banking.
Abstract: With the improvement of mobile technologies and devices, banking users are able to conduct banking services at anyplace and at anytime. Recently, many banks in the world have provided mobile access to financial information. The reason to understand what factors contribute to users' intention to use mobile banking is important issue of research. The purpose of this research is to examine and validate determinants of users' intention to mobile banking. This research used a structural equation modeling (SEM) to test the causalities in the proposed model. The results indicated strong support for the validity of proposed model with 72.2% of the variance in behavioral intention to mobile banking. This study found that self-efficiency was the strongest antecedent of perceived ease-of-use, which directly and indirectly affected behavioral intention through perceived usefulness in mobile banking. Structural assurances are the strongest antecedent of trust, which could increase behavioral intention of mobile banking. This research verified the effect of perceived usefulness, trust and perceived ease-of-use on behavioral intention in mobile banking. The results have several implications for mobile banking managers.

711 citations


Journal ArticleDOI
TL;DR: This work proposes a new fuzzy TOPSIS for evaluating alternatives by integrating using subjective and objective weights, and adopts end-user ratings as an objective weight based on Shannon's entropy theory.
Abstract: Multiple criteria decision making (MCDM) is widely used in ranking one or more alternatives from a set of available alternatives with respect to multiple criteria. Inspired by MCDM to systematically evaluate alternatives under various criteria, we propose a new fuzzy TOPSIS for evaluating alternatives by integrating using subjective and objective weights. Most MCDM approaches consider only decision maker's subjective weights. However, the end-user attitude can be a key factor. We propose a novel approach that involves end-user into the whole decision making process. In this proposed approach, the subjective weights assigned by decision makers (DM) are normalized into a comparable scale. In addition, we also adopt end-user ratings as an objective weight based on Shannon's entropy theory. A closeness coefficient is defined to determine the ranking order of alternatives by calculating the distances to both ideal and negative-ideal solutions. A case study is performed showing how the propose method can be used for a software outsourcing problem. With our method, we provide decision makers more information to make more subtle decisions.

Journal ArticleDOI
TL;DR: An evaluation model based on the analytic hierarchy process (AHP) and the technique for order performance by similarity to ideal solution (TOPSIS) to help the actors in defence industries for the selection of optimal weapon in a fuzzy environment where the vagueness and subjectivity are handled with linguistic values parameterized by triangular fuzzy numbers.
Abstract: The weapon selection problem is a strategic issue and has a significant impact on the efficiency of defense systems. On the other hand, selecting the optimal weapon among many alternatives is a multi-criteria decision-making (MCDM) problem. This paper develops an evaluation model based on the analytic hierarchy process (AHP) and the technique for order performance by similarity to ideal solution (TOPSIS), to help the actors in defence industries for the selection of optimal weapon in a fuzzy environment where the vagueness and subjectivity are handled with linguistic values parameterized by triangular fuzzy numbers. The AHP is used to analyze the structure of the weapon selection problem and to determine weights of the criteria, and fuzzy TOPSIS method is used to obtain final ranking. A real world application is conducted to illustrate the utilization of the model for the weapon selection problem. The application could be interpreted as demonstrating the effectiveness and feasibility of the proposed model.

Journal ArticleDOI
Hasan Ocak1
TL;DR: It was shown that epileptic EEG had significant nonlinearity whereas normal EEG behaved similar to Gaussian linear stochastic process.
Abstract: In this study, a new scheme was presented for detecting epileptic seizures from electro-encephalo-gram (EEG) data recorded from normal subjects and epileptic patients. The new scheme was based on approximate entropy (ApEn) and discrete wavelet transform (DWT) analysis of EEG signals. Seizure detection was accomplished in two stages. In the first stage, EEG signals were decomposed into approximation and detail coefficients using DWT. In the second stage, ApEn values of the approximation and detail coefficients were computed. Significant differences were found between the ApEn values of the epileptic and the normal EEG allowing us to detect seizures with over 96% accuracy. Without DWT as preprocessing step, it was shown that the detection rate was reduced to 73%. The analysis results depicted that during seizure activity EEG had lower ApEn values compared to normal EEG. This suggested that epileptic EEG was more predictable or less complex than the normal EEG. The data was further analyzed with surrogate data analysis methods to test for evidence of nonlinearities. It was shown that epileptic EEG had significant nonlinearity whereas normal EEG behaved similar to Gaussian linear stochastic process.

Journal ArticleDOI
TL;DR: The OL-SVR model is compared with three well-known prediction models including Gaussian maximum likelihood (GML), Holt exponential smoothing, and artificial neural net models and suggests that GML, which relies heavily on the recurring characteristics of day-to-day traffic, performs slightly better than other models under typical traffic conditions, as demonstrated by previous studies.
Abstract: Most literature on short-term traffic flow forecasting focused mainly on normal, or non-incident, conditions and, hence, limited their applicability when traffic flow forecasting is most needed, i.e., incident and atypical conditions. Accurate prediction of short-term traffic flow under atypical conditions, such as vehicular crashes, inclement weather, work zone, and holidays, is crucial to effective and proactive traffic management systems in the context of intelligent transportation systems (ITS) and, more specifically, dynamic traffic assignment (DTA). To this end, this paper presents an application of a supervised statistical learning technique called Online Support Vector machine for Regression, or OL-SVR, for the prediction of short-term freeway traffic flow under both typical and atypical conditions. The OL-SVR model is compared with three well-known prediction models including Gaussian maximum likelihood (GML), Holt exponential smoothing, and artificial neural net models. The resultant performance comparisons suggest that GML, which relies heavily on the recurring characteristics of day-to-day traffic, performs slightly better than other models under typical traffic conditions, as demonstrated by previous studies. Yet OL-SVR is the best performer under non-recurring atypical traffic conditions. It appears that for deployed ITS systems that gear toward timely response to real-world atypical and incident situations, OL-SVR may be a better tool than GML.

Journal ArticleDOI
TL;DR: The goal of this paper is to review the works that were published in journals, suggest a new classification framework of context-aware systems, and explore each feature of classification framework using a keyword index and article title search.
Abstract: Nowadays, numerous journals and conferences have published articles related to context-aware systems, indicating many researchers' interest. Therefore, the goal of this paper is to review the works that were published in journals, suggest a new classification framework of context-aware systems, and explore each feature of classification framework. This paper is based on a literature review of context-aware systems from 2000 to 2007 using a keyword index and article title search. The classification framework is developed based on the architecture of context-aware systems, which consists of the following five layers: concept and research layer, network layer, middleware layer, application layer and user infrastructure layer. The articles are categorized based on the classification framework. This paper allows researchers to extract several lessons learned that are important for the implementation of context-aware systems.

Journal ArticleDOI
TL;DR: This research compared three supervised machine learning algorithms of Naive Bayes, SVM and the character based N-gram model for sentiment classification of the reviews on travel blogs for seven popular travel destinations in the US and Europe.
Abstract: The rapid growth in Internet applications in tourism has lead to an enormous amount of personal reviews for travel-related information on the Web. These reviews can appear in different forms like BBS, blogs, Wiki or forum websites. More importantly, the information in these reviews is valuable to both travelers and practitioners for various understanding and planning processes. An intrinsic problem of the overwhelming information on the Internet, however, is information overloading as users are simply unable to read all the available information. Query functions in search engines like Yahoo and Google can help users find some of the reviews that they needed about specific destinations. The returned pages from these search engines are still beyond the visual capacity of humans. In this research, sentiment classification techniques were incorporated into the domain of mining reviews from travel blogs. Specifically, we compared three supervised machine learning algorithms of Naive Bayes, SVM and the character based N-gram model for sentiment classification of the reviews on travel blogs for seven popular travel destinations in the US and Europe. Empirical findings indicated that the SVM and N-gram approaches outperformed the Naive Bayes approach, and that when training datasets had a large number of reviews, all three approaches reached accuracies of at least 80%.

Journal ArticleDOI
TL;DR: Cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class are proposed and the experimental results show that these approaches outperform the other under-Sampling techniques in the previous studies.
Abstract: For classification problem, the training data will significantly influence the classification accuracy. However, the data in real-world applications often are imbalanced class distribution, that is, most of the data are in majority class and little data are in minority class. In this case, if all the data are used to be the training data, the classifier tends to predict that most of the incoming data belongs to the majority class. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class and investigate the effect of under-sampling methods in the imbalanced class distribution environment. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.

Journal ArticleDOI
TL;DR: F fuzzy risk priority numbers (FRPNs) are proposed for prioritization of failure modes, defined as fuzzy weighted geometric means of the fuzzy ratings for O, S and D, and can be computed using alpha-level sets and linear programming models.
Abstract: Failure mode and effects analysis (FMEA) has been extensively used for examining potential failures in products, processes, designs and services. An important issue of FMEA is the determination of risk priorities of the failure modes that have been identified. The traditional FMEA determines the risk priorities of failure modes using the so-called risk priority numbers (RPNs), which require the risk factors like the occurrence (O), severity (S) and detection (D) of each failure mode to be precisely evaluated. This may not be realistic in real applications. In this paper we treat the risk factors O, S and D as fuzzy variables and evaluate them using fuzzy linguistic terms and fuzzy ratings. As a result, fuzzy risk priority numbers (FRPNs) are proposed for prioritization of failure modes. The FRPNs are defined as fuzzy weighted geometric means of the fuzzy ratings for O, S and D, and can be computed using alpha-level sets and linear programming models. For ranking purpose, the FRPNs are defuzzified using centroid defuzzification method, in which a new centroid defuzzification formula based on alpha-level sets is derived. A numerical example is provided to illustrate the potential applications of the proposed fuzzy FMEA and the detailed computational process of the FRPNs.

Journal ArticleDOI
TL;DR: A least squares version of the recently proposed twin support vector machine (TSVM) for binary classification has comparable classification accuracy to that of TSVM but with considerably lesser computational time.
Abstract: In this paper we formulate a least squares version of the recently proposed twin support vector machine (TSVM) for binary classification. This formulation leads to extremely simple and fast algorithm for generating binary classifiers based on two non-parallel hyperplanes. Here we attempt to solve two modified primal problems of TSVM, instead of two dual problems usually solved. We show that the solution of the two modified primal problems reduces to solving just two systems of linear equations as opposed to solving two quadratic programming problems along with two systems of linear equations in TSVM. Classification using nonlinear kernel also leads to systems of linear equations. Our experiments on publicly available datasets indicate that the proposed least squares TSVM has comparable classification accuracy to that of TSVM but with considerably lesser computational time. Since linear least squares TSVM can easily handle large datasets, we further went on to investigate its efficiency for text categorization applications. Computational results demonstrate the effectiveness of the proposed method over linear proximal SVM on all the text corpuses considered.

Journal ArticleDOI
TL;DR: Two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets are presented: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM).
Abstract: As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naive Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naive Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.

Journal ArticleDOI
TL;DR: Assessment of electroencephalography activities during a monotonous driving session showed stable delta and theta activities over time, a slight decrease of alpha activity, and a significant decrease of beta activity, which can be used for future development of fatigue countermeasure devices.
Abstract: Fatigue is a constant occupational hazard for drivers and it greatly reduces efficiency and performance when one persists in continuing the current activity. Studies have investigated various physiological associations with fatigue to try to identify fatigue indicators. The current study assessed the four electroencephalography (EEG) activities, delta (@d), theta (@q), alpha (@a) and beta (@b), during a monotonous driving session in 52 subjects (36 males and 16 females). Performance of four algorithms, which were: algorithm (i) (@[email protected])/@b, algorithm (ii) @a/@b, algorithm (iii) (@[email protected])/(@[email protected]), and algorithm (iv) @q/@b, were also assessed as possible indicators for fatigue detection. Results showed stable delta and theta activities over time, a slight decrease of alpha activity, and a significant decrease of beta activity (p<0.05). All four algorithms showed an increase in the ratio of slow wave to fast wave EEG activities over time. Algorithm (i) (@[email protected])/@b showed a larger increase. The results have implications for detecting fatigue. Impact on industry: The results of this research have the implication for detecting fatigue and can be used for future development of fatigue countermeasure devices.

Journal ArticleDOI
TL;DR: A methodology which uses SAS base software 9.1.3 for diagnosing of the heart disease using a neural networks ensemble method, which creates new models by combining the posterior probabilities or the predicted values from multiple predecessor models.
Abstract: In the last decades, several tools and various methodologies have been proposed by the researchers for developing effective medical decision support systems. Moreover, new methodologies and new tools are continued to develop and represent day by day. Diagnosing of the heart disease is one of the important issue and many researchers investigated to develop intelligent medical decision support systems to improve the ability of the physicians. In this paper, we introduce a methodology which uses SAS base software 9.1.3 for diagnosing of the heart disease. A neural networks ensemble method is in the centre of the proposed system. This ensemble based methods creates new models by combining the posterior probabilities or the predicted values from multiple predecessor models. So, more effective models can be created. We performed experiments with the proposed tool. We obtained 89.01% classification accuracy from the experiments made on the data taken from Cleveland heart disease database. We also obtained 80.95% and 95.91% sensitivity and specificity values, respectively, in heart disease diagnosis.

Journal ArticleDOI
TL;DR: In this paper, a two step methodology is structured to evaluate hazardous waste transportation firms containing the methods of fuzzy-AHP and TOPSIS.
Abstract: Hazardous wastes are likely to cause danger to human health and/or environment. Safe transportation of them is so important. Consequently, selection of the right and most appropriate transportation firm is an important problem for hazardous waste generators. In this paper, a two step methodology is structured to evaluate hazardous waste transportation firms containing the methods of fuzzy-AHP and TOPSIS. And a numerical example is presented to clarify the methodology.

Journal ArticleDOI
TL;DR: A systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets is provided in this paper, where the authors used 74 studies in 11 journals and several conference proceedings.
Abstract: This paper provides a systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets. The review uses 74 software fault prediction papers in 11 journals and several conference proceedings. According to the review results, the usage percentage of public datasets increased significantly and the usage percentage of machine learning algorithms increased slightly since 2005. In addition, method-level metrics are still the most dominant metrics in fault prediction research area and machine learning algorithms are still the most popular methods for fault prediction. Researchers working on software fault prediction area should continue to use public datasets and machine learning algorithms to build better fault predictors. The usage percentage of class-level is beyond acceptable levels and they should be used much more than they are now in order to predict the faults earlier in design phase of software life cycle.

Journal ArticleDOI
TL;DR: A fuzzy model to evaluate the performance of the firms by using financial ratios and at the same time, taking subjective judgments of decision makers into consideration is developed.
Abstract: In today's competitive environment evaluating firms' performance properly, is an important issue not only for investors and creditors but also for the firms that are in the same sector. Determining the competitiveness of the firms and evaluating the financial performance of them is also crucial for the sector's development. The aim of this study is developing a fuzzy model to evaluate the performance of the firms by using financial ratios and at the same time, taking subjective judgments of decision makers into consideration. Proposed approach is based on Fuzzy Analytic Hierarchy Process (FAHP) and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) methods. FAHP method is used in determining the weights of the criteria by decision makers and then rankings of the firms are determined by TOPSIS method. The proposed method is used for evaluating the performance of the fifteen Turkish cement firms in the Istanbul Stock Exchange by using their financial tables. Then the rankings of the firms are determined according to their results.

Journal ArticleDOI
TL;DR: A comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches concludes that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.
Abstract: In this paper, we present a comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches. Instead of considering Spam filtering as a standard classification problem, we highlight the importance of considering specific characteristics of the problem, especially concept drift, in designing new filters. Two particularly important aspects not widely recognized in the literature are discussed: the difficulties in updating a classifier based on the bag-of-words representation and a major difference between two early naive Bayes models. Overall, we conclude that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.

Journal ArticleDOI
TL;DR: It is found that there is no need to under-sample so that there are as many churners in your training set as non churners, and under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC.
Abstract: Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1), 7-19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques. AUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press]. Results show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI'2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique.

Journal ArticleDOI
TL;DR: This research demonstrated that the AR can be used for reducing the dimension of feature space and proposed AR+NN model can be use to obtain fast automatic diagnostic systems for other diseases.
Abstract: This paper presents an automatic diagnosis system for detecting breast cancer based on association rules (AR) and neural network (NN). In this study, AR is used for reducing the dimension of breast cancer database and NN is used for intelligent classification. The proposed AR+NN system performance is compared with NN model. The dimension of input feature space is reduced from nine to four by using AR. In test stage, 3-fold cross validation method was applied to the Wisconsin breast cancer database to evaluate the proposed system performances. The correct classification rate of proposed system is 95.6%. This research demonstrated that the AR can be used for reducing the dimension of feature space and proposed AR+NN model can be used to obtain fast automatic diagnostic systems for other diseases.

Journal ArticleDOI
TL;DR: The proposed FMCDM evaluation model of banking performance using the BSC framework can be a useful and effective assessment tool and highlights the critical aspects of evaluation criteria as well as the gaps to improve banking performance for achieving aspired/desired level.
Abstract: The paper proposed a Fuzzy Multiple Criteria Decision Making (FMCDM) approach for banking performance evaluation. Drawing on the four perspectives of a Balanced Scorecard (BSC), this research first summarized the evaluation indexes synthesized from the literature relating to banking performance. Then, for screening these indexes, 23 indexes fit for banking performance evaluation were selected through expert questionnaires. Furthermore, the relative weights of the chosen evaluation indexes were calculated by Fuzzy Analytic Hierarchy Process (FAHP). And the three MCDM analytical tools of SAW, TOPSIS, and VIKOR were respectively adopted to rank the banking performance and improve the gaps with three banks as an empirical example. The analysis results highlight the critical aspects of evaluation criteria as well as the gaps to improve banking performance for achieving aspired/desired level. It shows that the proposed FMCDM evaluation model of banking performance using the BSC framework can be a useful and effective assessment tool.

Journal ArticleDOI
TL;DR: This survey discusses related issues and main approaches to these problems, namely, subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction.
Abstract: The sentiment detection of texts has been witnessed a booming interest in recent years, due to the increased availability of online reviews in digital form and the ensuing need to organize them Till to now, there are mainly four different problems predominating in this research community, namely, subjectivity classification, word sentiment classification, document sentiment classification and opinion extraction In fact, there are inherent relations between them Subjectivity classification can prevent the sentiment classifier from considering irrelevant or even potentially misleading text Document sentiment classification and opinion extraction have often involved word sentiment classification techniques This survey discusses related issues and main approaches to these problems