Showing papers in &quot;Expert Systems With Applications in 2009&quot;

A multi-criteria intuitionistic fuzzy group decision making for supplier selection with TOPSIS method

TL;DR: Experimental results show that the proposed algorithm takes a significantly reduced time in computation with comparable performance against the partitioning around medoids.

...read moreread less

Abstract: This paper proposes a new algorithm for K-medoids clustering which runs like the K-means algorithm and tests several methods for selecting initial medoids. The proposed algorithm calculates the distance matrix once and uses it for finding new medoids at every iterative step. To evaluate the proposed algorithm, we use some real and artificial data sets and compare with the results of other algorithms in terms of the adjusted Rand index. Experimental results show that the proposed algorithm takes a significantly reduced time in computation with comparable performance against the partitioning around medoids.

...read moreread less

1,629 citations

Journal Article•DOI•

[...]

Fatih Emre Boran¹, Serkan Genç¹, Mustafa Kurt¹, Diyar Akay¹•Institutions (1)

Gazi University¹

01 Oct 2009-Expert Systems With Applications

TL;DR: In this study, TOPSIS method combined with intuitionistic fuzzy set is proposed to select appropriate supplier in group decision making environment and Intuitionistic fuzzy weighted averaging (IFWA) operator is utilized to aggregate individual opinions of decision makers for rating the importance of criteria and alternatives.

...read moreread less

Abstract: Supplier selection, the process of finding the right suppliers who are able to provide the buyer with the right quality products and/or services at the right price, at the right time and in the right quantities, is one of the most critical activities for establishing an effective supply chain. On the other hand, it is a hard problem since supplier selection is typically a multi criteria group decision-making problem involving several conflicting criteria on which decision maker's knowledge is usually vague and imprecise. In this study, TOPSIS method combined with intuitionistic fuzzy set is proposed to select appropriate supplier in group decision making environment. Intuitionistic fuzzy weighted averaging (IFWA) operator is utilized to aggregate individual opinions of decision makers for rating the importance of criteria and alternatives. Finally, a numerical example for supplier selection is given to illustrate application of intuitionistic fuzzy TOPSIS method.

...read moreread less

1,278 citations

Journal Article•DOI•

Review: Application of data mining techniques in customer relationship management: A literature review and classification

[...]

Eric W.T. Ngai¹, Li Xiu², D. C. K. Chau¹•Institutions (2)

Hong Kong Polytechnic University¹, Tsinghua University²

Review: Intrusion detection by machine learning: A review

TL;DR: Findings of this paper indicate that the research area of customer retention received most research attention and classification and association models are the two commonly used models for data mining in CRM.

...read moreread less

Abstract: Despite the importance of data mining techniques to customer relationship management (CRM), there is a lack of a comprehensive literature review and a classification scheme for it. This is the first identifiable academic literature review of the application of data mining techniques to CRM. It provides an academic database of literature between the period of 2000-2006 covering 24 journals and proposes a classification scheme to classify the articles. Nine hundred articles were identified and reviewed for their direct relevance to applying data mining techniques to CRM. Eighty-seven articles were subsequently selected, reviewed and classified. Each of the 87 selected papers was categorized on four CRM dimensions (Customer Identification, Customer Attraction, Customer Retention and Customer Development) and seven data mining functions (Association, Classification, Clustering, Forecasting, Regression, Sequence Discovery and Visualization). Papers were further classified into nine sub-categories of CRM elements under different data mining techniques based on the major focus of each paper. The review and classification process was independently verified. Findings of this paper indicate that the research area of customer retention received most research attention. Of these, most are related to one-to-one marketing and loyalty programs respectively. On the other hand, classification and association models are the two commonly used models for data mining in CRM. Our analysis provides a roadmap to guide future research and facilitate knowledge accumulation and creation concerning the application of data mining techniques in CRM.

...read moreread less

1,135 citations

Journal Article•DOI•

[...]

Chih-Fong Tsai¹, Yu-Feng Hsu², Chia-Ying Lin³, Wei-Yang Lin³•Institutions (3)

National Central University¹, National Sun Yat-sen University², National Chung Cheng University³

01 Dec 2009-Expert Systems With Applications

TL;DR: This chapter reviews 55 related studies in the period between 2000 and 2007 focusing on developing single, hybrid, and ensemble classifiers and discusses current achievements and limitations in developing intrusion detection systems by machine learning.

...read moreread less

Abstract: The popularity of using Internet contains some risks of network attacks. Intrusion detection is one major research problem in network security, whose aim is to identify unusual access or attacks to secure internal networks. In literature, intrusion detection systems have been approached by various machine learning techniques. However, there is no a review paper to examine and understand the current status of using machine learning techniques to solve the intrusion detection problems. This chapter reviews 55 related studies in the period between 2000 and 2007 focusing on developing single, hybrid, and ensemble classifiers. Related studies are compared by their classifier design, datasets used, and other experimental setups. Current achievements and limitations in developing intrusion detection systems by machine learning are present and discussed. A number of future research directions are also provided.

...read moreread less

872 citations

Journal Article•DOI•

A green supplier selection model for high-tech industry

[...]

Amy H. I. Lee¹, He-Yau Kang², Chang-Fu Hsu³, Hsiao-Chu Hung¹•Institutions (3)

Chung Hua University¹, National Chin-Yi University of Technology², National Chiao Tung University³

Review: Neural networks and statistical techniques: A review of applications

TL;DR: With the proposed model, manufacturers can have a better understanding of the capabilities that a green supplier must possess and can evaluate and select the most suitable green supplier for cooperation.

...read moreread less

Abstract: With growing worldwide awareness of environmental protection, green production has become an important issue for almost every manufacturer and will determine the sustainability of a manufacturer in the long term. A performance evaluation system for green suppliers thus is necessary to determine the suitability of suppliers to cooperate with the firm. While the works on the evaluation and/or selection of suppliers are abundant, those that concern environmental issues are rather limited. Therefore, in this study, a model for evaluating green suppliers is proposed. The Delphi method is applied first to differentiate the criteria for evaluating traditional suppliers and green suppliers. A hierarchy is constructed next to help evaluate the importance of the selected criteria and the performance of green suppliers. Since experts may not identify the importance of factors clearly, the results of questionnaires may be biased. To consider the vagueness of experts' opinions, the fuzzy extended analytic hierarchy process is exploited. With the proposed model, manufacturers can have a better understanding of the capabilities that a green supplier must possess and can evaluate and select the most suitable green supplier for cooperation.

...read moreread less

735 citations

Journal Article•DOI•

[...]

Mukta Paliwal¹, Usha A. Kumar¹•Institutions (1)

Indian Institutes of Technology¹

01 Jan 2009-Expert Systems With Applications

TL;DR: A comprehensive review of articles that involve a comparative study of feed forward neural networks and statistical techniques used for prediction and classification problems in various areas of applications is carried out.

...read moreread less

Abstract: Neural networks are being used in areas of prediction and classification, the areas where statistical methods have traditionally been used. Both the traditional statistical methods and neural networks are looked upon as competing model-building techniques in literature. This paper carries out a comprehensive review of articles that involve a comparative study of feed forward neural networks and statistical techniques used for prediction and classification problems in various areas of applications. Tabular presentations highlighting the important features of these articles are also provided. This study aims to give useful insight into the capabilities of neural networks and statistical methods used in different kinds of applications.

...read moreread less

731 citations

Journal Article•DOI•

Support vector machines combined with feature selection for breast cancer diagnosis

[...]

Mehmet Fatih Akay¹•Institutions (1)

Çukurova University¹

Surveying stock market forecasting techniques - Part II: Soft computing methods

TL;DR: The results show that the highest classification accuracy (99.51%) is obtained for the SVM model that contains five features, and this is very promising compared to the previously reported results.

...read moreread less

Abstract: Breast cancer is the second largest cause of cancer deaths among women. At the same time, it is also among the most curable cancer types if it can be diagnosed early. Research efforts have reported with increasing confirmation that the support vector machines (SVM) have greater accurate diagnosis ability. In this paper, breast cancer diagnosis based on a SVM-based method combined with feature selection has been proposed. Experiments have been conducted on different training-test partitions of the Wisconsin breast cancer dataset (WBCD), which is commonly used among researchers who use machine learning methods for breast cancer diagnosis. The performance of the method is evaluated using classification accuracy, sensitivity, specificity, positive and negative predictive values, receiver operating characteristic (ROC) curves and confusion matrix. The results show that the highest classification accuracy (99.51%) is obtained for the SVM model that contains five features, and this is very promising compared to the previously reported results.

...read moreread less

723 citations

Journal Article•DOI•

[...]

George S. Atsalakis¹, Kimon P. Valavanis²•Institutions (2)

Technical University of Crete¹, University of South Florida²

The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients

TL;DR: This paper surveys more than 100 related published articles that focus on neural and neuro-fuzzy techniques derived and applied to forecast stock markets to show that soft computing techniques are widely accepted to studying and evaluating stock market behavior.

...read moreread less

Abstract: The key to successful stock market forecasting is achieving best results with minimum required input data. Given stock market model uncertainty, soft computing techniques are viable candidates to capture stock market nonlinear relations returning significant forecasting results with not necessarily prior knowledge of input data statistical distributions. This paper surveys more than 100 related published articles that focus on neural and neuro-fuzzy techniques derived and applied to forecast stock markets. Classifications are made in terms of input data, forecasting methodology, performance evaluation and performance measures used. Through the surveyed papers, it is shown that soft computing techniques are widely accepted to studying and evaluating stock market behavior.

...read moreread less

714 citations

Journal Article•DOI•

[...]

I-Cheng Yeh¹, Che-hui Lien²•Institutions (2)

Chung Hua University¹, Thompson Rivers University²

Determinants of behavioral intention to mobile banking

TL;DR: Among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default, and its regression intercept is close to zero, and regression coefficient to one.

...read moreread less

Abstract: This research aimed at the case of customers' default payments in Taiwan and compares the predictive accuracy of probability of default among six data mining methods. From the perspective of risk management, the result of predictive accuracy of the estimated probability of default will be more valuable than the binary result of classification - credible or not credible clients. Because the real probability of default is unknown, this study presented the novel ''Sorting Smoothing Method'' to estimate the real probability of default. With the real probability of default as the response variable (Y), and the predictive probability of default as the independent variable (X), the simple linear regression result (Y=A+BX) shows that the forecasting model produced by artificial neural network has the highest coefficient of determination; its regression intercept (A) is close to zero, and regression coefficient (B) to one. Therefore, among the six data mining techniques, artificial neural network is the only one that can accurately estimate the real probability of default.

...read moreread less

713 citations

Journal Article•DOI•

[...]

Ja-Chul Gu¹, Sang Chul Lee, Yung-Ho Suh²•Institutions (2)

KAIST¹, Kyung Hee University²

01 Nov 2009-Expert Systems With Applications

TL;DR: This study found that self-efficiency was the strongest antecedent of perceived ease-of-use, which directly and indirectly affected behavioral intention through perceived usefulness in mobile banking.

...read moreread less

Abstract: With the improvement of mobile technologies and devices, banking users are able to conduct banking services at anyplace and at anytime. Recently, many banks in the world have provided mobile access to financial information. The reason to understand what factors contribute to users' intention to use mobile banking is important issue of research. The purpose of this research is to examine and validate determinants of users' intention to mobile banking. This research used a structural equation modeling (SEM) to test the causalities in the proposed model. The results indicated strong support for the validity of proposed model with 72.2% of the variance in behavioral intention to mobile banking. This study found that self-efficiency was the strongest antecedent of perceived ease-of-use, which directly and indirectly affected behavioral intention through perceived usefulness in mobile banking. Structural assurances are the strongest antecedent of trust, which could increase behavioral intention of mobile banking. This research verified the effect of perceived usefulness, trust and perceived ease-of-use on behavioral intention in mobile banking. The results have several implications for mobile banking managers.

...read moreread less

711 citations

Journal Article•DOI•

Developing a fuzzy TOPSIS approach based on subjective weights and objective weights

[...]

Tien-Chin Wang¹, Hsien-Da Lee¹•Institutions (1)

I-Shou University¹

01 Jul 2009-Expert Systems With Applications

TL;DR: This work proposes a new fuzzy TOPSIS for evaluating alternatives by integrating using subjective and objective weights, and adopts end-user ratings as an objective weight based on Shannon's entropy theory.

...read moreread less

Abstract: Multiple criteria decision making (MCDM) is widely used in ranking one or more alternatives from a set of available alternatives with respect to multiple criteria. Inspired by MCDM to systematically evaluate alternatives under various criteria, we propose a new fuzzy TOPSIS for evaluating alternatives by integrating using subjective and objective weights. Most MCDM approaches consider only decision maker's subjective weights. However, the end-user attitude can be a key factor. We propose a novel approach that involves end-user into the whole decision making process. In this proposed approach, the subjective weights assigned by decision makers (DM) are normalized into a comparable scale. In addition, we also adopt end-user ratings as an objective weight based on Shannon's entropy theory. A closeness coefficient is defined to determine the ranking order of alternatives by calculating the distances to both ideal and negative-ideal solutions. A case study is performed showing how the propose method can be used for a software outsourcing problem. With our method, we provide decision makers more information to make more subtle decisions.

...read moreread less

Journal Article•DOI•

Weapon selection using the AHP and TOPSIS methods under fuzzy environment

[...]

Metin Dağdeviren¹, Serkan Yavuz, Nevzat Kılınç•Institutions (1)

Gazi University¹

Automatic detection of epileptic seizures in EEG using discrete wavelet transform and approximate entropy

TL;DR: An evaluation model based on the analytic hierarchy process (AHP) and the technique for order performance by similarity to ideal solution (TOPSIS) to help the actors in defence industries for the selection of optimal weapon in a fuzzy environment where the vagueness and subjectivity are handled with linguistic values parameterized by triangular fuzzy numbers.

...read moreread less

Abstract: The weapon selection problem is a strategic issue and has a significant impact on the efficiency of defense systems. On the other hand, selecting the optimal weapon among many alternatives is a multi-criteria decision-making (MCDM) problem. This paper develops an evaluation model based on the analytic hierarchy process (AHP) and the technique for order performance by similarity to ideal solution (TOPSIS), to help the actors in defence industries for the selection of optimal weapon in a fuzzy environment where the vagueness and subjectivity are handled with linguistic values parameterized by triangular fuzzy numbers. The AHP is used to analyze the structure of the weapon selection problem and to determine weights of the criteria, and fuzzy TOPSIS method is used to obtain final ranking. A real world application is conducted to illustrate the utilization of the model for the weapon selection problem. The application could be interpreted as demonstrating the effectiveness and feasibility of the proposed model.

...read moreread less

Journal Article•DOI•

[...]

Hasan Ocak¹•Institutions (1)

Kocaeli University¹

Online-SVR for short-term traffic flow prediction under typical and atypical traffic conditions

TL;DR: It was shown that epileptic EEG had significant nonlinearity whereas normal EEG behaved similar to Gaussian linear stochastic process.

...read moreread less

Abstract: In this study, a new scheme was presented for detecting epileptic seizures from electro-encephalo-gram (EEG) data recorded from normal subjects and epileptic patients. The new scheme was based on approximate entropy (ApEn) and discrete wavelet transform (DWT) analysis of EEG signals. Seizure detection was accomplished in two stages. In the first stage, EEG signals were decomposed into approximation and detail coefficients using DWT. In the second stage, ApEn values of the approximation and detail coefficients were computed. Significant differences were found between the ApEn values of the epileptic and the normal EEG allowing us to detect seizures with over 96% accuracy. Without DWT as preprocessing step, it was shown that the detection rate was reduced to 73%. The analysis results depicted that during seizure activity EEG had lower ApEn values compared to normal EEG. This suggested that epileptic EEG was more predictable or less complex than the normal EEG. The data was further analyzed with surrogate data analysis methods to test for evidence of nonlinearities. It was shown that epileptic EEG had significant nonlinearity whereas normal EEG behaved similar to Gaussian linear stochastic process.

...read moreread less

Journal Article•DOI•

[...]

Manoel Mendonca de Castro-Neto¹, Young-Seon Jeong², Myong-Kee Jeong², Lee D. Han¹•Institutions (2)

University of Tennessee¹, Rutgers University²

Pohang University of Science and Technology¹

TL;DR: The OL-SVR model is compared with three well-known prediction models including Gaussian maximum likelihood (GML), Holt exponential smoothing, and artificial neural net models and suggests that GML, which relies heavily on the recurring characteristics of day-to-day traffic, performs slightly better than other models under typical traffic conditions, as demonstrated by previous studies.

...read moreread less

Abstract: Most literature on short-term traffic flow forecasting focused mainly on normal, or non-incident, conditions and, hence, limited their applicability when traffic flow forecasting is most needed, i.e., incident and atypical conditions. Accurate prediction of short-term traffic flow under atypical conditions, such as vehicular crashes, inclement weather, work zone, and holidays, is crucial to effective and proactive traffic management systems in the context of intelligent transportation systems (ITS) and, more specifically, dynamic traffic assignment (DTA). To this end, this paper presents an application of a supervised statistical learning technique called Online Support Vector machine for Regression, or OL-SVR, for the prediction of short-term freeway traffic flow under both typical and atypical conditions. The OL-SVR model is compared with three well-known prediction models including Gaussian maximum likelihood (GML), Holt exponential smoothing, and artificial neural net models. The resultant performance comparisons suggest that GML, which relies heavily on the recurring characteristics of day-to-day traffic, performs slightly better than other models under typical traffic conditions, as demonstrated by previous studies. Yet OL-SVR is the best performer under non-recurring atypical traffic conditions. It appears that for deployed ITS systems that gear toward timely response to real-world atypical and incident situations, OL-SVR may be a better tool than GML.

...read moreread less

Journal Article•DOI•

Context-aware systems

[...]

Jong-Yi Hong¹, Euiho Suh¹, Sungjin Kim¹•Institutions (1)

Sentiment classification of online reviews to travel destinations by supervised machine learning approaches

TL;DR: The goal of this paper is to review the works that were published in journals, suggest a new classification framework of context-aware systems, and explore each feature of classification framework using a keyword index and article title search.

...read moreread less

Abstract: Nowadays, numerous journals and conferences have published articles related to context-aware systems, indicating many researchers' interest. Therefore, the goal of this paper is to review the works that were published in journals, suggest a new classification framework of context-aware systems, and explore each feature of classification framework. This paper is based on a literature review of context-aware systems from 2000 to 2007 using a keyword index and article title search. The classification framework is developed based on the architecture of context-aware systems, which consists of the following five layers: concept and research layer, network layer, middleware layer, application layer and user infrastructure layer. The articles are categorized based on the classification framework. This paper allows researchers to extract several lessons learned that are important for the implementation of context-aware systems.

...read moreread less

Journal Article•DOI•

[...]

Qiang Ye¹, Ziqiong Zhang², Rob Law¹•Institutions (2)

Hong Kong Polytechnic University¹, Harbin Institute of Technology²

Cluster-based under-sampling approaches for imbalanced data distributions

TL;DR: This research compared three supervised machine learning algorithms of Naive Bayes, SVM and the character based N-gram model for sentiment classification of the reviews on travel blogs for seven popular travel destinations in the US and Europe.

...read moreread less

Abstract: The rapid growth in Internet applications in tourism has lead to an enormous amount of personal reviews for travel-related information on the Web. These reviews can appear in different forms like BBS, blogs, Wiki or forum websites. More importantly, the information in these reviews is valuable to both travelers and practitioners for various understanding and planning processes. An intrinsic problem of the overwhelming information on the Internet, however, is information overloading as users are simply unable to read all the available information. Query functions in search engines like Yahoo and Google can help users find some of the reviews that they needed about specific destinations. The returned pages from these search engines are still beyond the visual capacity of humans. In this research, sentiment classification techniques were incorporated into the domain of mining reviews from travel blogs. Specifically, we compared three supervised machine learning algorithms of Naive Bayes, SVM and the character based N-gram model for sentiment classification of the reviews on travel blogs for seven popular travel destinations in the US and Europe. Empirical findings indicated that the SVM and N-gram approaches outperformed the Naive Bayes approach, and that when training datasets had a large number of reviews, all three approaches reached accuracies of at least 80%.

...read moreread less

Journal Article•DOI•

[...]

Show-Jane Yen¹, Yue-Shi Lee¹•Institutions (1)

Ming Chuan University¹

Risk evaluation in failure mode and effects analysis using fuzzy weighted geometric mean

TL;DR: Cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class are proposed and the experimental results show that these approaches outperform the other under-Sampling techniques in the previous studies.

...read moreread less

Abstract: For classification problem, the training data will significantly influence the classification accuracy. However, the data in real-world applications often are imbalanced class distribution, that is, most of the data are in majority class and little data are in minority class. In this case, if all the data are used to be the training data, the classifier tends to predict that most of the incoming data belongs to the majority class. Hence, it is important to select the suitable training data for classification in the imbalanced class distribution problem. In this paper, we propose cluster-based under-sampling approaches for selecting the representative data as training data to improve the classification accuracy for minority class and investigate the effect of under-sampling methods in the imbalanced class distribution environment. The experimental results show that our cluster-based under-sampling approaches outperform the other under-sampling techniques in the previous studies.

...read moreread less

Journal Article•DOI•

[...]

Ying-Ming Wang¹, Kwai-Sang Chin¹, Gary Ka Kwai Poon¹, Jian-Bo Yang²•Institutions (2)

City University of Hong Kong¹, University of Manchester²

Least squares twin support vector machines for pattern classification

TL;DR: F fuzzy risk priority numbers (FRPNs) are proposed for prioritization of failure modes, defined as fuzzy weighted geometric means of the fuzzy ratings for O, S and D, and can be computed using alpha-level sets and linear programming models.

...read moreread less

Abstract: Failure mode and effects analysis (FMEA) has been extensively used for examining potential failures in products, processes, designs and services. An important issue of FMEA is the determination of risk priorities of the failure modes that have been identified. The traditional FMEA determines the risk priorities of failure modes using the so-called risk priority numbers (RPNs), which require the risk factors like the occurrence (O), severity (S) and detection (D) of each failure mode to be precisely evaluated. This may not be realistic in real applications. In this paper we treat the risk factors O, S and D as fuzzy variables and evaluate them using fuzzy linguistic terms and fuzzy ratings. As a result, fuzzy risk priority numbers (FRPNs) are proposed for prioritization of failure modes. The FRPNs are defined as fuzzy weighted geometric means of the fuzzy ratings for O, S and D, and can be computed using alpha-level sets and linear programming models. For ranking purpose, the FRPNs are defuzzified using centroid defuzzification method, in which a new centroid defuzzification formula based on alpha-level sets is derived. A numerical example is provided to illustrate the potential applications of the proposed fuzzy FMEA and the detailed computational process of the FRPNs.

...read moreread less

Journal Article•DOI•

[...]

M. Arun Kumar¹, M. Gopal¹•Institutions (1)

Indian Institute of Technology Delhi¹

Feature selection for text classification with Naïve Bayes

TL;DR: A least squares version of the recently proposed twin support vector machine (TSVM) for binary classification has comparable classification accuracy to that of TSVM but with considerably lesser computational time.

...read moreread less

Abstract: In this paper we formulate a least squares version of the recently proposed twin support vector machine (TSVM) for binary classification. This formulation leads to extremely simple and fast algorithm for generating binary classifiers based on two non-parallel hyperplanes. Here we attempt to solve two modified primal problems of TSVM, instead of two dual problems usually solved. We show that the solution of the two modified primal problems reduces to solving just two systems of linear equations as opposed to solving two quadratic programming problems along with two systems of linear equations in TSVM. Classification using nonlinear kernel also leads to systems of linear equations. Our experiments on publicly available datasets indicate that the proposed least squares TSVM has comparable classification accuracy to that of TSVM but with considerably lesser computational time. Since linear least squares TSVM can easily handle large datasets, we further went on to investigate its efficiency for text categorization applications. Computational results demonstrate the effectiveness of the proposed method over linear proximal SVM on all the text corpuses considered.

...read moreread less

Journal Article•DOI•

[...]

Jingnian Chen¹, Houkuan Huang¹, Shengfeng Tian¹, Youli Qu¹•Institutions (1)

Beijing Jiaotong University¹

Using EEG spectral components to assess algorithms for detecting fatigue

TL;DR: Two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets are presented: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM).

...read moreread less

Abstract: As an important preprocessing technology in text classification, feature selection can improve the scalability, efficiency and accuracy of a text classifier. In general, a good feature selection method should consider domain and algorithm characteristics. As the Naive Bayesian classifier is very simple and efficient and highly sensitive to feature selection, so the research of feature selection specially for it is significant. This paper presents two feature evaluation metrics for the Naive Bayesian classifier applied on multi-class text datasets: Multi-class Odds Ratio (MOR), and Class Discriminating Measure (CDM). Experiments of text classification with Naive Bayesian classifiers were carried out on two multi-class texts collections. As the results indicate, CDM and MOR gain obviously better selecting effect than other feature selection approaches.

...read moreread less

Journal Article•DOI•

[...]

Budi Thomas Jap¹, Sara Lal¹, Peter Fischer, Evangelos Bekiaris•Institutions (1)

University of Technology, Sydney¹

Effective diagnosis of heart disease through neural networks ensembles

TL;DR: Assessment of electroencephalography activities during a monotonous driving session showed stable delta and theta activities over time, a slight decrease of alpha activity, and a significant decrease of beta activity, which can be used for future development of fatigue countermeasure devices.

...read moreread less

Abstract: Fatigue is a constant occupational hazard for drivers and it greatly reduces efficiency and performance when one persists in continuing the current activity. Studies have investigated various physiological associations with fatigue to try to identify fatigue indicators. The current study assessed the four electroencephalography (EEG) activities, delta (@d), theta (@q), alpha (@a) and beta (@b), during a monotonous driving session in 52 subjects (36 males and 16 females). Performance of four algorithms, which were: algorithm (i) (@[email protected])/@b, algorithm (ii) @a/@b, algorithm (iii) (@[email protected])/(@[email protected]), and algorithm (iv) @q/@b, were also assessed as possible indicators for fatigue detection. Results showed stable delta and theta activities over time, a slight decrease of alpha activity, and a significant decrease of beta activity (p<0.05). All four algorithms showed an increase in the ratio of slow wave to fast wave EEG activities over time. Algorithm (i) (@[email protected])/@b showed a larger increase. The results have implications for detecting fatigue. Impact on industry: The results of this research have the implication for detecting fatigue and can be used for future development of fatigue countermeasure devices.

...read moreread less

Journal Article•DOI•

[...]

Resul Das¹, Ibrahim Turkoglu¹, Abdulkadir Sengur¹•Institutions (1)

Fırat University¹

Evaluation of hazardous waste transportation firms by using a two step fuzzy-AHP and TOPSIS methodology

TL;DR: A methodology which uses SAS base software 9.1.3 for diagnosing of the heart disease using a neural networks ensemble method, which creates new models by combining the posterior probabilities or the predicted values from multiple predecessor models.

...read moreread less

Abstract: In the last decades, several tools and various methodologies have been proposed by the researchers for developing effective medical decision support systems. Moreover, new methodologies and new tools are continued to develop and represent day by day. Diagnosing of the heart disease is one of the important issue and many researchers investigated to develop intelligent medical decision support systems to improve the ability of the physicians. In this paper, we introduce a methodology which uses SAS base software 9.1.3 for diagnosing of the heart disease. A neural networks ensemble method is in the centre of the proposed system. This ensemble based methods creates new models by combining the posterior probabilities or the predicted values from multiple predecessor models. So, more effective models can be created. We performed experiments with the proposed tool. We obtained 89.01% classification accuracy from the experiments made on the data taken from Cleveland heart disease database. We also obtained 80.95% and 95.91% sensitivity and specificity values, respectively, in heart disease diagnosis.

...read moreread less

Journal Article•DOI•

[...]

Alev Taskin Gumus¹•Institutions (1)

Yıldız Technical University¹

A systematic review of software fault prediction studies

TL;DR: In this paper, a two step methodology is structured to evaluate hazardous waste transportation firms containing the methods of fuzzy-AHP and TOPSIS.

...read moreread less

Abstract: Hazardous wastes are likely to cause danger to human health and/or environment. Safe transportation of them is so important. Consequently, selection of the right and most appropriate transportation firm is an important problem for hazardous waste generators. In this paper, a two step methodology is structured to evaluate hazardous waste transportation firms containing the methods of fuzzy-AHP and TOPSIS. And a numerical example is presented to clarify the methodology.

...read moreread less

Journal Article•DOI•

[...]

Cagatay Catal, Banu Diri¹•Institutions (1)

Yıldız Technical University¹

Performance evaluation of Turkish cement firms with fuzzy analytic hierarchy process and TOPSIS methods

TL;DR: A systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets is provided in this paper, where the authors used 74 studies in 11 journals and several conference proceedings.

...read moreread less

Abstract: This paper provides a systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets. The review uses 74 software fault prediction papers in 11 journals and several conference proceedings. According to the review results, the usage percentage of public datasets increased significantly and the usage percentage of machine learning algorithms increased slightly since 2005. In addition, method-level metrics are still the most dominant metrics in fault prediction research area and machine learning algorithms are still the most popular methods for fault prediction. Researchers working on software fault prediction area should continue to use public datasets and machine learning algorithms to build better fault predictors. The usage percentage of class-level is beyond acceptable levels and they should be used much more than they are now in order to predict the faults earlier in design phase of software life cycle.

...read moreread less

Journal Article•DOI•

[...]

İrfan Ertuğrul¹, Nilsen Karakaşoğlu¹•Institutions (1)

Pamukkale University¹

01 Jan 2009-Expert Systems With Applications

TL;DR: A fuzzy model to evaluate the performance of the firms by using financial ratios and at the same time, taking subjective judgments of decision makers into consideration is developed.

...read moreread less

Abstract: In today's competitive environment evaluating firms' performance properly, is an important issue not only for investors and creditors but also for the firms that are in the same sector. Determining the competitiveness of the firms and evaluating the financial performance of them is also crucial for the sector's development. The aim of this study is developing a fuzzy model to evaluate the performance of the firms by using financial ratios and at the same time, taking subjective judgments of decision makers into consideration. Proposed approach is based on Fuzzy Analytic Hierarchy Process (FAHP) and TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) methods. FAHP method is used in determining the weights of the criteria by decision makers and then rankings of the firms are determined by TOPSIS method. The proposed method is used for evaluating the performance of the fifteen Turkish cement firms in the Istanbul Stock Exchange by using their financial tables. Then the rankings of the firms are determined according to their results.

...read moreread less

Journal Article•DOI•

Review: A review of machine learning approaches to Spam filtering

[...]

Thiago Guzella¹, Walmir Matos Caminhas¹•Institutions (1)

Universidade Federal de Minas Gerais¹

01 Sep 2009-Expert Systems With Applications

TL;DR: A comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches concludes that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.

...read moreread less

Abstract: In this paper, we present a comprehensive review of recent developments in the application of machine learning algorithms to Spam filtering, focusing on both textual- and image-based approaches. Instead of considering Spam filtering as a standard classification problem, we highlight the importance of considering specific characteristics of the problem, especially concept drift, in designing new filters. Two particularly important aspects not widely recognized in the literature are discussed: the difficulties in updating a classifier based on the bag-of-words representation and a major difference between two early naive Bayes models. Overall, we conclude that while important advancements have been made in the last years, several aspects remain to be explored, especially under more realistic evaluation settings.

...read moreread less

Journal Article•DOI•

Handling class imbalance in customer churn prediction

[...]

Jonathan Burez¹, D Van den Poel¹•Institutions (1)

Ghent University¹

An expert system for detection of breast cancer based on association rules and neural network

TL;DR: It is found that there is no need to under-sample so that there are as many churners in your training set as non churners, and under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC.

...read moreread less

Abstract: Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1), 7-19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques. AUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press]. Results show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI'2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique.

...read moreread less

Journal Article•DOI•

[...]

Murat Karabatak¹, M. Cevdet Ince¹•Institutions (1)

Fırat University¹