scispace - formally typeset

Proceedings ArticleDOI

Improved Accuracy of Naive Bayes Classifier for Determination of Customer Churn Uses SMOTE and Genetic Algorithms

06 Oct 2020-Vol. 1, Iss: 1, pp 70-75

TL;DR: The purpose of this study is to improve the accuracy of the Naive Bayes for customer classification by using the SMOTE and genetic algorithm to handle class imbalance problems and attributes selection.

AbstractWith increasing competition in the business world, many companies use data mining  techniques to determine the level of customer loyalty. The customer data used in this  study is the german credit dataset obtained from UCI. Such data have an imbalance  problem of class because the amount of data in the loyal class is more than in the  churn class. In addition, there are some irrelevant attributes for customer  classification, so attributes selection is needed to get more accurate classification  results. One classification algorithm is naive bayes. Naive Bayes has been used as an  effective classification for years because it is easy to build and give an independent  attribute into its structure. The purpose of this study is to improve the accuracy of the  Naive Bayes for customer classification. SMOTE and genetic algorithm do for  improving the accuracy. The SMOTE is used to handle class imbalance problems,  while the genetic algorithm is used for attributes selection. Accuracy using the Naive  Bayes is 47.10%, while the mean accuracy results obtained from the Naive Bayes  with the application of the SMOTE is 78.15% and the accuracy obtained from the  Naive Bayes with the application of the SMOTE and genetic algorithm is 78.46%.

Topics: Naive Bayes classifier (60%), Bayes' theorem (52%)

...read more

Content maybe subject to copyright    Report

Citations
More filters

Journal ArticleDOI
31 Mar 2021
TL;DR: There is the main dimension of logistic service quality in improving the quality of service, namely ordering condition, time, and information quality, which can be the basis of decision making for companies in choosing alternative criteria priorities.
Abstract: Logistics plays a role in the smooth transaction between companies because it is a facilitator of buying and selling goods and services to fulfill the supply orders of consumer companies. This study aims to analyze how the impact of improved Logistic Service Quality (LSQ) for quality of goods delivery services by using LSQ dimensions from previous research. Sample data is obtained through the dissemination of questionnaires which are then processed quantitatively with convergent validity and reliability tests. Data processing with a sample count of 61 respondents. The results of this study show that there is the main dimension of logistic service quality in improving the quality of service, namely ordering condition, time, and information quality. Each comparison factor is tested for consistency using the Analytical Hierarchy Process (AHP), each of the main criteria has a consistency value of less than 0.1 so that the main criteria tested have a consistent comparison matrix and can be the basis of decision making for companies in choosing alternative criteria priorities.

3 citations


Journal ArticleDOI
TL;DR: This study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy and uses the best feature selection and ensemble learning.
Abstract: Company bankruptcy is often a very big problem for companies. The impact of bankruptcy can cause losses to elements of the company such as owners, investors, employees, and consumers. One way to prevent bankruptcy is to predict the possibility of bankruptcy based on the company's financial data. Therefore, this study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy. The prediction analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using feature importance to XGBoost with a weight value filter of 10. The ensemble learning method used is stacking. Stacking is composed of the base model and meta learner. The base model consists of K-nearest neighbor, decision tree, SVM, and random forest, while the meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of 97%.

Journal ArticleDOI
31 Mar 2021
TL;DR: The results of this study obtained the best 2 packages recommended for tourists to choose, namely the Triangular Fuzzy Number and the Simple Additive Weighting method.
Abstract: For tourists who do not understand the situation or the desired tourist attraction, they can choose tour and travel services. Tour and travel provides a choice of tour packages with various variations. Determining the right tour and travel package and agency can benefit tourists, both in terms of financial and vacation quality. The data used in this study were obtained from several Tour and Travel agents. There are several variables used, namely the price of the package, the number of participants, and the number of facilities obtained. The method used in this study combines the Triangular Fuzzy Number (TFN) and the Simple Additive Weighting (SAW) method. The purpose of this study is to help tourists determine the most profitable or best packages. The results of this study obtained the best 2 packages recommended for tourists to choose.

References
More filters

Journal ArticleDOI
Abstract: An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.

11,512 citations


Journal ArticleDOI
Abstract: An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of oversampling the minority (abnormal)cla ss and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space)tha n only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space)t han varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC)and the ROC convex hull strategy.

11,077 citations


Journal Article
TL;DR: The aim of this work is to compare the performance of Support vector machine (SVM) and K-Nearest Neighbour (KNN) classifier on the basis of its accuracy, precision and execution time for CKD prediction.
Abstract: Chronic kidney disease (CKD), also known as chronic renal disease. Chronic kidney disease involves conditions that damage your kidneys and decrease their ability to keep you healthy. You may develop complications like high blood pressure, anemia (low blood count), weak bones, poor nutritional health and nerve damage. . Early detection and treatment can often keep chronic kidney disease from getting worse. Data Mining is the term used for knowledge discovery from large databases. The task of data mining is to make use of historical data, to discover regular patterns and improve future decisions, follows from the convergence of several recent trends: the lessening cost of large data storage devices and the everincreasing ease of collecting data over networks; the expansion of robust and efficient machine learning algorithms to process this data; and the lessening cost of computational power, enabling use of computationally intensive methods for data analysis. Machine learning, has already created practical applications in such areas as analyzing medical science outcomes, detecting fraud, detecting fake users etc. Various data mining classification approaches and machine learning algorithms are applied for prediction of chronic diseases. The objective of this research work is to introduce a new decision support system to predict chronic kidney disease. The aim of this work is to compare the performance of Support vector machine (SVM) and K-Nearest Neighbour (KNN) classifier on the basis of its accuracy, precision and execution time for CKD prediction. From the experimental results it is observed that the performance of KNN classifier is better than SVM. Keywords—Data Mining, Machine learning, Chronic kidney disease, Classification, K-Nearest Neighbour, Support vector machine.

70 citations


01 Jan 2011
Abstract: Purpose: This paper aims at exploring the theoretical foundations of customer relationship management and its relationship to the marketing performance from the several perspectives. Design/ methodology/approach : CRM was derived from systematic comparative analysis of the relevant relationship marketing literature , there are additional elements that relating to the important of focus on main customers , the organizational efficiency and customer knowledge management elements and their influence on the marketing performance. Findings: the study finings concluded positive relationship between CRM and marketing performance. In additional to , being effect of the dimensions of CRM on marketing performance in financial institutions. Originality / value: the study treats the question of CRM and its relationship marketing performance for marketing academicians and professionals by investigating structural relationship among focus on main customers, the organizational efficiency and customer knowledge management, and marketing performance.

66 citations


01 Jan 2013
TL;DR: A new set of features is proposed with the aim of improving the recognition rates of possible churners, derived from call details and customer profiles and categorized as contract-related, call pattern description, and call pattern changes description features.
Abstract: Customer churn in the mobile telephony industry is a continuous problem owing to stiff competition, new technologies, low switching costs, deregulation by governments, among other factors. To address this issue, players in this industry must develop precise and reliable predictive models to identify the possible churners beforehand and then enlist them to intervention programs in a bid to retain as many customers as possible. This paper proposes a new set of features with the aim of improving the recognition rates of possible churners. The features are derived from call details and customer profiles and categorized as contract-related, call pattern description, and call pattern changes description features. The features are evaluated using two probabilistic data mining algorithms Naive Bayes and Bayesian Network, and their results compared to those obtained from using C4.5 decision tree, a widely used algorithm in many classification and prediction tasks. Experimental results show improved prediction rates for all the models used.

66 citations