scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Improved Accuracy of Naive Bayes Classifier for Determination of Customer Churn Uses SMOTE and Genetic Algorithms

06 Oct 2020-Vol. 1, Iss: 1, pp 70-75
TL;DR: The purpose of this study is to improve the accuracy of the Naive Bayes for customer classification by using the SMOTE and genetic algorithm to handle class imbalance problems and attributes selection.
Abstract: With increasing competition in the business world, many companies use data mining techniques to determine the level of customer loyalty. The customer data used in this study is the german credit dataset obtained from UCI. Such data have an imbalance problem of class because the amount of data in the loyal class is more than in the churn class. In addition, there are some irrelevant attributes for customer classification, so attributes selection is needed to get more accurate classification results. One classification algorithm is naive bayes. Naive Bayes has been used as an effective classification for years because it is easy to build and give an independent attribute into its structure. The purpose of this study is to improve the accuracy of the Naive Bayes for customer classification. SMOTE and genetic algorithm do for improving the accuracy. The SMOTE is used to handle class imbalance problems, while the genetic algorithm is used for attributes selection. Accuracy using the Naive Bayes is 47.10%, while the mean accuracy results obtained from the Naive Bayes with the application of the SMOTE is 78.15% and the accuracy obtained from the Naive Bayes with the application of the SMOTE and genetic algorithm is 78.46%.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy and uses the best feature selection and ensemble learning.
Abstract: Company bankruptcy is often a very big problem for companies. The impact of bankruptcy can cause losses to elements of the company such as owners, investors, employees, and consumers. One way to prevent bankruptcy is to predict the possibility of bankruptcy based on the company's financial data. Therefore, this study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy. The prediction analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using feature importance to XGBoost with a weight value filter of 10. The ensemble learning method used is stacking. Stacking is composed of the base model and meta learner. The base model consists of K-nearest neighbor, decision tree, SVM, and random forest, while the meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of 97%.

12 citations

Journal ArticleDOI
31 Mar 2021
TL;DR: There is the main dimension of logistic service quality in improving the quality of service, namely ordering condition, time, and information quality, which can be the basis of decision making for companies in choosing alternative criteria priorities.
Abstract: Logistics plays a role in the smooth transaction between companies because it is a facilitator of buying and selling goods and services to fulfill the supply orders of consumer companies. This study aims to analyze how the impact of improved Logistic Service Quality (LSQ) for quality of goods delivery services by using LSQ dimensions from previous research. Sample data is obtained through the dissemination of questionnaires which are then processed quantitatively with convergent validity and reliability tests. Data processing with a sample count of 61 respondents. The results of this study show that there is the main dimension of logistic service quality in improving the quality of service, namely ordering condition, time, and information quality. Each comparison factor is tested for consistency using the Analytical Hierarchy Process (AHP), each of the main criteria has a consistency value of less than 0.1 so that the main criteria tested have a consistent comparison matrix and can be the basis of decision making for companies in choosing alternative criteria priorities.

3 citations

Proceedings ArticleDOI
04 Nov 2022
TL;DR: In this article , the authors used SMOTE and Random Oversampling (ROS) sampling techniques to overcome imbalanced data combined with the Naive Bayes classification method in cases of detection of early cervical cancer in Indonesia.
Abstract: Imbalanced data was a problem that is often encountered when classifying, where the distribution of the majority class has more numbers than the minority class. The existence of imbalanced data makes the performance of classification methods in machine learning decrease. This study adopted SMOTE and Random Oversampling (ROS) sampling techniques to overcome imbalanced data combined with the Naive Bayes classification method in cases of detection of early cervical cancer in Indonesia. Cervical cancer is a disease of an abnormal cell group that growth in the cervix (mouth of the womb). Cervical cancer is the most common type and ranks number 2 as cancer suffered by Indonesian women. Various factors that influence the event include eating behavior, personal hygiene behavior, motivational strength, social support, empowerment of knowledge, abilities, and desires. The data used is secondary data with a sample of 72 patients and 20 attributes. A total of21 patients in the classification had cervical cancer and 51 patients did not have cervical cancer. The ratio of 30:70 are imbalanced data. Through this classification method, it is expected to know what factors influence the event of cervical cancer and gains the best performance of two classifications. The results point out that average performance of SMOTE Naive Bayes has a higher (81,73%) than Random Oversampling Naive Bayes which is 81,12%. Therefore, SMOTE Naive Bayes outperforms Random Oversampling Naïve Bayes.

3 citations

Journal ArticleDOI
TL;DR: In this study, using sentiment labelled dataset (field amazon_labelled) obtained from UCI Machine Learning, the accuracy of the naïve bayes classifier in the amazon review sentiment analysis was 82% and the accuracy by applying chi square and TF-IDF is 83%.
Abstract: The rapid development of the internet has made information flow rapidly wich has an impact on the world of commerce. Some people who have bought a product will write their opinion on social media or other online site. Long-text buyer reviews need a machine to recognize opinions. Sentiment analysis applies the text mining method. One of the methods applied in sentiment analysis is classification. One of the classification algorithms is the naïve bayes classifier. Naïve bayes classifier is a classification method with good efficiency and performance. However, it is very sensitive with too many features, wich makes the accuracy low. To improve the accuracy of the naïve bayes classifier algorithm it can be done by selecting features. One of the feature selection is chi square. The selection of features with chi square calculation based on the top-K value that has been determined, namely 450. In addition, weighting features can also improve the accuracy of the naïve bayes classifier algorithm. One of the feature weighting techniques is term frequency inverse document frequency (TF-IDF). In this study, using sentiment labelled dataset (field amazon_labelled) obtained from UCI Machine Learning. This dataset has 500 positive reviews and 500 negative reviews. The accuracy of the naïve bayes classifier in the amazon review sentiment analysis was 82%. Meanwhile, the accuracy of the naïve bayes classifier by applying chi square and TF-IDF is 83%.

3 citations

Journal ArticleDOI
31 Mar 2021
TL;DR: The results of this study obtained the best 2 packages recommended for tourists to choose, namely the Triangular Fuzzy Number and the Simple Additive Weighting method.
Abstract: For tourists who do not understand the situation or the desired tourist attraction, they can choose tour and travel services. Tour and travel provides a choice of tour packages with various variations. Determining the right tour and travel package and agency can benefit tourists, both in terms of financial and vacation quality. The data used in this study were obtained from several Tour and Travel agents. There are several variables used, namely the price of the package, the number of participants, and the number of facilities obtained. The method used in this study combines the Triangular Fuzzy Number (TFN) and the Simple Additive Weighting (SAW) method. The purpose of this study is to help tourists determine the most profitable or best packages. The results of this study obtained the best 2 packages recommended for tourists to choose.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: A metaheuristic based churn prediction technique that performs churn prediction on huge telecom data using a hybridized form of Firefly algorithm as the classifier and it was observed that Firefly algorithm works best on churn data and the hybridized Firefly algorithm provides effective and faster results.

50 citations

Journal ArticleDOI
TL;DR: The author will do a comparison between the performance of the technical classification methods naïve Bayes and C4.5 algorithms.
Abstract: The development of data miningis inseparable from the recent developments in information technology that enables the accumulation of large amounts of data. For example, a shopping mall that records every sales transaction of goods using various POS (point of sales). Database data from these sales could reach a large storage capacity, even more being added each day, especially when the shopping center will develop into a nationwide network. The development of the internet at the moment also has a share large enough in the accumulation of data occurs. But the rapid growth of data accumulation it has created conditions that are often referred to as \"data rich but information poor\" because the data collected can not be used optimally for useful applications. Not infrequently the data set was left just seemed to be a \"grave data\". There are several techniques used in data mining which includes association, classification, and clustering. In this paper, the author will do a comparison between the performance of the technical classification methods naïve Bayes and C4.5 algorithms.

31 citations

Journal ArticleDOI
TL;DR: A model on churn factors, identified from the study is proposed to serve as a roadmap, to build upon exciting churn management techniques.
Abstract: The communications sector is emerging with new technologies, wireless and wireline services. The industry's success expects a better perception of customer requirements and superior quality of service and models. Customer churn has a huge impact on companies and is the prime focus area for the companies to remain competitive and profitable. Hence, significant research had been undertaken by researchers worldwide to understand the dynamics of customer churn. This paper provides a review of around 75 recent journal articles (starting from year 2000) to identify the various churn factors and their complex relationships, in existing telecom churn literature. It gives detailed discussion of what factors were identified in various studies, the sample sizes used and the method used for the study by different researchers. The gaps identified in the previous studies have also been discussed. A model on churn factors, identified from the study is proposed to serve as a roadmap, to build upon exciting churn management techniques.

21 citations