scispace - formally typeset
Proceedings ArticleDOI

An Improved XGBoost Model Based on Spark for Credit Card Fraud Prediction

Reads0
Chats0
TLDR
In this article, an improved XGBoost model based on Spark is proposed to detect credit card fraud and the experimental results show that the proposed model can accurately and efficiently predict credit card Fraud and has a good practical effect.
Abstract
Credit card fraud causes huge economic losses for many financial institutions. Given the imbalance of dataset and the huge amount of data in the field of credit card fraud, an improved XGBoost model based on Spark is proposed. In this project, the Smote algorithm was used to to balance the training set. And the XGBoost classifier based on Spark was used as the fraud detection mechanism. Finally, the test sets were classified in parallel. In the model comparison experiment, the model proposed in this project is compared with logistic regression model, decision tree model, random forest model, and original XGBoost model. The experimental results show that in the three metrics of Recall, Fl-Score, and AUC, the model proposed in this project is the best, which is 9.1%, 1.4%, and 1.2% ahead of the model ranked second respectively. In the speedup experiment, the speedup on the dataset of 70,000, 140,000, and 280,000 samples are 2.06, 3.28, and 3.75 respectively. The experimental results of these two parts show that the proposed model can accurately and efficiently predict credit card fraud and has a good practical effect.

read more

Citations
More filters
Journal ArticleDOI

Credit card fraud detection in the era of disruptive technologies: A systematic review

TL;DR: In this paper , the authors present an in-depth review of cutting-edge research on detecting and predicting fraudulent credit card transactions conducted from 2015 to 2021 inclusive, and highlight the challenges associated with detecting credit card fraud through the use of new technologies such as big data analytics, large scale machine learning and cloud computing.
Proceedings ArticleDOI

Air Quality Prediction Based on Air Pollution Emissions in the City Environment Using XGBoost with SMOTE

TL;DR: In this article , an ensemble machine learning approach built on a decision tree and applying a gradient reinforcement framework was used for air quality classification in Indonesia's major cities, where the overall accuracy value of SMOTE with XGBoost is 99.60%, the overall recall value was 99.6% and the overall f1-score is 99 .96%.
Proceedings ArticleDOI

Air Quality Prediction Based on Air Pollution Emissions in the City Environment Using XGBoost with SMOTE

TL;DR: In this article , an ensemble machine learning approach built on a decision tree and applying a gradient reinforcement framework was used for air quality classification in Indonesia's major cities, where the overall accuracy value of SMOTE with XGBoost is 99.60%, the overall recall value was 99.6% and the overall f1-score is 99 .96%.
References
More filters
Proceedings ArticleDOI

XGBoost: A Scalable Tree Boosting System

TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Journal ArticleDOI

To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques

TL;DR: T theoretical analyses and empirical observations across wide spectrum multi-class imbalanced benchmarks indicate that MDO is the method of choice by offering statistical superior MAUC and precision compared to the popular over-sampling techniques.
Proceedings ArticleDOI

Using deep networks for fraud detection in the credit card transactions

TL;DR: A deep autoencoder is proposed to extract best features from the information of the credit card transactions and then append a softmax network to determine the class labels and results can reveal the advantages of proposed method comparing to the state of the arts.
Proceedings ArticleDOI

Sepsis Prediction in Intensive Care Unit Using Ensemble of XGboost Models

TL;DR: The proposed algorithm is ranked officially as third place in the PhysioNet/Computing in Cardiology Challenge 2019 with an overall utility score of 0.339 on the unseen test dataset.
Proceedings ArticleDOI

Web service based credit card fraud detection by applying machine learning techniques

TL;DR: This study has proposed a model by hybridizing DT, SVM and K-NN models, in which the prediction accuracy has improved significantly and two web services Simple Object Access Protocol (SOAP) and Representational State Transfer (REST) have been applied in this study for efficient exchange of data across heterogeneous platform.
Related Papers (5)