Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble learning

doi:10.11591/IJECE.V11I6.PP5549-5557

Home
/
Papers
/
Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble learning

Journal Article•DOI•

Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble learning

Much Aziz Muslim¹, Yosza Dasril¹•Institutions (1)

01 Dec 2021-International Journal of Electrical and Computer Engineering (Institute of Advanced Engineering and Science)-Vol. 11, Iss: 6, pp 5549-5557

TL;DR: This study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy and uses the best feature selection and ensemble learning.

read less

Abstract: Company bankruptcy is often a very big problem for companies. The impact of bankruptcy can cause losses to elements of the company such as owners, investors, employees, and consumers. One way to prevent bankruptcy is to predict the possibility of bankruptcy based on the company's financial data. Therefore, this study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy. The prediction analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using feature importance to XGBoost with a weight value filter of 10. The ensemble learning method used is stacking. Stacking is composed of the base model and meta learner. The base model consists of K-nearest neighbor, decision tree, SVM, and random forest, while the meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of 97%.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Predictive model of employee attrition based on stacking ensemble learning

[...]

Alamat Florist

01 Apr 2023-Expert systems with applications

TL;DR: In this article , a predictive model was constructed based on 30 variables that affect employee attrition from the 'IBM HR Analytics Employee Attrition & Performance data', which consists of 1,470 records.

...read moreread less

Abstract: Since human resource is the most important resource of a company, employee attrition is an important agenda from the company's point of view. However, employee attrition occurs due to various reasons, and it is difficult for the HR manager or the leader of each department to know these signs in advance. Employee attrition results in considerable burdens and losses of the organization due to a variety of reasons such as interruption of ongoing tasks, cost of employee re-employment and retraining, and risk of leaking core technologies and know-hows. Therefore, in this study, we propose a model for predicting employee attrition so that we can take measures for talent management which in the past, has been carried out ex post. In this study, a predictive model was constructed based on 30 variables - that affect employee attrition - from the 'IBM HR Analytics Employee Attrition & Performance data', which consists of 1,470 records. To this end, a total of eight predictive models, including Logistic Regression, Random Forest, XGBoost, SVM, Artificial Neural Network model and ensemble model, were built and their performance was evaluated. In addition, when the impact of variables on employee attrition was analyzed, variables such as environmental satisfaction, overtime work, and relationship satisfaction were found to be the biggest contributors.

...read moreread less

2 citations

Book Chapter•DOI•

Intrusion Detection by XGBoost Model Tuned by Improved Social Network Search Algorithm

[...]

Nebojsa Bacanin, Aleksandar Petrovic, Milos Antonijevic, Miodrag Zivkovic, Marko Šarac, Eva Tuba, Ivana Strumberger - Show less +3 more

01 Jan 2023-Communications in computer and information science

TL;DR: In this paper , a novel framework, based on the XGBoost machine learning model, tuned with a modified variant of social network search algorithm, is proposed for intrusion detection in the industry 4.0 domain.

...read moreread less

Abstract: The industry 4.0 flourished recently due to the advances in a number of contemporary fields, alike artificial intelligence and internet of things. It significantly improved the industrial process and factory production, by relying on the communication between devices, production machines and equipment. The biggest concern in this process is security, as each of the network-connected components is vulnerable to the malicious attacks. Intrusion detection is therefore a key aspect and the largest challenge in the industry 4.0 domain. To address this issue, a novel framework, based on the XGBoost machine learning model, tuned with a modified variant of social network search algorithm, is proposed. The introduced framework and algorithm have been evaluated on a challenging UNSW-NB 15 benchmark intrusion detection dataset, and the experimental findings were put into comparison against the outcomes of other high performing metaheuristics for the same problem. For comparison purposes, alongside the original version of social network search, harris hawk optimization algorithm, firefly algorithm, bat algorithm, and artificial bee colony, were also adopted for XGBoost tuning and validated against the same internet of things security benchmark dataset. Experimental findings proved that the best performing developed XGBoost model is the one which is tuned by introduced modified social network search algorithm, outscoring others in most of performance indicators which were employed for evaluation purposes.

...read moreread less

1 citations

Journal Article•DOI•

Machine learning tuning by diversity oriented firefly metaheuristics for Industry 4.0

[...]

Luka Jovanovic, Nebojsa Bacanin, Miodrag Zivkovic, Milos Antonijevic, Ivana Strumberger - Show less +1 more

30 Mar 2023-Expert Systems

TL;DR: In this paper , a swarm intelligence-based approach to tune the machine learning models is proposed and tested on four real-world Industry 4.0 data sets, namely distributed transformer monitoring, elderly fall prediction, BoT-IoT, and UNSW-NB 15.

...read moreread less

Abstract: The progress of Industrial Revolution 4.0 has been supported by recent advances in several domains, and one of the main contributors is the Internet of Things. Smart factories and healthcare have both benefited in terms of leveraged quality of service and productivity rate. However, there is always a trade-off and some of the largest concerns include security, intrusion, and failure detection, due to high dependence on the Internet of Things devices. To overcome these and other challenges, artificial intelligence, especially machine learning algorithms, are employed for fault prediction, intrusion detection, computer-aided diagnostics, and so forth. However, efficiency of machine learning models heavily depend on feature selection, predetermined values of hyper-parameters and training to deliver a desired result. This paper proposes a swarm intelligence-based approach to tune the machine learning models. A novel version of the firefly algorithm, that overcomes known deficiencies of original method by employing diversification-based mechanism, has been proposed and applied to both feature selection and hyper-parameter optimization of two machine learning models—XGBoost and extreme learning machine. The proposed approach has been tested on four real-world Industry 4.0 data sets, namely distributed transformer monitoring, elderly fall prediction, BoT-IoT, and UNSW-NB 15. Achieved results have been compared to the results of eight other cutting-edge metaheuristics, that have been implemented and tested under the same conditions. The experimental outcomes strongly indicate that the proposed approach significantly outperformed all other competitor metaheuristics in terms of convergence speed and results' quality measured with standard metrics—accuracy, precision, recall, and f1-score.

...read moreread less

1 citations

Journal Article•DOI•

Predictive model of employee attrition based on stacking ensemble learning

[...]

Doo Woong Chung, Jinseop Yun, Jeha Lee, Yeram Jeon

01 Apr 2023-Expert Systems With Applications

1 citations

Journal Article•DOI•

Rough-Granular Approach in Imbalanced Bankruptcy Data Analysis

[...]

Katarzyna Borowska, Jarosław Stepaniuk

01 Jan 2022-Procedia Computer Science

TL;DR: In this article , an optimized version of rough-granular approach (RGA) was proposed to improve the classification efficiency of bankruptcy data by generating new minority class samples in specific areas of the feature space taking into consideration additional difficulty factors.

...read moreread less

1 citations

References

PDF

Open Access

More filters

Proceedings Article•DOI•

XGBoost: A Scalable Tree Boosting System

[...]

Tianqi Chen¹, Carlos Guestrin¹•Institutions (1)

University of Washington¹

13 Aug 2016

TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.

...read moreread less

Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

...read moreread less

14,872 citations

Journal Article•DOI•

Original Contribution: Stacked generalization

[...]

David H. Wolpert

05 Feb 1992-Neural Networks

TL;DR: The conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate.

...read moreread less

5,834 citations

"Company bankruptcy prediction frame..." refers methods in this paper

...Stacking ensemble modeling The stacking ensemble introduced by Wolpert [51] then formalized by Breimen [52] and theoretically validated by Van der Laan et al....
[...]
...The stacking ensemble introduced by Wolpert [51] then formalized by Breimen [52] and theoretically validated by Van der Laan et al. [53] is one of the learning algorithms known as a superior learning framework based on generalizing losses....
[...]

A Short Introduction to Boosting

[...]

Yoav Freund¹, Robert E. Schapire¹•Institutions (1)

AT&T¹

01 Jan 1999

TL;DR: This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting’s relationship to support-vector machines.

...read moreread less

Abstract: Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting’s relationship to support-vector machines. Some examples of recent applications of boosting are also described.

...read moreread less

3,212 citations

"Company bankruptcy prediction frame..." refers background in this paper

...Boosted models can produce good accuracy even though the basic classification has only slightly better accuracy than random classification, so that the basic classification is considered a weak learner [50]....
[...]

Journal Article•DOI•

An application of support vector machines in bankruptcy prediction model

[...]

Kyung Shik Shin, Talk Soo Lee¹, Hyun Jung Kim¹•Institutions (1)

Ewha Womans University¹

01 Jan 2005-Expert Systems With Applications

TL;DR: The results demonstrate that the accuracy and generalization performance of SVM is better than that of BPN as the training set size gets smaller, and the several superior points of the SVM algorithm compared with BPN are investigated.

...read moreread less

Abstract: This study investigates the efficacy of applying support vector machines (SVM) to bankruptcy prediction problem. Although it is a well-known fact that the back-propagation neural network (BPN) performs well in pattern recognition tasks, the method has some limitations in that it is an art to find an appropriate model structure and optimal solution. Furthermore, loading as many of the training set as possible into the network is needed to search the weights of the network. On the other hand, since SVM captures geometric characteristics of feature space without deriving weights of networks from the training data, it is capable of extracting the optimal solution with the small training set size. In this study, we show that the proposed classifier of SVM approach outperforms BPN to the problem of corporate bankruptcy prediction. The results demonstrate that the accuracy and generalization performance of SVM is better than that of BPN as the training set size gets smaller. We also examine the effect of the variability in performance with respect to various values of parameters in SVM. In addition, we investigate and summarize the several superior points of the SVM algorithm compared with BPN.

...read moreread less

728 citations

Journal Article•DOI•

Bankruptcy prediction using neural networks

[...]

Rick L. Wilson¹, Ramesh Sharda¹•Institutions (1)

Oklahoma State University–Stillwater¹

01 Jun 1994

TL;DR: The study indicates that neural networks perform significantly better than discriminant analysis at predicting firm bankruptcies, and implications for the accounting professional, neural networks researcher and decision support system builders are highlighted.

...read moreread less

Abstract: Prediction of firm bankruptcies have been extensively studied in accounting, as all stakeholders in a firm have a vested interest in monitoring its financial performance. This paper presents an exploratory study which compares the predictive capabilities for firm bankruptcy of neural networks and classical multivariate discriminant analysis. The predictive accuracy of the two techniques is presented within a comprehensive, statistically sound framework, indicating the value added to the forecasting problem by each technique. The study indicates that neural networks perform significantly better than discriminant analysis at predicting firm bankruptcies. Implications of our results for the accounting professional, neural networks researcher and decision support system builders are highlighted.

...read moreread less

717 citations

"Company bankruptcy prediction frame..." refers background in this paper

...The combination of SVM and ANN integrated with dropout, auto-encoder proved to produce better accuracy than logistic regression, genetic algorithm and ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 11, No. 6, December 2021 : 5549 - 5557 5550 inductive learning [39]....
[...]
...This concept resulted in significant performance than the ANN and weak learners trained in the AUC section [43]....
[...]
...Nowadays, machine learning techniques [6] and artificial intelligence [7] computation have been widely used by researchers to solve bankruptcy prediction problems such as support vector machines (SVM) [8]-[16], decision trees [17]-[23], artificial neural networks (ANN) [24]-[31] and discussion with systematic literature review technique [32]-[37]....
[...]
...A hybrid approach based on synthetic minority over-sampling technique known as the SMOTE technique with the ensemble learning method, i.e. Boosting, Bagging, Naive Bayes, ANN, Random forest, Rotation forest and diverse ensemble creation by oppositional relabeling of meaningful training examples (DECORATE) are proven to efficiently improve performance parameters such as accuracy, AUC, error types 1 and 2, G-mean through the collected data set of Spanish companies [40]....
[...]
...Reducing the unbalanced class of bankruptcy data sets using over-sampling or SMOTE techniques then ANN as a predictive model....
[...]