scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Company bankruptcy prediction framework based on the most influential features using XGBoost and stacking ensemble learning

01 Dec 2021-International Journal of Electrical and Computer Engineering (Institute of Advanced Engineering and Science)-Vol. 11, Iss: 6, pp 5549-5557
TL;DR: This study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy and uses the best feature selection and ensemble learning.
Abstract: Company bankruptcy is often a very big problem for companies. The impact of bankruptcy can cause losses to elements of the company such as owners, investors, employees, and consumers. One way to prevent bankruptcy is to predict the possibility of bankruptcy based on the company's financial data. Therefore, this study aims to find the best predictive model or method to predict company bankruptcy using the dataset from Polish companies bankruptcy. The prediction analysis process uses the best feature selection and ensemble learning. The best feature selection is selected using feature importance to XGBoost with a weight value filter of 10. The ensemble learning method used is stacking. Stacking is composed of the base model and meta learner. The base model consists of K-nearest neighbor, decision tree, SVM, and random forest, while the meta learner used is LightGBM. The stacking model accuracy results can outperform the base model accuracy with an accuracy rate of 97%.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article , a predictive model was constructed based on 30 variables that affect employee attrition from the 'IBM HR Analytics Employee Attrition & Performance data', which consists of 1,470 records.
Abstract: Since human resource is the most important resource of a company, employee attrition is an important agenda from the company's point of view. However, employee attrition occurs due to various reasons, and it is difficult for the HR manager or the leader of each department to know these signs in advance. Employee attrition results in considerable burdens and losses of the organization due to a variety of reasons such as interruption of ongoing tasks, cost of employee re-employment and retraining, and risk of leaking core technologies and know-hows. Therefore, in this study, we propose a model for predicting employee attrition so that we can take measures for talent management which in the past, has been carried out ex post. In this study, a predictive model was constructed based on 30 variables - that affect employee attrition - from the 'IBM HR Analytics Employee Attrition & Performance data', which consists of 1,470 records. To this end, a total of eight predictive models, including Logistic Regression, Random Forest, XGBoost, SVM, Artificial Neural Network model and ensemble model, were built and their performance was evaluated. In addition, when the impact of variables on employee attrition was analyzed, variables such as environmental satisfaction, overtime work, and relationship satisfaction were found to be the biggest contributors.

2 citations

Book ChapterDOI
TL;DR: In this paper , a novel framework, based on the XGBoost machine learning model, tuned with a modified variant of social network search algorithm, is proposed for intrusion detection in the industry 4.0 domain.
Abstract: The industry 4.0 flourished recently due to the advances in a number of contemporary fields, alike artificial intelligence and internet of things. It significantly improved the industrial process and factory production, by relying on the communication between devices, production machines and equipment. The biggest concern in this process is security, as each of the network-connected components is vulnerable to the malicious attacks. Intrusion detection is therefore a key aspect and the largest challenge in the industry 4.0 domain. To address this issue, a novel framework, based on the XGBoost machine learning model, tuned with a modified variant of social network search algorithm, is proposed. The introduced framework and algorithm have been evaluated on a challenging UNSW-NB 15 benchmark intrusion detection dataset, and the experimental findings were put into comparison against the outcomes of other high performing metaheuristics for the same problem. For comparison purposes, alongside the original version of social network search, harris hawk optimization algorithm, firefly algorithm, bat algorithm, and artificial bee colony, were also adopted for XGBoost tuning and validated against the same internet of things security benchmark dataset. Experimental findings proved that the best performing developed XGBoost model is the one which is tuned by introduced modified social network search algorithm, outscoring others in most of performance indicators which were employed for evaluation purposes.

1 citations

Journal ArticleDOI
TL;DR: In this paper , a swarm intelligence-based approach to tune the machine learning models is proposed and tested on four real-world Industry 4.0 data sets, namely distributed transformer monitoring, elderly fall prediction, BoT-IoT, and UNSW-NB 15.
Abstract: The progress of Industrial Revolution 4.0 has been supported by recent advances in several domains, and one of the main contributors is the Internet of Things. Smart factories and healthcare have both benefited in terms of leveraged quality of service and productivity rate. However, there is always a trade-off and some of the largest concerns include security, intrusion, and failure detection, due to high dependence on the Internet of Things devices. To overcome these and other challenges, artificial intelligence, especially machine learning algorithms, are employed for fault prediction, intrusion detection, computer-aided diagnostics, and so forth. However, efficiency of machine learning models heavily depend on feature selection, predetermined values of hyper-parameters and training to deliver a desired result. This paper proposes a swarm intelligence-based approach to tune the machine learning models. A novel version of the firefly algorithm, that overcomes known deficiencies of original method by employing diversification-based mechanism, has been proposed and applied to both feature selection and hyper-parameter optimization of two machine learning models—XGBoost and extreme learning machine. The proposed approach has been tested on four real-world Industry 4.0 data sets, namely distributed transformer monitoring, elderly fall prediction, BoT-IoT, and UNSW-NB 15. Achieved results have been compared to the results of eight other cutting-edge metaheuristics, that have been implemented and tested under the same conditions. The experimental outcomes strongly indicate that the proposed approach significantly outperformed all other competitor metaheuristics in terms of convergence speed and results' quality measured with standard metrics—accuracy, precision, recall, and f1-score.

1 citations

Journal ArticleDOI
TL;DR: In this article , an optimized version of rough-granular approach (RGA) was proposed to improve the classification efficiency of bankruptcy data by generating new minority class samples in specific areas of the feature space taking into consideration additional difficulty factors.

1 citations

References
More filters
Proceedings ArticleDOI
13 Aug 2016
TL;DR: XGBoost as discussed by the authors proposes a sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning to achieve state-of-the-art results on many machine learning challenges.
Abstract: Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system. By combining these insights, XGBoost scales beyond billions of examples using far fewer resources than existing systems.

14,872 citations

Journal ArticleDOI
TL;DR: The conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate.

5,834 citations


"Company bankruptcy prediction frame..." refers methods in this paper

  • ...Stacking ensemble modeling The stacking ensemble introduced by Wolpert [51] then formalized by Breimen [52] and theoretically validated by Van der Laan et al....

    [...]

  • ...The stacking ensemble introduced by Wolpert [51] then formalized by Breimen [52] and theoretically validated by Van der Laan et al. [53] is one of the learning algorithms known as a superior learning framework based on generalizing losses....

    [...]

Yoav Freund1, Robert E. Schapire1
01 Jan 1999
TL;DR: This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting’s relationship to support-vector machines.
Abstract: Boosting is a general method for improving the accuracy of any given learning algorithm. This short overview paper introduces the boosting algorithm AdaBoost, and explains the underlying theory of boosting, including an explanation of why boosting often does not suffer from overfitting as well as boosting’s relationship to support-vector machines. Some examples of recent applications of boosting are also described.

3,212 citations


"Company bankruptcy prediction frame..." refers background in this paper

  • ...Boosted models can produce good accuracy even though the basic classification has only slightly better accuracy than random classification, so that the basic classification is considered a weak learner [50]....

    [...]

Journal ArticleDOI
TL;DR: The results demonstrate that the accuracy and generalization performance of SVM is better than that of BPN as the training set size gets smaller, and the several superior points of the SVM algorithm compared with BPN are investigated.
Abstract: This study investigates the efficacy of applying support vector machines (SVM) to bankruptcy prediction problem. Although it is a well-known fact that the back-propagation neural network (BPN) performs well in pattern recognition tasks, the method has some limitations in that it is an art to find an appropriate model structure and optimal solution. Furthermore, loading as many of the training set as possible into the network is needed to search the weights of the network. On the other hand, since SVM captures geometric characteristics of feature space without deriving weights of networks from the training data, it is capable of extracting the optimal solution with the small training set size. In this study, we show that the proposed classifier of SVM approach outperforms BPN to the problem of corporate bankruptcy prediction. The results demonstrate that the accuracy and generalization performance of SVM is better than that of BPN as the training set size gets smaller. We also examine the effect of the variability in performance with respect to various values of parameters in SVM. In addition, we investigate and summarize the several superior points of the SVM algorithm compared with BPN.

728 citations

Journal ArticleDOI
01 Jun 1994
TL;DR: The study indicates that neural networks perform significantly better than discriminant analysis at predicting firm bankruptcies, and implications for the accounting professional, neural networks researcher and decision support system builders are highlighted.
Abstract: Prediction of firm bankruptcies have been extensively studied in accounting, as all stakeholders in a firm have a vested interest in monitoring its financial performance. This paper presents an exploratory study which compares the predictive capabilities for firm bankruptcy of neural networks and classical multivariate discriminant analysis. The predictive accuracy of the two techniques is presented within a comprehensive, statistically sound framework, indicating the value added to the forecasting problem by each technique. The study indicates that neural networks perform significantly better than discriminant analysis at predicting firm bankruptcies. Implications of our results for the accounting professional, neural networks researcher and decision support system builders are highlighted.

717 citations


"Company bankruptcy prediction frame..." refers background in this paper

  • ...The combination of SVM and ANN integrated with dropout, auto-encoder proved to produce better accuracy than logistic regression, genetic algorithm and ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 11, No. 6, December 2021 : 5549 - 5557 5550 inductive learning [39]....

    [...]

  • ...This concept resulted in significant performance than the ANN and weak learners trained in the AUC section [43]....

    [...]

  • ...Nowadays, machine learning techniques [6] and artificial intelligence [7] computation have been widely used by researchers to solve bankruptcy prediction problems such as support vector machines (SVM) [8]-[16], decision trees [17]-[23], artificial neural networks (ANN) [24]-[31] and discussion with systematic literature review technique [32]-[37]....

    [...]

  • ...A hybrid approach based on synthetic minority over-sampling technique known as the SMOTE technique with the ensemble learning method, i.e. Boosting, Bagging, Naive Bayes, ANN, Random forest, Rotation forest and diverse ensemble creation by oppositional relabeling of meaningful training examples (DECORATE) are proven to efficiently improve performance parameters such as accuracy, AUC, error types 1 and 2, G-mean through the collected data set of Spanish companies [40]....

    [...]

  • ...Reducing the unbalanced class of bankruptcy data sets using over-sampling or SMOTE techniques then ANN as a predictive model....

    [...]