scispace - formally typeset
Search or ask a question
Book ChapterDOI

Prediction of Employee Turnover Using Ensemble Learning

01 Jan 2019-pp 319-327
TL;DR: This work will use the application of ensemble learning to solve the problem, rather than focusing on a single classifier algorithm, to find the key features of voluntary employee turnover and how they can be overcome well before time.
Abstract: Employee turnover is now becoming a major problem in IT organizations, telecommunications, and many other industries. Why employees leave the organization is the question rising amongst many HR managers. Employees are the most important assets of an organization. Hiring new employees will always take more efforts and cost rather than retaining the old ones. This paper focuses on finding the key features of voluntary employee turnover and how they can be overcome well before time. The problem is to predict whether an employee will leave or stay based on some metrics. The proposed work will use the application of ensemble learning to solve the problem, rather than focusing on a single classifier algorithm. Each classification model will be assigned with some weight based on the individual predicted accuracy. The ensemble model will calculate the weightage average for the probabilities of the individual classification and based on this weightage average, an employee can be classified. Accurate prediction will help organizations take necessary steps toward controlling retention.
Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors proposed a people analytics approach to predict employee attrition that shifts from a big data to a deep data context by focusing on data quality instead of its quantity, and this deep data-driven approach is based on a mixed method to construct a relevant employee attrition model in order to identify key employee features influencing his/her attrition.
Abstract: In the era of data science and big data analytics, people analytics help organizations and their human resources (HR) managers to reduce attrition by changing the way of attracting and retaining talent. In this context, employee attrition presents a critical problem and a big risk for organizations as it affects not only their productivity but also their planning continuity. In this context, the salient contributions of this research are as follows. Firstly, we propose a people analytics approach to predict employee attrition that shifts from a big data to a deep data context by focusing on data quality instead of its quantity. In fact, this deep data-driven approach is based on a mixed method to construct a relevant employee attrition model in order to identify key employee features influencing his/her attrition. In this method, we started thinking ‘big’ by collecting most of the common features from the literature (an exploratory research) then we tried thinking ‘deep’ by filtering and selecting the most important features using survey and feature selection algorithms (a quantitative method). Secondly, this attrition prediction approach is based on machine, deep and ensemble learning models and is experimented on a large-sized and a medium-sized simulated human resources datasets and then a real small-sized dataset from a total of 450 responses. Our approach achieves higher accuracy (0.96, 0.98 and 0.99 respectively) for the three datasets when compared previous solutions. Finally, while rewards and payments are generally considered as the most important keys to retention, our findings indicate that ‘business travel’, which is less common in the literature, is the leading motivator for employees and must be considered within HR policies to retention.

22 citations

Journal ArticleDOI
03 Jul 2021
TL;DR: In this article, the authors developed an intelligent system that can accurately predict the like of employee turnover in a company, which can cause severe consequences to a company which are hard to be replaced or rebuilt.
Abstract: Employee turnover (ET) can cause severe consequences to a company, which are hard to be replaced or rebuilt. It is thus crucial to develop an intelligent system that can accurately predict the like...

12 citations

Journal ArticleDOI
TL;DR: In this article, a data mining based employee turnover predictor is developed in which ORACLE ERP dataset was used for sample training to predict the employee turnover with much higher accuracy.
Abstract: Employee turnover is the important issue in the recent day organizations In this paper, a data mining based employee turnover predictor is developed in which ORACLE ERP dataset was used for sample training to predict the employee turnover with much higher accuracy This paper deploys impactful algorithms and methodologies for the accurate prediction employee turnover taking place in any organization First of all preprocessing is done as a precautionary step as always before proceeding with the core part of the proposed work New Intensive Optimized PCA-Principal Component Analysis is used for feature selection and RFC-Random Forest Classifier is used for the classification purposes to classify accordingly to make the prediction more feasible For classifying and predicting accurately, a methodology called Random Forest Classifier (RFC) classifier is deployed The main objective of this work is to utilize Random Forest Classification methodology to break down fundamental purposes lying behind the worker turnover by making use of the information mining technique refer as Intensive Optimized PCA for feature selection Comparative study taking the proposed novel work with the existing is made for showing the efficiency of this work The performance of this proposed method was found to perform better with improved yields of ROC, accuracy, precision, recall, and F1 score when compared to other existing methodologies

2 citations

Book ChapterDOI
10 Jun 2022
TL;DR: In this paper , a stacked classifier algorithm is used to predict whether or not an employee will leave based on specific indications, and a new model is constructed based on the ensemble technique used for prediction.
Abstract: Employee attrition is becoming a big issue in businesses. The question that many HR managers are asking is why workers quit the company. Hiring new staff, rather than maintaining existing personnel, will always take more time and money. The project’s objective is to anticipate employee attrition before he or she departs the organization. The problem is predicting whether or not an employee will leave based on specific indications. Instead of focusing on a single classifier technique, the proposed study would tackle the problem utilizing a stacked classifier algorithm. The ensemble model will compute the prediction of each classification, and a new model will be constructed based on the ensemble technique used for prediction. Accurate forecasting will aid businesses in taking the required actions to reduce attrition.
References
More filters
01 Jan 2007
TL;DR: random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.
Abstract: Recently there has been a lot of interest in “ensemble learning” — methods that generate many classifiers and aggregate their results. Two well-known methods are boosting (see, e.g., Shapire et al., 1998) and bagging Breiman (1996) of classification trees. In boosting, successive trees give extra weight to points incorrectly predicted by earlier predictors. In the end, a weighted vote is taken for prediction. In bagging, successive trees do not depend on earlier trees — each is independently constructed using a bootstrap sample of the data set. In the end, a simple majority vote is taken for prediction. Breiman (2001) proposed random forests, which add an additional layer of randomness to bagging. In addition to constructing each tree using a different bootstrap sample of the data, random forests change how the classification or regression trees are constructed. In standard trees, each node is split using the best split among all variables. In a random forest, each node is split using the best among a subset of predictors randomly chosen at that node. This somewhat counterintuitive strategy turns out to perform very well compared to many other classifiers, including discriminant analysis, support vector machines and neural networks, and is robust against overfitting (Breiman, 2001). In addition, it is very user-friendly in the sense that it has only two parameters (the number of variables in the random subset at each node and the number of trees in the forest), and is usually not very sensitive to their values. The randomForest package provides an R interface to the Fortran programs by Breiman and Cutler (available at http://www.stat.berkeley.edu/ users/breiman/). This article provides a brief introduction to the usage and features of the R functions.

14,830 citations

Posted Content
TL;DR: It is shown that more efficient sampling designs exist for making valid inferences, such as sampling all available events and a tiny fraction of nonevents, which enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables.
Abstract: We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all variable events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

3,170 citations

Journal ArticleDOI
TL;DR: The authors study rare events data, binary dependent variables with dozens to thousands of times fewer events than zeros (nonevents) and recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature.
Abstract: We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

2,962 citations

Journal Article
TL;DR: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features, and the resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown.
Abstract: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single chapter cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored.

2,535 citations

Journal ArticleDOI
TL;DR: In this paper, the authors used meta-analytic techniques to review studies of employee turnover and found that almost all of the 26 variables studied relate to turnover, including population, nationality, and industry.
Abstract: Studies of employee turnover are reviewed using meta-analytic techniques. The findings indicate that almost all of the 26 variables studied relate to turnover. The findings also indicate that study variables including population, nationality, and industry moderate relationships between many of the variables and turnover. It is suggested that future research on employee turnover: (1) report study variables, (2) continue model testing rather than simply correlating variables with turnover, and (3) incorporate study variables into future models.

1,692 citations