Prediction of Employee Turnover Using Ensemble Learning

doi:10.1007/978-981-13-5934-7_29

Home
/
Papers
/
Prediction of Employee Turnover Using Ensemble Learning

Book Chapter•DOI•

Prediction of Employee Turnover Using Ensemble Learning

Shubham Karande¹, L. Shyamala¹•Institutions (1)

VIT University¹

01 Jan 2019-pp 319-327

TL;DR: This work will use the application of ensemble learning to solve the problem, rather than focusing on a single classifier algorithm, to find the key features of voluntary employee turnover and how they can be overcome well before time.

read less

Abstract: Employee turnover is now becoming a major problem in IT organizations, telecommunications, and many other industries. Why employees leave the organization is the question rising amongst many HR managers. Employees are the most important assets of an organization. Hiring new employees will always take more efforts and cost rather than retaining the old ones. This paper focuses on finding the key features of voluntary employee turnover and how they can be overcome well before time. The problem is to predict whether an employee will leave or stay based on some metrics. The proposed work will use the application of ensemble learning to solve the problem, rather than focusing on a single classifier algorithm. Each classification model will be assigned with some weight based on the individual predicted accuracy. The ensemble model will calculate the weightage average for the probabilities of the individual classification and based on this weightage average, an employee can be classified. Accurate prediction will help organizations take necessary steps toward controlling retention.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

From Big Data to Deep Data to Support People Analytics for Employee Attrition Prediction

[...]

Nesrine Ben Yahia¹, Jihen Hlel¹, Ricardo Colomo-Palacios²•Institutions (2)

Manouba University¹, Østfold University College²

01 Jan 2021-IEEE Access

TL;DR: In this article, the authors proposed a people analytics approach to predict employee attrition that shifts from a big data to a deep data context by focusing on data quality instead of its quantity, and this deep data-driven approach is based on a mixed method to construct a relevant employee attrition model in order to identify key employee features influencing his/her attrition.

...read moreread less

Abstract: In the era of data science and big data analytics, people analytics help organizations and their human resources (HR) managers to reduce attrition by changing the way of attracting and retaining talent. In this context, employee attrition presents a critical problem and a big risk for organizations as it affects not only their productivity but also their planning continuity. In this context, the salient contributions of this research are as follows. Firstly, we propose a people analytics approach to predict employee attrition that shifts from a big data to a deep data context by focusing on data quality instead of its quantity. In fact, this deep data-driven approach is based on a mixed method to construct a relevant employee attrition model in order to identify key employee features influencing his/her attrition. In this method, we started thinking ‘big’ by collecting most of the common features from the literature (an exploratory research) then we tried thinking ‘deep’ by filtering and selecting the most important features using survey and feature selection algorithms (a quantitative method). Secondly, this attrition prediction approach is based on machine, deep and ensemble learning models and is experimented on a large-sized and a medium-sized simulated human resources datasets and then a real small-sized dataset from a total of 450 responses. Our approach achieves higher accuracy (0.96, 0.98 and 0.99 respectively) for the three datasets when compared previous solutions. Finally, while rewards and payments are generally considered as the most important keys to retention, our findings indicate that ‘business travel’, which is less common in the literature, is the leading motivator for employees and must be considered within HR policies to retention.

...read moreread less

22 citations

Journal Article•DOI•

A machine learning-based analytical framework for employee turnover prediction

[...]

Xinlei Wang¹, Jianing Zhi²•Institutions (2)

University of California, San Diego¹, Pennsylvania State University²

03 Jul 2021

TL;DR: In this article, the authors developed an intelligent system that can accurately predict the like of employee turnover in a company, which can cause severe consequences to a company which are hard to be replaced or rebuilt.

...read moreread less

Abstract: Employee turnover (ET) can cause severe consequences to a company, which are hard to be replaced or rebuilt. It is thus crucial to develop an intelligent system that can accurately predict the like...

...read moreread less

12 citations

Journal Article•DOI•

Prediction of Employee Turn Over Using Random Forest Classifier with Intensive Optimized Pca Algorithm

[...]

Alaeldeen Bader Wild Ali¹•Institutions (1)

University of Belgrade¹

26 Mar 2021-Wireless Personal Communications

TL;DR: In this article, a data mining based employee turnover predictor is developed in which ORACLE ERP dataset was used for sample training to predict the employee turnover with much higher accuracy.

...read moreread less

Abstract: Employee turnover is the important issue in the recent day organizations In this paper, a data mining based employee turnover predictor is developed in which ORACLE ERP dataset was used for sample training to predict the employee turnover with much higher accuracy This paper deploys impactful algorithms and methodologies for the accurate prediction employee turnover taking place in any organization First of all preprocessing is done as a precautionary step as always before proceeding with the core part of the proposed work New Intensive Optimized PCA-Principal Component Analysis is used for feature selection and RFC-Random Forest Classifier is used for the classification purposes to classify accordingly to make the prediction more feasible For classifying and predicting accurately, a methodology called Random Forest Classifier (RFC) classifier is deployed The main objective of this work is to utilize Random Forest Classification methodology to break down fundamental purposes lying behind the worker turnover by making use of the information mining technique refer as Intensive Optimized PCA for feature selection Comparative study taking the proposed novel work with the existing is made for showing the efficiency of this work The performance of this proposed method was found to perform better with improved yields of ROC, accuracy, precision, recall, and F1 score when compared to other existing methodologies

...read moreread less

2 citations

Book Chapter•DOI•

Prediction of Employee Attrition Using Stacked Ensemble Method

[...]

Giuseppa Minutolo

10 Jun 2022

TL;DR: In this paper , a stacked classifier algorithm is used to predict whether or not an employee will leave based on specific indications, and a new model is constructed based on the ensemble technique used for prediction.

...read moreread less

Abstract: Employee attrition is becoming a big issue in businesses. The question that many HR managers are asking is why workers quit the company. Hiring new staff, rather than maintaining existing personnel, will always take more time and money. The project’s objective is to anticipate employee attrition before he or she departs the organization. The problem is predicting whether or not an employee will leave based on specific indications. Instead of focusing on a single classifier technique, the proposed study would tackle the problem utilizing a stacked classifier algorithm. The ensemble model will compute the prediction of each classification, and a new model will be constructed based on the ensemble technique used for prediction. Accurate forecasting will aid businesses in taking the required actions to reduce attrition.

...read moreread less

Book Chapter•DOI•

Efficient Approach to Employee Attrition Prediction by Handling Class Imbalance

[...]

R. Goutham¹, M. Prathilothamai², A Chandravadhana, Jingyun Li, Eugenio Conti - Show less +1 more•Institutions (2)

Sri Sivasubramaniya Nadar College of Engineering¹, Amrita Vishwa Vidyapeetham²

01 Jan 2022

References

PDF

Open Access

More filters

Classification and Regression by randomForest

[...]

Andy Liaw, Matthew C. Wiener

01 Jan 2007

TL;DR: random forests are proposed, which add an additional layer of randomness to bagging and are robust against overfitting, and the randomForest package provides an R interface to the Fortran programs by Breiman and Cutler.

...read moreread less

Abstract: Recently there has been a lot of interest in “ensemble learning” — methods that generate many classifiers and aggregate their results. Two well-known methods are boosting (see, e.g., Shapire et al., 1998) and bagging Breiman (1996) of classification trees. In boosting, successive trees give extra weight to points incorrectly predicted by earlier predictors. In the end, a weighted vote is taken for prediction. In bagging, successive trees do not depend on earlier trees — each is independently constructed using a bootstrap sample of the data set. In the end, a simple majority vote is taken for prediction. Breiman (2001) proposed random forests, which add an additional layer of randomness to bagging. In addition to constructing each tree using a different bootstrap sample of the data, random forests change how the classification or regression trees are constructed. In standard trees, each node is split using the best split among all variables. In a random forest, each node is split using the best among a subset of predictors randomly chosen at that node. This somewhat counterintuitive strategy turns out to perform very well compared to many other classifiers, including discriminant analysis, support vector machines and neural networks, and is robust against overfitting (Breiman, 2001). In addition, it is very user-friendly in the sense that it has only two parameters (the number of variables in the random subset at each node and the number of trees in the forest), and is usually not very sensitive to their values. The randomForest package provides an R interface to the Fortran programs by Breiman and Cutler (available at http://www.stat.berkeley.edu/ users/breiman/). This article provides a brief introduction to the usage and features of the R functions.

...read moreread less

14,830 citations

Posted Content•

Logistic Regression in Rare Events Data

[...]

Gary King¹, Langche Zeng²•Institutions (2)

Harvard University¹, University of California, San Diego²

17 Jan 2008-Social Science Research Network

TL;DR: It is shown that more efficient sampling designs exist for making valid inferences, such as sampling all available events and a tiny fraction of nonevents, which enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables.

...read moreread less

Abstract: We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros ("nonevents"). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all variable events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

...read moreread less

3,170 citations

Journal Article•DOI•

Logistic Regression in Rare Events Data

[...]

Gary King¹, Langche Zeng•Institutions (1)

Harvard University¹

01 Jan 2001-Political Analysis

TL;DR: The authors study rare events data, binary dependent variables with dozens to thousands of times fewer events than zeros (nonevents) and recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature.

...read moreread less

Abstract: We study rare events data, binary dependent variables with dozens to thousands of times fewer ones (events, such as wars, vetoes, cases of political activism, or epidemiological infections) than zeros (“nonevents”). In many literatures, these variables have proven difficult to explain and predict, a problem that seems to have at least two sources. First, popular statistical procedures, such as logistic regression, can sharply underestimate the probability of rare events. We recommend corrections that outperform existing methods and change the estimates of absolute and relative risks by as much as some estimated effects reported in the literature. Second, commonly used data collection strategies are grossly inefficient for rare events data. The fear of collecting data with too few events has led to data collections with huge numbers of observations but relatively few, and poorly measured, explanatory variables, such as in international conflict data with more than a quarter-million dyads, only a few of which are at war. As it turns out, more efficient sampling designs exist for making valid inferences, such as sampling all available events (e.g., wars) and a tiny fraction of nonevents (peace). This enables scholars to save as much as 99% of their (nonfixed) data collection costs or to collect much more meaningful explanatory variables. We provide methods that link these two results, enabling both types of corrections to work simultaneously, and software that implements the methods developed.

...read moreread less

2,962 citations

Journal Article•

Supervised Machine Learning: A Review of Classification Techniques

[...]

Sotiris Kotsiantis

01 Jan 2007-Informatica (lithuanian Academy of Sciences)

TL;DR: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features, and the resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown.

...read moreread less

Abstract: The goal of supervised learning is to build a concise model of the distribution of class labels in terms of predictor features. The resulting classifier is then used to assign class labels to the testing instances where the values of the predictor features are known, but the value of the class label is unknown. This paper describes various supervised machine learning classification techniques. Of course, a single chapter cannot be a complete review of all supervised machine learning classification algorithms (also known induction classification algorithms), yet we hope that the references cited will cover the major theoretical issues, guiding the researcher in interesting research directions and suggesting possible bias combinations that have yet to be explored.

...read moreread less

2,535 citations

Journal Article•DOI•

Employee Turnover: A Meta-Analysis and Review with Implications for Research

[...]

John L. Cotton¹, Jeffrey M. Tuttle¹•Institutions (1)

Purdue University¹

01 Jan 1986-Academy of Management Review

TL;DR: In this paper, the authors used meta-analytic techniques to review studies of employee turnover and found that almost all of the 26 variables studied relate to turnover, including population, nationality, and industry.

...read moreread less

Abstract: Studies of employee turnover are reviewed using meta-analytic techniques. The findings indicate that almost all of the 26 variables studied relate to turnover. The findings also indicate that study variables including population, nationality, and industry moderate relationships between many of the variables and turnover. It is suggested that future research on employee turnover: (1) report study variables, (2) continue model testing rather than simply correlating variables with turnover, and (3) incorporate study variables into future models.

...read moreread less

1,692 citations