Skip to content

To understand what factors contributed most to employee turnover and predict the likelihood of a certain employee leaving the company

Notifications You must be signed in to change notification settings

lawilap/HRAnalytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Understanding and Predicting Employee Turnover

HR Analytics

Objective:

  • To understand what factors contributed most to employee turnover.

  • To perform clustering to find any meaningful patterns of employee traits.

  • To create a model that predicts the likelihood if a certain employee will leave the company or not.

  • To create or improve different retention strategies on targeted employees.

The implementation of this model will allow management to create better decision-making actions.

The Problem:

One of the most common problems at work is turnover.

Replacing a worker earning about 50,000 dollars cost the company about 10,000 dollars or 20% of that worker’s yearly income according to the Center of American Progress.

Replacing a high-level employee can cost multiple of that...

Cost include:

  • Cost of off-boarding
  • Cost of hiring (advertising, interviewing, hiring)
  • Cost of onboarding a new person (training, management time)
  • Lost productivity (a new person may take 1-2 years to reach the productivity of an existing person)

Annual Cost of Turnover = (Hiring + Onboarding + Development + Unfilled Time) * (# Employees x Annual Turnover Percentage)

Annual Cost of Turnover = (1,000 + 500) x (15,000 * 24%)

Annual Cost of Turnover) = 1500 x 3600

Annual Cost of Turnover) = 5400000

Example

  1. Jobs (earning under 30k a year): the cost to replace a 10/hour retail employee would be 3,328 dollars.
  2. Jobs (earning 30k-50k a year) - the cost to replace a 40k manager would be 8,000 dollars.
  3. Jobs of executives (earning 100k+ a year) - the cost to replace a 100k CEO is 213,000 dollars.

EDA

Satisfaction VS Evaluation

  • There are 3 distinct clusters for employees who left the company

Cluster 1 (Hard-working and Sad Employee): Satisfaction was below 0.2 and evaluations were greater than 0.75. Which could be a good indication that employees who left the company were good workers but felt horrible at their job.

  • Question: What could be the reason for feeling so horrible when you are highly evaluated? Could it be working too hard? Could this cluster mean employees who are "overworked"?

Cluster 2 (Bad and Sad Employee): Satisfaction between about 0.35~0.45 and evaluations below ~0.58. This could be seen as employees who were badly evaluated and felt bad at work.

  • Question: Could this cluster mean employees who "under-performed"?

Cluster 3 (Hard-working and Happy Employee): Satisfaction between 0.7~1.0 and evaluations were greater than 0.8. Which could mean that employees in this cluster were "ideal". They loved their work and were evaluated highly for their performance.

  • Question: Could this cluser mean that employees left because they found another job opportunity?

K-Means Clustering of Employee Turnover


Cluster 1 (Blue): Hard-working and Sad Employees

Cluster 2 (Red): Bad and Sad Employee

Cluster 3 (Green): Hard-working and Happy Employee

Employee Satisfaction

There is a tri-modal distribution for employees that turnovered

  • Employees who had really low satisfaction levels (0.2 or less) left the company more
  • Employees who had low satisfaction levels (0.3~0.5) left the company more
  • Employees who had really high satisfaction levels (0.7 or more) left the company more

Employee Project Count

Summary:

  • More than half of the employees with 2,6, and 7 projects left the company
  • Majority of the employees who did not leave the company had 3,4, and 5 projects
  • All of the employees with 7 projects left the company
  • There is an increase in employee turnover rate as project count increases

Average Monthly Hours

Summary:

  • A bi-modal distribution for employees that turnovered
  • Employees who had less hours of work (~150hours or less) left the company more
  • Employees who had too many hours of work (~250 or more) left the company
  • Employees who left generally were underworked or overworked.

Imbalanced Datasets

There are many ways of dealing with imbalanced data. I've focused on the following approaches:

  1. Oversampling — SMOTE
  2. Undersampling — RandomUnderSampler

Choose Which Sampling Technique to Use For Model

Apply 10-Fold Cross Validation for Logistic Regression

Train on Original, Upsampled, SMOTE, and Downsampled Data

Objective:Train our Logistic Regression Model to our original, upsampled, and downsampled data to see which performs best.

Resut:

  • Original Sample: F1 Score 45%
  • Upsample: F1 Score 77%
  • SMOTE: F1 Score 61%
  • Downsample: F1 Score 77%

Upsample

You randomly resample the minority class to create new data.

SMOTE

You use the nearest neighbors of the minority observations to create new synthetic data

Downsample

You remove some samples of the majority class

Retention Plan

Since this model is being used for people, we should refrain from soley relying on the output of our model. Instead, we can use it's probability output and design our own system to treat each employee accordingly.

  1. Safe Zone (Green) – Employees within this zone are considered safe.
  2. Low Risk Zone (Yellow) – Employees within this zone are too be taken into consideration of potential turnover. This is more of a long-term track.
  3. Medium Risk Zone (Orange) – Employees within this zone are at risk of turnover. Action should be taken and monitored accordingly.
  4. High Risk Zone (Red) – Employees within this zone are considered to have the highest chance of turnover. Action should be taken immediately.

Conclusion

Binary Classification: Turnover V.S. Non Turnover

Instance Scoring: Likelihood of employee responding to an offer/incentive to save them from leaving.

Need for Application: Save employees from leaving

In our employee retention problem, rather than simply predicting whether an employee will leave the company within a certain time frame, we would much rather have an estimate of the probability that he/she will leave the company. We would rank employees by their probability of leaving, then allocate a limited incentive budget to the highest probability instances.

Consider employee turnover domain where an employee is given treatment by Human Resources because they think the employee will leave the company within a month, but the employee actually does not. This is a false positive. This mistake could be expensive, inconvenient, and time consuming for both the Human Resources and employee, but is a good investment for relational growth.

Compare this with the opposite error, where Human Resources does not give treatment/incentives to the employees and they do leave. This is a false negative. This type of error is more detrimental because the company lost an employee, which could lead to great setbacks and more money to rehire. Depending on these errors, different costs are weighed based on the type of employee being treated. For example, if it’s a high-salary employee then would we need a costlier form of treatment? What if it’s a low-salary employee? The cost for each error is different and should be weighed accordingly.

Solution 1:

  • We can rank employees by their probability of leaving, then allocate a limited incentive budget to the highest probability instances.
  • OR, we can allocate our incentive budget to the instances with the highest expected loss, for which we'll need the probability of turnover.

Solution 2: Develop learning programs for managers. Then use analytics to gauge their performance and measure progress. Some advice:

  • Be a good coach
  • Empower the team and do not micromanage
  • Express interest for team member success
  • Have clear vision / strategy for team
  • Help team with career development

Selection Bias


  • One thing to note about this dataset is the turnover feature. We don't know if the employees that left are interns, contractors, full-time, or part-time. These are important variables to take into consideration when performing a machine learning algorithm to it.

  • Another thing to note down is the type of bias of the evaluation feature. Evaluation is heavily subjective, and can vary tremendously depending on who is the evaluator. If the employee knows the evaluator, then he/she will probably have a higher score.

About

To understand what factors contributed most to employee turnover and predict the likelihood of a certain employee leaving the company

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published