Predicting Employee attrition for thousands of companies

About our customer

The client is based in the United States and offers cloud-based software as a service to companies for managing their human resources


Insurance, SaaS, HR

Use Case

HR, Employee Retention

The Challenge

1) The factors affecting attrition includes both internal factors like attrition rate within a company as well as external factors like competition in industry which can be very difficult to measure.

2) Lot of factors vary with size and location as well as with the position of an employee in a company, which makes it very challenging to have a generalized solution for them.

3) There is no or very less recorded data for some of the most important predictors of employee attrition.

4) Identifying and collecting the right variables with satisfactory data size from a list of more than 50 variables which we initially thought could be good predictors.

5) Data required a lot of treatment before it could be used.

The Approach

After having multiple discussions with the client we built a hypothesis around the predictors which could be important in finding the employee attrition risk. Initially we had a list of more than 50 variables which we thought could explain employee termination. We collected the data available and identified the variables which had sufficient size and quality to be used. We built our hypothesis and tested them with the data. Some of them were eg. removing executives on the assumption that companies would be more interested in identifying attrition risk of other employees and also because their reasons for termination would be different from other employees. Similarly we removed employees with salaries below the minimum wage mandated by federal government of the US under US labor law. We also merged salary changes happening in a very short duration because they could be due to the negotiations that happen during a salary hike.

After taking appropriate steps for data treatment and hypothesis testing we build a machine learning model on a reduced number of predictors. To visualize how the decisions are made, we built a surrogate model over the predictions of our original model. We also built partial dependency plots for the important predictors to explain how they impact employee attrition. Finally we shared the model performance metrics and risk score distribution plot for terminated and non-terminated employees with the client to show how well the model is able to separate the two cases.

The Outcome

The client is able to provide employee attrition risk scores to their users as another feature of the product.

Capabilities Enabled

Identified the factors that could be indicative of employee attrition risk. Also developed a machine learning model which provides companies with an employee attrition risk score and enables them to take action to retain their high performing employees.

Impact Created

We were able to achieve a Gini score of 0.54. The user companies of the product are now able identify and retain their high performing employees by working on the factors of attrition and risk scores.