Predictive
models have been in use since advent of various statistical tools. Both
marketing and financial institution uses it extensively. This article tries to
look under the hood how a predictive model is created. I will try explaining it
using simplest available statistical tool like Trend analysis, Univariate Analysis
and slightly complex tool called Binomial Logistic
Regression.
Lets
assume company want to know how much attrition will happen in current quarter,
so that it can start hiring process.
First
step of model is find out what are the factors that can lead to attrition of an
employee. To find these factors we simple take past data and holistic list of
factors.
Age, Income, Martial
Status, Education Background, Work Rating, Number of leaves, Number of days
came later than usual timing, Number of year in company, Business Unit,
Business vertical, Past Work experience
Indicative list of factors for Company
Attrition Model
On
each listed factor we do trend analysis and Univariate analysis. Tread analysis
will give us a rough relation between each factor and attrition. In our case we
will find out trend between attrition and age. Outcome can be something like
shown below.
Sample Trend Analysis Result
(Attrition of Employee less than 26 Year in age
against number of quarters spent in Company)
Similarly once done for all parameters,
researcher will get an idea, which are the parameters that are relevant for
prediction. In this process of selecting the parameters one should always ask,
if the result shown by graph makes sense, if it defies normal course of
happening or observation, a researcher should try to look into the data, verify
its validity.
Once all the parameters are chosen with
help of Univariate and Trend analysis, its time to create the required model.
For this purpose we divide available past data into two parts. 2/3 of randomly
selected data is used to create the model and rest 1/3 will be used to verify
the model.
Prepare 2/3 randomly selected data for
binomial logistic regression. One can use any analytics tool like SPSS.
Once we get the parameter coefficient
from regression analysis, we use this equation to predict the event. We test
the model using 1/3 data kept aside for testing the model. A good fit will give
you your model, otherwise one need to go back to parameter discovery exercise
once again and repeat the whole process.
Predictive model has innumerable usage
and if created with care, it can help in streamlining many complex process.
Comments