BeyeBLOGS | BeyeBLOGS Home | Get Your Own Blog

Main

December 30, 2008

Churn Modeling - Logistic Regression

A prediction/classification problem involving a lot of categorical variables and the first thing that comes to mind is Logistic Regression.

One thing I normally come across in Logistic Regression models is the low percentage of true positives, or cases/records correctly classified. And most of the times, the problem lies with the selection of the predictor variables. Many people tend to select as many predictor variables as they can. They have this wrong notion that they will miss something really BIG if they don't include certain variables in the model.

And this is exactly where the idea of statisticians being the best and only candidates for analytics jobs is proved wrong. Someone with an understanding of the domain/business will easily point out the variables that will influence the independent/response variable. I always say to my managers - A Statistician, a Database Expert and an MBA are absolutely required for a successful Analytics Team.

Coming back to the accuracy of the Logistic Regression topic; while variable selection is the most important factor (besides the data quality, of course!!) influencing the accuracy of the model, I would like to say variable transformation and/or how you interpret the predictor variable is the second most important factor.

In a churn prediction model for a telecom company, I was working on Logistic Regression techniques and one of the predictor variables was "Months in Service". In the initial runs, I specified it as a continuous variable in the model. After a lot of reruns that failed to increase the accuracy of the model, something made me think about the relation between "Probability of Churn" & "Months in Service". Will the probability increase with an increase in the months of service? Will it decrease? Or will it be a little more complicated - with a lot of customers leaving in the initial few months of service, staying back for the next couple of months, and then churning again for another block of months, and so on?

I reran the model, this time specifying "Months in Service" as a categorical variable. And the model accuracy shot up by about 12%!!!

Posted by romakanta at December 30, 2008 5:00 PM

Comments

Thanks for another excellent article. Where else could anyone get that kind of info in such a perfect way of writing? I have a presentation next week, and I'm on the look for such information.

Posted by: escort London agency at May 16, 2011 3:36 PM

When I initially commented I clicked the -Notify me when new comments are added- checkbox and now each time a comment is added I get four emails with the identical comment. Is there any way you'll be able to take away me from that service? Thanks!

Posted by: Alegro at May 16, 2011 6:02 PM

Excellent post. I was checking continuously this blog and I'm impressed! Extremely useful info particularly the last part :) I care for such information much. I was seeking this particular info for a long time. Thank you and good luck.

Posted by: London escort agency at May 17, 2011 1:38 AM

Post a comment




Remember Me?