This is the third post in the following series:
1.Introduction to Logistic Regression
2.Setting up a model
3.Testing and optimising the model
4.Evaluating the model
It is now time to fit the model with my explanatory variables. It means finding the coefficients Bk that best fits:
When feeding my model with my dataset and fitting the model (using SAS but there are several tools out there such as R, Matlab and Excel) I get these results:
K43 is the game minute and g15 is the pre-game favourite.
There are several methods to see how well your model fits the data (so that you can compare models and chose the best one). One of my favourite evaluation method is to check the ROC curve:
The concept is that you check your models ability to guess your response in comparison to make a random guess. The diagonal line is equivalent to a random guess, and the blue curve is the lift of my model (which is much better than a random guess, jippi!). The area under the curve is a measure of how big the lift is and therefore how good the fit is. I found a point system to see how good your model is:
.90-1 = excellent (A)
.80-.90 = good (B)
.70-.80 = fair (C)
.60-.70 = poor (D)
.50-.60 = fail (F)
So, the model is “fair” … Is that enough to win money?!
Now we have a model to describe the probability for a away win (with 2 goals ahead) given game minute and who was pre-game favourite, and plotted it looks like:
If you are unsure how to convert the formula into a odds I will give you an example.
Lets say you are in game minute 50 and the home-team was pre-game favourite. Then you get:
Log Odds = 4.3327 + 50*0.0247 – 2.8406 = 2.7172
To get a probability you take 1/(1+exp(-log odds)) = 0.939. To get the odds we take 1/probability = 1/0.939 = 1.065. That is our own estimated odds for this case. By the same principles you can calculate all the combinations of game minutes and pre-game favourite and get your own estimated odds!
In next post I will continue to evaluate the model!