A couple of years ago I wrote a post (here) regarding the average match length in soccer matches, based on Betfair data. A few days ago I got a replay from David with an explanation on my findings (game length decreases by number of goals). It actually made me curious to pick up this subject once more! In my ordinary data I don’t have the exact additional time (delivered by the fourth referee), so it can actually be of great value to me to predict how long the match will be. So instead of always guessing that there are 2.5 minutes added, I can hopefully differentiate that and get a better guess.
This time I will take it one step further and build a predictive model of the remaining game-time given that we just reached the full ordinary time (90 minutes).
The variables I will try in my model are:
1. Number of total goals (k59)
2. Number of total red cards (k60)
3. Number of total yellow cards (k61)
4. Absolute goal difference (k44_grp)
The short name in parenthesis is the short name that I use in my data, so instead of me renaming my whole database you can look them up in the table above.
First I need to chose a model, I look at the distribution and it seems like a Gamma distribution could be used:
I decide to go with a GLM using the underlying Gamma distribution, and I use the ending game-minute as response. I put all the four variables into my model, and estimate. I get:
All four variables included are significant, and “number of yellow cards” and “absolute goal difference” are the most explaining ones. The estimation is done and the expected game-length is given by:
E[game-length]= 1/exp(- (4,5281 – ‘total goals’*0,0002 + ‘number of red cards’*0,0014 + ‘number of yellow cards’*0,0012 -‘absolute goal difference’*0,0033))
I do a quick sanity check of the model by plotting expected game-length as a function of ‘absolute goal difference’ (locking the values for red cards = 0 and yellow cards = 4, corresponding to their rounded averages). In the same graph I plot the one-way averages (the “real” average game-length for each ‘absolute goal difference’ with red and yellow cards being what they were).
There it is, a reasonable and logical model of game-length – giving me a better tool than assuming 92.5 minutes for all matches!