Model of expected overtime in soccer

A couple of years ago I wrote a post (here) regarding the average match length in soccer matches, based on Betfair data. A few days ago I got a replay from David with an explanation on my findings (game length decreases by number of goals). It actually made me curious to pick up this subject once more! In my ordinary data I don’t have the exact additional time (delivered by the fourth referee), so it can actually be of great value to me to predict how long the match will be. So instead of always guessing that there are 2.5 minutes added, I can hopefully differentiate that and get a better guess.

This time I will take it one step further and build a predictive model of the remaining game-time given that we just reached the full ordinary time (90 minutes).

The variables I will try in my model are:

1. Number of total goals (k59)
2. Number of total red cards (k60)
3. Number of total yellow cards (k61)
4. Absolute goal difference (k44_grp)

The short name in parenthesis is the short name that I use in my data, so instead of me renaming my whole database you can look them up in the table above.

First I need to chose a model, I look at the distribution and it seems like a Gamma distribution could be used:

I decide to go with a GLM using the underlying Gamma distribution, and I use the ending game-minute as response. I put all the four variables into my model, and estimate. I get:

All four variables included are significant, and “number of yellow cards” and “absolute goal difference” are the most explaining ones. The estimation is done and the expected game-length is given by:

E[game-length]= 1/exp(- (4,5281 – ‘total goals’*0,0002 + ‘number of red cards’*0,0014 + ‘number of yellow cards’*0,0012 -‘absolute goal difference’*0,0033))

I do a quick sanity check of the model by plotting expected game-length as a function of ‘absolute goal difference’ (locking the values for red cards = 0 and yellow cards = 4, corresponding to their rounded averages). In the same graph I plot the one-way averages (the “real” average game-length for each ‘absolute goal difference’ with red and yellow cards being what they were).

There it is, a reasonable and logical model of game-length – giving me a better tool than assuming 92.5 minutes for all matches!

Match size potential

One of the most important things to get rich on betting is NOT to create the best model – Instead you need to balance your model with how much liquidity that is available in the market. No use of calculating the most accurate odds, load it with your margin, just to find out that there is no one in the market to take your bets.

I know for my self that I haven’t really been on top of this issue, the reason mainly for betting with low stakes (and therefore almost all the time getting matched). As my account has grown the requested bet size has become bigger and bigger and it will eventually be an issue that I need to address with more intelligence.

I made a graph to see how much I manage to get matched (as percentage) by different requested amounts:

The “amount asked” is rounded to nearest 250 interval, so 0 means asked amounts < 125 SEK. Just as expected there is a quite obvious trend that shows the problem of getting full amounts through at higher stakes.

I’ve also added a linear trend so that I can make a very simple prediction of my match% as my account (and stakes) grows. Actually when playing around with the trend I found out that the exponential trend had the best fit so I use:

y= 0,8412*exp(-0,06x) where x equals my group (0=0 SEK,1=250 SEK, 2=500 SEK …). Extrapolating this on higher stakes gives:

Ouch! I hope that the curve flattens out more when reaching higher stakes in practise.

One reason to be bothered about this fact is that I suspect that I risk getting more matched in parts of my model where there is a lower expected value, meaning that the ROI of my model will eventually also decline as I reach higher stakes. So far I have no evidence of this, but in my mind it is logical.

I will follow up on this, I expect to reach the 2000 and 3000 stakes within a year and will then return to this subject.