## Model of expected overtime in soccer

A couple of years ago I wrote a post (here) regarding the average match length in soccer matches, based on Betfair data. A few days ago I got a replay from David with an explanation on my findings (game length decreases by number of goals). It actually made me curious to pick up this subject once more! In my ordinary data I don’t have the exact additional time (delivered by the fourth referee), so it can actually be of great value to me to predict how long the match will be. So instead of always guessing that there are 2.5 minutes added, I can hopefully differentiate that and get a better guess.

This time I will take it one step further and build a predictive model of the remaining game-time given that we just reached the full ordinary time (90 minutes).

The variables I will try in my model are:

1. Number of total goals (k59)
2. Number of total red cards (k60)
3. Number of total yellow cards (k61)
4. Absolute goal difference (k44_grp)

The short name in parenthesis is the short name that I use in my data, so instead of me renaming my whole database you can look them up in the table above.

First I need to chose a model, I look at the distribution and it seems like a Gamma distribution could be used:

I decide to go with a GLM using the underlying Gamma distribution, and I use the ending game-minute as response. I put all the four variables into my model, and estimate. I get:

All four variables included are significant, and “number of yellow cards” and “absolute goal difference” are the most explaining ones. The estimation is done and the expected game-length is given by:

E[game-length]= 1/exp(- (4,5281 – ‘total goals’*0,0002 + ‘number of red cards’*0,0014 + ‘number of yellow cards’*0,0012 -‘absolute goal difference’*0,0033))

I do a quick sanity check of the model by plotting expected game-length as a function of ‘absolute goal difference’ (locking the values for red cards = 0 and yellow cards = 4, corresponding to their rounded averages). In the same graph I plot the one-way averages (the “real” average game-length for each ‘absolute goal difference’ with red and yellow cards being what they were).

There it is, a reasonable and logical model of game-length – giving me a better tool than assuming 92.5 minutes for all matches!

## Match size potential

One of the most important things to get rich on betting is NOT to create the best model – Instead you need to balance your model with how much liquidity that is available in the market. No use of calculating the most accurate odds, load it with your margin, just to find out that there is no one in the market to take your bets.

I know for my self that I haven’t really been on top of this issue, the reason mainly for betting with low stakes (and therefore almost all the time getting matched). As my account has grown the requested bet size has become bigger and bigger and it will eventually be an issue that I need to address with more intelligence.

I made a graph to see how much I manage to get matched (as percentage) by different requested amounts:

The “amount asked” is rounded to nearest 250 interval, so 0 means asked amounts < 125 SEK. Just as expected there is a quite obvious trend that shows the problem of getting full amounts through at higher stakes.

I’ve also added a linear trend so that I can make a very simple prediction of my match% as my account (and stakes) grows. Actually when playing around with the trend I found out that the exponential trend had the best fit so I use:

y= 0,8412*exp(-0,06x) where x equals my group (0=0 SEK,1=250 SEK, 2=500 SEK …). Extrapolating this on higher stakes gives:

Ouch! I hope that the curve flattens out more when reaching higher stakes in practise.

One reason to be bothered about this fact is that I suspect that I risk getting more matched in parts of my model where there is a lower expected value, meaning that the ROI of my model will eventually also decline as I reach higher stakes. So far I have no evidence of this, but in my mind it is logical.

I will follow up on this, I expect to reach the 2000 and 3000 stakes within a year and will then return to this subject.

## Betting results 2016-Q3

It is time to sum up the performance of my bot during the last three months (also known as the third quarter…). It has been a really good run, the model has performed above expectation in both ROI and in turnover.

Breaking the quarter down on bet type:

We can conclude that the bot struggles to get the Away algorithm to be profitable, it manages to get it just into the green (100.2% ROI). Although that poor performance it manages to get 101.75% ROI on the total, which is above expectation (I aim for around 101%).

In the start of this fourth quarter I will do some adjustment to the away model and hopefully bring it into profit when closing to book for that quarter. I have estimated a new model, with one more explanatory variable then old model, that seems very promising. I will keep the other models untouched (don’t fix it if it ain’t broken…).

Looking back a couple of years I new realise that 2016-Q3 is the seventh quarter in a row with positive results:

This is of course very pleasing, and clearly indicates that I have an edge in the market. These kind of results only makes me want to work harder and improving the models – Now I know that it is possible and that I am on the right track (there have been many times over the years when I have been on the border of giving up).

finally looking at the yearly table:

2016 so far has turned 3.8 MSEK, earned 41.7 KSEK at ROI of 101.09%. Looking back in my old blog (blog.bramhed.se) to see what my expectations were for 2016:

“So what do I hope for 2016? Its not of any value to have financial goals, I will just try to improve the model, bot and risk management as much as possible and hope for the best… But walking into 2016 with much better starting point than in 2015, a reasonable guess would be to turn at least 4 MSEK, and reach a ROI of 1 %. If that’s the case it would mean a profit around 40000 SEK.”