Summing up 2016 and prognosis for 2017

It has been a fantastic year for the bot! Looking back for my 2016 prognosis one year ago, I wrote:

“So what do I hope for 2016? Its not of any value to have financial goals, I will just try to improve the model, bot and risk management as much as possible and hope for the best… But walking into 2016 with much better starting point than in 2015, a reasonable guess would be to turn at least 4 MSEK, and reach a ROI of 1 %. If that’s the case it would mean a profit around 40000 SEK.”

The actual performance for 2008-2016:


I managed to increase both turnover and ROI, instead of the prognosis to turn 4 MSEK I turned 6.1 MSEK and instead of 1 % yield I got 1.2%. Increasing both of them gave me a profit of 75 865 SEK (instead of the anticipated 40 000 SEK).

Breaking down the bets for 2016 on bet types:


I am happy to see that I am “green all over”, although the Draw and Over/Under models are performing below expectations.

The plan for 2017 is to keep the good models for Home and Away, and do some minor changes to Draw and Over/Under to slightly improve their yield. I aim to get all models +1.0% yield. My total yield goal is to improve from 1.24% to 1.30%.  A bigger wallet for 2017 will also mean bigger stakes and more turnover. Projecting the turn for the last few months indicates that 2017 could turn around 10 MSEK in total. If both these conditions hold I guess that the bot will generate around 130 000 SEK. That would be very satisfying!

I have also implemented two new models, one for Home and one for Over/Under. These are non-competitive with the other models (meaning that they wont ever bet on the same market). I will start these models with very small stakes, and evaluate during the year. If I could get these models going on full stakes later this year it would increase turnover a lot….

Finally, happy new year to you all and let us make 2017 a magic betting year!




Top10 supported soccer teams – Looking for edges – Part 2 of 3

This is the second post in the following series:

1. If many “play-for-fun” supporters bet on their favourite team, can laying them pre-game be a winning strategy?
2. What if one of these teams take a one goal lead in-play. Should I back or lay?
3. What if their opponent takes the lead by one goal. Should I back or lay?

The data is from in-play match-odds from 2013, and the bet is calculated from the first recording after the first goal that also has an over-round of maximum 2 %.

I mined through the data and found something interesting when I separated league matches from other matches (cups, champions league etc). Domestic league are marked as “Y” and others as “N” in the table:


Backing the top10 teams in other matches but league-matches has been a good profitable strategy the latest years, and has worked well both when the top10 team is playing home and when playing away.

The league matches show the opposite trend, backing the top10 team would have been something similar to setting fire to your money. On the other hand, laying them would have been profitable (including the negative yield laying away in 2016).

In the next post I will deep into the situation where the opponent to the top10 team takes a one goal lead…


Top10 supported soccer teams – Looking for edges – I/III

I decided to look into the European soccer teams with the biggest supporter base. Doing a quick google on the subject I found the top10 teams to be:

1. Manchester United
2. Real Madrid
3. FC Barcelona
4. Chelsea
5. Arsenal
6. Liverpool
7. FC Bayern Munich
8. AC Milan
9. Juventus
10. Paris Saint-Germain

I found the list at, it is their ranking (based on followers on social media, TV-viewership, shirt sales and sponsorship deals), and I decided to go with that. I have three questions I want to explore:

1. If many “play-for-fun” supporters bet on their favourite team, can laying them pre-game be a winning strategy?
2. What if one of these teams take a one goal lead in-play. Should I back or lay?
3. What if their opponent takes the lead by one goal. Should I back or lay?

I will split this subject into three different blog post, starting with the pre-game question. My hypotheses is that when big amounts of supporter money hits the line the odds on the big team will get eaten, therefore there might be an edge laying the big team.

I use my own recorded data on Betfair with the following notes:

1. The pre-game odds that I use is recorded a few minutes before the kick-off.
2. If it’s a match between two top10 teams, then it’s excluded.
3. Data from 2013 until today.
4. Not complete data, some matches have not been recorded and I know nothing about them.

The result I get is seen in the the table below:


I am looking for a strategy that have been consistent over time, and where I have some exposure. Both Premier League and Primera Division have a history of positive yield when laying any of the top10 teams. Maybe this could be an interesting pre-game strategy for 2017? It’s big markets and you can probably get large sums matched pre-game.


Tools and infrastructure for analysing big data



Its been some time since last update, I have been spending my (very limited) spare time recoding my whole analysis infrastructure. I use the PostgreSQL database with my bot, but it just dumps raw data into a database. I need to adjust, clean and process my data – something I have been using SAS to do earlier. SAS is a great tool for analysing data, but not that great when it comes to structuring millions and millions of lines. So I have rewritten my SAS code into SQL code instead. I will now create all my data tables (used in analysis) by SQL code into my database, this have increased the speed of creating my tables.

For analysis I will use the free edition of SAS (University edition), it is basically made for training so it has some limitations. One of the things I stumbled into was the issue of connecting my database to SAS and reading directly from it. It is NOT featured in the free edition, but it can be overruled by a few simple tricks. Download the PC Files Server for SAS, run it in the background and libname it with something like:
LIBNAME z PCFILES SERVER=computername port=9621 DSN=raw USER=username PASSWORD=xx;

This unlocks the free edition to actually being useful for big volume data analysis. As I get more and more data I also have to sharpen the infrastructure so that it’s manageable in size and time.

As I a last improvement I am going to write a simulation module in VBA costume made for my needs, sometimes it’s just not possible to fit all my needs into a fabricated software.

I have now spent many hours to create some analysis tables which are exactly the same as earlier, not as funny as improving models and looking for areas with edge but with the improved structure I will be in a better position when I start to analyse. Do the hard work and hopefully get the benefits later 🙂






Model of expected overtime in soccer

A couple of years ago I wrote a post (here) regarding the average match length in soccer matches, based on Betfair data. A few days ago I got a replay from David with an explanation on my findings (game length decreases by number of goals). It actually made me curious to pick up this subject once more! In my ordinary data I don’t have the exact additional time (delivered by the fourth referee), so it can actually be of great value to me to predict how long the match will be. So instead of always guessing that there are 2.5 minutes added, I can hopefully differentiate that and get a better guess.

This time I will take it one step further and build a predictive model of the remaining game-time given that we just reached the full ordinary time (90 minutes).

The variables I will try in my model are:

1. Number of total goals (k59)
2. Number of total red cards (k60)
3. Number of total yellow cards (k61)
4. Absolute goal difference (k44_grp)

The short name in parenthesis is the short name that I use in my data, so instead of me renaming my whole database you can look them up in the table above.

First I need to chose a model, I look at the distribution and it seems like a Gamma distribution could be used:


I decide to go with a GLM using the underlying Gamma distribution, and I use the ending game-minute as response. I put all the four variables into my model, and estimate. I get:


All four variables included are significant, and “number of yellow cards” and “absolute goal difference” are the most explaining ones. The estimation is done and the expected game-length is given by:

E[game-length]= 1/exp(- (4,5281 – ‘total goals’*0,0002 + ‘number of red cards’*0,0014 + ‘number of yellow cards’*0,0012 -‘absolute goal difference’*0,0033))

I do a quick sanity check of the model by plotting expected game-length as a function of ‘absolute goal difference’ (locking the values for red cards = 0 and yellow cards = 4, corresponding to their rounded averages). In the same graph I plot the one-way averages (the “real” average game-length for each ‘absolute goal difference’ with red and yellow cards being what they were).


There it is, a reasonable and logical model of game-length – giving me a better tool than assuming 92.5 minutes for all matches!


Match size potential

One of the most important things to get rich on betting is NOT to create the best model – Instead you need to balance your model with how much liquidity that is available in the market. No use of calculating the most accurate odds, load it with your margin, just to find out that there is no one in the market to take your bets.


I know for my self that I haven’t really been on top of this issue, the reason mainly for betting with low stakes (and therefore almost all the time getting matched). As my account has grown the requested bet size has become bigger and bigger and it will eventually be an issue that I need to address with more intelligence.


I made a graph to see how much I manage to get matched (as percentage) by different requested amounts:


The “amount asked” is rounded to nearest 250 interval, so 0 means asked amounts < 125 SEK. Just as expected there is a quite obvious trend that shows the problem of getting full amounts through at higher stakes.


I’ve also added a linear trend so that I can make a very simple prediction of my match% as my account (and stakes) grows. Actually when playing around with the trend I found out that the exponential trend had the best fit so I use:

y= 0,8412*exp(-0,06x) where x equals my group (0=0 SEK,1=250 SEK, 2=500 SEK …). Extrapolating this on higher stakes gives:


Ouch! I hope that the curve flattens out more when reaching higher stakes in practise.

One reason to be bothered about this fact is that I suspect that I risk getting more matched in parts of my model where there is a lower expected value, meaning that the ROI of my model will eventually also decline as I reach higher stakes. So far I have no evidence of this, but in my mind it is logical.

I will follow up on this, I expect to reach the 2000 and 3000 stakes within a year and will then return to this subject.




Betting results 2016-Q3

It is time to sum up the performance of my bot during the last three months (also known as the third quarter…). It has been a really good run, the model has performed above expectation in both ROI and in turnover.

Breaking the quarter down on bet type:


We can conclude that the bot struggles to get the Away algorithm to be profitable, it manages to get it just into the green (100.2% ROI). Although that poor performance it manages to get 101.75% ROI on the total, which is above expectation (I aim for around 101%).

In the start of this fourth quarter I will do some adjustment to the away model and hopefully bring it into profit when closing to book for that quarter. I have estimated a new model, with one more explanatory variable then old model, that seems very promising. I will keep the other models untouched (don’t fix it if it ain’t broken…).

Looking back a couple of years I new realise that 2016-Q3 is the seventh quarter in a row with positive results:



This is of course very pleasing, and clearly indicates that I have an edge in the market. These kind of results only makes me want to work harder and improving the models – Now I know that it is possible and that I am on the right track (there have been many times over the years when I have been on the border of giving up).

finally looking at the yearly table:


2016 so far has turned 3.8 MSEK, earned 41.7 KSEK at ROI of 101.09%. Looking back in my old blog ( to see what my expectations were for 2016:

“So what do I hope for 2016? Its not of any value to have financial goals, I will just try to improve the model, bot and risk management as much as possible and hope for the best… But walking into 2016 with much better starting point than in 2015, a reasonable guess would be to turn at least 4 MSEK, and reach a ROI of 1 %. If that’s the case it would mean a profit around 40000 SEK.”

I am ahead!



Big data challenges


As a bot runner I have the wonderful possibility to collect a lot of data. As an analyst I can not say no to data, it is the engine in all of my analysis – and a bigger engine is always better! In practise this means that I collect all available data.

What does it mean? Lets go through some figures:

1. I follow up to 500 matches simultaneously
2. Each match is recorded at least once a minute, during some circumstances they are recorded once every 5 seconds.
3. Every record generates one line and 143 columns with different types of data.


I load all of my main data into one big table (as we speak its about 26 GB big with about 26 million rows).

In my current hardware set up I run into problems now and then. When following a lot of matches (+200) and when many of them are in the 5 seconds update phase the writing of data into the PostgreSQL database is not fast enough. At those times my bot can not write every 5 seconds, sometimes the update frequency drops to 10-15 seconds instead! This is not good enough and I need to improve the performance.

What are my options?

1. Improving my code – Put more advanced routines to handle and write data in the background in my bot code. I am not really a big fan of this as my expertise in coding is on amateur level, but I will try to do some improvements.
2. Rearranging my PostgreSQL database – This I will need to do. I will partly create a new structure, keeping more tables but with smaller size.
3. Upgrading hardware – This I will also do. I don’t think that the problem is in the processor, but rather in the hard disk. So I will change my HDD into a SSD, it writes data much faster.
4. Moving to VPN – Might be an option in the future, if my bot performs OK there are several benefits to not use my basement located home computer any more, such as speed and stability.




This is the fifth post in the following series:

1.Introduction to Logistic Regression
2.Setting up a model
3.Testing and optimising the model
4.Evaluating the model

Lets assume we have an implementable model. The implementation phase have shown many times to be a real challenge for me, small errors in the implemented model have generated huge miss-pricing in the model. Just as one example I accidental used betting stakes at 50% of my capital instead of 5%…. just a pure miracle that I didn’t empty my bank (instead it actually became very profitable by pure luck, but I strive to replace luck with skills :)).

To avoid problems and to discover errors I do :

– Lower the betting stakes on the new implemented model
– Cross reference the bets generated by the bot with simulated bets from another system
– Implement one model at a time
– Implementation phase limited to one day every quarter

The cross reference is done by running my model in both VB environment as well as SAS environment, the VB model executes the bets and the SAS model works as a reference. As soon as the calculations deviates I get notified.

When it comes to the betting stakes I currently run a normal model at 3% of my capital on each bet, a newly implemented model runs on 0.3% instead.

Down below you see a picture of my bot in action tonight at around 22.00. In a later post I will guide you through the structure and features of my bot. It is really cool!



Evaluating the model

This is the forth post in the following series:

1.Introduction to Logistic Regression
2.Setting up a model
3.Testing and optimising the model
4.Evaluating the model

In the previous post we created a model as good as we could, we know that model is statistically “fair” but we are not sure if that means the model is long-term profitable (got an edge in the market). So we need to test it on out of sample data!

After the creation of my dataset I divided it onto two parts, one for modelling and one for out of sample simulation. I used 2014 and 2015 data to model, and selected 2016 as simulation set.

I have now implemented the new model in SAS, and start out with the rule: create a back signal IF the offered odds is at least my calculated odds plus 1%. I also have a suspicion that if my model deviates to much from the market price, then there is a flaw in my model and not in the market. I plot the accumulated ROI by different “edges” (or deviation from market price) together with the number of bets induced by the model in 2016.


I use linear regression to get a feel of how to model behaves, and conclude that a higher edge is probably just a sign of me missing some vital information in my model. This model is probably OK for small deviations from market, so I restrict the model to work in the range of edges below 4 %. I choose 4 % because at that point I should get about 1 % ROI (by looking at the graph above and seeing where the regression line crosses 1 % ROI).

Running the model through with this restriction to see how the 118 bets are distributed by month:


It is a small sample, but the model implies a 1.1% ROI when simulated on 2016 (out of sample) data – It might be a model worth going further with!

In “real life” I do much more extensive testing, I look for strange things by splitting my simulated data by different variables and try to find all possible areas where the model doesn’t work. It is not as easy as just finding unprofitable segments, they also need to be logical and explainable. Limiting the model on historical data might risk creating an over adapted model with poor predictive power.

In the next post I will write about the implementation phase.