Published on Dec 19, 2019 in Beating Vegas
One of our modeling goals at Gridiron AI is to pick outcomes of NFL games against the spread at a rate high enough to win money over the long-term. This article covers our strategies and results for the 2019 season thus far.
Our models at the time of writing are correctly predicting outcomes against the spread around 60% of the time. Overall, we’re happy with this iteration of our modeling efforts. We’re highly confident it’s better than guessing, but it’s still difficult to determine whether or not the win rate is high enough and with low enough variance to beat vegas.
Vegas lines are almost always in favor of the house, so if you lose, you lose 100% of your money, but if you win, you only increase your money by ~90%. The house takes around $10 for every $100 you bet even if you win. This can be thought of as a “fee.” And it makes it difficult to achieve a good return over time.
You need to win ~55% of bets just to cover this fee.
The problem we’ve ran into is that there are few outcomes (we might only bet 3-5 games per week) and they have high variance. One week we get 5/5, the next we go 2/4, etc. Our model has been live for all of 2019 and on average, we’re at about 60% win rate, but our betting pool has only barely increased because of fees and high variance.
The problem with high variance is that if you start with $100 and lose 50% week 1, then gain 70% week 2, how much do you think you’re up by? Hint, the answer is not 20%.
$100 * 50% = $50 => $50 * 170% = $85.
That’s right, even though you’re good week was “better” than your bad week (70% > %50), you’re still actually down 15%.
This is compounding at work, but not in your favor.
The problem is that you’re making 70% return on $50 instead of the original $100.
What keeps happening to us is that we’ll get into positive territory, then have a bad week that sends us below the $100 watermark and it takes 2-3 good weeks to get back to where we were.
High variance also means we have a decent chance of losing everything. All it takes is one skunk week. And with ~5 fairly random outcomes, that could happen any week even if our models are doing well on average.
Right now, our models are trained to predict the probability of every point combination for both teams in each game. What’s the probability home team 0 and away team 0? Home team 2 and away team 0? Home team 0 and away team 2? And so on.
We then look at the spreads and add up the probabilities into X for the point outcomes that would satisfy the spread, and add up the probabilities into Y for the point outcomes that would not satisfy the spread.
Often, the probabilities are fairly close to each other—they might be 50/50, 55/45, etc. There are usually a handful that are 65/35 or better. These are the games we bet on.
Here’s the problem…
We’re asking our model to predict super granular point outcomes. How often has a game finished 0-2 in NFL history? Or 33-4? Maybe never? But for every game we ask the model to predict dozens of point outcomes while also taking into account a host of factors—the starting QBs, defenses, coaches, weather, etc. NFL football is already a sparse dataset (16 games per season for 32 teams and we’ve found that only the last ~12 years or so are relevant for predicting games today). I think one of our biggest issues is that there just isn’t enough signal to do these point distributions accurately.
We might get better results predicting the ending spread since there are fewer outcomes, but I’m still skeptical there’s enough data.
We’re brainstorming new approaches to train our models. Right now our top idea is to have the model predict return on money wagered. Or, simply whether or not the bet is expected to meet a hurdle rate for return (maybe 10% ROI).
I’ll update in this collection once we’ve experimented more and found something worthwhile.
In the meantime, you can check out our betting performance for 2019.