Football (or soccer for the USA readers) is an amazing sport. It can’t be the world’s most popular sport by coincidence.
Football gathers people together, it’s an excuse to disconnect from our busy lives because game time is fun time. We order some fast food and eat it while Messi makes magic with the ball — how lucky we are for having been able to enjoy him. And we get to watch so many amazing teams like 2010’s Barça or even 2023’s Manchester City.
Many will say no game is equal. It’s football, and there’s nothing like it. But I’d say that’s wrong.
As outstanding as it is, it still is dominated by math. Like everything else.
Life is full of mathematical models. And football is no exception.
I’ve been a die-hard Barça fan throughout my entire life. Add that to the current situation I find myself professionally in, and the result is a genuine interest in sports analytics — obviously inclined toward football.
This post is the first I’ll be writing about sports analytics, so I will keep it relatively simple. However, I plan on writing a lot more to learn a lot about how math applies to football (and potentially other sports like handball) — and share the insights with you all.
The amount of data scientists getting hired for sports analytics roles is increasing strongly and it won’t seem to be stopping anytime soon. Using data in sports makes more sense than ever, especially given that the amount of data being generated is also increasing at a fast pace.
So, this post will be a great intro tool for all aspiring sports analysts or data-related folks interested in sports.
Here, I’ll be using StatsBomb’s open and free data to inspect the La Liga season of 2015–2016, which I’ve randomly chosen. I invite you to do the same analysis and see if it holds true for other seasons and leagues as well.
So let’s dig in!
Preparing The Data
There’s a wonderful Python module that will allow us to get all the data we need…