Data-Driven Analysis Is Changing The Way We See Football
Soccer fans know that the outcome of a match doesn’t always reflect what transpired on the pitch. Talking heads have long debated whether a team was lucky to win or unlucky to lose, often with little substantial proof to argue against the scoreboard. The expected goals statistic (xG) has changed that, equipping soccer enthusiasts with a metric for “what should've happened.”
The expected goals statistic (xG) has risen in both popularity and accuracy over the past decade. MLSsoccer.com publishes an article following each slate of games that compares the actual results with each team’s expected goals. In theory, expected goals and actual goals should align over time. Fans of a team that outperforms its expected goals total should be wary, while an underachieving team, by the marker of expected goals, might have reason for hope.
Calculating xG is not an exact science, however. The stat has progressed significantly as an indicator of which team deserved to win and a predictor of how a team will perform moving forward, but it still does not perfectly capture those aspects of the game that dictate the outcome.
Two-legged xG map for Barcelona - PSG.
Two truly incredible matches that add up into kind of a weird muddle. Football! pic.twitter.com/0nSbiEsfFr
— Caley Graphics (@Caley_graphics) March 8, 2017
The Basics of xG
The most fundamental stat in soccer is binary. Some shots go in, others don’t. xG measures how likely it was for a shot to go in. Whether it actually did or not is irrelevant.
How Does xG Accomplish This?
Based upon a review of thousands of games, sports analytics firms can determine the success rate of any specific shot type. The essential factors considered are distance from goal and the angle at which the shot is taken. For example, a shot six yards out from the center of goal will have a higher xG value than an angled shot taken from six yards.
Two-legged xG map for Sevilla - Leicester City. This could have gone either way, but in this universe of course it went this way. pic.twitter.com/xU5RERn5XT
— Caley Graphics (@Caley_graphics) March 14, 2017
The Stat's Progress
As tracking of xG has evolved, more factors have been introduced. In addition to distance and angle, the type of pass, the type of shot and the type of play leading up to the shot are all now included in the best models. Data analyst and soccer guru Michael Caley offers a comprehensive look at how shot types are grouped and broken down.
The key consideration is that all of these factors affect the likelihood of scoring. A shot taken on a counterattack means that defenders are more likely to be out of position than during a slow possession-based build up. Similarly, a through ball often leaves a defender exposed, meaning shots taken immediately after one are more likely to go in than the average pass.
This effectively means that not all shots from 18 yards out are created equal. Is it a volley? A header? Coming from a corner kick? A free kick? You get the idea. xG accounts for all of this variability, grouping each specific shot type and calculating its success rate (somewhere between 0 and 1).
Two-legged xG map for Arsenal - Bayern Munich. That is a danger zone cluster for the ages. pic.twitter.com/LtQID8u4w6
— Caley Graphics (@Caley_graphics) March 8, 2017
Where xG Falls Short
If the goal of advanced analytics in soccer is to more accurately assess how teams and players perform, then xG is just the start of a movement. Although xG continues to incorporate the more specific factors that define a shot, it is only triggered by an actual shot attempt.
Caley’s model is based off of data from Opta Sports, which also includes missed chances (when a shot should have been taken). However, it only goes so far in accounting for the value of each build up play. If David Silva plays several excellent passes that set up the opportunity for a cross but Jesus Navas fails to deliver one, no xG value will be attached to the play.
An analyst could argue, though, that if Silva continues to make passes like that, Man City will eventually turn some of them into goals.
Additionally, the xG value from a shot almost always results from some combination of passes, an interception, a tackle, an aerial ball, a drawn foul and several other events on the field. Ideally, one could parcel out value to each action, determining how much each player in the build up was responsible for the chance. xG values at this point are not distributed to players based upon their role in creating a scoring chance. It's still best used as a team statistic.
Finally, in grouping shot types and calculating their average success rates, xG does not adjust for the quality of the player taking the shot. Barcelona, PSG and Man City consistently post goal totals that exceed their xG totals. Having Messi, Cavani and Aguero up top means that what would typically constitute a low probability shot is in fact one that’s more likely to end up in the back of the net.
Goals are the currency of soccer so figuring out how many a team should have scored, and, more importantly, projecting how many they will score, is a worthwhile endeavor. The challenge for soccer analytics folks is to connect xG to all actions on the field with a continuously more nuanced approach.