Also posted on Citywatch:
More than any other position, goalkeeper has been the symbol of Pep's revolution at City. Joe Hart, one of the mainstays under Pellegrini and Mancini, has been replaced by Claudio Bravo. While there are other possible reasons Hart may have been jettisoned, the popular theory is that Bravo's ability with the ball at his feet was a key component in making the switch. Indeed, Pep's belief in bringing out the ball from the back has been one of the most obvious changes to the way City have played this season. While surely no one is sorry that goal-kicks targeting Bacary Sagna on the half-way line have been largely abandoned (a tactic of Pellegrini's that was one of my personal pet peeves), the question of what value this change in tactics brings is still very relevant. Most of the analysis I've seen (of which this piece by Tom Payne is probably the best example) focuses on the attacking value of a passing goalkeeper can bring, but the evidence for this argument mostly focuses on increased possession of the ball. As anyone vaguely familiar with Louis Van Gaal's tenure at United will tell you, possession is not synonymous at all with a high-functioning attack. Instead, to me the value lies entirely in the retention of possession, which certainly impacts both attack and defense, but does not necessarily increase the efficiency of either.
There's little doubt that Pep's reliance on Bravo's ability with the ball at his feet has increased Manchester City's dominance of possession. The team averaged a very respectable 57% possession last year, good for 5th in the Premier League. This season so far that figure has jumped to 65% to lead the league. Bravo is certainly not the sole reason for this (and City's weak early schedule is probably inflating the number somewhat), but he has certainly been a big factor. Per WhoScored's passing data, Bravo passing success percentage is 79%, whereas Hart's last season was just 53%. Though Bravo has more total passes, he still turned over the ball via a misplaced pass 1.7 fewer times per 90 minutes than Hart did last season. Those differences add up, and over the course of the season leads to a much larger possession figure.
Given that an emphasis on playing the ball from the back has indeed increased City's ball possession, shouldn't that have an impact on the attacking effectiveness of the team? That depends on how attacking effectiveness is defined. There has been an increase in both Shots and Shots on Target (SOT) per game this season compared to last (17.91 from 16.16 in Shots, 6.18 from 5.53 in SOT). However, if you use a per Time of Possession (TOP) basis, City are actually slightly behind where they were a season ago (105 SOT per 1000 minutes of possession this season, 108 last season). Simply put, the increased possession has given City more chances to attack, but has not really improved the effectiveness of their attack, i.e. the rate they turn possession into shots. Similarly, the Shots and SOT City allow their opponents are also down in total, but are up slightly on a per TOP basis. The result is City is now 8th in the league in SOT/TOP and 9th in SOTA/TOPA, down from 4th and 5th respectively last season.
If we consider that a focus on playing out from the back has played a role in the increased possession, the above statistics shouldn't be that surprising. The bulk of a goalkeeper's possession is far from his opponent's goal and close to his own. A look at City's pass maps (as for example this one versus Spurs) for this season have shown Bravo primarily linking with just the center-backs, and occasionally with Fernandinho when he drops in between. As a result, Bravo has played 68% of his passes to players in the defensive third of the pitch via Statszone, which is more than double the 31% average noted by Sam Jackson for keepers in the top 3 leagues in 2015-16. When your passes start from such a deep position and don't go very far, it's difficult to have that much of an influence on the attack.
The more curious finding is the lack of impact on defensive efficiency. I had assumed prior to looking at the data that City would have given up fewer chances per opponent's TOP this season, but the chances they gave up would be of higher quality. With so much passing in the defensive third, there are bound to be more errors and opponent recoveries high up the pitch, and chances resulting from counterattacks and errors are converted at much higher rates, part of the reason xG metrics value them so highly. The Premier League data don't seem to be showing this, as City's SOTA/TOPA is worse this season and City's opponents have only converted 26% of their SOT, below last year's average of 30%. Part of this may be because Bravo has been doing an excellent job passing so far, beating the average GK passing % in the defensive third. According to Statszone, only 3 of his 43 turnovers have been in the defensive third, so opponents really haven't had that many opportunities from quick turnovers. This may, once again, be skewed slightly by the fact City still have played a pretty weak schedule, and there have certainly been passing errors from other defenders (Stones against Southampton, Gundogan against Barcelona). Overall though, it suggests that is the rest of the City defense that has failed to make progress (I'm looking at you, Kolarov and Otamendi).
In the summer, a lot of people described Pep as an attacking coach and Mourinho, ever his foil, as a defensive one. However, I think it's more appropriate to think of the pair as proactive and reactive respectively. Pep's focus on ball possession allows his teams to dictate how the game is played no matter the opponent. So far, Bravo's passing from the back has clearly allowed the team to average more possession from which to base attacks, and denied opponents the opportunities to make theirs. Now if the efficiency with which possession is used could be similarly improved, City would be unstoppable.
A blog about statistics, the EPL, and Manchester City. Follow me on twitter at @HawkesTeeter
Friday, November 18, 2016
Wednesday, November 16, 2016
Small Samples vs Bad Samples
I got into a Twitter discussion recently on Chelsea's current hot streak, now up to five straight wins in the Premier League. The argument I responded to was pretty straightforward: a five-game sample is too small to draw any conclusions, and City's earlier five-game winning streak to start the season shows that such things don't last forever. It is certainly true that five games is not a large enough sample to project the rest of the season with any sort of certainty. However, not all small samples are created equal, and the failure to consider obvious factors (such as the strength of schedule) leads to an erroneous conclusion that both samples are equally flawed. Chelsea have been objectively better over this five-game sample than City were over theirs, and by just repeating the mantra of "small sample size" over and over analysts are doing their audience a disservice.
Let's take an example away from football for a minute: political polling. If you had conducted a poll of 1,000 people in Florida asking who respondents were planning to vote for in the recent Presidential election, that might give you a pretty good idea of the state of the race in Florida at the time. However, who is in that 1,000 person sample matters to the efficacy of the poll. If the poll included only Caucasians, it would not be very accurate in predicting the correct winner given the diversity of the state. If the poll was conducted of all registered voters, that might mean the respondents were less likely to go to the polls than if it was conducted of likely voters, so there would be greater uncertainty in the result. Put simply, it is not only the size of the sample that determines how effective the poll is, but how representative the sample is of the overall population. Size plays a key part in that obviously, but as noted above, other factors also play a role.
Football is very similar. There are plenty of ways any sample of a team's matches could not be representative of the season as a whole. Most obviously, if a team played weak opponents over the stretch, it's not very representative as each team must play every other team (good and bad) twice over the course of the season. Similarly, if they had an abnormally good finishing run in the sample (either G-xG or G/SOT being very high), that would also be unlikely to be replicated over the full season. If the sample contained games where the team went down to ten men, or their opponents did, that could also make it unrepresentative (I discussed this specific scenario in regards to Swansea last year). Games where an abnormally high number of own goals or penalties occurred would also be something to look at. As a result, it is disingenuous to suggest that all five-game samples have equal predictive value (or lack thereof), as that does not take into account any of these factors.
So in the case of City's and Chelsea's streaks, how representative are the samples of the overall population, i.e. the games they will play this season? In City's case, not very representative at all. Their opponents over the stretch had an average points per game of just 1.07, compared to a league average of 1.37. City also were finishing at higher than expected rates and their opponents were finishing at lower than expected rates. The average G/SOT (excluding penalties and own goals) so far this season is 30%, City were at 45% and their opponents at 25% over the period. Chelsea meanwhile played close to an average schedule in their streak (opponents PPG was 1.29) and though their finishing rates definitely are running hot (Chelsea were at 43% and their opponents at 0%), their SOT numbers more than good enough to offset that.
To that point, I have compiled the data for each Premier League team's best five-game stretch so far this season in terms of points (when teams have multiple stretches with the same point total, I chose the one with the higher Opponents' PPG). The table is shown below:
The Expected GD column is what the team's Goal Difference would be if they had finished their SOT over the sample at their rate for the season and their opponents had finished their SOT at the rate the team had allowed for the season. The xEGD column is what the team's Goal Difference would be if they and their opponents had finished their SOT at a league average rate of 30%. Chelsea lead in both columns here, suggesting that even with their finishing luck their streak has been the most impressive so far this year. This is not to say they haven't been lucky with finishing (the Actual-xEGD is 2nd largest in the set), but the number of shots on target they are producing has been very good. Also, unlike Southampton and Everton (who also rank highly), their streak hasn't been boosted by an easy slate of opponents.
A closer look at this set also confirms why hot streaks tend to happen. The average opponents PPG for these streaks is down slightly to 1.28, confirming that such streaks do tend to happen against weaker opponents, as in Stoke's most recent five-game run. The Actual-xEGD column shows that on average teams are getting a benefit of .5 in goal difference per game based on their finishing being better than league average over these streaks. The number of minutes a team's opponents play with a red card shoots up here too, accounting for 147 of the 210 total minutes a team has played with a man advantage. You can look at Arsenal's five game win streak and note the 50+ minutes against 10-man Hull and their opponents G/SOT for good examples of both. The point is for most of these streaks there are easily identifiable reasons they might be over-performing, but Chelsea aren't showing any of them.
Again, this isn't to say Chelsea will definitely win the title, nor that five games is enough to make sweeping conclusions. However, it's clear that Chelsea's recent streak contained a much more representative sample of opponents than did City's and was more impressive even when accounting for finishing. Therefore, I think it likely to be more predictive than City's early winning streak proved to be.
Let's take an example away from football for a minute: political polling. If you had conducted a poll of 1,000 people in Florida asking who respondents were planning to vote for in the recent Presidential election, that might give you a pretty good idea of the state of the race in Florida at the time. However, who is in that 1,000 person sample matters to the efficacy of the poll. If the poll included only Caucasians, it would not be very accurate in predicting the correct winner given the diversity of the state. If the poll was conducted of all registered voters, that might mean the respondents were less likely to go to the polls than if it was conducted of likely voters, so there would be greater uncertainty in the result. Put simply, it is not only the size of the sample that determines how effective the poll is, but how representative the sample is of the overall population. Size plays a key part in that obviously, but as noted above, other factors also play a role.
Football is very similar. There are plenty of ways any sample of a team's matches could not be representative of the season as a whole. Most obviously, if a team played weak opponents over the stretch, it's not very representative as each team must play every other team (good and bad) twice over the course of the season. Similarly, if they had an abnormally good finishing run in the sample (either G-xG or G/SOT being very high), that would also be unlikely to be replicated over the full season. If the sample contained games where the team went down to ten men, or their opponents did, that could also make it unrepresentative (I discussed this specific scenario in regards to Swansea last year). Games where an abnormally high number of own goals or penalties occurred would also be something to look at. As a result, it is disingenuous to suggest that all five-game samples have equal predictive value (or lack thereof), as that does not take into account any of these factors.
So in the case of City's and Chelsea's streaks, how representative are the samples of the overall population, i.e. the games they will play this season? In City's case, not very representative at all. Their opponents over the stretch had an average points per game of just 1.07, compared to a league average of 1.37. City also were finishing at higher than expected rates and their opponents were finishing at lower than expected rates. The average G/SOT (excluding penalties and own goals) so far this season is 30%, City were at 45% and their opponents at 25% over the period. Chelsea meanwhile played close to an average schedule in their streak (opponents PPG was 1.29) and though their finishing rates definitely are running hot (Chelsea were at 43% and their opponents at 0%), their SOT numbers more than good enough to offset that.
To that point, I have compiled the data for each Premier League team's best five-game stretch so far this season in terms of points (when teams have multiple stretches with the same point total, I chose the one with the higher Opponents' PPG). The table is shown below:
Team | Opp PPG | SOT | Opp SOT | SOTD | TOP % | GF | GA | GD | G/SOT | Opp G/SOT | Points | Expected GD | xEGD | Actual - xEGD |
Chelsea | 1.29 | 7.40 | 1.60 | 5.80 | 52.60 | 3.20 | 0.00 | 3.20 | 43.24% | 0.00% | 3.00 | 2.15 | 1.74 | 1.46 |
Southampton | 1.20 | 7.20 | 1.80 | 5.40 | 54.00 | 1.60 | 0.40 | 1.20 | 20.00% | 12.50% | 2.20 | 0.46 | 1.62 | -0.42 |
Everton | 1.15 | 7.40 | 2.20 | 5.20 | 56.40 | 2.00 | 0.60 | 1.40 | 25.00% | 18.18% | 2.60 | 1.06 | 1.50 | -0.10 |
Liverpool | 1.33 | 7.40 | 3.00 | 4.40 | 58.80 | 2.80 | 1.00 | 1.80 | 32.35% | 33.33% | 2.60 | 1.27 | 1.14 | 0.66 |
Man City | 1.07 | 6.40 | 2.60 | 3.80 | 65.00 | 3.00 | 0.80 | 2.20 | 40.00% | 25.00% | 3.00 | 1.46 | 1.08 | 1.12 |
Tottenham | 1.44 | 6.60 | 3.00 | 3.60 | 56.80 | 2.00 | 0.40 | 1.60 | 28.13% | 7.14% | 2.60 | 0.94 | 1.08 | 0.52 |
Man United | 1.35 | 5.20 | 3.40 | 1.80 | 51.60 | 1.60 | 1.20 | 0.40 | 28.00% | 31.25% | 1.80 | 0.15 | 0.54 | -0.14 |
Arsenal | 1.40 | 5.20 | 3.20 | 2.00 | 58.00 | 2.60 | 0.60 | 2.00 | 47.83% | 6.67% | 3.00 | 1.46 | 0.48 | 1.52 |
Watford | 1.20 | 4.80 | 3.20 | 1.60 | 44.60 | 2.00 | 1.40 | 0.60 | 39.13% | 43.75% | 2.00 | 0.46 | 0.42 | 0.18 |
Stoke | 0.89 | 5.20 | 4.00 | 1.20 | 45.80 | 1.80 | 0.60 | 1.20 | 30.77% | 10.00% | 2.20 | 0.39 | 0.36 | 0.84 |
Crystal Palace | 1.07 | 4.60 | 3.60 | 1.00 | 51.00 | 2.20 | 1.20 | 1.00 | 47.83% | 33.33% | 2.20 | 0.10 | 0.30 | 0.70 |
West Ham | 1.05 | 2.80 | 3.60 | -0.80 | 51.40 | 0.80 | 0.80 | 0.00 | 21.43% | 23.53% | 1.60 | -0.36 | -0.18 | 0.18 |
Middlesbrough | 1.56 | 3.20 | 4.00 | -0.80 | 43.60 | 0.80 | 0.60 | 0.20 | 25.00% | 15.00% | 1.20 | -0.12 | -0.24 | 0.44 |
Leicester | 1.44 | 4.20 | 5.00 | -0.80 | 47.40 | 1.40 | 1.40 | 0.00 | 25.00% | 28.00% | 1.40 | 0.05 | -0.30 | 0.30 |
West Brom | 1.15 | 3.40 | 4.60 | -1.20 | 35.60 | 1.20 | 1.00 | 0.20 | 31.25% | 18.18% | 1.40 | -0.25 | -0.36 | 0.56 |
Bournemouth | 1.45 | 4.20 | 5.20 | -1.00 | 50.60 | 2.00 | 1.40 | 0.60 | 45.00% | 26.92% | 2.00 | -0.36 | -0.36 | 0.96 |
Swansea | 1.35 | 4.00 | 5.20 | -1.20 | 50.00 | 0.80 | 1.40 | -0.60 | 15.79% | 26.92% | 0.80 | -1.11 | -0.42 | -0.18 |
Hull | 1.33 | 3.20 | 5.60 | -2.40 | 46.20 | 1.20 | 1.40 | -0.20 | 33.33% | 23.08% | 1.40 | -0.50 | -0.66 | 0.46 |
Sunderland | 1.33 | 2.00 | 6.00 | -4.00 | 39.80 | 0.80 | 1.80 | -1.00 | 25.00% | 30.00% | 0.80 | -1.27 | -1.32 | 0.32 |
Burnley | 1.53 | 2.60 | 8.20 | -5.60 | 33.80 | 1.20 | 1.40 | -0.20 | 41.67% | 12.82% | 1.40 | -0.29 | -1.62 | 1.42 |
The Expected GD column is what the team's Goal Difference would be if they had finished their SOT over the sample at their rate for the season and their opponents had finished their SOT at the rate the team had allowed for the season. The xEGD column is what the team's Goal Difference would be if they and their opponents had finished their SOT at a league average rate of 30%. Chelsea lead in both columns here, suggesting that even with their finishing luck their streak has been the most impressive so far this year. This is not to say they haven't been lucky with finishing (the Actual-xEGD is 2nd largest in the set), but the number of shots on target they are producing has been very good. Also, unlike Southampton and Everton (who also rank highly), their streak hasn't been boosted by an easy slate of opponents.
A closer look at this set also confirms why hot streaks tend to happen. The average opponents PPG for these streaks is down slightly to 1.28, confirming that such streaks do tend to happen against weaker opponents, as in Stoke's most recent five-game run. The Actual-xEGD column shows that on average teams are getting a benefit of .5 in goal difference per game based on their finishing being better than league average over these streaks. The number of minutes a team's opponents play with a red card shoots up here too, accounting for 147 of the 210 total minutes a team has played with a man advantage. You can look at Arsenal's five game win streak and note the 50+ minutes against 10-man Hull and their opponents G/SOT for good examples of both. The point is for most of these streaks there are easily identifiable reasons they might be over-performing, but Chelsea aren't showing any of them.
Again, this isn't to say Chelsea will definitely win the title, nor that five games is enough to make sweeping conclusions. However, it's clear that Chelsea's recent streak contained a much more representative sample of opponents than did City's and was more impressive even when accounting for finishing. Therefore, I think it likely to be more predictive than City's early winning streak proved to be.
Subscribe to:
Posts (Atom)