It's pretty much a given that, for any individual player, spring training stats are meaningless as a projection of what might happen when the season starts. Not that this stops us from poring over them, worrying as a starter's ERA tilts towards double figures, or cheering as a backup infielder hits .400. But as a projection of regular season performance, they are basically useless. The one exception John Dewan discovered is that
When we chose only those players doing exceptionally well in spring training, we found that about three-fourths of them performed better than their career average during the upcoming season. Our definition of "exceptionally well" was slugging 100 points higher in spring training than their previous career slugging percentage.
But is this because the sample size - many players will get less than 60 at-bats in spring - is just so small as to be useless? What if we combined the numbers and looked at spring training performance on the team level, giving us a sample size maybe 30 times as big? Do they offer any potential insight into how a team might perform during the regular season?
Firstly, here's a quote from a front-office source, regarding the general relevance of spring-training stats, and how they are seen by the teams.
"We go into the spring with a baseline evaluation of each individual player and his role within the club and organization. During the spring, we try to measure each player against this baseline. The lower that initial valuation or our certainty in regards to that evaluation, the more sensitive we are going to be to that individual’s performance in the spring. The statistics only matter in that they are a reflection of the performance that those performing on field evaluations are seeing."
What I did was collate various numbers for the sixteen National League teams over the course of spring training, and then compared them to the same numbers during regular-season play in April. It seemed likely that, if there was a connection to spring numbers, the first month would show this the clearest, as team performance will inevitably vary over the course of an entire season - a lot can happen between April and September. Here's the Google Doc with Spring and April team stats
I then worked out the correlation between the spring and April numbers. Statistically, correlation measures the relationship between two sets of data, and ranges from +1 to -1. +1 means an almost perfect match (for instance, if one set of data was 1, 2, 3, 4 and the other 2, 4, 6, 8, that'd have a correlation of +1); -1 means the data are inversely related i.e. the higher X is, the less Y is, while 0 means no apparent relationship. Now, there are limitations - these four graphs all have the same correlation, despite widely varying data, and correlation does not prove causation. But a lack of correlation would strongly suggest no connection between spring numbers and regular season ones.
- BA against: correlation 0.131
- OBP against: 0.119
- SLG against: 0.229
- WHIP: 0.136
- K/BB: 0.155
- GO/AO: 0.405
There seems to be some evidence, albeit pretty weak, in most of these categories, to suggest a slight link between performance in spring training and once Opening Day starts. Though only the GO/AO ratio reaches the level where the relationship appears to be particularly significant - if your team gets a lot of ground-balls in spring, it seems this is a trend that may continue into the regular season. Slugging percentage allowed shows a medium correlation, but the other categories barely crept over the 0.1 level, which marks the bottom of the "small correlation" range.
- BA: correlation -0.035
- OBP: -0.367
- SLG: -0.135
- GO/AO: -0.365
- SB%: 0.152
This was a surprise. While there was almost no correlation between spring and regular-season batting average, there was a negative one for on-base percentage. This means that the better teams were at getting on-base in Florida and Arizona, the worse they were in April. The poster-child for this was the Houston Astros: in spring, their OBP trailed only one team, but in the games that counted, they were abysmal, at .281, over thirty points worse than the 15th-ranked franchise. The same reverse correlation existed for GO/AO, the opposite of what we found for pitchers.
The sole area that did show a positive, albeit small, relationship was stolen-base percentage. To some extent, this may make sense, because of teams like the Padres, who explicitly expressed, early in spring-training, that they were going to be more aggressive on the base-paths this year. "We’re going to press the limits on the bases this spring so we learn what our limits are," said Dave Roberts, hired as a special adviser to the team. The top team in spring, the Brewers, at 83%, were also the best in April, going 18-1 in stolen-base attempts.
Defense and W-L record
- Fielding %: -0.131 correlation
- Defensive Efficiency: -0.214
- Stolen-base % Allowed: -0.171
- W-L %: 0.256
Again, a surprise. Any relationship between the numbers, is a negative one - the worse a team does fielding in spring, the better they do in the regular season. Any explanation for why this might be, would be welcome! However, as in most of the numbers, the correlation remains too low to draw any significant connection. There is a little more, but still marginal, connection between overall record in spring and in April - it'd have been closer, except for the Nationals turning it around, going 10-20 in Grapefruit League, then 13-10 in April. That said, I note the teams among the closest to spring training - the Marlins, Padres, Dodgers and D-backs - all had winning percentages in April less then 25 points away from their pre-season number...
More or less the same goes for team performance as for individual performance: spring training numbers offer a very limited insight into what can be expected from a team's bats, arms or defense. The only area where the 2010 numbers showed any real correlation was in the ground-out/fly-out ratio returned by pitchers - and, let's be honest, that's hardly something on which we hang with baited breath during the regular season. Please bookmark this article, and treat it as the sabermetric equivalent of a brown-paper bag, into which you can breathe when the D-backs suck at [insert aspect of game] in the 2011 Cactus League.