/cdn.vox-cdn.com/photo_images/6412788/20120408_mjr_su5_021.jpg)
I'll admit that sometimes I'm jealous of other franchises, only because I'm a history geek. I wish the Diamondbacks had fifty or a hundred years of stories and data to dig through, instead of the very limited 15 seasons Arizona actually has. I've seen every player, just about, that has gone through the team, and although my memory isn't the clearest about every moment, I was there for almost all of it. There's little mystery to a team that started its life when you're at the end of grade school; I grew up as the team did.
One thing that is nice, however, is the constant sense of discovery when even minor milestones are reached. Things haven't become ho-hum or rote. It's still pretty cool when the team wins the division, or a pitcher gets 20 wins. The feeling isn't exclusive to new teams, as the Mets are still waiting for a no-hitter of any kind, but I would imagine most of the glow has burned off.
This opening weekend the Diamondbacks did something they've only done once before: swept the opening series. The previous opening sweep was in 2000. We shouldn't celebrate too much, the conventional thinking goes, and I'm prone to agree with the thought. There were wonderful things from Opening Weekend, and ugliness to be forgotten, but a series is not a season.
It did pique my interest (by way of BattleMoses' suggestion) on how well teams do after starting the season with a sweep. Join me after the jump to find out, but you might be surprised at what you find.
It seems ridiculous that an opening series has any bearing on final standings. It's only three games, games that aren't weighted any more than any other set of games. The idea might seem crazy on the surface, but science(!) wouldn't get anywhere if we didn't attempt to prove or disprove even the most obvious of relationships.
A working theory of why an opening sweep might be predictive of a better than usual season is that an opening sweep is not particularly common (I found 107 series since 1980 with 3 or more wins to start the season), and that a particularly good team is more likely to sweep an opponent than a terrible team. It takes a certain amount of luck to sweep, yes, but it also takes talent and consistently to sweep. So it should follow that a team with a long streak to start the season will have an even greater chance at ending the season with a good record.
To test these hypotheses, I created a set of teams that opened a season with a win streak of 3 or more. 3 is the minimum, because that means the team swept (or did the equivalent of a sweep, if they only played a 1 or 2 game series to start). I eliminated 2 teams from the set from 1981 as that season was strange for strike-related reasons. From 1980 onwards I was able to make a sample set of 107 series that fit the criteria.
One thing I also noted was how many wins above the baseline 3 the team won, so a team with a 4 game win streak was +1. I figured that a team with more wins above the baseline would be more likely to end the season with a winning record for reasons discussed above. Once I had the data collected, I created three different t distribution tests, one for the entire sample, one for win streaks of 3-4, and one for streaks above 4.
For the null hypothesis I set .500 as the neutral win percentage. Anything above .500 increases the likelihood of reaching the playoffs, while less than .500 would decrease the chances. The t-test would test whether the sample's mean was significantly different than .500, and if it was it would suggest (assuming no errors) that starting the season with a win streak increases the likelihood of ending the season above .500. This is not a predictive test.
For the entire set, I found a statistically significant relationship between starting the season with a streak. I'll present the information in a table:
For the set ALL, the statistically significant range would have been a t < 1.66, whereas the t-score generated was well above this. For set STRK3-4, the significant range would have been t < 2.63 (I'm using a one-tailed test with P = .005), so it isn't nearly as significant as ALL, though would have been with a more forgiving test. For set STRK>4 the significant range would have been t < 2.75, so the set is well beyond that.
What the hell does any of that mean?
Okay, the math is over. What it means is that there's reason to believe a streak at the beginning of a season indicates a winning season overall. It's most significant, however, when we separate out for win streaks of 5 or more, which is pretty rare. I only had 29 in the sample from 1980-2011, but even the normal small sample size disclaimers aside, I feel pretty confident that it's mainly good, well-balanced teams that start seasons with streaks greater than 4.
Does this mean the Diamondbacks will have a winning season this year? Who knows? I think this gives me a little more confidence for a good year, but there are reasons to be suspicious of the above data (I'll spare you the details. For the other people reading this, yes I understand the potential problems the above tests had. This was for fun, only).
Not to jinx anything, but I think I'll feel a lot better if the Diamondbacks win the next two games against the Padres.