clock menu more-arrow no yes mobile

Filed under:

SnakePit Statistics 1.0.2: Pitching

In our previous installment, we took a look at the numbers which are the most important for hitters, and good ways to judge offensive performance. Now, we turn things around and examine pitching statistics. What do they mean? Which reflect true talent, and which are illusions, dependent on other factors? If you didn't already do so, you might want to take a look at part one first, as there's some opening principles there, which you might need to understand to get the most out of what follows.

Ready? Let's take the mound!

PITCHING NUMBERS

W = Wins.
L = Losses.
Let's start by tearing into the most holy of sacred cows: I hope you brought a fork. However, before that, we should discuss how they're decided. The basic rule is to look back from the end of the game, and see who was pitching at the time the lead last changed hands. The winning team's pitcher gets the W, the losing team's pitcher gets the L. As the winning team must be hitting in order to score, the pitcher of record will be whoever got the last out for them.

A couple of kinks to note. It's possible to get an L after leaving the game: if you leave a man on base, and he comes around to become the go-ahead run, it's still "your" run, and so you'd get the L, if the lead doesn't change hands thereafter. Also, a starting pitcher must go five innings to get a W, even if he leaves with his team ten runs up. If he fails to complete five frames, the official scorer gives the W to whichever reliever was most effective, in their opinion. [Except at Coors, where the W must be awarded to Troy Tulowitzki]

With that out of the way... I wouldn't say wins are completely meaningless, but they are a poor guide to a pitcher's ability, because they depend on two factors outside of their control. Firstly, what happens after they leave the game. If the bullpen blows a lead, no matter how well you threw, you get a no-decision at best. Secondly, to get a W, your offense has to score more runs than you allow. Witness the June 2 game last year: Edwin Jackson threw shutout baseball for nine innings against the Dodgers, but as the Diamondbacks couldn't score either, he got a no-decision.

All told, the decision a pitcher gets may be only 50% dependent on his own performance. You can be a mediocre pitcher on a team that scores a lot, and you'll rack up the W's. Just ask current D-backs pitching coach Charles Nagy. In 1998, he had an ERA of 5.22 - but he picked up fifteen wins, because the Indians scored 8+ runs in ten of his starts, in all of which Nagy got the W. On the other hand, in 2004, Brandon Webb had a 3.59 ERA - much better than Nagy - but Webb's record was 7-16, since 14 times with him on the mound, we managed to score two runs or less. unsurprisingly, Brandon's record was 1-11 in those games.

If you want proof of the limited relevance of wins, over 2003-04, a pitcher had a record of 36-16. He was signed to a four-year deal after the latter season, causing David Pinto to write at the time, that the franchise in question "are buying [the pitcher's] wins, not his ability." He was dead right, as the pitcher went 5-16 with a 7.00 ERA for that team, and they ended up eating $22 million of his contract in June 2006. In case you haven't worked it out, the pitcher in question was Russ Ortiz, and the team was the Diamondbacks. Yeah, relying on wins as a measure of effectiveness can be extremely risky.

SV = Saves. The save was invented in 1960 by baseball writer Jerome Holtzman, largely because of reliever Elroy Face of the Pirates. Face had set an all-time relief record by going 18–1 the previous year; however, in most of those wins, he blew the lead, but got the W when Pittsburgh regained it. Jerome decided the game needed a better measure of relief pitching, and came up with the Save. Every closer who has received a big pay-day since the Save's official adoption in 1969 [becoming the first 'new' major stat since RBI in 1920], should be cutting a check to Holtzman or his estate.

The Save can only go to the last pitcher used by a winning team, though you can't get a W and a Sv in the same game. The pitcher can qualify for a save if he keeps a lead intact, when any of the following three conditions apply:
a) He enters the game with a lead of no more than three runs and pitches for at least one inning
b) He enters the game with the potential tying run either on base, at bat or on deck
c) He pitches for at least three innings.
The last is how Wes Littleton of the Rangers still got credited with a save, in the game where Texas beat Baltimore 30-3.

One problem with the save is it treats all situations equal: three-run lead, bases empty gets you the same save as the bases-loaded in a one-run game. It also has led to team's using their best pitcher solely in the ninth, even if the most critical game situation happens earlier, which is why one writer said that the closer "is the only example in sports of a statistic creating a job." Much like wins, the number of save chances - and thus, saves - a closer gets depends on his team-mates being involved in close contests. When Jose Valverde lead the NL in saves in 2007, it was in part because no team in the league was involved in more one-run games than the Diamondbacks.

It has been suggested that teams should be more flexible, using their best arms when the need arises. However, the potential for blowback if such an approach doesn't work out is huge, and almost every team will take a more defined approach. As Jeff Passan put it, modern relievers "are bred to believe the bullpen is a class system, from mop-up guy to long man to lefty specialist to set-up man to closer. Every pitcher is given a role, and it becomes a self-fulfilling prophecy." In support of this, it may be easier for a reliever to work, knowing his role, than if it's a case of "anybody, any time."

Let's quickly go through the other pitching numbers shown at baseball-reference.com, most of which should need little or no explanation.

G = Games in which pitcher appeared.
GS = Games started.
GF = Games finished.
CG = Complete games.
In a rain-shortened game, this can be less than nine innings, as long as no relief pitcher is used.
SHO = Shutouts. Complete games in which the pitcher does not allow a run, earned or unearned.
IP = Innings pitched.
H = Hits allowed.

R = Runs allowed.
ER = Earned runs allowed.
To decide if a run is earned or not, you play "what if?" for the inning: specifically, "what if no errors had occurred?" If the run would have scored anyway, it's earned. If not, it's unearned. For instance, if an error allows a batter to reach, and the next guy hits a home-run, the first run across home-plate is unearned, and the second would be earned. If there's an error earlier on, all runs scored with two outs are unearned, since the inning "should" have been over.
HR = Home runs allowed.
BB = Walks allowed.
IBB = Intentional walks allowed.
SO = Strikeouts.
HBP = Batters hit by pitches
.
BK = Balks.
WP = Wild pitches.
BF = Batters faced.

QS = Quality Starts. To get around the run support issue, the idea of a Quality Start was introduced. This is a game where the pitcher goes six or more innings, and allows three earned runs or less. Basically, it's a measurement of whether or not you kept your team in the game. Last year, 53% of starts in the NL qualified, so if a pitcher is doing better than half, he's likely above average there.

PITCHING STATISTICS

ERA = Earned Run Average. This is the average number of earned runs a pitcher allows per nine innings of work. You can calculate it by multiplying earned runs by nine, and dividing by the number of innings thrown. It's a pretty good measure of effectiveness, but it is subject to external factors. The parks in which you pitch plays a significant role; the defense behind you is important (not every non-out is an error); and if your reliever allow the runners they inherit from you to score, that will inflate your ERA too.

Of pitchers with 50+ innings last year, the median was close to Ian Kennedy's 3.80 - the overall league ERA was a little higher, at 4.02, because if you're pitching less innings, there's a good chance it's because you suck. An ERA of 2.75 would put you in the ninetieth percentile of qualifying starters, and better than three would be the top quartile. The bottom quartile is at 4.15, but again, that's a self-selecting sample, as if you're worse than that, you probably won't get to throw the 162 innings needed to qualify. Perhaps more usefully, here's a chart from a study we did in 2008, looking at the typical ERAs by a #1, #2, etc. starter in the NL, and their W-L records:

ERA range W-L
#1 0.00-3.33 15.5-7.7
#2 3.33-3.96 12.8-10.0
#3 3.96-4.35 10.6-10.7
#4 4.35-5.15 9.9-12.8
#5 5.15+ 7.0-14.8

ERA+ = Adjusted Earned Run Average. If you read part one - and I trust you did - you'll remember OPS+, which is OPS adjusted for park factors, and compared to league average. ERA+ serves the same purpose for pitchers: it takes ERA, adjusts based on where you pitch, and then for the time and league in which you pitched. The latter is important, as some years are more pitcher-friendly than others. For instance, MLB lowered the height of the pitching mound from 15 to 10 inches before the 1969 season, tilting the balance back to the batter. The NL ERA jumped more than 20% that year, having fallen below three in 1968, for the only time since the dead-ball era.

Obviously, when league average is three, an ERA of three is less impressive than if league average is four, as it was last year. ERA+ takes that into account: a figure of 100 is at league average, higher than that indicates a lower adjusted ERA. Daniel Hudson set the franchise record for a pitcher with 50 innings pitched last year, posting an ERA+ of 251, beating Byung-Hung Kim's 225 from 2002. Among those who played a full season for Arizona, the best number is Randy Johnson's 197 ERA+, also in 2002 - he's also second, third, fourth and fifth on that list, with his 2001, 1999, 2000 and 2004 campaigns respectively.

WHIP = Walks + Hits per Inning Pitched. This is a more direct measurement of pitching effectiveness. On the one hand, while errors don't count, WHIP continues to accrue the rest of the inning, regardless of what happens. On the other, it does treat all hits as equal and doesn't include hit batters. Overall, there is a pretty good correlation between WHIP and ERA, as the following diagram shows:

Whip_era_medium

Strikeout pitchers can sometimes survive a higher WHIP better, because they can then get outs without moving runners over; those who pitch to contact are more likely to see RBI groundouts or sacrifice flies. Overall, WHIP has a correlation to player value (WAR, which we'll get to next week) of 0.481; ERA comes in at 0.528 and xFIP (see below) at 0.670. So it's best to look at WHIP as one tool to measure pitching ability: if ERA and WHIP are both low, the guy's a good pitcher. If they disagree, time to look closer.  What counts as "low"? The NL WHIP last year was 1.35; if a pitcher is getting anywhere close to one, they're in the elite, while anything above 1.50 is problematic.

H/9 = Hits per nine innings.
HR/9 = Home-runs per nine innings.
This is something to look at as a possible indicator of potential problems with a pitcher at Chase Field. League average is 0.92 per nine IP, but Yusmeiro Petit was more than double that (1.91) in his time with Arizona. It was an inability to curb that insane home-run rate which punched his ticket out of Arizona.
BB/9 = Walks per nine innings. The 2010 NL average was 3.3.
SO/9 = Strikeouts per nine innings. The 2010 NL average was 7.4: if you're getting up towards nine (which means one strikeout per inning), then you're likely dominating opposing hitters.
SO/BB = Strikeout to walk ratio. Also referred to as K:BB, this is one I like to look at, especially for pitchers coming up in the farm system. The league average last year was 2.23, and for top prospects, I would like to see something around three - Jarrod Parker's minor-league number is 2.99, for instance.

EXTRA CREDIT

BABIP = Batting Average on Balls in Play. It goes somewhat counter to logic, but after a pitcher has thrown a pitch, the batter hits it, and it doesn't leave the park, what happens then is out of the hands of the pitcher. It's in the hands of Lady Luck and his defense, and the former tends to average out over time: three out of ten balls in play become hits, largely regardless of the pitcher. See Randy Johnson, career BABIP = .295, compared to Yusmeiro Petit, career BABIP = .301. The further a pitcher's BABIP is from average, the more likely it is to regress to the normal, and take his other numbers, like ERA, with it.

For instance, Mat Latos's September struggles for the Padres might simply have been regression biting. Through September 7, he had a 2.21 ERA, but a .243 BABIP. Over his final five starts, he had an 8.18 ERA, thanks largely to a .438 BABIP. That brought his season number up to .275; still a little low, so I would expect more regression in 2011, taking his ERA above three, though he'll still be pretty good. For the Diamondbacks, players who might suffer the same this year are Daniel Hudson (BABIP: .217) and Barry Enright (.254), while Joe Saunders (.315) could get a bit of help there.

FIP = Fielding Independent Pitching. This number uses only the things a pitcher truly controls - walks, home-runs and strikeouts - and combines them into a number which is on a similar scale to ERA. It helps avoid the vagaries resulting from BABIP, and has been shown to be a better predictor of future performance than straight-up ERA. There are some variations in the specifics of calculation - I refer you to here, if you're all that interested. You won't find it on baseball-reference.com, so need to go to the other Holy Grail of numbers, Fangraphs, to find it. They also list pitchers with the biggest gap between ERA and FIP - ones who may well regress going forward.

Clay Buchholz of the Red Sox leads all pitchers: his ERA last year was 1.28 runs better than FIP expected, so don't be surprised if his ERA this year is 3.50 or above. Conversely Colorado's Jason Hammel was the "unluckiest" pitcher, and might be expected to shave more than a run off his ERA in 2011 - if he performs as he did in 2010. The last part of that sentence is important, since just as with stocks and shares, past performance is no guarantee of future success, or lack thereof.

xFIP = Expected Fielding Independent Pitching. It's like FIP, but it replaces a pitcher's home-runs with an "expected" number, based on league average. This is helpful, because home-run rates are volatile: a pitcher may allow a home-run on 12% of fly-balls one year, then turn around and only allow 7% the next year. That is almost impossible to predict, so xFIP attempts to correct for that. As extremes tend to regress towards the norm, it is statistically a better predictor of the future than FIP. However, some pitchers do have a history of a lower or higher home-run rate, and that should be taken into account when looking at FIP.

GmSc = Game Score. This was invented by Bill James, to give a single number you could look at, as a measure of how well a starter pitched, rather than having to take into account walks, hits, runs and strikeouts. Here's how it's calculated:
1. Start with 50 points.
2. Add 1 point for each out recorded, so 3 points for every complete inning pitched.
3. Add 2 points for each inning completed after the 4th.
4. Add 1 point for each strikeout.
5. Subtract 2 points for each hit allowed.
6. Subtract 4 points for each earned run allowed.
7. Subtract 2 points for each unearned run allowed.
8. Subtract 1 point for each walk.

The highest-ever score in a nine-inning game was Kerry Wood's 20-K performance for the Cubs, which notched a 105. Randy Johnson's perfect game was a 100, while his 20-K game was a 97. He and Curt Schilling own the top nine performances in Diamondbacks history: Brandon Webb's one-hitter against the Cardinals in 2006 is tied for tenth, on 90. The best one last season was a tie, between Ian Kennedy's seven innings of one-hit, 12 K ball vs. San Diego, and Edwin Jackson's imperfect perfecto, both of which were an 85.

Hd = Holds. Perhaps the most useless statistic ever invented: it's not an official MLB one, and the definition used is not fixed [some sources require a reliever to retire at least one batters, others don't]. Basically, it's saves for the middle innings: a reliever comes in to a save situation and leaves with his team still ahead, he gets a hold. He can even be tagged with a Loss and get a Hold in the same game, if a subsequent reliever allows inherited runners to score. Wonderful...