/cdn.vox-cdn.com/uploads/chorus_image/image/54203311/465748932.0.jpg)
Today, we shall begin our dive into Sabermetrics. Many of the readers here commented on my first post and stated that they were interested in learning more about Sabermetrics, several of which also were interested in learning “the basics”. I think it will be good to start from the beginning, with both hitting and pitching stats. It will certainly be useful to have this post to reference in the future.
This week, I will go through the basic stats in offense and give a bit more detailed understanding about what they actually mean. I won’t go over the counting stats (R, RBI, 2B, 3B, XBH, etc.) but I will touch the stats that are a form of measurement/analysis (and yes, this includes batting average). Hopefully I can help some of you take the next step from being HR and RBI counters to doing basic hitting analysis!
For those that are less familiar with Sabermetrics, the “Forumla”, “What does it actually mean”, and “Is this useful? If so, how?” in each of the stats below are geared for you. I tried to simplify these sections as much as possible to try and make each stat more understandable and applicable.
For those with a better understanding of Sabermetrics, then the “Deeper Dive” sections going into more detail of the math, function, and relatability of the stat to other stats. Feel free to skip these sections if they seem too complicated to you.
Basic Hitting Stats and Metrics
Batting Average (BA)
Formula: H/AB
What does it actually mean: The rate at which a batter gets a hit anytime they come to bat and don’t walk, get hit by a pitch, sacrifice, or reach on interference.
Is this useful? If so, how? Not really. Batting average was long considered the “premier” indicator of batting talent, but deeper analysis has shown it to be pretty trivial. Batting average has two glaring flaws: it does not account for walks and it treats all hits to be of the same value. Avoiding outs is a far more important skill in baseball and batting average only captures a portion of this skill. Also, the missing run value of the equation is important, as a .300 hitter that hits lots of doubles, triples, and home runs is FAR more valuable than a .300 hitter that only hits singles.
Deeper Dive: Batting average is primarily just the end result of a player’s K%, BABIP, and HR’s per AB (HR/AB). K% and BABIP are generally a much better measure of a player’s talent and power is captured within slugging/ISO, both of which are better measures when combined with other stats.
On-Base Percentage (OBP)
Formula: (H+BB+HBP) / (AB+BB+HBP+SF)
What does it actually mean: On-base percentage is essentially the rate at which a batter avoids making outs.
Is this useful? If so, how? On-base percentage is extremely important. So important, in fact, that a book was written basically because of it. Going through the value of OBP could take up a post or three, but the basic reasoning is simple: unlike the other major American sports (NFL, NBA, NHL), where time is the limiting factor in a game, each baseball game is limited to 27 outs per team. In the other sports, where efficiency is one of the most important traits to measure a player (in other words, a player’s value with respect to time), not making outs is the “king” of baseball (in other words, a player’s value with respect to outs).
Let’s use an exaggerated example to illustrate our case. Let’s say that Team A has a lineup of 9 players with exactly .400 OBP while Team B has a lineup of 9 players with exactly .300 OBP. This means Team A makes 0.6 outs per plate appearance (outs/PA) while Team B makes 0.7 outs/PA. So, in a standard 9-inning game:
Team A: (27 outs) / (0.6 outs/PA) = 45 PA
Team B: (27 outs) / (0.7 outs/PA) = ~39 PA
Six extra base runners in a single game is a pretty large difference. This doesn’t mean that Team A will when every time, as the value of each plate appearance will vary (e.g. lots of home runs vs. lots of singles) and the sequencing/order of the events is also important (5 hitters getting on base in a row is far more valuable than 5 hitters getting on base over 5 innings). But, generally speaking, Team A will have a pretty significant advantage over Team B (a team with .400 OBP is expected to average 7.0 runs/game while a team with .300 OBP is expected to score ~3.6 runs/game).
Deeper Dive: On-base percentage is pretty much a product of batting average and BB%, though not exactly. You can generally simplify the denominator to PA (which excludes sac bunts and catcher’s interference, but those numbers are generally very small), which gets you:
OBP = (H+BB+HBP) / PA
Which you can then break apart:
OBP = (H/PA) + (BB/PA) + (HBP/PA) ~=~ BA + BB% + HBP rate
Generally, HBP rate is very low (0.8% in 2016), so let’s throw it out:
OBP ~=~ BA + BB%
Note that this isn’t exactly perfect, as it counts H/PA instead of H/AB, but it’s a close approximation of batting average. Using the simplified approach, H/PA = H/(AB+BB), we see that the source of the discrepancy is essentially a player’s walk total, so a player with a low walk total will have H/PA very close to H/AB whereas a player with a high walk rate will have a H/PA that is smaller from H/AB (with this difference being captured in the BB%). In the event of a player with 0% walk rate, OBP = BA in this simplified approach.
In the end, this difference isn’t ground breaking as this is a simplified equation. The point is to show that BA and BB% are the primary contributors to OBP and that the batting average component will vary from 70-100% of their true batting average, depending on the player’s walk rate.
BA will come back up in OPS.
Slugging Percentage (SLG)
Formula: (Total Bases) / AB or (1B + 2x2B + 3x3B + 4xHR) / AB
What does it actually mean: Slugging percentage is the average number of bases that a batter averages per at bat.
Is this useful? If so, how? SLG is fairly useful as a measure of a player’s power. It is essentially batting average that is modified to value each type of hit per the number of bases acquired in each hit. In this context, it is a better measure of a player’s talent than batting average by attempting to assign a value to each type of hit (which was one of the two flaws of BA mentioned above). Through data analysis, the actual run values of each hit have been computated and are included in the wOBA stat (1B = 0.89, 2B = 1.27, 3B = 1.62, HR = 2.10). Generally speaking, ISO is a better measure of a player’s power than slugging.
Deeper Dive: To help understand the fractions, here is a breakdown to see how slugging differs from BA:
SLG = (H/AB) + (2B + 2x3B + 3xHR) / (AB) = BA + ISO
I bring up BA, again, because I want you to see that OBP and SLG, two of the most predominant “acceptable” stats in baseball (especially when added together in OPS), are essentially derivatives of batting average, with the point of addressing its two flaws (run value of hits and avoiding outs).
Isolated Power (ISO)
Formula: (Total Bases - Singles) / AB or (2B + 2x3B + 3xHR) / AB or SLG - BA
What does it actually mean: Isolated power is the number of extra base hits bases a batter averages per at bat.
Is this useful? If so, how? ISO is essentially the next level of slugging and is generally the most useful power stat that’s not exit velocity/launch angle. ISO is useful because it takes the batting average and singles portion out of SLG (which is why it’s 1x2B, 2x3B, and 3xHR - it removes “1 base” from doubles, triples, and homers as that “base” is being calculated in batting average) and focuses strcitly on extra base hits.
Deeper Dive: The BA component in SLG makes SLG a less desireable metric because slugging can’t differentiate between a player with high BA and low ISO versus a player with low BA and high ISO. Generally speaking, assuming the same OBP, the player with the low BA and higher ISO will be more valuble than the player with a high BA and low ISO.
ISO isn’t perfect, however. For instance, the run values for each hit are not accurate (a triple is not worth 50% more than a double, for example). Furthermore, comparing two players with identical ISOs isn’t perfect: a player with a .100 BA and .300 SLG and a player with a .300 BA and .500 SLG both have .200 ISOs, but the second player is clearly a superior hitter. Conceptually, the first player essentially gets 1 hit every 10 AB but that hit is always a triple whereas the second player gets 3 hits every 10 AB but could hit two doubles and a single in those three hits.
On-Base Percentage + Slugging (OPS)
Formula: OBP + SLG or
(H+BB+HBP) / (AB+BB+HBP+SF) + (1B + 2x2B + 3x3B + 4xHR) / AB
What does it actually mean: OPS is literally On-Base Percentage + Slugging Percentage, but unfortunately there is no conceptual or physical meaning to OPS, due to the differing denominators.
Is this useful? If so, how? OPS is somewhat useful but that mostly because of how widely accepted and common it is rather than how accurate the stat is. Luckily, OPS is relatively accurate enough to keep its place in the mainstream. The main issue that Sabermetricians have with OPS is that it effectively treats OBP and SLG equally, whereas OBP is generally considered to be about 1.8 times more valuable (e.g., a .400 OBP is equivalent to a .720 SLG).
However, OPS captures the overall goal mentioned above regarding batting average: measure a player’s ability to avoid outs while properly assessing the run value of each hit. And in this case, generally speaking, the higher the better. There are variations to OPS that can make it better (OPS+ or wOBA) and we will tackle those next week.
Still, OPS is a good “gateway” statistic to get people thinking beyond traditional stats. It’s a better measure of a player’s talent than batting average, home runs, or RBIs, for example.
Deeper Dive: The true biggest issue with OPS is that OBP is based on PA and SLG is based on AB and you can’t really add two fractions with differing denominators. That is why I simplified OBP above (from PA to AB) to make the next discussions simpler.
The next biggest issue with OPS is that 1 point of SLG is not equivalent to 1 point of OBP. Now we’re double dipping into some of our stats and using incorrect run values. Here is the formula:
OPS = (H+BB+HBP) / (AB+BB+HBP+SF) + (1B + 2x2B + 3x3B + 4xHR) / AB
Now, let’s replace it with the simplified OBP and broken-out SLG formulas above:
OPS ~=~ (H/AB) + (BB/PA) + (H/AB) + (2B + 2x3B + 3xHR) / (AB)
And condense (ignoring dissimilar denominators):
OPS ~=~ 2xBA + BB% + ISO
And now we start to see why this stat gets a little funky. For no real reason, batting average is essentially getting counted twice. And again, ISO pops back up but we’ve already discussed the issues with the run values used in its calculation. However, with these two factors (2x BA, incorrect run values with slugging), the end product is that OPS ends up overrating players that hit for power while underrating players that avoid outs.
Here is a breakdown of the regression modeling used to derive the weights of OBP vs. SLG.
Batting Average on Balls in Play (BABIP)
Formula: (H-HR) / (AB-K-HR+SF)
What does it actually mean: The batting average for all of the balls a player hits into the field of play (home runs are not considered the field of play).
Is this useful? If so, how? BABIP is a very useful metric - it is a much better accurate measure of a player’s talent for generating good contact than batting average. BABIP is affected by three things: the defense, “luck” (which is really just another word for natural variation), and a player’s hitting talent.
We have a ton of data on BABIP over the years with baseball and we see that BABIP tends to average right around .300. We also know that given large enough samples, the best players typically aren’t able to exceed around .370-.380 (nor the worst hitters drop below .230 or so). This is important because it allows us to help make better assessments on a player’s talent than by looking at their BA.
Many people don’t like when “luck” is brought up, especially when it comes to BABIP. But natural variation is an extremely powerful force, especially when dealing with small samples. This is why sample size is vitally important and why you never see players exceed the ranges above over large samples - because it’s just not possible.
So, there we have it. Congrats, if you’ve read this far, you’ve now passed Sabermetrics Hitting 101! Tune in next week for Sabermetrics Hitting 202!
Edit: Corrected description for ISO.