Navigation: Jump to content areas:


Pro Quality. Fan Perspective.
Login-facebook
Around SBN: Dallas Cowboys Projects: Aston Whiteside

SnakePit Statistics 1.0.1: Hitting

Juan Miranda seeks to improve his OPS.

If you've been around the site long enough, you'll know there is a blizzard of abbreviations and stats which get thrown around by participants as they try to show one thing or another. It was suggested, and it seems a good one, to have a primer for the most common statistics used, so that those just dipping their toes in the water will have some idea what's being talked about. Hence, this series of articles, which will cover the main numbers you'll see. In the first part, we look at hitting stats; the second will cover pitching ones; and the third will tidy up anything left over, such as fielding stats, WAR, and so on.

Questions, comments, etc. are particularly welcome, so if anything is not clear, please ask and our crack team will respond. You will, however, not be tested on this at the end of semester...

Star-divide

We should start with a few statistical terms that may crop up. Average can actually mean one of three things: the mean, the median or the mode. The mean is the sum of the data, divided by the number of items. The median is the data points that's in the middle, when you sort the data. The mode is the most common value. For example, if you had the following data:
    1, 2, 2, 2, 3, 3, 4, 5, 6, 7, 9
The mean would be (1+2+2+2+3+3+4+5+6+7+9) / 11 = 4. The median would be 3, because there are five items higher or equal to three, and five lower than or equal to it. The mode is 2, because there are more twos than any other number. In most cases, when we say "average," we mean the mean. Er, as it were. :-)

Correlation measures the connection between two sets of data, and varies from -1 to +1, and zero means there's no connection detectable. Say, height and weight. They are kinda linked, because taller people will weight more, but it's not perfect, as there are tall, skinny people and short, fat ones. Correlation for that might be 0.7. A negative correlation means that as one set of data increases, the other decreases. Say, temperature and amount of clothing worn - the warmer it gets, the less people put on.

Percentiles and quartiles. Knowing that .269 is the average batting average is useful, but how good is a .300 average? Well, you can plot batting average last year and find out that 90% of hitters will bat .299 or less. Put another way, .299 marks the ninetieth percentile for BA. If you bat above that, you're in the top ten percent of hitters (for batting average, at least). You may also see the 25th percentile referred to as the bottom quartile and the 75th percentile as the top quartile, as they mark the boundary for the bottom and top quarter of data respectively.

Sample size. The bigger the sample size, the more accurate results will be: over a short span, luck can induce wild variations. If you flip a coin ten times, the odds of getting seven or more heads is 17.2%; but if you flip it 100 times, the odds of 70 or more heads is virtually zero. Same with batting. A .250 batter has a 15% chance of hitting .300 in 100 at-bats, purely by luck; over 500 ABs, however, the odds are less than one in a hundred. Knowing what counts as a meaningful sample size will help you work out whether a number means anything.

baseball-reference.com.  This site is the Mecca for baseball statistics, containing just about everything you could want, broken down in about a billion different ways. For our articles, we'll be mainly using as an example the 2010 Diamondbacks page, which gives you all the numbers for our players. We'll explain the numbers found there, tell you which ones are more important than others, and what values a good player should be putting up. Some will need more explanation than others; some will be a single line or less.

Still with me? Here we go.

BATTING NUMBERS

G = Number of games played. Doesn't matter whether you started, came in later or even if you got to bat. If you're announced, chalk one up. Just ask Robin Yount's brother Larry, a pitcher who injured himself while throwing warm-up tosses in his major-league debut, and never got to play again. He's still listed officially as having appeared in one game.

PA = Plate appearances. A player is credited with one of these, each time he completes a turn batting. Whether he walks, gets a hit, is hit by a pitch, makes an out: it doesn't matter. It's all counted as a plate appearance.

AB = At-bats. These are a restricted version of PAs - some PAs don't count as an at-bat. The most common cases which aren't counted are when the batter walks, is hit by a pitch, or puts down a bunt or hits a fly-ball which advance a base-runner. Those are all counted as PAs, but not ABs.

R = Runs. Every time a player crosses home-plate. They aren't much use as a measure of a player's own skill, because they are too dependent on other factors. Virtually, the only time a player will score a run on his own is with a homer; otherwise, he depends on something else happening, e.g. hit, wild pitch, etc. to bring him home. And all runs are "equal": if you get plunked and the next guy hits a homer, you get one run, exactly the same as if you tripled then stole home. Which hardly seems fair, does it?

H = Hits.
2B = Doubles.
3B = Triples.
HR = Home Runs.
Largely self-explanatory. Hits are good, extra-base hits [doubles, triples and home-runs] are better. Double and triples can be an indicator of a player's speed, but you should use caution, as the park in which you play can heavily affect these, just as they affect home-run numbers. Chase Field, for example, saw 46 triples hit there last year. That's second only to Coors in the NL (50) and more than three times as many as the fifteen hit at Dodger Stadium.

RBI = Runs Batted In. You'll also see RBIs, though technically that's wrong, since RBI is already plural. Anyway, grammar Naziness aside, when a run is scored as the result of a player's action, they are credited with an RBI. If they get a hit, sacrifice fly, or walk which leads to their team scoring, they get an RBI. If they ground out and a run scores, they get credit for that too. You do not get an RBI if you ground into a double-play, or if the run scores as the result of a fielding error.

RBI, along with batting average and HR, form the Triple Crown, a very rare feat earned by a player when he leads his league in all three categories: it hasn't been done since Carl Yastrzemski of the 1967 Red Sox. However, the same goes for RBI as for Runs: if you come up with the bases empty, the only way to get an RBI is with a home-run. But if the bases are loaded, a bloop single could get you twice as many. A player's RBI number is largely determined by how good his team-mates are at getting on base in front of him. So while RBI are nice, exercise caution in using the number as proof of greatness.

SB = Stolen Bases.
CS = Caught Stealing.
It's important to look at both numbers, because few things are worse than getting caught - you do the hard part, by getting on base, then give the opposition an out. The break-even point is very high: overall, you need to succeed about 70% of the time to have a positive impact [it varies from situation to situation: if you're down by one run with no outs in the ninth, it's less]. Washington's Nyjer Morgan stole 34 bases last year, tied for third in the NL. But he was caught 17 times, so succeeded only 67% of the time, and overall, probably hurt the Nats more than he helped. In general, SB must be more than twice CS, at the very least, to be considered positive.

BB = Bases on Balls.
SO = Strike-outs.
The "do strikeouts matter?" argument is an interesting one, too deep to get into here. The case against can be found in a previous exploration of the topic, but shoewizard also wrote about the counter-argument, suggesting too many strikeout-prone players on one team is dangerous. Overall in the 2010 National League and excluding pitchers, batters struck out a bit more than twice as often as they walked (2.09), with the D-backs' ratio 2.44. On an individual level, two qualifying NL batters (502 or more PAs) last year had more walks than K's: Albert Pujols and Jeff Keppinger; Ronny Cedeno was the sole man with four times as many K's as BB (106:23).

TB = Total Bases.
GDP = Double Plays
Grounded Into.
HBP = Hit By Pitch.
SH = Sacrifice Hits.
SF = Sacrifice Flies.
IBB = Intentional Base on Balls.
Just to tidy up, these are the minor categories listed, but you won't find them used very often in statistical argument.

BATTING STATISTICS

BA = Batting Average. Hits divided by at-bats. Simple, huh? The most well-known mark of hitting skill, and in the 2010 NL, for players with 150 PAs or more, it ranged from .336 (Carlos Gonzalez) to .181 (Garret Anderson), with the median Emilio Bonifacio's .261. If you hit .299 you'd be at the ninetieth percentile and .281 puts you in the top quartile. At the other end, .246 marks the bottom quartile, and .217 would put you in the bottom 10%. However, as a standalone figure, it doesn't tell you anything about the player's power, since singles and home-runs are counted the same, and it also omits walks entirely from the equation.

OBP = On-base Percentage. The formula here is a bit more complex: (H + BB + HBP) / (AB + BB + HBP + SF). The range is generally from .200 to .400 - only five NL batters were above .400 last year, led by Joey Votto's .424. The all-time high is the insane .609 by Barry Bonds in 2004. During 2010, the median was .327; Stephen Drew's .352 puts him in the top quartile, and .378 is the ninetieth percentile. While rare, it is possible for a hitter to have a lower OBP than BA, if you have more sacrifice flies than walks.

SLG = Slugging Percentage. It's like batting average, but rather than all hits counting the same, a single counts as one, a double as two, while a triple and home-run are three and four. The total over a season is divided by the at-bats: if you like, it's the average number of bases a batter produces per at-bat. The average is currently round about .400; conveniently, .350 is the bottom quartile, 450 the top quartile, and .500 the ninetieth percentile. In career terms, Albert Pujols .624 is the leader among active players, and trails only Babe Ruth, Lou Gehrig and Ted Williams all-time.

OPS = On-base plus Slugging Percentage. I trust I need not say how this is calculated. :-) This number became popular after its use in 1984 by John Thorn and Pete Palmer, in their (excellent) book, The Hidden Game of Baseball. It's important, because it is simple, but does a great job of combining all the important numbers - not just batting average, but walks and power - into one. League median last year was Cody Ross's .735. Justin Upton's .799 just missed out on the upper quartile, while Kelly Johnson's .865 put him in the top 10%. If you can crack .900, you're an All-Star; reach 1.000, and MVP beckons. Two did the latter in 2010: Votto and Pujols.

OPS+ = Adjusted OPS. Not all parks are equal. And not all seasons are equal. OPS+ is an effort to take those factors into account. I won't even get into the formula, because it doesn't matter. What you should know, is that 100 is league average for the time, after adjusting for park factors [we'll get in to those in part three, but for now, they measure the extent to which Chase is more hitter friendly than Petco, and so on]. Votto's 174 was best in the league; the ninetieth percentile came in at 131, and the top quartile at 114. The lower quartile started at 79, and the tenth percentile was a lowly 67. Every point above or below 100 is one-half percent better or worse than average.

EXTRA CREDIT

Understanding the above, and using them correctly, will get you through the vast majority of discussions, and make you look really, really smart. If you plan to get even deeper into the Matrix, here are some other terms you might hear.

wOBA = Weighted On-Base Average. One of the problems with OPS is that it values OBP and SLG the same, even though it has been shown that OBP correlates better to runs scored. wOBA attempts to address that by adjusting its formula. The scale is the same as for OBP, and you can find numbers on Fangraphs.com, which uses it a lot in its calculations of player value. 

RC = Runs Created. This tries to measure how many runs a player created for his team - which is, after all, the point of the exercise, rather than walks, hits or any other stat in isolation. It was created by stats guru Bill James, and has been shown to be pretty good - usually within 5% - of predicting the actual runs a team will score. Kelly Johnson led the Diamondbacks last year, with 109 RC; Drew and Chris Young were both in the nineties.

LD% = Line-drive percentage. The percentage of all balls put into play that are line-drives. Line-drives are good, because they are far more likely to become hits than fly-balls or ground-balls. League average last year was 19% - Mark Reynolds' struggles were largely because his number was dead last among qualifying NL batters, at only 13%. That's why his average was so low: he was hitting fly-balls and ground-balls instead. Whether he turns things around in Baltimore will likely depend on whether his LD% gets back to where it was.

BABIP = Batting Average on Balls in Play. We'll discuss this more in pitching, but the basic principle is that, after the ball leaves a hitter's bat and stays in the park, whether it becomes a hit or not is mostly chance, outside the hitter's control. .300 is league average; batters who hit a lot of line drives will see it higher than that, but if a hitter has a high BABIP, this can suggest he has been lucky, with balls finding holes and dropping into gaps. If so, then his numbers might be likely to drop going forward. Conversely, a low BABIP can suggest he has been hitting balls at people, and similarly, that won't last forever.

Next week, we'll look at pitching numbers.

Comment 35 comments  |  2 recs  | 

Do you like this story?

Comments

Display:

I would argue that FanGraphs deserves as prominent of a link as B-R

But that’s just semantics. : )

http://hasthelargehadroncolliderdestroyedtheworldyet.com/

by Dan Strittmatter on Mar 2, 2011 12:47 PM EST reply actions  

That's ISOlated Power

2B + 3Bx2 + HRx3 / AB

It’s a measure of how often the player hits extra base hits. Suppose to help focus on the more powerful hitters in the league. It doens’t take the park factor into consideration.

Well there's your problem!

by JoeStock on Mar 2, 2011 4:52 PM EST up reply actions  

I know what it is

I was just vaguely pointing out that it was missing from the article.

Wear your own fur.

by Marc Fournier on Mar 2, 2011 6:28 PM EST up reply actions  

If numbers make your eyes glaze over,

you can think of ISO as slugging %, minus singles, with each extra-base hit weighted.

Mr. Science Boy

by DbacksSkins on Mar 2, 2011 6:36 PM EST up reply actions  

Or even

While SLG is bases per at bat, ISO is extrabases per at-bat.

"While Mrs. SnakePit watched one of the most highly acclaimed films of the year, I sat through a badly made schlock fest with absolutely no redeeming value. And it was awesome."

by Jim McLennan on Mar 2, 2011 7:34 PM EST up reply actions  

Easier to calculate than this...

SLG – BA

http://hasthelargehadroncolliderdestroyedtheworldyet.com/

by Dan Strittmatter on Mar 2, 2011 7:33 PM EST up reply actions  

+1

nowadays, i consider ISO, BB%, K% to be the holy trinity of batting stats…looking at those three is incredibly useful in my opinion to get an understanding of the fundamental attributes of a hitter, and useful at predicting what the future is like. it’s like the luck-independent/fielding-independent version of hitting, as they are sort of predictors for SLG/OBP/BA respectively.

and then i look at wOBA to get a big picture, and because that’s how WAR is calculated

by blue bulldog on Mar 2, 2011 11:48 PM EST up reply actions  

At the same time

ISO can also be subject to BABIP fluctuations – if line drives or fly balls have an abnormally-high BABIP for a season, then there is likely an inflated ISO.

http://hasthelargehadroncolliderdestroyedtheworldyet.com/

by Dan Strittmatter on Mar 2, 2011 11:55 PM EST up reply actions  

well

ISO is only subject to BABIP fluctuation insofar as line drives or flyballs turn into doubles more often than singles….which i think, has a pretty small affect. an abnormally high BABIP alone does not lead to an inflated ISO

on the other hand, admittedly, ISO is subject to fluctuations in HR/FB rates

by blue bulldog on Mar 3, 2011 4:57 AM EST up reply actions  

That's why I made the distinction

Between FB/LD and GB BABIP fluctuations. : )

And yeah, that’s a big one too. : P

http://hasthelargehadroncolliderdestroyedtheworldyet.com/

by Dan Strittmatter on Mar 3, 2011 1:09 PM EST up reply actions  

I was

gonna make that distinction, too…

I think, in theory, it IS possible to see BABIP decrease as ISO increases, but I may be wrong? If a player simply hits more fly balls?

Mr. Science Boy

by DbacksSkins on Mar 3, 2011 2:41 PM EST up reply actions  

Good stats but...

Those three are very good stats, but I don’t think they are really predictors of SLG/OBP/BA. Well, ISO is for SLG, because it’s almost the same. But players like Carlos Pena and Mark Reynolds have excellent BB%, but will never have a good OBP (because of the poor K%). And plenty of players have poor K rates and excellent batting averages. For example, the two NL leaders last year, Carlos Gonzalez and Joey Votto, both struck out 23% of the time.

by Amit on Mar 3, 2011 12:22 AM EST up reply actions  

well...

it’s kind of funny because your comment about BB% and OBP is also applicable to ISO versus SLG (since OBP is BA adjusted by BB%, and SLG is BA adjusted by ISO)

i guess i actually meant something a lot more complicated than just that ISO/BB%/K% predict SLG/OBP/BA (though i copped out in my earlier comment by using that phrase).

basically, the fact that SLG and OBP are dependent on BA, makes both statistics very volatile to BABIP variations. it also makes it so that you can’t get an accurate measure of different skills. Mark Reynolds 2010 is the PERFECT example of this. you look at his 2010 SLG (.433) and say wow, that’s lower than any year in his career. you look at his 2010 OBP (.320) and see that it’s lower than his 2007 and 2009 seasons. what that hides though, is that his ISO% (an independent measure of his power) was higher than all but 2009, and his BB% (an independent measure of his plate discipline) was a career high.

by blue bulldog on Mar 3, 2011 5:12 AM EST up reply actions  

I see your point

I guess because ISO is a more major part of SLG than BB% is of OBP. That is, ISO adds around .150 to AVG to get SLG, while BB% only adds around .060 to AVG to get OBP.

Getting back to Reynolds – batters have much more influence on their BABIP than pitchers do, since it is related to things like line drive rate, speed, and power. And Reynolds has seen his BABIP steadily drop the last few years, probably because his LD% has sunk all the way down to 13.3% last year. Is that due to bad luck, a decline in skill level, an injury, or just a fluke? That’s the big question for Reynolds, and I think KT did not want to get stuck with an expensive player if it turns out to be a decline in skill level.

by Amit on Mar 3, 2011 12:48 PM EST up reply actions  

It's very minor...

…but to be clear, if an out occurs while the batter is batting and he wasn’t involved, it doesn’t count as Plate Appearance or an At Bat.

The most common way for this to occur is if a baserunner is picked off or caught trying to steal. The player currently batting gets another shot the next inning, with his ball/strike count reset.

Well there's your problem!

by JoeStock on Mar 2, 2011 4:44 PM EST reply actions  

True.

Pitchers can even earn a save or a win without technically throwing a pitch by getting an out through a pickoff or CS. (If it ends an inning)

Mr. Science Boy

by DbacksSkins on Mar 2, 2011 6:37 PM EST up reply actions  

nice article...

look forward to reading the upcoming ones…

by Gildo on Mar 2, 2011 6:33 PM EST reply actions  

i'll be bookmarking this

so i don’t have to google/ask ’skins to explain every other comment to me :p

by jinnah on Mar 2, 2011 6:49 PM EST reply actions  

I could totally be wrong

but isn’t BB = Base on Balls and not “Bases” since you only get one?

Not that it is really important. The only reason I bring it up is that it was the only thing I could find even remotely wrong with the post! Amazing job as always Jim. Man I love this place!

It's not enough to just live, you gotta live for something.

by Dallas D'Back Fan on Mar 2, 2011 8:09 PM EST reply actions  

Is BABIP really a matter of "chance"?

I’ve seen this description used often, and to a degree it is correct. But, aren’t batters who are both physically talented AND observant/smart more likely to have a somewhat higher BABIP than batters who ignore opportunities for “easy” base hits (e.g., bunting when the infielders are playing back, or going opposite-field when they are playing a shift?)

Too often, BABIP is invoked as a negating factor- “He can possibly maintain last year’s BABIP” or, as my dad always said about the racehorse that hadn’t won in years, “He’s DUE!”

If it really is “chance”, why do we bother keeping track of it?

by TylerO on Mar 2, 2011 8:37 PM EST reply actions  

You are right, to some extent

Batters have more control than pitchers, and legging out infield hits is a good example – Ichiro comes to mind there. But it’s worth keeping track of for that exact reason. Even if you compare a hitter’s BABIP to his career number, it will give you some idea of if he has been ‘lucky’ or ‘unlucky’ – because over the course of a year, a hitter’s BABIP will tend to head towards his overall number (everything else being equal, of course).

Can hitters consciously “go the opposite way”? I’m frankly amazed anyone can make contact with a 95 mph fastball, never mind consciously decide where to hit it! Someone with that skill set probably wouldn’t have much of a defensive shift against them, I suspect.

"While Mrs. SnakePit watched one of the most highly acclaimed films of the year, I sat through a badly made schlock fest with absolutely no redeeming value. And it was awesome."

by Jim McLennan on Mar 2, 2011 9:20 PM EST up reply actions  

With my bat speed

opposite field (or opposite foul territory!) was my specialty when I was playing…

by TylerO on Mar 2, 2011 9:56 PM EST up reply actions  

Note:

Stats beginners should probably stick to reading this article and avoid the comments.

Mr. Science Boy

by DbacksSkins on Mar 3, 2011 2:43 PM EST reply actions   1 recs

Basic question is basic

What order is the slash line in? As in “he has a line of .blah/.whatever/.unicorn.”

This is not going to be pretty. We're talking violence, strong language, adult content...

by luckycc on Mar 3, 2011 3:47 PM EST reply actions  

Batting average/On-base Percentage/Slugging Percentage

So, if you really want, then you can add the last two numbers to get their OPS.

"I just don't know about rhinos. They have the same soulless eyes, but not ALL of them are jerks."

by kishi on Mar 3, 2011 3:55 PM EST up reply actions  

And another

I read somewhere that OBP is an approximation of how often a hitter gets on base. By that I mean you can see an OBP of .300 and contextualize that as the hitter gets on base roughly every three times out of ten. It’s easier for me to understand the stats with that kind of context, rather than trying to remember what the upper and lower ends of the stat are.

This is not going to be pretty. We're talking violence, strong language, adult content...

by luckycc on Mar 3, 2011 3:55 PM EST reply actions  

yeah that's a nice way of thinking about it

SLG is similar. by that i mean you can see a SLG of 1.000 and contextualize that as the hitter obtains one base on average every time he gets an at bat

by blue bulldog on Mar 3, 2011 9:28 PM EST up reply actions  

That…makes a lot of sense. Thanks!

This is not going to be pretty. We're talking violence, strong language, adult content...

by luckycc on Mar 4, 2011 12:46 AM EST up reply actions  

Comments For This Post Are Closed


User Tools

Welcome to the AZ SnakePit, the SB Nation blog about the Arizona Diamondbacks. "When you think about the past all the time, when you get to the present day you are thinking about the past so it becomes your future again." -- Kirk Gibson.

FanPosts

Community blog posts and discussion.

Recommended FanPosts

Jon-stewart-painting_small
"Leading the League in Love"

Recent FanPosts

Me___drums_small
OT: The (Literally) Thankless Job of a Hitting Coach
Hl_small
Recommendations
Small
In which I dispense some amateurish medical advice to Trevor Bauer
Basshat3_small
SNAKEPITFEST TUCSON EDITION....?
Small
My thoughts on Justin Upton
Small
Thoughts on the D-Backs Season So Far
200234_1969418916472_1272934884_2352102_4759893_n_small
D'Hall E-mailed me back!
Small
Hey Gibson ... heard of bunts?
227177_10150583458835315_663770314_18513970_7717573_n_small
Diamondbacks 1, Mets 3: R.A Dickey's Knuckle Sandwich KO

+ New FanPost All FanPosts >

Yahoo_full_count

Manager

Lucha_small Jim McLennan

Bench coaches

Madmen_icon_small snakecharmer

My-little-pony-friendship-is-magic-brony-not-the-element-of-efficiency_small kishi

Scarlett_small soco

Me___drums_small Dan Strittmatter

Players

Wailord_by_xous54_small Wailord

Wolfwood_small BattleMoses

Avogadro_small Zavada's Moustache

Basshat3_small Clefo