clock menu more-arrow no yes

Filed under:

My Own Statistical Analysis (Part 1)

New, 74 comments

It's time to examine the Offense, Defense, and Pitching of the Arizona Diamondbacks.

Joe Camporeale-USA TODAY Sports

Prologue

This being the traditional time of reflection, I thought I would take some time to do some of my own analysis. Us baseball fans seem to like numbers because they help us quantify that which is good and that which is bad.  There are full divisions within baseball organizations who's sole job is to analyse the numbers.  However, the numbers alone can never tell the full story of the season and quite often the numbers we assume will win games can often be wrong.

My Resume

Some of you may or may not know that my full time job is that of Database Administrator.  My job revolves around extracting useful information from raw data.  Though I am not a Sabrematrician and do not really know all of the ins and outs of baseball statistics, I do think I'm uniquely qualified to point out various pieces of information that might be useful.  I can build tables, queries, forms and reports on just about anything as long as you can give me the numbers.  It is something I do for both large and small corporations.  Does this make me the next Chief Statistics Officer for the Diamondbacks?  I dunno.... does it Tony?

My Methods

The first step to doing statistical analysis is to gather your data and begin to understand the relationships between different pieces of that data.  Though there are good resources such as Baseball-Reference.com and Fangraphs.com, I was only able to find one actual downloadable database which I could work with that had relatively new data in it.  SeanLahman.com had a fully downloadable Access, SQL, or CSV file.  I went with the Access database version due to it's easy to use query designer.  The database is made free of charge and is the combination of many contributors.  For those interested and with the skill, I highly recommend it for your own perusing.  There is one drawback however.  Since the 2014 year has not been concluded, we have no stats for this year.  I will add them manually as I find relevant.

The First Stat

My first goal was to identify some offensive metric that would correlate with winning and losing seasons.  This was to help form a general overview rather than a game by game analysis.  My first instinct was to create a statistic that showed the average number of hits per game for each player.  Then I would assume a highly effective offense would have a significant number of players who had a higher average.  After a cursory look at the data I decided an average of one hit per game would be a good watermark.  To narrow my scope even further I decided to just focus on the Arizona Diamondbacks for the time being.  Here are results:

Year Players W >=1 HPG Win %
2013 4 0.5
2012 4 0.5
2011 1 0.58
2010 3 0.4
2009 4 0.43
2008 3 0.51
2007 2 0.56
2006 8 0.47
2005 3 0.48
2004 3 0.31
2003 4 0.52
2002 4 0.6
2001 3 0.57
2000 4 0.52
1999 5 0.62
1998 4 0.4

I don't know about you, but I certainly didn't expect 2001, 2007, and 2011 to have the lowest number of players to have at least one hit per game.  I also didn't expect the 3rd worst seasons (2006) to have 8 players over 1 hit per game average.  This is a classic example of where you can't make assumptions.  For those interested in 2006 the 8 players who averaged over 1 hit per game were (over 50 games with an AB):


G_Avg First Last
1.00714285714286 Conor Jackson
1.02608695652174 Shawn Green
1.03921568627451 Luis Gonzalez
1.04895104895105 Eric Byrnes
1.05732484076433 Orlando Hudson
1.08695652173913 Johnny Estrada
1.09090909090909 Chad Tracy
1.11864406779661 Stephen Drew

The one player who averaged one hit per game or more in 2011 was Justin Upton.

Broadening The Scope

Whenever I get a result set that doesn't seem to match expectations, it's time to critically analyze the metric itself and make sure it's a good indicator of results.  So I needed broaden my sample size while at the same time narrowing my focus on the metric itself.  I took all teams since 2000 and counted the number of players who averaged more than 1 hit per game.  Then I split it up into winning seasons (>= .5%) and losing seasons (<.5%).  Here are my results:

Winning Seasons: 4.36 Players >=1 HPG

Losing Seasons: 3.77 Players >=1 HPG

This would indeed seem to indicate that the Diamondbacks are an anomaly or it could be said that my made up statistic of Hits Per Game is really a non-factor in determining wins and losses.  But still this intrigues me how our better years have fewer players with at least a one hit per game average while our down years seem to have more players above 1 HPG.

My Next Statistic

Since Hits Per Game average didn't seem to indicate much of anything worthwhile I decided to try a different approach.  Hits alone don't really tell the scoring story but rather how many hits are converted into runs via the RBI.  To me RBI/H is about as direct a link you can get to good offense as I can think of. It isn't directly an indication of "clutch", but at least it should give us a good representation of how good the team is at scoring.  Again, I narrowed down the results to just the Diamondbacks:


Year RBI/H Win %
1998 0.459 0.4
1999 0.552 0.62
2000 0.516 0.52
2001 0.519 0.57
2002 0.532 0.6
2003 0.474 0.52
2004 0.415 0.31
2005 0.472 0.48
2006 0.493 0.47
2007 0.509 0.56
2008 0.504 0.51
2009 0.487 0.43
2010 0.506 0.4
2011 0.517 0.58
2012 0.501 0.5
2013 0.441 0.5

AT LAST!  A result set that seems to match expectations.  The winning seasons all seem to have a .505 or higher ratio of RBIs to Hits with the exception of 2010.  In down seasons we see no higher than .506.  To distill this down to a couple of numbers, the average RBI/H ratio during winning seasons (>.500) is .515.  During the losing season (<.500) it's .472.  Note, this does exclude the two years of '12 and '13 where we broke even.

Just to verify our results and get a larger sample size, lets look at all of the teams since 2000.  During winning seasons the RBI/H rate was .505 and only .472 in losing seasons.  It seems that we really have a statistic that fits nicely into determining good and bad seasons on offense.  I believe the .505 RBI/H rate is a good baseline for determining good offense and bad offense.

So how did the 2014 Diamondbacks fare?  Not so good.  Only a .416.  This sure explains a lot.

What This Means

To generate good offense it's both a combination of Hits and RBIs.  Now that may seem like an obvious fact that you knew before reading this article, consider that we have definitively proven it through statistical analysis.  Not only that, we have also proven it is not the NUMBER of hits but rather the conversion of hits to RBI's which produce a winning offense.  It is important then that you build an RBI heavy offense and focus less on the high BA types.  For every player with a low RBI/H ratio they must be compensated for by someone with a higher RBI/H ratio.  In 2011 this is what the RBI/H ratio looked like for players with over 50 hits:


nameFirst nameLast Conv RBI H
Willie Bloomquist 0.28 26 93
Stephen Drew 0.556 45 81
Kelly Johnson 0.544 49 90
Miguel Montero 0.619 86 139
Xavier Nady 0.686 35 51
Gerardo Parra 0.354 46 130
Ryan Roberts 0.542 65 120
Justin Upton 0.515 88 171
Chris Young 0.53 71 134

Only two players, Willie Bloomquist and Gerardo Parra were below the .505 mark.  Lets look at the same data for 2014:


Name Conv RBI H
Miguel Montero* 0.605 72 119
Paul Goldschmidt 0.566 69 122
Aaron Hill 0.492 60 122
Didi Gregorius* 0.443 27 61
Martin Prado 0.385 42 109
Mark Trumbo 0.792 61 77
Ender Inciarte* 0.233 27 116
Gerardo Parra* 0.286 30 105
David Peralta* 0.383 36 94
Chris Owings 0.321 26 81
A.J. Pollock 0.3 24 80
Cody Ross 0.294 15 51

Only 3 players had a conversion rate higher than .505.  Interestingly, Montero and Parra who were the only two players on both teams had similar RBI/H rates between the 2 years.  This also puts a bit of a dent in the theory that we have a good solid core of offensive players.  We have far too much depth of hitters who don't knock in runs.

In part 2 of my analysis we will look at Defensive Metrics of my own devising.

Please feel free to comment and ask questions.  If you would like for me to try and build a stat for you I can.