clock menu more-arrow no yes mobile

Filed under:

Introducing Pitching Maps

My first real stab at developing my own analysis tool.

Arizona Diamondbacks v Washington Nationals Photo by Mitchell Layton/Getty Images

So, I’m going to warn you now. This is a bit more complicated than most of my posts. This post is going to be a bit less focused directly on the Diamondbacks and more directly on the analysis I’m trying to do. So it may not be very interesting to many members here, at least as of yet.

This is my first “real” attempt at trying to develop my own analytical tool. As such, this is going to be a work in progress and I will post any meaningful updates as I come along them. Today’s post isn’t going to be very conclusive; however, it is going to give an overview of the method and help paint a picture of what this might be able to answer.

Statcast provides a lot of very valuable tools. One of them are the “Gameday” and “Detailed” strike zones. These are very useful filters as they can narrow down your data to specific zones where a pitch was thrown. You might recall that I used the “Gameday” zone in the Pitch Framing article a few weeks ago.

For this analysis, I am going to use the “Detailed” strike zone as it gives more zones for us to pick and choose from (as we will want to do later on into this project).

Above is a picture of the Detailed Zone. The green box within that picture is the technical strike zone. Zones 1-9 are are fully within the strike zone. Zones 11-19 are the “edges” - pitches which are borderline and can be called either way (and will often shift between LHH and RHH). Zones 21-29 are pitches that are clearly outside the strike zone and outside the “edges”.

For this exercise, I went zone-by-zone and calculated the wOBA in each zone for 2018. I made my own chart and color-coordinated it for clarity.

Just so we’re clear on the colors: the green boxes are clear strikes, the orange-ish boxes are the “edge” zones, and the blue boxes are the clear balls. Inside each box is the league-wide wOBA for each zone. Also, this is from the catcher’s perspective so right handed hitters are to the LEFT of this box. Lastly, please note that this chart is not to scale.

To me, this was fascinating. Some of it should be very obvious and intuitive. For instance, pitches through in the center of the zone and middle height have very high wOBAs. These are the easiest pitches for batters to hit. That makes sense.

To validate this data, I looked at a few more years. I wanted to get a full range of 2015 - 2018 for the chart but Statcast kept failing each query (too much data to compute?). However, it worked when I did three years of data. So, here is the same chart as above, but with 2016 and 2017 included.

Overall, the charts are fairly close in magnitude. There are a few zones with some decent-sized changes but this seems to be fairly repeatable year-to-year. Again, this should be fairly intuitive: batters do best at hitting pitches in the strike zone, especially those in the middle of the plate. Pitches on the edge are very difficult for hitters as they can be hard to hit but also be called strikes. If you apply observational common sense to how to best “pitch” a batter, it would probably come close to representing this chart.

However, some things were stood out against what I was expecting. For starters, the upper corners (zones 21 and 23) have the highest wOBAs of any zone. At first, it didn’t make sense to me because there is very rarely any contact made in those zones (e.g., no homers, extra base hits, etc.). However, I did some digging and it dawned on me: these zones have such high wOBAs because they have ridiculously high rates of called balls with very low rates of swings, let alone whiffs. The high wOBAs here are almost exclusively a product of the high amount of walks produced by pitches in these two zones. However, the bottom two corners (27 and 29) produced extremely low wOBAs due to drastically higher whiff rates. The main take away here: it seems like it’s easier for batters to pick up high pitches out of the zone.

Now that we have this wOBA chart, we can now start picking our “buckets” to try and analyze pitchers (and eventually, hitters). I won’t be getting too far into the buckets today (this will require a lot more analysis), but I picked out a simple one to illustrate my point.

I call this the “Hot Zone” bucket and it’s simply a splitting the zone between the high wOBAs and the low wOBAs. It looks like this:

This should be pretty self-explanatory. The red zones are where batters seem to be beating pitchers, overall, and green zones are where pitchers seem to be winning. But it should allow for an easy conclusion: teams that pitch more in the red zones should do worse than teams that more less in the red zones.

So, I had Statcast look at zones 2, 4, 5, 6, 7, 8, 21, 23, and 24 (the red zones) and sorted by teams that throw there the least. Once again, it appears that the Diamondbacks are on to something:

It looks like we have another area where the Diamondbacks are in front of the pack on a lesser-known aspect of baseball. What’s also interesting are the pitchers leading this front:

Ten Lowest ‘Hot Zone’ Pitch% in MLB, 2018

MLB Rank Pitcher Pitch%
MLB Rank Pitcher Pitch%
1 T.J. McFarland 19.7
2 Patrick Corbin 24
3 Kyle Gibson 24.2
4 Dallas Keuchel 25.8
5 Alex Wood 26.4
6 Martin Perez 26.6
7 Jason Vargas 26.8
8 Gio Gonzalez 26.8
9 Blake Snell 27.3
10 Zack Greinke 27.7

Numbers 1 and 2 on this list are our very own TJ McFarland and Patrick Corbin (minimum: 1000 pitches). And McFarland is a whopping 4.3% ahead of Corbin - this is 0.6% bigger than the gap between Corbin and #10 Greinke (hey look, another Diamondback).

Now, this isn’t a perfect measurement. I plotted the team-by-team rate of “Hot Zone” pitch% and their 2018 ERA:

The trend here isn’t super strong but with an R^2 approaching 6%, it shows that there is actually a weak relationship to be found here. This is actually a pretty good thing, as I just chose the simplest “bucket” of zones to look at it. ERA, in itself, is full of noise (even FIP only has about a 45% correlation to ERA), so tying these directly to ERA may not be my end game.

So that’s what I have right now. A new tool to mess around with on Statcast and a methodology to attempting analysis. If you feel up to it, I would love to see others player around with this! I know that this might be more than many people want to mess around with. Personally, I want to see if I can find some relationships between zones and either hard/quality contact or BABIP. BABIP is one of the hardest things to tie down in baseball, especially for pitchers, and this might provide an interesting means for looking into it. Or maybe it won’t yield much useful information. We shall see!