I attended the recently-concluded inaugural SABR Analytics Conference this year, held right here in Arizona due to the presence of the Cactus League, awesome weather, and SABR's Arizona headquarters. Despite being the first-ever Analytics Conference held by the organization, it was clear that these guys meant serious business in arranging thing. The speaker list was a who's-who of baseball figures - three MLB GM's, several top SABR ground-breakers, head figures of Baseball Think Factory, MLB.com, Baseball-Reference, Fangraphs, Baseball Prospectus, and, excitingly enough for SB Nation, Baseball Nation's own Rob Neyer... [::deep breath::] Derrick Hall, Tom Ricketts, Ken Rosenthal, Brandon McCarthy, and representatives from Bloomberg Sports. SABR did some serious work in setting up this conference, and that isn't just me schmoozing up to a group in which I'm now a member (a year of membership was included with registration).
With so much going on, I naturally spent the weekend feverishly taking notes, and a nice consequence of this has been the ability to write up a SABR Analytics Conference diary in the vein of the Bill Simmons draft diaries. Why such a heavy delay from the conference days to the postings? Well, considering how much detail I tend to put into what I write, I felt it necessary to ask SABR President Vince Gennaro if I could put this up on the site - and, of course, he graciously agreed. Throw in the fact that I was on break seeing my parents for the first time since Christmas, and hopefully the reason behind the delay is somewhat clear.
In a perfect world, I'd like these posts to be an opportunity for people to recognize the incredible advancements all around the sport, have a brief glimpse into some of the discussion that went on during the conference, and - to keep things fun - to have some humor. I can honestly say that I've taken in more information that I'll use in my career at this conference in three days than in any semester at ND, and hopefully the tales of my exploits can contribute, even in the smallest way, to the growth of the event. In reality, though, it'll probably just be a series of incoherent notes I wrote down connected by some nerdy jokes. After all, folks, I was just at an analytics conference. Any ounce of humor I exude for at least the rest of Spring Training is bound to be nerdy, or at least a bit unusual.
Note: Select recordings of the presentations are available online at http://sabr.org/analytics
Thursday morning: Registration.
Registration is slated to begin at 9:00 a.m., but the first panel starts at 1:30 p.m.. I split the difference and arrive around 11:00, but find the registration process is over in five minutes and the area is generally un-populated at this point, save for a couple dozen early birds (I imagine some of them were already staying at the hotel the event was held in). I find a small area of seating and settle in for a while, hoping to see some familiar faces to embarrass myself in front of in the coming couple of hours. There are enough people here, though, for me to notice one clear trend: the room has a large percentage of polo shirts and jeans - as I had insisted to my parents would be the case - and I'm wearing a button-down and tie and slacks. My dad expertly chimes in via text: "Those people have jobs."Touche, dad.
I find the first instantly-recognizable figure of the convention to my left, Baseball Nation's own Rob Neyer. Sweet.
Human Interaction 1: Not nearly as bad as I could have made it. From a couple-minute conversation, it's apparent that Neyer is truly down-to-earth and interested in people, and he works around my obvious - obvious to me, at least - nerves about randomly approaching people whose work I admire. If I had to pick someone among the crowd to stumble over my words a bit in front of, it seems as if I made a good choice. The first of several random encounters in the history books, I settle back in my seat and wait once again as people start to filter in.
Having settled back into my seat, I find myself surrounded by baseball discussion. The person sitting across from me is a new Marketing intern with an MLB club here to learn more about sabermetrics. The person to his left is a mathematician from Minnesota - and, naturally, a Twins fan. Another seat over is a fan and new member of SABR who is simply here for good weather and giggles. The conversation drifts from the Hall of Fame to the Twins and their Twins-y off-season (Jason Marquis, anyone?), the D-backs, a little South Bend Silver Hawks, and a lot of me rambling on like I tend to do (it's not just my writing that's wordy). All walks of life were represented, and it made the discussion all the better.
Before I know it, there's just 45 minutes until the first panel starts, and the numbers here have certainly started to increase. I decide to check the Twitterverse for some baseball news, but unfortunately discover that following even one NBA fan during the NBA Trade Deadline can single-handedly overrun a Twitter feed. At least I know that the Cavs turned a backup point guard into a first-round pick... or something.
Half-way through searching for any ounce of news in baseball, I hear someone say "there's free books over there." Oh, great, I love reading books (psyche! - though, to be fair, I doubt they'd be passing out Jane Austen novels). Still, I venture a look up to see what the literature is...
... WHAT?! FREE 2012 BILL JAMES HANDBOOKS?!?!
Poll the audience time: do you know what it's like to be deeply upset by the fact that it is not socially acceptable to sprint over short distances during moderately crowded conventions? I do. Now, sure, in retrospect there were a couple hundred copies laid out and there was little chance - approximately never percent - that they were going to get scooped up during my 50-foot walk to where they were displayed. In my defense, though, I didn't know this at first, and I really wanted a Bill James Handbook.
My diving into the handbook is briefly interrupted by my running into the University of South Carolina Case Competition team. For those who know nothing of the case competition, the basic idea of the competition was to get teams of college students - undergrad or grad school, primarily from business schools - and hand them a baseball operations analysis case on the Monday prior to the convention to run through. They were judged by a selection of pretty awesome baseball people - Tom Garfinkel (San Diego Padres), Lauren Prieb (MLB), Adam Cromie (Washington Nationals), Andrew Miller (Cleveland Indians), Shiraz Rehman (Chicago Cubs), Dave Studenmund (The Hardball Times), and Rob Neyer. After the first "round" of presentations, four teams were selected to move on to the final round, given a wrinkle in the case to re-analyze that night, and re-present their work with the wrinkle included as the final presentation of the conference.
Much as I would have loved to be involved in the competition, I a) had noticed the competition after the registration date, and b) wouldn't have been able to sing up as a one-person team anyway (thus requiring the Notre Dame business school to put effort into helping coordinate and arrange a team for an extracurricular project unrelated to big business... best of luck to anyone doing that). Immensely curious about what the project would have entailed had I registered, I approached the South Carolina team trying to get the scoop, but sadly found out that they were sworn to secrecy. Thankfully, I had mapped out a time to go check out some of the case presentations (they ran alongside some of the larger SABR-organized presentations), so a little patience would eventually ease my curiosity.
Thursday, 1:30 p.m.: First Panel
After some brief flipping-through of the Bill James Handbook, it was time for the first panel of the conference. The lineup is nuts: moderated by Sean Foreman (though I would have been very interested to hear his answers as well), the founder of the heavenly gift known as Baseball-Reference.com, and featuring Dave Cameron of FanGraphs, John Dewan of Baseball Info Solutions, and Cory Schwartz of MLB.com. It's like baseball nerd heaven arrived on earth and I wound up on the guest list. The panel is titled the "Changing Face of Baseball Panel", and took a topic-by-topic approach at where advancements are being made, what is still uncertain, and where the new frontiers are in the statistical community.
The panel began with the topic that inspired the most discussion, the present and future states of defensive metrics. A few excerpts from the discussion:
- Dewan: We currently measure about 60% of defense with analytics, with the remaining 40% of the insight needing to be acquired and developed through scouting and seeing how a player performs in the field.
- Schwartz/Cameron: The future of the metrics is in measuring individual components of defense rather than relying on a single number to provide total defensive value. Things like reaction times, speeds to different directions, reliability, etc., and use these physical skills to provide more precise defensive value figures.
- Dewan: For modern defensive metrics, a three-year sample is typically needed to provide reliable results. Most of what I've heard around the 'net seems to suggest that a two-year sample is sufficient for conclusions to be drawn, but perhaps even that is an insufficient sample.
- The entire panel shared the hard-to-dispute belief that Field f/x is going to be awesome. Whether or not it becomes publicly-available is another question. Fingers crossed.
Next, the panelists were asked to discuss what they felt the next frontier in the game was: if defensive metrics are today's big hot-button issue, what will it be in five years?
- Cameron: Injuries and injury prevention. More specifically, how can things such as Pitch f/x data be used to analyze pitchers and how they throw, and can that data subsequently be used to help pinpoint problems in how pitchers throw and what causes them to get hurt?
- Schwartz: The psychology of the game. How do the pressure of big, new contracts and off-the-field issues effect performance? Can they be quantified or predicted (provided that one has the necessary information)?
As a final topic, the three panelists were asked what they each believe to be the biggest analytical mistakes made in the publications they see.
- Dewan: "Every stat is flawed" - in short, these flaws need to be compensated for and addressed when working with them. Additionally, Dewan noted that team chemistry is so rarely considered in analytic topics, and while chemistry is so difficult to quantify, that doesn't mean that it shouldn't be considered at all.
- Schwartz: "There are no absolutes" - no individual number, whether batting average or fWAR, is able to answer every question.
- Cameron: "An over-reliance on the results as true talent" - just because a player posts a 2.0-win season doesn't mean that he's actually a 2.0-win player. That is the most likely outcome of all possibilities, but the sum of the likelihoods of all other possibilities greatly outweigh the likelihood of that initial figure being exactly correct. Of the three responses to this question, this one is what I find most pertinent to me, as I find that I make this assumption quite a bit. Does this mean there's a better alternative? Probably not. Still, I do believe that people were perhaps too quick to assume, for instance, that Gerardo Parra and Jason Kubel are players of similar caliber given their 2011 seasons.
Thursday, 3:00 p.m.: First Presentation
After another foray into human interaction, this time a pleasant self-introduction to John Dewan of BIS - a company that has proven to be a great stepping-stone for young baseball minds to get into the industry - the conference jumps right into the second segment of the day, with a choice of two lectures for those attending. In one room, Dave Studenmund of The Hardball Times was slated to discuss post-season WPA figures, providing a full accounting for the individual plays and players in post-season history who did the most to further their team's chances at winning a World Series Championship.
However, I chose to go to the other discussion, a presentation from SABR president Vince Gennaro entitled "Top 10 Value Plays or Building a Roster". Now, I certainly don't want to steal Mr. Gennaro's thunder (and make this a 10,000 word write-up) by simply posting the entire presentation here verbatim, but I'll provide a basic outline as to what the talk covered.
Gennaro used $/WAR as his definition of value for the presentation (I want to say it was specifically fWAR, but, having not written it down, don't quote me on this), and looked at a few ways to try to maximize this figure based on history and/or hypothetical situations when looking to fill a roster hole. The basic groups of move were as follows:
1) Within a category he referred to as "Exploiting Data Biases", Gennaro set out a few examples of where the industry - or, at least, the public domain/blogosphere - may not be seeing hidden value that lies in a player's performance. The most striking to me was his heightened valuation of bulk innings from starting pitchers. Gennaro noted something that, while rather obvious and true upon hearing it, isn't something that I think most people would consider - I certainly hadn't - without being prodded to consider it: if a team's starters go, on average, one-third of an inning deeper into games, they can cut the total bullpen workload by an entire middle reliever.
After all, 1/3 of an inning * 162 games = 54 innings, a workload eclipsed by only three full-time Arizona relievers in 2011 - those being J.J. Putz, David Hernandez, and Micah Owings (though Brad Ziegler also notched 58.1 innings combined between Arizona and Oakland). Further, cutting out those early middle relief innings typically means that you're trimming some of the least-effective relief work done by the bullpen, so it can eliminate some of the innings that are closest to replacement-level.
For the D-backs, those would have been the innings of Zach Kroenke, Kam Mickolio, Juan Gutierrez, Yhency Brazoban, Aaron Heilman, and Ryan Cook, all of whom had ERAs of 5.40 or higher. In fact, if you add all of their results together, you get a combined 6.58 ERA (57 ER) in 78 innings of "work". Now, it's certainly presumptuous to say that you'll always be replacing the worst innings by having starters go deeper, and Arizona already good about giving their starters long leashes. Nonetheless, there's something to be said for someone who can give you seven innings of a 4.00 ERA beyond simply his ability to chew those innings.
2) Gennaro's second subgroup was labeled "Inefficient Pricing of Wins", essentially a catch-all for what the real purpose of Moneyball was supposed to be - market inefficiencies (as opposed to Peter Brand, making a villain of scouts, and pensive Brad Pitt glances). The specific points made in this segment are less important to individually highlight, at least in my mind, because they're rapidly changing. What Mr. Gennaro thinks is a market inefficiency today could be overcompensated for tomorrow. However, one solid point was worth keeping in the memory banks: (since much of the league's pitching is right-handed) when in doubt, buy cheap left-handed hitting (Lyle Overbay, anybody?).
3) The third subgroup of tactics, "Optimizing Timing", had two basic components. The first was something we've seen a lot of this off-season, particularly in the evolution of the pitching markets - wait until mid-February to sign free agents. Somewhere, Jonathan Papelbon is still chuckling at Ryan Madson. What was more surprising to me was Gennaro's work in studying when teams can truly develop a level of certainty as to whether or not they can be contenders for the post-season. While we often see teams waiting until right up to the trade deadline to pull the trigger, Gennaro suggested that teams learn very little from one week into July until the time of the trade deadline about whether or not they're truly likely to be post-season contenders. By making moves earlier than right up against the deadline, their effects (and the value of the rental players and packages offered in return) can be maximized.
4) The final subgroup is another simple one, "Buying Risk at the Right Price". Since baseball is a controlled environment where downside can be limited by benching players, the downsides of highly-risky players can be mitigated while their upsides can be enormous. A fairly basic concept, but certainly an important one.
Thursday, 3:45 p.m.: Second Presentation
Again, it was right from one talk into the next, with another pair of presentations to choose from. Figuring I'd had a solid supply of defensive metric intake already from some serious experts in the field during the first panel, I chose to skip Brian Cartwright's presentation of "Counting Defense: Extending Defensive Efficiency Rating to the Player Level". Instead, I opted for J.C. Bradbury's "Impact of Pitch Counts/Rest on Pitcher Performance", a synopsis of a study done by Bradbury with the help of Baseball-Reference.com founder Sean Foreman's heavenly data supply.
A professor at Kennesaw State University, Bradbury's presentation wasn't an exposé on pitcher injury rates due to pitch counts and days of rest - I can't even begin to imagine how someone would try to conduct that study - but instead looked strictly at the effects on subsequent performance. Still, Bradbury's work was perhaps the most informative event of the conference, and certainly was the most informative research presentation I sat in on.
While there were several points of Bradbury's presentation, the two I found most interesting were the results of how a previous game's pitch count - or the pitch count of a trailing sample of five or ten games - affected the ERA of a pitcher in his subsequent outing. Using a median pitch count number of 99, Bradbury found that each additional pitch thrown above (or below) the 99 pitch mark had the effect of raising (or lowering) the pitcher's expected ERA in the subsequent game by 0.007. Additionally, every additional pitch over 99 added to a pitcher's trailing five-game average raised his ERA by 0.014 in the subsequent game, and every additional pitch over 99 added to a pitcher's trailing ten-game average raised his ERA by 0.021, so the effects do build over longer time spans.
Certainly, this seems somewhat insignificant at first glance, but consider the effect that letting a pitcher go for a complete game in a blowout could have on the subsequent outing. If the pitcher needs 25-30 pitches to get through that last inning, that raises his expected ERA by 0.175-0.210 for that next outing. Is an mostly-meaningless inning of work worth that slightly-meaningful ERA change? Do the psychological rewards of completing the game offset the physical detriment associated with the additional pitches? What about going to the extremes, such as with whether or not Edwin Jackson should have been kept in for his 149-pitch no-hitter?
Bradbury then broke out the results by age group, with the real distinctions coming between the 34-and-younger age group and the over-34 age group. Over the entire sample, an additional 38 pitches over the 99-pitch median were needed in order to cause a rise in ERA of 0.25 in the subsequent start. However, while the younger group saw that spike after just 33 additional pitches in the previous start, the group over 34 years old could throw an extra fifty-eight pitches in their previous start before seeing their ERA rise by 0.25 in their subsequent outing.
This, naturally, led me to wonder: is the reason for this high tolerance for pitch-count shocks the fact that the pitchers have aged, or have these pitchers been able to survive past age-34 because they are capable of tolerating pitch-count shocks throughout their career? I approached Bradbury after his talk and discussed the possibility of back-tracking that sample of pitchers through their earlier years to see if that group of pitchers had always shown an ability to handle high pitch-count shocks (as I would suspect), or if it was acquired in their later years. He seemed intrigued by the idea and suggested that it wouldn't be particularly difficult to figure out (he indicated that a little programming with the database he already has could get it done), so I'll follow up on it and see what further results he comes up with.
Thursday, 5:00 p.m.: Bloomberg Sports Presentation
Immediately after this was the presentation by Bloomberg Sports about their new team analytics database service. While I certainly received only a small glimpse of what their product can do, and the typical Joe College like me could never afford the package that MLB clubs are receiving from Bloomberg, what I saw was incredibly impressive. Sadly, I wouldn't know where to begin if I were to attempt to describe the whole thing - even by my lengthy word count standards, it'd be too much - so I'll simply divulge the coolest feature (as I see it).
Bloomberg Sports' tools allow teams to sort through individual pitches thrown/seen (for pitchers/hitters) by nearly any query imaginable, and then have every single pitch linked up to video clips that can be accessed in seconds. It's awesome. They've also created a personalized iPad app for players that allows the players to review their personal track record by individual pitch location, type, and result (with corresponding video) against the team/lineup/pitchers they're facing on any particular day. Add it all together, and you have a tool that has uses at literally every level of baseball operations. As of now, Bloomberg Sports has already established partnerships to provide this software to 24 MLB teams, and it's not hard to see why there's been high demand.
Thursday, 9:00 p.m.: Second Panel
The final discussion of the day was dubbed the "Player Panel". However, it seems that most players in Arizona for Spring Training were slightly more preoccupied with trying to make their respective rosters than discussing sabermetrics, as the "panel" wound up being a two-person affair, with Baseball Nation's Rob Neyer chatting with Oakland A's starter Brandon McCarthy. After Neyer introduces McCarthy and recants the absurd numbers he posted in 2011 (to the tune of an attendee audibly snoring behind me), they dove into a discussion about how McCarthy's use of analytics, the sudden career renaissance he experienced with Oakland, and - of course - the hilarity of the now-defunct (still accessible, just no longer updated) FireJoeMorgan.com. McCarthy was remarkably candid in the interview, and Neyer's questions were to-the-point. Outside of their discussion of analytics and their practicality at the player level, the duo freely discussed McCarthy's early-career home run problems, the psychological repercussions of working in relief, and how McCarthy went through the process of remaking himself.
Starting his big-league career as a highly-touted prospect, McCarthy suggested that he wasn't expecting to run into some of baseball's harsh realities. Relegated to relief work after his trade to Texas, McCarthy ran into troubles with arm injuries while dealing the transition to short stints, and also simply wanted to be a starter. Further, the fly-ball-heavy style he had employed throughout his career was taking a serious toll on his numbers - as McCarthy put it (paraphrasing), "you don't need advanced metrics to tell you that home runs are bad". Fed up with fly balls, McCarthy said that he simply looked at what Roy Halladay was doing and told himself "I want to do that", believing Halladay's physical abilities to be within his own scope of possibility.
McCarthy thus began altering his arsenal to fit the Halladay sinker/cutter/breaking ball mold, despite the Rangers' pitching coach, Mike Maddux, not supporting the transition (according to McCarthy). Yet, we know how the story ends - McCarthy's ground ball rates were awesome in 2011, and his ability to miss a good number of bats while utterly refusing to issue walks made him one of the best pitchers in the game. McCarthy also expanded on his use of Pitch f/x data, particularly in how he analyzes the movement on his sinker and cutter, looking back on the movement measurements of his pitches to pinpoint problems he could be having in his mechanics, and then using the data to help identify the appropriate way to correct the movement of his pitches.
Thursday, 7:00 p.m.: Networking Reception
With the first day wrapping up, Arizona's own Derrick Hall took a moment to speak to those at the conference and highlighted just how integral Arizona is becoming in the baseball community. Between an NL West Pennant, the Cactus League, the 2011 All-Star Game, 70 to 80-degree weather, the new SABR HQ in the Valley, and the start-up of the Analytics Conference, it's been an awesome year for a city that's becoming something of a Spring Mecca for baseball fans.
Upon the conclusion of Hall's brief speech, it was time for the Networking Reception, a gathering that featured practically the entire collection of college-age attendees, a handful of gracious baseball executives willing to offer some advice, a cash bar, and enough spring rolls to feed a small army - even one that had spent the last seven hours in a non-stop series of baseball lectures and panels. The baseball executives started to filter out first, as I'm sure they had, y'know, jobs to attend to, with a large chunk of the rest of the group following shortly thereafter. There was one common denominator among those at the reception: they enjoyed talking about baseball. You know I'm all for that.
The hubbub at the Networking Reception began to fade around 9-9:30, and after eight hours of baseball, the first day of the conference came to an end. That's right, folks, all that action was merely a third of the conference schedule. If you love baseball, have even the slightest interest in statistics, live in the valley, and have a registration fee's worth of change burning a hole in your pocket, next year's conference is certain to be well worth it. If you're still not convinced, make sure to check out my forthcoming recaps of days two and three - there's plenty more goodness to come.