clock menu more-arrow no yes mobile

Filed under:

Evaluating baseball managers - a chat with Chris Jaffe

As we saw in Arizona last year, managing a baseball team is not the most secure of professions. Indeed, the managers of both teams in the 2007 NL Championship Series, lauded for taking their mid-market outfits as far as they did, found themselves out of work during the first half of the 2009 season. It's probably fair to say that managers get a lot of credit with regard to things for which they are not responsible. Yet, equally, they get blamed for things over which they have little or no control. Is it possible to be more rigorous with regard to analyzing a manager's performance?

Chris Jaffe, best known for his work on The Hardball Times, thinks so. His new book, Evaluating Baseball's Managers, takes a look at 89 managers, most of whom managed ten seasons or more as a team's primary manager. He uses various metrics to establish a manager's tendencies - fan of small ball or prefers to wait for the three-run homer? - and then discusses the approaches they used, along with some of the key issues faced during their tenure, and an analysis of how well they were handled. It's an approach that does break new ground on one of the less-explored areas of sabermetric research, and Jaffe was kind enough to answer a few questions on his work for the SnakePit.

Where did the idea come from?
In the summer of 2005 I attended SABR’s annual conference in Toronto.  While there, I saw a presentation by a sabermetric researcher named Phil Birnbaum, who had assembled a database which determined how teams over/underachieved in the course of a single season.   

The parts of his database that really caught my eye were two algorithms he’d created to figure if a player under/overachieved in a given season by looking at how he did in the surrounding season.

Say you wanted to figure how Luis Gonzalez’s 2001 performance compared with the surrounding seasons.  First, take his performances in 1999, 2000, 2002, and 2003, and calculate Runs Created for them and adjust those numbers for park and league average.  You use them to figure how he should have done in 2001 by taking a weighted average of those seasons.  (By weighted I mean the inner years – 2000 and 2002 in this example – are worth twice as much as the outer ones, 1999 and 2003 here).  Then regress to the mean, adjust for Gonzalez’s playing time in 2001 and it says he overachieved by 40 runs in 2001.

Phil had a similar formula for pitchers, based on another Bill James stat – Component ERA.  James described this stat as the Runs Created against a particular pitcher, and that’s pretty much what it is, thus a good one to use alongside Runs Created itself.  The algorithms are by no means perfect, but they give a dang good quick’n’dirty idea how a player over/underachieved.  As an added bonus, since Birnbaum’s formula uses previous and succeeding seasons, the player’s course on the aging curve is accounted for.  (Not perfectly accounted for, but acceptably so).   

These formulas struck me as brilliant.  I’ve had the sense that in-depth baseball analysis has missed a lot of what made managers important.  I’ve seen numerous terrific studies of managers looking at their ability to fill out a lineup card, or the feasibility of bunting or many other similar factors.  The ability to handle in-game strategic notions has been the best analyzed part of the game, in part because it’s the easiest to analyze.   

That’s struck me as a bit off, though.  My contention is that managers are primarily managers of men, and only secondarily of the game.  Aeons ago, Rob Neyer wrote a column as ESPN where he asked a GM or two what they discuss when interviewing would-be managers, and was told communication was the top item of interest.  At SABR’s 2008 convention, Cleveland GM Ron Shapiro said the most important parts of the job were communication, self-awareness, and prioritization.  All of those things deal with the clubhouse, not the in-game strategy.  I don’t think any of these insights are particularly shocking, but it’s still worthy of note – the behind the scenes stuff matters the most, even though it’s the hardest for us to study or quantify.  

I think the Birnbaum Database gives us an insight to this, however.  It can teach you about both coaching individual players and handling the overall roster.

That said, one key bit of irony exists: Phil called his database the "Luck Database."  His presentation in SABR 2005 used it to determine what are the luckiest teams ever.  I think luck clearly plays a role, but think it’s incorrect to say all variation is luck.  For example, if you look at teams run Earl Weaver or Bobby Cox in the database (Phil has been nice enough to share his full results with me), they constantly overachieve.  Alternately, Don Baylor’s teams always do terribly.  Some results are surprising, but there is a sense that managers matter.

I actually ran a quick test to see if there was any evidence the Birnbaum Database (I call it that because I think it covers more than just luck) tells us anything about managers.  I took the entire run of the database – from the 1890s until the 21st century – and divided all games into four categories: 1) games where the team’s skipper lasted 2000+ games in his career, 2) those helmed by a manager who lasted 1000-1999 games, 3) 500-999 game managers, and 4) 499 or less.  

My contention was simple: if the Birnbaum Database indicated managerial skill, it would look different than if it indicated luck.  In both cases, the 499s would do worst – they were either terrible or unlucky. However, if it’s skill, the 2000s should be best.  If it’s luck, they should be the most even.  After all, to manage 2000 games is a 13+ season sample size – and 2000 games is just the minimum requirement for making the group.  If it’s skill, the gap between 500s and 1000s should be the smallest.  They’re the two most average groups of managers.  If it’s luck the 2000s/1000s gap should be smallest – it’s sample size.   

All the above rests on one assumption: there is some sort of correlation between length of tenure and skill of manager.  As assumptions go, that’s one I feel extremely safe in making.  Some real dogs last a long time and some short-career guys should have lasted longer, but on the whole the 2000s are a lot better than the 499s.  The results came back verifying the notion that the Birnbaum Database indicates managerial skill.

What tools do you use to separate a manager’s performance from other causes?
Actually, I start from the premise it’s impossible to separate a manager perfectly from his environment.  In fact, I think it’s a fundamentally false notion to do so.  A manager’s value is less about himself as a singular individual and more about how he works in his given environment.  

There are a bunch of numbers in the book – from the Birnbaum Database, from my own creation called the Tendencies Database (which looks at how managers handled specific portions of the game), from various other places – but none of the numbers inform us exclusively about the manager. I think this is one reason why there has been limited sabermetric focus on managers aside from in-game tactics.  It’s tough to get a perfectly clear view.  I can accept that because imperfection is not a synonym for useless.   Heck, the Birnbaum Database itself was first created to look at luck, not managerial skill.  And in fact, when looking only at one season, it does tell us far more about luck than anything else.  

If I can’t separate a manager from other influences, I can at least try to limit it.  One way I try to do this is with sample sizes. As noted above, one Birnbaum’ed season mostly tells you about luck.  So I rarely look at just one season’s worth of data in it.  I look at career length info.  When you have 2000 or so games for someone like Davey Johnson and it shows he got a lot out of his individual hitters, that’s a bit more meaningful.  Sample sizes come in especially handy when using the Tendencies Database.  I’ll only run a manager through it if he lasted at least ten seasons as a primary manager.  

I’ll also mostly look at the extreme results at either end.  A manager who ranks third best ever with individual hitters or pitchers probably has some luck going for him – but more than that he must have been doing a heckuva job with those men. Mostly, though, I try to limit the noise from the signal by just being aware of possible complicating factors.  I’ve always been a firm believer that the analysis really starts after you have the numbers, not in coming up with the data.   

You say managers are first and foremost managers of men.  Managing the game is only a secondary job function.  Were you able to take this into account in your analysis?  Is it possible to measure objectively the former skill?
It’ll never be possible to perfectly analyze managing’s "softer skills side" (for lack of better way of putting it).  A certain degree of uncertainty and inexactitude will forever remain.  Personally, I’m more interested in the broad picture than the exact details, though.  It’s more a matter of dealing with the static than eliminating it.  

Could you detect any common patterns in the approaches taken by good and/or bad managers?
It’s like I tell my students when they ask me if they have a good idea for a paper: any idea can go wrong, and any halfway decent idea can go right.  You can find small ball managers and take’n’rake-rs.  Those that lead with pitching and those with offense.  Bullpen backers and heavy users of their rotation.  Some use their bench and others rely on their starters.  Any particular approach can be used successfully.  And they can all be used unsuccessfully.  

In his book on managers, Bill James says that the only indispensable quality for a manager to have is the respect of the players.  I’d agree with that.  Heck, I'd better – that quote begins the book, just before the acknowledgments page.  

You said of former Arizona manager Buck Showalter that he "does not like small ball: his batters do not bunt and his runners do not steal," while noting that he had the worst record in one-run games among ten-year managers since 1920.  Do you think these are connected?
It could be, but I’m a bit skeptical.  He’s 220-244: if you flip a dozen of those 464 games, he’s at .500.  A lot times, the info I put in the "Team Characteristics" section of each manager’s commentary (which is where the Buck Showalter 1-run game thing can be found) are merely interesting tid-bits.  One of my favorite ones is noting that Lou Piniella presided over more 50-double performances than any manager in history.  It isn’t terribly important, but it’s fun.

Arizona got a lot of flak last season for hiring for hiring AJ Hinch, a man without previous experience.  How much did your studies show that experience makes a difference to managerial success?
It’s always better to have experience, but then again everyone’s got to get started somehow.  The only real problem I can see is if it costs him the respect of players.  If he has enough personal qualities, that shouldn’t be an issue, and it might not be anyway depending on the mix of men they have in the clubhouse. You and all your readers would know more about this than I would for the Arizona squad.

Looking across history, being a manager without experience is more a matter of the era.  Way back in the day, managers were often player-managers, and not always particularly old players, either.  Fred Clarke was less than 25 when he first filled out a lineup card, for instance.  I’m not sure any team would dare try that now.  A team can always veer from the norm in hiring managers (in some ways its an advantage if you think the norm doesn’t mean too much), but coming with it will be some extra baggage that will have to be dealt with.  

In terms of games won or lost per year, how much difference do you think a good manager now makes over a bad one?
In general, a few/couple games.  Most managers aren’t that wildly different from each other.  That said, it could potentially be more.

Well, sort of.  What I mean is that a manager can be worth more based on how he interacts with the team. In and of himself he isn’t worth more than what I said above, but he can make a substantially larger impact than that.  Get the right man in the right situation, and you can see a Billy Martin-esque improvement.  Alternately, the wrong man in the wrong slot can kill a team.  

Who is the best manager, both all-time and currently?
Easy one: Joe McCarthy. He may not be as closely associated with the job as John McGraw, but there was none finer than McCarthy. In over 20 seasons of managing, he never had a losing season - not even in his partial seasons. That's almost impossible. In comparison, among the dozens of guys with more than five seasons managed, only one other guy always had a winning season. McCarthy was over 20 years. He had a plan to win, implemented it as effectively as possible, and kept doing it year after year, with more than one franchise.

Tony LaRussa is the best manager since World War II. He's been consistently among the best managers in the game for almost three decades now, continually getting the most out of his players year after year, team after team. At this point he's managed more games than John McGraw: did anyone ever think they'd see a manager do that?

Evaluating Baseball’s Managers
A History and Analysis of Performance in the Major Leagues, 1876–2008
by Chris Jaffe
ISBN 978-0-7864-3920-1
tables, appendices, bibliography, glossary, index
333pp. softcover (7 x 10) 2010
Price: $39.95 - Buy from publisher

For more information and some excerpts, please check out the book's website.