We’re just a few days out from the annual announcement of the results from the Hall of Fame balloting. These are going to be a tense few days for Trevor Hoffman.
After Hoffman fell just 1 percentage point below the 75 percent threshold for enshrinement last year, he seemed like a shoe-in to go over the top this time around. However, Hoffman’s gains have thus far been marginal. According to Ryan Thibodaux’s Hall of Fame tracker, Hoffman has 77.8 percent affirmative votes on the ballots that have been revealed. That means he needs to get a thumbs-up on 72.6 percent of the ballots that haven’t yet been revealed. It seems like he’s in good shape, but Hoffman is on high alert until the numbers are official.
If Hoffman does get in, he’ll become the sixth player to earn a spot in Cooperstown based primarily on his work as a reliever. Or maybe he’s the fifth. We know he’ll join a group that includes Rich Gossage, Hoyt Wilhelm, Rollie Fingers and Bruce Sutter. Dennis Eckersley probably belongs in that group as well, but it’s hard to say for sure. After all, from 1975-79, Eckersley ranked second in the AL in pitcher WAR — as a starter. He wouldn’t have made the Hall as a starter though, and it’s somewhat debatable if Eck would have gotten in solely as a reliever. Instead, we consider both.
Still, it was definitely Eck’s influence as a closer that put us on the road to where we are today in terms of voting pure relievers into Cooperstown, a notion with which not everybody is on board. At the same time, the balloting the past two years has told us something else: We still don’t have a great idea what to do with the Hall of Fame candidacies of relievers.
That story is told in part by Hoffman but also by the other great reliever on the ballot, a guy whose tracker data already tells us he’s not getting in this time around: Billy Wagner.
A lack of context
The LaRussa-Eck-Duncan revolution was based on the invention of the one-inning closer — a relief ace who was no longer a fireman in the classic sense but a security blanket — the light at the end of the tunnel that is left burning until it’s needed to lead the way through the final leg of a game’s journey. We may well be in the midst of another widespread evolution of bullpen deployment.
If so, that will leave that rarefied era — from roughly halfway through the 1987 season through now, just a bit over three decades — as a relatively short one. In this era, the Eck model has come to dominate how we think about those pitchers we’ve come to know as closers. On the scale of baseball history, it’s a blip on the radar, a blink of the eye. Any hard-core baseball fan who is at least 40 has at least scattered memories of how it was before Eck came along. Eckersley didn’t retire until 1998 and he wasn’t eligible for the Hall until 2004, when he was elected with 83.2 percent of the vote on his first appearance on the ballot. It’s just not that long ago.
When it comes to Hall of Fame voting, most writers have always had the luxury of comparing the records of the names on the ballot to those through history, and to those who have previously been enshrined. However, relievers in general and closers in particular don’t afford that luxury. They just haven’t been around long enough, not in their contemporary form, and we don’t have the historical context with which to measure them.
Plus, our tools for doing these evaluations have never been that good. Or should I say tool: saves.
Until a few years ago, saves were the main criteria we had when looking at closers. Oh, you might look at save conversion percentage or something, or even ERA, but the data point was saves. That was true when it came to things such as free agency and arbitration hearings. But it was a flawed data point, especially when it comes to Hall balloting.
Saves, as you probably know, are another fairly recent invention, coming into its initial existence in the mind of Hall of Fame writer Jerome Holtzman at the end of the 1950s. It wasn’t adopted as an official statistic by MLB until 1969. So when we talk about the saves total of, say, Wilhelm, we’re only able to do so because we’ve gone back and calculated those numbers after the fact. Relievers didn’t realize they were achieving such a thing as a “save” until the 1960s, if then.
We plodded along with saves until, really, about 2006, when the publication of “The Book” first brought leverage-based statistics into the light. These days, you hear everyone from the writer in the press box to the ball boy standing next to the dugout refer to “high-leverage situations.” Before “The Book,” the advent of Fangraphs.com shortly thereafter, and the adoption of the full suite of leverage-based statistics at baseball-reference.com, no one could have told you what the hell a high-leverage situation was. Now it’s standard jargon.
Before we had leverage-based metrics, you had to wade pretty far into the weeds to understand the smarter arguments made against the Eck-inspired practice of saving your best bullpen arm for such a narrow band of opportunities, many of which don’t ultimately have that high of an impact on winning games. The mainstreamers had only saves, and because teams paid big dollars for them, an entire universe of myths emerged around the save statistic that would have been only moderately out of place in a George R.R. Martin novel.
One such myth that still persists is that it is a rarefied human who can excel under the glare of the ninth-inning spotlight, if the game is close and the stakes are high. To be sure, those are clutch, game-turning moments, at least when the pitchers fail. But we see a dozen examples every season of a pitcher previously thought to be unworthy of ninth-inning anointment suddenly morphing into a “proven closer” — like a caterpillar turning into a butterfly. Teams don’t buy (literally) these myths much any more, even though most players and managers still insist — probably correctly — that there is something different about pitching the ninth inning.
But what we’ve learned is that there are more effective relievers capable of getting those precious last outs than there are jobs to be had as closers. That, as much as anything, explains why the dollars spent in this year’s free-agent class have shifted toward relievers who are simply good pitchers, regardless of their previous staff roles. Eventually, if current trends continue, we’ll have to tackle a whole new set of questions when it comes to evaluating the careers of someone like Cleveland Indians super-reliever Andrew Miller.
The old numbers
Saves is a poor evaluative statistic but when it comes to the cases of pitchers such as Hoffman, Wagner and poor Lee Smith, who fell off the ballot last year, they at least can help identify relievers who were durable and relatively consistent over a long period of time. Those are key traits to look at when it comes to a position that sees more attrition and year-to-year volatility than others.
But saves can be only a starting point because the problem with them is that they are entirely contingent upon opportunity. If a manager sees you as a closer, then you’re a closer. If you’re the best reliever in baseball — Miller is the perfect example — and have a certain amount of versatility, you may not be a closer at all, or only when the situation calls for it. You don’t rack up saves because your impact is maximized by use in earlier tight-game situations when the leverage is high. You’re not wasted on a three-run game when most competent pitchers can trot into a bases-empty situation and get three outs.
These days, if we drop an old measure, like saves, the answer is often to just use WAR. Problem solved, right? Well, not if we want to evaluate relief pitchers. If we used WAR as the sole criteria, virtually no pure relief pitcher would ever find his way to Cooperstown.
Among pitchers in the Hall of Fame, you’ve got three players at the bottom of the WAR leaderboard who are in the Hall for reasons other than major league pitching — Babe Ruth (hitting), Satchel Paige (a long, glorious career in the Negro Leagues) and Hank O’Day (umpiring). If you remove them, the two players holding up the leaderboard are Fingers and Sutter, who together compiled fewer WAR (49.56) than 56 pitchers who are enshrined.
During his 12 years as a closer, Eckersley compiled just 16.8 of his 62.5 career WAR. Even Mariano Rivera, the undisputed best reliever of all time, posted 56.6 WAR, a total that would rank just 49th among Hall of Fame pitchers, right smack between Eppa Rixey and Red Ruffing. (This leaves aside Rivera’s monumental postseason record.) The average Hall pitcher put up 70 WAR. So if even the great Rivera comes up short by that yardstick, and we accept that saves are a poor substitute, how are we supposed to put relievers into context?
Should we put them in at all?
For some, the sheer lack of bottom-line value (as expressed by WAR) isn’t high enough for most relief pitchers to justify Hall consideration. The key word to remember in that sentence is “value.” WAR is meant to provide a short-hand metric for measuring the worth of a player in a given season and through a career. It’s a mash-up of quality and quantity, and you really need a good dose of both to shine. Relievers, by the definition of their roles, lack the quantity.
A couple of years ago, analyst Joe Sheehan, writing in his subscription newsletter, framed an off-shoot of this value question by comparing the Hall cases of Hoffman and former big league starter Andy Benes, a teammate of Hoffman’s for a time in San Diego. Here’s the baseball-reference.com snapshot of the two.
It’s a great way to highlight the problems of considering value of relief pitchers. Hoffman’s career ERA was more than a run lower than Benes’, but the latter threw over 1,400 more innings. Those disparate paths resulted in a similar level of career value: 31.4 WAR for Benes and 28.0 for Hoffman. Surprised? You shouldn’t be. We could come up with dozens of such comparisons. But you also see on that page that Hoffman’s average leverage index (1.92) was nearly twice that of Benes (1.00). He was used in many more pressure situations. That’s a fairly typical disparity in leverage between starters and closers.
As a result of those differing roles, despite Benes’ huge edge in opportunity, Hoffman put up 34.2 win probability added (WPA) during his career compared to Benes’ 10.5.
So is that the panacea we’ve been looking for to contextualize and justify the modern closer? Not really.
Again, we’re limited here by opportunity — a pitcher exposed to more high-leverage spots has a much better chance of compiling WPA. That makes it another smart way of looking at durability, more precise than saves, but it doesn’t quite capture what we want. Better would be a measure that looks at how well a pitcher responds to those high-leverage spots. That brings us to one last leverage-based metric: WPA/LI. This adjusts WPA for opportunity, resulting in what is called “context neutral wins.” And by that measure, Hoffman again dominates Benes: 19.3 to 7.6. Even so, Hoffman’s WPA/LI ranks just 61st all time, right behind David Price and just ahead of Mark Buehrle.
None of that is the crux of Sheehan’s Benes vs. Hoffman example, in which he’s building a case against Hoffman. To paraphrase Sheehan, his stance is that Benes was used as a starter because he could be, and that’s how teams maxed out his value and his impact on winning games. They did the same for Hoffman, but he maxed out by working in a relief role.
It’s likely that Hoffman couldn’t have done Benes’ job — if he could, he surely would have been asked to try. While we can’t know for sure, it is more likely that given the same opportunity, even a non-elite starter like Benes could have had a successful career as a reliever. That is really the crux of the relievers-Cooperstown conundrum.
On the flip side, the simple argument in favor of relievers/closers looks at the very evolution of the game toward increased specialization. Relievers have become ever more essential to the way baseball is played and their importance grows with each passing season. These days, the best relievers are usually paid like stars, so teams obviously see value in them above that which is captured by WAR. That suggests that to leave elite relievers out of Cooperstown would be to leave incomplete the story told by all those portraits hanging in the Plaque Room.
The difficulty of differentiation
So then, rather than trying to find a metric that puts top relievers on an even footage with starting pitchers, much less position players, if we grant that space needs to be reserved for Hall pitchers, then we need only to compare them to each other. Given the birth of leverage stats, that’s not as hard to do as it used to be. Take this list of five random relievers, taken from the full list of the 28 pitchers with at least 300 career saves.
It seems impossible to separate these guys by this criteria, but it wasn’t so long ago these numbers would have served as the basis for looking at a reliever’s career. Two of the pitchers in that group are in the Hall of Fame: Gossage (he’s the one with 310 saves) and Sutter (300). The top saves guy on the list, whose ERA correctly suggests he can get into the Hall only by buying a ticket, is Jose Mesa.
The other two pitchers in the group are Tom Henke (311 saves), who had the lowest ERA in the sample group and the second-most saves, and Jeff Montgomery (304 saves). Both were fine pitchers, and both fell off the Hall ballot after their first year of eligibility. Now let’s present the same five pitchers with WAR, WPA and WPA/LI data included.
If the last metric (WPA/LI) is the most telling of our tools for capturing the quality and quantity of a reliever’s career, then the Hall doors would need to swing pretty wide to justify Sutter’s election. However, if they did, that might not be a bad thing. Among the 11 300-save relievers ahead of Sutter in WPA/LI are Rivera, Eckersley, Fingers and Gossage — three of the other four relievers in the Hall, plus Rivera, who will be a no-brainer in next year’s class.
Not included in the 300-save group is the fifth Hall of Fame reliever, Wilhelm, who played from 1952-72 and ended up with 228 saves. That was the the big-league record until Fingers surpassed him in 1980. If you include Wilhelm as a 300-save guy, which you might if you’re adjusting his career numbers for era, his WPA (30.8) would rank fifth and his WPA/LI (27.0) would rank second, behind Rivera’s 33.6. His case stands up well to advance metrics that didn’t exist when Wilhelm played.
Using these tools, we can draw concrete conclusions. We see separation for Gossage as the best pitcher in our random group. We can easily dismiss Mesa and Montgomery. However, Sutter’s case — already decided as a “yes” by the voters — looks a little more murky.
The other story of our random group of five is Henke, whom the table told us was better than Sutter in saves, WPA and WPA/LI. This is where we come back to our question of historical context. Henke dropped off the ballot in 2001, five years before “The Book” was published. No one created much of a fuss at the time, largely because few had the tools to understand just what kind of career Henke enjoyed. That’s not to say that Henke is a Hall of Famer, but he at least deserves consideration by a future veterans committee, more than he got by the voters in 2001.
Cases for reconsideration
Here are Sutter’s key rankings in our aforementioned group of 28 300-save relievers: ninth in WAR, 17th in WPA 12th in WPA/LI. Now, let’s present the WPA/LI leaderboard for the 300-save relievers:
We’ve gone over most of these players. Nathan, Papelbon and Rodriguez, who is active, aren’t yet eligible for the ballot. All three will be litmus tests for how our evaluation of relievers evolves over the next few years, especially as non-closers like Miller gain in prominence and in number.
Smith, briefly mentioned above, is a special case. His 478 careers saves kept him on the ballot for all 15 years of his eligibility, which expired with last year’s vote. Smith’s support topped out at 50.6 percent in 2012. Saves, it seems, only went so far and weren’t enough to put him over the top, just as they haven’t been for most relievers. Yet, ironically, even beyond the saves, the more advanced measures tell us it’s a disorderly world in which Sutter is a Hall of Famer and Smith is not.
Like Henke, Smith deserves to eventually be re-evaluated by a veterans committee. Doesn’t mean he gets in, but if others, like Rodriguez and Nathan, find their way in, then Smith’s case will be bolstered. Nevertheless, it’s hard to imagine all 11 of these relievers eventually getting in the Hall, either by vote or a future committee. So it appears that Sutter will always be a bit of a statistical outlier.
Back to the present
That brings us to the two players left anonymous in the table above. Both are on this year’s ballot and you probably already know who they are since they were established as our subject back in the introduction. But let’s leave the names out of it for the moment to focus on the results and isolate the key metrics in terms of their rankings among the group of 28 300-save guys:
Clearly Player X has the edge in these metrics, but this is a straight-up comparison of historically elite closers. If we’re widening the Hall doors for the modern reliever, both of these players are clearly well above the thresholds for the position. And if indeed we are including closers as a position group, and they are evaluated primarily among themselves, Player X (Hoffman) and Player Y (Billy Wagner) both are then worthy.
According to Thibodaux’s Tracker, Hoffman has 77.8 percent of ballots received so far — putting him on pace for enshrinement — and Wagner (10.6 percent) has already been eliminated. Luckily, Wagner should have enough support to remain on the ballot, which he should.
According to the tracker, Hoffman has already received 154 votes; Wagner is at just 21. What does this tell us? First off, Hoffman’s probable election suggests that we’re making progress when it comes to understanding the context of the elite careers of the best of the modern closers. His election would also be another data point telling us that the writers, as a group, have decided that the best pure relievers deserve spots in Cooperstown.
Wagner’s vote total tells us something else: Too many voters are still leaning too heavily on saves in order to judge relievers. A few years ago, perhaps you couldn’t blame them. These days, however, our tools for making sense of all this are better than they’ve ever been. It appears that we still have a long ways to go at putting these tools to their best use.