-
John Beamer David,
An interesting read. I have often pondered this question but never got round to thinking through some the the drivers.
I have a few thoughts and comments that I'd be keen to hear your prespective on:
If I am reading your leverage regression right, for a closer like Mariano Rivera his predicted LI is:
1.11 -0.86*(54-39)/63 34/63*2.165 = 2.1
In 2006 his LI was 1.83 (according to fangraphs). Perhaps your equation overrates elite closers? Not sure, I'd have to investigate.
Anyway, do you have a sense for how your predicted LI correlates to David's LI? I'd be interested to see that number.
The other thing that it would be great to know is what is the standard deviation of a team's "leverage wins" as you define it?
On to the pythag regression.
On Managers, if I am reading it correctly, the effect in a season is 0.24 /- 0.61 ... which would imply that the impact isn't significantly different from zero. Am I reading that right? There could be a selection bias issue at work here, which is that manager's who outperform their pythag by luck stick around longer ... not sure if I fully buy that though.
I'm also struggling to interpret the balance coefficient. First, I'm not 100% sure what the range of balance is? I could work this out -- I'm just a little lazy. Why should an increase in balance see a team outperform its pythag? The only explanation is because it score an optimum number of runs more frequently and therefore wins close games ... yeah, that sounds plausible, I guess.
The leverage parameter makes sense. You'd expect teams who win in closer games to have a better pythag ... again having the range in LI wins would help understand the size of the effect ..... my bet is that this account for nearly all of the effect that your equation explains.
-
John Beamer One other thing I was trying to think through was whether you could some how infer what the pythag-actual distribution is based on luck and compare that to the st. dev of the observed.
However, as this isn't a binomial I don't think we can do that (unless we run a simulation, where leveraged wins are distributed randomly)
-
David Gassko The overall correlation for my LI formula was around .50 (this is all off the top of my head), but I used a very low threshold for appearances (20?), so the equation is pretty solid. I've actually looked at it historically and the numbers match what we know and what we would expect almost exactly.
One standard deviation in "LI Wins" was 2.8 wins. You could figure that out, by the way, by dividing the 95% interval I gave by 2, and then dividing it by the coefficient I reported. In case you wanted to know the standard deviation for any other category...
I'm confused by your question on managers. Overall, yes, the impact is very small, but it's there. Selection bias is certainly a viable explanation, though I'm not sure it's the correct one.
You can figure out random chance in Pythag, but it's a bitch to do. If you'd like, I can forward you an e-mail on how to.
-
studes In just trying to put the leverage equation in a different language, is this correct?
*When a team accumulates more saves than expected by their overall run differential, then they're more likely to beat their pythagorean variance.*
I'm just trying to get past all the leverage language and math to understand it better.
-
studes Replying to my own e-mail, you add a level of detail by calcuating "expected saves" while looking at each individual pitcher's run differential and totaling to the team, I think.
-
-
John Beamer David,
I'd be keen to see that email on calculating random chance in Pythag -- if you could forward that would be awesome.
Thanks for the reply, and good pointer on the short-cut for calculated st dev on inputs -- anyway, it does seem that LI wins is the dominant driver of pythag outperformance.
My point on managers, and I may not understand this correctly is that a manager adds 0.25 wins per season but the std error around that is 0.61 wins, which if so isn't significantly different from zero (not to say the effect doesn't exist though)
And if I didn't say so at the start, great work .
-
GuyM Another very interesting piece, David.
I assume that the value of lineup balance is that the SD of R/G scored is lower than average for a given scoring level. Is that what you see in your data? And if so, it would be interesting to also look at the SD of pitchers' RA/G. It should be the case that an unbalanced staff helps a team overperform (and perhaps your LI variable might be picking up some of that?). Basically, you want consistent hitters and inconsistent pitchers.
On managers, I do think selective sampling is the likely explanation. Chris Jaffe did a big analysis of managers looking at their pythag over/under-performance (a methodology I think Phil Birnbaum developed). There were a few threads on this at BTF. I seem to recall that the relationship between a manager's career win% and his pythag over-performance was EXACTLY what you'd expect from pure luck (managers that won a lot will be luckier). And of course, the best way to survive as a manager is to win.
I'm not sure how we would figure out if managers have this skill. Maybe look to see if over/under in first half of career predicts 2nd half?
-
DougWalters I would say that, according to my research, over or under-performing your Pythagorean estimate in any year is NOT random. What I've found is that when a team is on an improving trend over the course of a decade or so, they will consistently over-perform their Pythagorean estimate. When a team is on a declining trend, they will under-perform against their Pythagorean estimates.
My site has more details.
http://footballprofessor.blogspot.com
-
GuyM DSG, I think you have a problem with your leverage variable in this context. It's defined in part by saves, which will be somewhat correlated with team wins (you can't earn a save in a loss) and, even worse, with wins in close games. So we'd expect to find a correlation with over-performing pythag. What you need is a way of measuring leverage that uses only GF, or any other variables that tell you a pitcher is being used late in games, but not contaminanted by saves. Alternatively, just look at the ratio of RA/G for a team's two best relievers compared to the team average, and assume these pitchers were used in high leverage situations.
-
David Gassko Guy,
I'm not sure that is necessarily a problem. Yes, saves will be correlated with wins in close games, but as long as we're accurately predicting LI, that's fine. The whole point is that a great closer (or great bullpen) can result in more close wins, allowing a team to outperform its Pythagorean record.
-
GuyM David: But if you define a great closer as someone whose teams won close games -- as you somewhat are -- then this is just tautological. I'm not saying that using your best pitchers in high LI situations doesn't help a team overperform its pythag -- that seems like a reasonable theory. But you won't know unless you develop a proxy for LI that doesn't include saves. Team wins and saves had an r of .65 last year (and correlation with winning 1- and 2-run games could be even higher), so using saves is almost like throwing team wins into your regression!-
studes Personally, I think the LI issue is making the situation hard to understand. That's why I posted my comments above, trying to express what David did in other terms.
I kind of agree with Guy here. Although David is onto something, it's hard to separate how much of the impact can be attributed to the bullpen, vs. the bullpen saves being a natural outcome. Cause and effect, that sort of thing.
Teams with higher save totals than expected are teams that won more close games, I would think, leading to a positive variance in the Pythagorean formula. But those increased saves could also be the result of a team naturally being ahead by a bit thanks to its offense. Hard to see how you could resolve this without actual LI play by play.
-
John Beamer I think that is right .... but I'm not sure that saves is the root issue -- though it is an issue.
For instance, Guy's suggestion, which was to find pitcher's with high LI without using the saves variable may have the same issue. Teams that win in close games (i.e., who have more leveraged wins) will outperform their pythag and these teams will likely have better relievers.
Getting rid of saves in the regression equation, while more elegant wouldn't change the answer I don't believe.
-
-
-
-
notsellingjeans In layman's terms:
*A balanced lineup insulates a team against injury, players taking days off, and slumps, because the offense isn't overly reliant on one player. That makes them more likely to outperform their expected Pythag, like the A's did last year, because they always had at least 5-6 hitters performing at league average levels for their position in the lineup for all 162 games, even if there was always one starter getting the day off, one or two hurt, and one or two slumping. Here's the flip-side to that, hypothetically: Barry Bonds inflates the Giants expected run and expected win totals. When he takes a day off, or goes on the Disabled List, the offense is in shambles, and the team will play .300 to .400 ball on those days. The argument with Bonds can applied to other teams that derive a high percentage of their offense from one or two players. If their star takes a day off or is mired in a slump, the team is more likely to have their winning percentage suffer on those particular days than the well-balanced team is. Sure, they'll blow teams out when Star Hitter X goes on a hot streak...but when he goes cold, if he's carrying a high percentage of the load, they will lose more often than their Pythag suggests.
*I think that ALL relief, not just closers or set-up men, is certainly a factor in deviation from expected Pythag wins. Anecdotally, look at the Indians last year - it wasn't a coincidence that their bullpen was in shambles and they underachieved relative to their Pythag. Conversely, the A's had a dominant bullpen and outperformed their's. The problem with focusing only on closers is that it ignores the relievers who come in and blow games in the 6th and 7th innings. Due to the small amount of innings they throw, and the leverage of those innings, those relievers have an unfortunately small amount of impact in calculating a team's expected Pythag wins, and yet a larger-than-expect impact in actual wins on the field.
-
studes Good points. I think everyone will agree with your comment about the entire bullpen. The issue is finding enough historical data to perform the analysis, and saves are just about the only historical relief data that has anything to do with Leverage.-
John Beamer Agreed. David's regression estimates leveraged wins (i.e., using the save to signify a win) and not necessarily overall bullpen quality.
Another ancedotal data point on the bullpen issue and pythag underperformance was the Braves 2006 vs Braves 2007 ... the stats speak for themselves
-
-
-
GuyM On balanced lineup, I think the impact has to be less variance in runs scored. A team that regularly scores 3,4,5 runs will win a little more than a boom-bust team with same mean RS.
"Getting rid of saves in the regression equation, while more elegant wouldn't change the answer I don't believe." It's not just an issue of elegance. Saves are themselves an indicator of wins -- that's the problem. While a true LI measure might be correlated with saves, that's not itself a problem. And I don't think this is very hard: comparing a team's two best relievers (or two with most GF) to the team average will tell you what you want to know. If you have two 4.5 RA/G teams, but one's top relievers are 2.5 and the other's are 3.5, the first team can better exploit high LI situations. It's mainly a function of whether you have those kinds of pitchers available in the first place. While every team won't exploit them equally efficiently, that's an adequate assumption for David's analysis.
We may still find a connection to pythag over-performance. But is will almost certainly be weaker than if you rely on saves.
-
John Beamer "Saves are themselves an indicator of wins -- that's the problem"
As, I suspect, is the LI of the two best relievers relative to the others. What's the correlation between the two lowest LI relievers on a team and saves?
Just thinking aloud ... it may be lower that I think because of what counts as a save e.g. a reliever coming in with a five run lead, giving up some runs and earning the save.
Guy -- you regressions below on Leverage ... is that for bullpen LI or total team LI?
-
John Beamer David could have legitimately used saves in his regression rather than LI wins ... that wouldn't be invalid. The conclusion would have been that teams that save more games outperform their pythag.
MLB could bin the save stat and introduce leveraged wins. Leveraged wins will be higher for teams that win close games (i.e. in save situations) ... it is all the same thing
-
GuyM John: it's total team Lev (using BPro definition, not Tango's LI).
-
-
-
GuyM I think there's a way for David to check whether using saves to estimate leverage is a problem. Leverage is highly correlated with saves (r=.59) but only loosely correlated with wins (r=.17). (That's using BPro's Lev, but I assume that'll work for this purpose.) So David can check to see if the average team leverage in his system correlates with wins more than it actually should. If so, that's presumably the influence of using saves.
-
John Beamer Responding to my own question I think that any measure of "leveraged wins" will produce the same problem ... be it saves or looking at the LI of the best relievers.
Guy is right though -- what's important is trying to measure bullpen quality irrespective of wins. Maybe reliver runs per game for the entire pen is the most simple metric in this case ... afterall in the orginal article David states he is trying to control for quality of the bullpen
-
GuyM John: It would be fine to use leveraged wins IF David's modeled LI was measuring only game situation, i.e. actual leverage. But by incorporating saves, the metric is also (I think) relecting -- rather than predicting -- actual game outcomes. The modeled LI is not just picking up actual LI, it's telling us how many wins the team actually had, so using it to then 'predict' team wins is circular.
Imagine we wanted to test the theory that teams with great starting pitchers over-perform their pythag. But my dataset doesn't include ERA for a lot of years, so I decide to use W-L record as my measure of starter quality (it probably predicts starter ERA about as well as the model predicts LI). Surely, starter W-L record will 'predict' over/under on pythag. But I obviously haven't really proved that starters are important (because the W is a team outcome), only that teams that win, win.
-
GuyM Thinking a little more on this, even if we had a pure measure of LI, we'd still have a problem. Since LI is highest for a team with a lead, it is itself a byproduct of winning. A team with high leverage will tend to be a team that won, regardless of how well pitchers peformed in high LI situations. (As I noted above, r=.17 over last 2 years using Woolner's Lev.) Your measure needs to tell us how well a team exploited its leverage opportunities, controlling for total amount of leverage the team had. So I think you need to divide each team's leveraged wins by the team LI (in addition to finding an LI measure not contaminated by saves).
I still think the cleanest shot will be some measure of how good a team's best relievers are compared to the rest of staff.
-
-
-
John Beamer Guy -- thanks for the comment. I agree.
Re-reading David's article it is clear he is trying to capture overall bullpen quality and not Leveraged Wins (ie, wins in close games) as I previously thought.
-
David Gassko Guy is right. The "r" between predicted LI and winning percentage was .57. I still believe my conclusions are right but the magnitude could be wrong; we need to use actual LI data to figure that out. -
salb918 David,
As you know, I am a big fan of run distributions. Because the Pyth thm is a consequence of the shape of run distributions, I would expect deviations from the expected distribution shape to be reflected as deviations from the Pyth thm. Indeed, leverage and balance are two things I can imagine affecting the shape of the run distribution.
If I were able to give you a metric describing how far a team was away from it's expected distribution, could you use that as an input for your regressions?
Well done.
-
GuyM Sal: is there reason to think, or evidence, that balance in a lineup yields less variance in runs scored per game?-
salb918 I can only answer that question in re: offense.
On its face, no. I showed using a naive simulation that a team of league average players one superstar has the same run distributionas a team of slightly above average players. In other words, when two teams have the same composite offensive rate stats, it doesn't matter whether they take a star and scrubs approach or an even steven approach (http://www.beyondtheboxscore.com/story/2005/9/22/103843/776).
In practice, maybe. A team that takes a star and scrub approach is more prone to injury. If the star goes down, you're left with just scrubs. But if one of your stevens goes down, the hit shouldn't be so bad.
Leverage, on the other hand, I haven't properly explored its affect on run distribution. So I'll leave that as an open question at this point.
-
-
David Gassko Yeah, sure. The one question would be is that variance repeatable or due to luck? My interest is how much we can predict a team's Pythagorean variance before the fact, not what explains it after.-
salb918 How much of the variance is repeatable or due to luck, I am not sure. I'll try to compile and send some data your way over the weekend.
Knowing a priori how a team will leverage its bullpen is difficult, if not impossible, since a) bullpen roles other than closer are among the most fluid in baseball b) relievers are more variable in their performance. And leverage looks like it is the most important factor here. Still, it's something worth exploring.
Maybe the question I should be asking is, "Given that the ability to leverage a bullpen allows a team to outperform its Pyth, is this phenomenon due to the bullpen leveraging distorting the run distribution?"
-
-
-
John Beamer All said and done, my perspective is:
1) David's equation showed that teams that win more close games outperform pythag -- now this is a result we'd expect.
2) This doesn't necessarily tell us that the better bullpens mean that teams outperform pythag
3) A strong hypothesis would be that balanced teams that have strong bullpens are more likely to play in close games, and are therefore more likely to outperform pythag. David's equation did not answer this.
In summary, David's equation wasn't completely tautological.Demonstrating that teams with more saves outperform pythag is an intuitive result altough not necessarily a guaranteed one.
-
mcrawfordhulla In re: Pythagorean theorem and managers, did you know that there is a paper called "A Note on the Pythagorean Theorem of Baseball Production" by Ruggiero, et al in Managerial and Decision Economics? Worth reading, if you can search for it in JSTOR. -
Gael64 I'm not a math guy, but the overall premise makes a great deal of sense. One's pitchers are put out every 5th, one's batters hit every 9 at bats. One's BP, though, can be leveraged. This year's Sox have nearly unhittable relievers in Okajima and Papelbon, and use those players in situations that are to the team's advantage. Given this ability to maximize a player's effectiveness, it only stands to reason that all runs scored are not created equally. Just like the football team that gives up a safety when it is ahead by six or the basketball team that gives up a lay-up when it's up by three, a baseball team does not uniformly place the same value on a run scored against. Another factor would be how the infield and outfield are positioned in a given situation. My compliments, David, on smashing this faulty assumption.
Blogging the Bases: Royals going for the sweep?
FanIQ Blog —
...
Pondering Pythagoras Because I have a feeling today's going to be a slow sports day, here's a reading assignment for stat geeks or not. I feel that somewhere in Sabermetrics there should be a Cubs factor added which would help explain under performance of numbers.
Pythagorus On The Cubs
A Hundred Next Years —
... Click here for an answer, courtesy of The Hardball Times and author David Gassko.
Pythagorus On The Cubs
A Hundred Next Years —
... Click here for an answer, courtesy of The Hardball Times and author David Gassko.
More chop links: May 27
Atlanta Braves —
... . David Gassko’s column on why teams beat their pythag record was excellent too, though, I suspect a little nerdy for many.
Expecting the Unexpected
Another Cubs Blog —
... , but we still don t know why the team is so much worse in close contests. A while back i said that, assuming the Cubs aren t playing above their collective heads, they should improve their record. Clearly, that hasn t happened as they ve fallen far below .500 since that piece. (Was it really almost a month ago?) While i planned for my next major piece to test the assumption that they weren t playing above their collective heads, recent events have changed those plans. Specifically, i have changed plans after witnessing the implosion of the Cubs combustible bullpen and reading work by David Gassko that asks a very important question: why should we be so sure that teams outperform or underperform their Pythagorean records just by chance? In the piece where Gassko asks that question, he then finds three areas that show a decent correlation to deviation from expected Pythagorean records: lineup balance, bullpen
What do Pythagorean residuals really measure?
MVN RSS —
... The answer is yes. Former StatSpeaker David Gassko found that a team with a balanced offensive lineup was more likely to outperform their projection. He also found that a team with a manager who had been around a while was marginally more likely to outperform their projection. The folks over at Baseball Prospectus found evidence that ...
How Best to Measure a Team's True Talent
Baseball Analysts —
... before, but here I take another look at it. Which one of these three metrics (WPCT, Pythagorean WPCT, and component Pythagorean WPCT) is best and is there some way to combine all three metrics to get the the best possible estimator of a team's ability? ...

