Consistency is key

22
0
 Consistency is key  Links6
 Consistency is key
...or is it? Sal takes a look at which offenses were the most consistent, and which ones weren't, with a little help from his friend Waloddi Weibull . [link]

Tags:

Comments (12)

  • jscape2000 jscape2000
    +2

    So we can see how many games each team under or over performed in... how does this correlate to expected wins? Obviously, if a team's offense gets more consistent but their pitching stays bad or gets worse, they can't expect to add any wins.

    Also, can we conclude from this model (and from previous years) that good offenses are by nature inconsistent? If our givens are 1) there is a floor of Replacement Level, and 2) there is a ceiling so that a 1.000 winning percentage is unattainable, can we conclude that what makes the best offenses the best that they can erupt? Does this article support the idea that hitting comes in bunches?

    Posted 12/3/2007 [reply] [flag]
    • salb918 salb918
      +2

      Hi jscape,

      The correlation to expected (pythagorean wins) is quantitatively unclear.  But a team is most likely to match its pythagorean wins if its distribution matches the Weibull distribution exactly.

      This model suggests that the consistency of an offense and is not related to how good it is.  There is NO correlation between "consistency" (as I defined it) and how good the offense is.   It makes no conclusions about erupting and hitting coming in bunches.

      Posted 12/3/2007 [reply] [flag]
  • Ambiguity Ambiguity
    +1
    I really enjoyed this article. Thanks for the research you've done. It would be interesting to see something like this for individual players. A lot more work, but it would be interesting to see if some players were more likely to spread out their home runs (or some other stat) over the course of a season than the Weibull curve for that stat.
    Posted 12/3/2007 [reply] [flag]
    • salb918 salb918
      +1
      That is a good question.  I suspect that a Weibull curve would be inappropriate for describing the distribution of home run hitting (but then again, maybe not).  In any event, it would be interesting to see how player's distribute their performance, and I would guess that we wouldn't see much difference between players.
      Posted 12/4/2007 [reply] [flag]
  • philosofool philosofool
    +1

    This is intense stuff. Thanks.  I haven't actually finished the article and absorbed all the analysis.  When I read that the bosox and yanks were among the least consistent, I immediately wondered if power hitting teams will tend to look less consistent on this model  because power will mean random chance of scoring many runs.  The quick and dirty analysis of AL teams in 2007 shows a correlation coefficient of -.26 between the consistency measures given in the article and team ISO (SLG - BA). (I did that analysis cause I happened to have a spreadsheet with BA and SLG for AL teams on hand.) It would be interesting to see whether HR/Game has a stronger correlation with inconsistency.

    What I thought was more interesting is that inconsistency is associated with higher runs/game! So even though we bemoan our "inconsistent" team, actually inconsistent teams are were usually playing better in 2007. (Correlation coefficent = .65) (Since I haven't finished the article, for all I know, you point this out.)

    Posted 12/3/2007 [reply] [flag]
    • salb918 salb918
      +1

      Phil,

      That correlation is probably due to small sample size.  Using data from 2004-2007 for both leagues, the correlation between ISO and "consistency" - which is really just a toy to show off the utility of the Weibull distribution - is +.105, which isn't really all that impressive.  That correlation suggests that ISO "predicts" about 1% of the consistency measure.  HR/PAis similarly useless, r = +.08.

      The correlation between  overall offensive output (runs/game) and "consistency" over that time period is .0001 - practically nothing!

      Posted 12/4/2007 [reply] [flag]
  • RedsManRick RedsManRick
    +1

    jscape, I was thinking the same thing. Given that I'm at work and haven't actually done stats analysis since college, I figured I'd just play with the data for a little bit.

    I'm surely breaking stastical rules & assumptions. But given that were using correlation coefficients as measurements of consistency and Sal's conjecture, I figured that a team who was RS consistent and RA inconsistent would tend to have a positive pythag.

    So I took the difference of the RS-RA coefficients as a generic consistency measurement where postives are good and regressed it against the pyth plus/minus. I ended up with an R2 of .009 (negative R), so basically, there's no relationship between my non-sensical variable RSR2-RAR2 and pythag performance.

    For what it's worth...

    Posted 12/3/2007 [reply] [flag]
    • salb918 salb918
      +1
      I'll be looking at the relationship between run distributions and Pythag in two weeks.  Tune in next time...
      Posted 12/4/2007 [reply] [flag]
  • weskelton weskelton
    +1

    Sal,

     I've really enjoyed this article as well as your others on Weibull in baseball.  You do a great job of making it understandable for the common man.  That being said, you made reference to Steven Miller's paper as to the means by which you generate your 3-parameter Weibull.  Unfortunately, at least for me, Miller's work is a little less readable.  I'm wondering if you can provide a description of how you derive your Weibull parameters in more laymen's terms.

    Thanks.

    Posted 12/7/2007 [reply] [flag]
    • salb918 salb918
      +1

      Hi Wes,

      Thanks...I always hesitate to bust out ol' Weibull because I get the feeling I'm not explaining it well.  Miller's work is, IMO, groundbreaking, and I don't think he (or anyone) fully grasps how important of a result it is.  But, it is a tad on the academic side.

      The short story with the Weibull is that IF you assume that runs/game follow a Weibull distribution THEN the Pythagorean formula results.  The exponent in the Pyth formula is the same as the "gamma" parameter in the Weibull.

      The way I get the gamma and alpha parameters is by a process called non-linear least squares (I think this is how MIller does it as well).  Essentially, you make a guess as to what alpha and gamma are and see how well it agrees with the data.  Then you slightly change the guess and see if the agreement gets better or worse.  If you use a computer program to do this in a controlled fashion, you get the so-called "best-fit" parameters - the alpha and gamma give the best agreement between the distribution and the data.

      I hope this helped. 

      Posted 12/8/2007 [reply] [flag]
      • weskelton weskelton
        +1

        Sal,

         Thanks for the reply.  I guess I was hoping that you were going to tell me that there was a simple formula for each of the three parameters and that they were all just based on the relationship of runs scored vs allowed.  I guess that would have been too simple. 

         It sounds like you are telling me that the derivation of the best-fit numbers is really based on an educated guess followed by some systematic tweaking.  Is that right?

        The way I'm understanding this, is that the parameters are 1) shape - how much area is under the curve  2) scale - how much the curve is spread out and 3) location - how much the curve is skewed.  I'm assuming that your gamma (pyth exp)is the "location" parameter.  What are typical values for shape and scale when applied to a typical baseball team?

        Let me know if I'm not making any sense here.

        Posted 12/9/2007 [reply] [flag]
        • salb918 salb918
          +1

           It sounds like you are telling me that the derivation of the best-fit numbers is really based on an educated guess followed by some systematic tweaking.  Is that right?

          Basically, yes. 

            Thanks for the reply.  I guess I was hoping that you were going to tell me that there was a simple formula for each of the three parameters and that they were all just based on the relationship of runs scored vs allowed.  I guess that would have been too simple.

          There is a shortcut that uses a "simple" (I guess it depends on your definition of simple) formula to compute the parameters.  See http://www.beyondtheboxscore.com/story/2006/2/23/164417/484. 

           

          The way I'm understanding this, is that the parameters are 1) shape - how much area is under the curve  2) scale - how much the curve is spread out and 3) location - how much the curve is skewed.  I'm assuming that your gamma (pyth exp)is the "location" parameter.  What are typical values for shape and scale when applied to a typical baseball team? 

          You're close.  The "shape" parameter is gamma.  It doesn't change the area under the curve because the area under the curve is always one (the probability of scoring between 0 and infinity runs is 100%).   It determines how "streched" the curve is.  Gamma is typically 1.6-1.9.  The "scale" parameter is alpha and essentially describes where the maximum is located, it is usually 5-6.  Beta can be thought of as the "location" parameter.  It is always equal to -0.5.

           I hope this helps!  Please ask if I'm not being clear!

          Posted 12/10/2007 [reply] [flag]

Links (6)

Another Reason I'm Not Worried About Next Season
Published 12/3/2007 by jscape2000 <info@pinstripealley.com> at Pinstripe Alley: Front Page Posts
The Yankees are going to bounce back in a big way, and here's another reason why from Sax Baxamusa and the Hardball Times: ...

links for 2007-12-03
Published 12/3/2007 by billfer at The Detroit Tiger Weblog
... Consistency is key — The Hardball Times The Tigers had a remarkably consistent offense - statistically speaking. Scout.com: TigsTown MLN: Signings, Surgery, and More Tigers signed catcher Raynolds Guzman from the Dominican and ...

Weibull
Published 12/3/2007 by Tangotiger (tangotiger@yahoo.com) at THE BOOK--Playing The Percentages In Baseball
... I always thought that the Tango Distribution was something that was already invented.  It seems that it must be a form of the Weibull Distribution.  Hopefully, one of you with some time on your hands can do the comparison, and perhaps finally make me retire the poorly named Tango Distribution. ...

White Sox Offense May Have Been Bad, But It Was Consistent
Published 12/3/2007 by Jay Maxwell at Black Sox Blog
... In a statistically heavy column at the Hardball Times , Sal Baxamusa analyzes which teams were the most consistent in 2007, i.e. which scored in a pocket of runs each and every games.  The White Sox had the most consistent offense in 2007. [ ...

White Sox to lose Fernando Hernandez in Rule 5 Draft
Published 12/4/2007 by The Cheat <info@southsidesox.com> at South Side Sox: Front Page Posts
... to a baby boy. Kenny says a decision has been made regarding who'll be on 3rd base in 2008. And Jerry Reinsdorf feels good about the 2008 Sox. BTW, the Sox will play the Mets in the 2nd annual Civil Rights Game on March 29 in Memphis and maybe against the Rangers on March 28 in Oklahoma City. Jim @ SoxMachine looks at the Quentin-Carter trade. FJM opens for comments. From Tango: Japan to MLB translations and Consistensy is key. Rowand already has a 5-year offer says Robothal. BTW, Minaya may ...

THT:Jaffe:Interview: John Thorn
Published 12/4/2007 at BBTF's Baseball Primer Newsblog
... I’m exhausting our supply of semicolons up there. Chris continues his series of interviews. This time’s victim is none other than John Thorn. Thorn has inspired some of my interest in the early, early days of baseball. Also, I enjoyed this Sal Baxamusa article from yesterday, but never got around to posting it.

Leave a Comment Comment