Submit a Story!
Get the BallHype iPhone App
topics:

The injury zone
The injury zone
A method for identifying when a pitcher is injured. Click the title to read more. Order the Hardball Times 2009 Season Preview today !
Sabathia Leaves With The Flu
Sabathia Leaves With The Flu
slidingintohome.blogspot.com — According to Bryan Hoch , CC Sabathia missed his scheduled live batting practice session today because he... came down with the flu. Yankees manager Joe Girardi said that Sabathia wanted to throw the session but vomited and was permitted to leave camp. Sabathia told Girardi that his children were ... (more) Sabathia Leaves With The Flu
Sabathia saddled with an aging zoo
sfgate.com — For one thing, the Yankees' so-called powerhouse is hardly that. This is an old and terribly vulnerable... team, starting with Derek Jeter, who turns 35 this summer and continues a gradual decline so evident, the stat guys are calling him the worst ... (more) Sabathia saddled with an aging zoo
The 2009 Prospect Mine: Milwaukee Brewers
fangraphs.com — The Milwaukee Brewers system is loaded with offensive talent - even after the trade for C.C. Sabathia... last season, which included top offensive prospect Matt LaPorta . Unfortunately, the pitching depth is thin and the organization has traditionally ... (more) The 2009 Prospect Mine: Milwaukee Brewers
14 Comments
  • walshj58 walshj58
    +2

    Excellent work, Josh. Really nice.

     One question though, regarding this:

    For instance, if I remove all the data below 0.5, the data that remain will contain about three times as many injured pitchers as healthy pitchers. This seems like a reasonable spot to me so I will call the region beyond 0.5 the injury zone. If a pitcher enters this zone he is much more likely to end up on the DL than he is to make his next scheduled start.

    You obtain the factor of three only after normalizing both distributions to unit area, correct?  I think the more relevant quantity is the proportion of injured pitchers in the sample with NN>.5 without normalizing.  In other words, if in real life you have many more healthy pitchers than injured ones, your factor of three will be much reduced.  In such a case, your last statement quoted above may not be true.

    Can you show us the NN plot without the normalization applied? 

    Another question: does each pitch get a NN value or does a group of 10 pitches get a single NN value?  If it's the former, it'd be interesting to see how consistent the 10 pitches are. 

    Posted 2/17/2009 respond (flag)
    • joshkalk joshkalk
      +2

      You are absolutely right John it is the unnormalized graph that matters.  Here is the regular scale plot

      baseball.bornbybits.com/THT/full_NN.gif

      and the log scale plot

      baseball.bornbybits.com/THT/log_NN.gif

      Thanks a lot for catching this quickly and I will try to doa quick edit on this.

      As for your other question I am sending the group of ten to the NN but you make another good point about sending each pitch.  I am going to have to tell the network the pitch number, probably in reverse, but then there should be no need to stop at ten I should be able to send the whole start and let the NN decide where the differences begin.  This will take some major recoding so I won't have an answer right away but I will get to work on that.  Thanks.

      Posted 2/17/2009 respond (flag)
  • pjensen pjensen
    +2

    You can ratchet down your cut, but the lower you go the less actual risk there is to the pitcher. This would lead to pitchers who weren't really in any danger being removed from the game.

    I am not sure that I would worry too much about this.  The important question is do pitchers who display this fall off in performance ever recover and return to pitching well during the game.  If not, than they should be pulled whether they are injured or not, and a determination of whether they actually are injured can be made by traditional diagnostics before the next start.

    Posted 2/17/2009 respond (flag)
    • joshkalk joshkalk
      +2
      You make a very good point there.  I certianly could add a runs100 check to this as well to determine how much a pitcher suffers when he gets near the injury zone.  If his performance is going to be poor anyway and he has a good chance of being injured then there is no reason not to pull him.  I'll try to tie up this loose end when I tie up the loose end that John pointed out.  Thanks for the excellent input.
      Posted 2/17/2009 respond (flag)
  • studes studes
    +2
    Wow.  Awesome, Josh.
    Posted 2/17/2009 respond (flag)
  • richard.betzel richard.betzel
    +2
    A lot more goes into predicting injuries than studying pitch counts. If it were JUST pitch counts relief pitchers would hardly ever break down. How often does a guy throw more than 40 pitches in relief? Likewise, recovery methods also play a huge role. The guy who goes out after a start and houses beers at the local watering hole is not helping his arm heal. Monitoring pitch counts is the first step. But examining the actual mechanics used for delivery and examining anthropomorphic data for the pitcher may prove more enlightening as to WHY certain guys break down at different points.
    Posted 2/17/2009 respond (flag)
    • joshkalk joshkalk
      +2
      I completely agree that just pitch counts won't do the job.  I certainly would like to head in the direction that you suggest but right now I am limited to the data that I have available.  Hopefully when more, better, data becomes available we can start looking into questions aboue why guys break down at different points in a meaningful way.
      Posted 2/17/2009 respond (flag)
  • wesketchum wesketchum
    +2
    Great work! Can you make/show the ROC curve for the net output? I can kind of guess by eye, but it would be nice to see it in full (though I realize it's going to look very noisy).
    Posted 2/18/2009 respond (flag)
    • joshkalk joshkalk
      +2

      Here it is in all of it's uglyness:

      baseball.bornbybits.com/THT/ROC.gif

      For those of you unfamilar with this and interested in learning check out here:

      en.wikipedia.org/wiki/Receiver_operating_characteristic
      Posted 2/18/2009 respond (flag)
  • walshj58 walshj58
    +2

    Hey Josh,

    Another question came to mind: did you use separate samples for training and for validation?  In other words, does the plot of the NN output contain the data that you trained the NN on? 

    A fairer test is to show the performance of the NN on data that have not been used for training.

    Just a thought. 

    Posted 2/18/2009 respond (flag)
    • joshkalk joshkalk
      +2
      Yes the problem was I didn't want to break my signal sample up becuase I thought it was already small enough.  The plan is going to be to use 2009 as a testing ground for validation.
      Posted 2/18/2009 respond (flag)
  • TheProgram TheProgram
    +1
    Excellent analysis Josh....it will help the baseball dummy (me) with drafting.
    Posted 2/18/2009 respond (flag)
  • 1M1Ucla 1M1Ucla
    +1
    This is neat stuff -- I think there are a few factors that lower the ability to discriminate pitchers who are likely to hit the DL. One is the relatively coarse metric of going on the DL -- this happens for a variety of reasons (including roster management) and for differing levels of acuity. Some guys pitch a long time injured and in pain, some guys don't. There is also significant physiological and anatomical variation that makes some guys just more injury prone (Mark Prior, for one). There are also several ways to get to the DL -- you mentioned that this is likely to be sudden, but anecdotally, the number of guys with a sudden acute episode are relatively few compared to the guys that develop the lesion as a more chronic condition that exacerbates over time. Given these confounding factors, the ability of your model to discriminate to the degree it does is impressive. I wonder, if the individuals were marked for prior (no pun intended) injury alone, would that improve the separation of the populations? Anyway, really cool stuff. Looking forward to more!
    Posted 2/20/2009 respond (flag)
    • joshkalk joshkalk
      +1
      A very well thought out paragraph there.  Let me try to address some of these issues.  First, going on the DL is indeed a very coarse metric and, as I mention in the article, I'd love to use a better metric but that is all I have available to me at this point.  If you, or anyone, has a spreadsheet with more detailed information I'd love to incorporate that.  The fact that I get such good discrimination with such a poor metric is indeed good news going forward.  Second, the hope is that comparing a pitcher to himself will help handle issues of different pitchers durability levels.  For instance, Sabathia had a very heavy workload in terms of pitch counts, innings, and PAPs but hardly even registered in this study.  Third, I mentioned that landing on the DL could be sudden but generally isn't sudden at all.  Pitchers who ended up on the DL but who scored very low on this scale likely were sudden injuries.  You can see how relatively few of them there were.  It is very likely that this, or similar metric, will never be able to do anything about these injuries.  If you really are pitching just like you normally would and then all of a sudden a tendon snaps there just isn't much we can do as far as prevention.  Fortunately, this appears to be the exception not the rule and chronic conditions occur much more frequently.  Last, I am not exactly sure what you mean by marking certain individuals for prior injury.  Do you mean if they have been on the DL already this year or on the DL the year before or something like that?  If that is the case then that certainly is something I could potentially add to the NN but I am not sure how much it would add.  Pitchers who have been injured before should behave in a similar fashion when they are injured again and hopefully that behavior is something this metric can pick up on.
      Posted 2/20/2009 respond (flag)
Blog Reactions

The injury zone - The Harball Times
Beyond the Box ScoreThe injury zone - The Harball Times Josh Kalk's latest on pitching injuries is a major step, and one of the most, if not the most, advanced uses of PITCHf/x data to date.

The Injury Zone
THE BOOK--Playing The Percentages In Baseball — Josh keeps pumping out the great work: ...total movement is actually more important than speed. I’d think not only as a predictor of injuries, but of fatigue as well.  My guess is that the pitcher, knowing he doesn’t have as much bite on his pitches, will “reach back” for a little extra.  He gets the speed where he wants, but not the movement.

Kalk: Sabathia Not An Injury Risk
The Yankee Universe — ... The incredible Josh Kalk over at THT is doing some fascinating work with Pitchf/x data, and his latest installment is no exception. Although you are going to have to read his article to understand his methodology, the basic gist of the study was to try to utilize the data to better comprehend pitching injuries and the effects of overuse. The last paragraph in the piece should be heartening for Yankees fans: ...

Pitcher abuse, 'roids and Brosius
Pinstripe Alley — No, they're not related - just three links to the above topics. First off, good news regarding Carsten Charles (from a very interesting article about pitch counts and abuse): I want to close by looking at the pitcher who was ridden as hard as any in the game last year, C.C. Sabathia. Sabathia gobbled up innings late in the year as the Brewers desperately attempted to make the playoffs. There was much talk about the Brewers' right to use a rented player this way, and about then-manager Ned Yost in particular. Near the end of ...

Friday Filberts
ESPN Feed: neyer rob — ... . Everybody loves a two-fer!• Could one use PITCHf/x data to predict injuries? If anyone could, it'd be Josh Kalk . Enjoy him while you can, because within the next year some smart baseball team is going to snatch him away from us.• ShysterBall -- who is a professional and probably does know what he's talking about -- says the case against ...

Harper: Sabathia Workload A Problem?
The Yankee Universe — ... Harper’s concern is fair, although there is no reason to connect Sabathia’s performance yesterday to his high workload. However, this is an issue that we have tackled in the past, with the help of THT’s Josh Kalk. The quoted portion below contains part of Kalk’s article on using statistics to comprehend pitching injuries, as well as ...

Related Content
Yankees ready to take the field
yankees.lhblogs.com 2/22/2009 — CC Sabathia is back in action after some intestinal discomfort yesterday. He will throw BP today. Here are the pitching assignments today for live BP: Field 1 (starting at 12:25): Melancon, Tomko, Hacker, Brackman, De La Rosa, Claggett, ...
Sabathia On His New Team
slidingintohome.blogspot.com 1/19/2009 — From SI.com : "If you ask anybody in my family or anybody that knows me, I don't think there's any outside pressure that could be put on me that I don't put on myself," Sabathia said Saturday night before being honored with the Warren Spahn Award. "I put an enormous amount of pressure on ...
Sabathia will do his part — and more
yankees.lhblogs.com 2/14/2009 — Here’s what kind of pitcher the Yankees have on top of their rotation. When the Milwaukee Brewers were trying to make the playoffs last season, CC Sabathia was pitching deep into games and working on three days rest because they were short ...
One Bad (Spring) Start Must Mean It's Time To Panic
slidingintohome.blogspot.com 3/12/2009 — .. And panic about something completely unrelated to yesterday's start. From John Harper (hat-tips to Scott Proctor's Arm and Rob Neyer ) : CC Sabathia didn't forget how to pitch, let's get that out of the way first. Getting smacked around in early March is practically a rite of spring for ...
Sabathia content with simulated start | yankees.com: News
newyork.yankees.mlb.com 3/3/2009 — KISSIMMEE, Fla. -- Not long after his teammates boarded their bus for Kissimmee on Monday, Yankees left-hander CC Sabathia lumbered onto the mound at George M. Steinbrenner Field and began to throw. A small cluster of reporters and uniformed ...
The Big Man Hits the Big Town
futilityinfielder.com 12/11/2008 — Less than 24 hours after filing the K-Rod signing story for SI.com, I awoke to an email from their editor asking me to prep a similar article on CC Sabathia given the reports that he was poised to sign with the Yankees -- which was news to me at that ...
The Sabathia Opt-Out
jorgesaysno.blogspot.com 12/11/2008 — We look into why Sabathia's out clause after year 3 is great, for Sabathia.
Sabathia Got Pwned
thebuckychannel.com 4/7/2009 — Chances are this pitch thrown by CC Sabathia either went for a hit, was a ball four, or was a wild pitch. What we certainly know is that this pitch wasn't strike three, as Sabathia didn't throw a strikeout in a game for the first time since 2005. No matter what the result of this pitch was, it ...
CC Says Y-E-S to NY
hardballtimes.com 12/10/2008 — Jeez, you take two hours to write up a legal brief, and the whole world goes and gets itself in a hurry : CC Sabathia is not going to play on the West Coast. He is not going to play in the National League. CC Sabathia is going to be a Yankee, The Post has learned exclusively. After three ...
CC Sabathia Rumors: Monday
umpbump.com 12/11/2008 — Yep, we’re pretty much all sat around today checking on the latest CC Sabathia rumors . Here are all the rumors that mlbtraderumors.com missed (as always, the latest rumors are higher up): 10:40 pm: Ned Colletti has just been found in the MGM ...