-
walshj58 Excellent work, Josh. Really nice.
One question though, regarding this:
For instance, if I remove all the data below 0.5, the data that remain will contain about three times as many injured pitchers as healthy pitchers. This seems like a reasonable spot to me so I will call the region beyond 0.5 the injury zone. If a pitcher enters this zone he is much more likely to end up on the DL than he is to make his next scheduled start.
You obtain the factor of three only after normalizing both distributions to unit area, correct? I think the more relevant quantity is the proportion of injured pitchers in the sample with NN>.5 without normalizing. In other words, if in real life you have many more healthy pitchers than injured ones, your factor of three will be much reduced. In such a case, your last statement quoted above may not be true.
Can you show us the NN plot without the normalization applied?
Another question: does each pitch get a NN value or does a group of 10 pitches get a single NN value? If it's the former, it'd be interesting to see how consistent the 10 pitches are.
-
joshkalk You are absolutely right John it is the unnormalized graph that matters. Here is the regular scale plot
baseball.bornbybits.com/THT/full_NN.gif
and the log scale plot
baseball.bornbybits.com/THT/log_NN.gif
Thanks a lot for catching this quickly and I will try to doa quick edit on this.
As for your other question I am sending the group of ten to the NN but you make another good point about sending each pitch. I am going to have to tell the network the pitch number, probably in reverse, but then there should be no need to stop at ten I should be able to send the whole start and let the NN decide where the differences begin. This will take some major recoding so I won't have an answer right away but I will get to work on that. Thanks.
-
-
pjensen You can ratchet down your cut, but the lower you go the less actual risk there is to the pitcher. This would lead to pitchers who weren't really in any danger being removed from the game.
I am not sure that I would worry too much about this. The important question is do pitchers who display this fall off in performance ever recover and return to pitching well during the game. If not, than they should be pulled whether they are injured or not, and a determination of whether they actually are injured can be made by traditional diagnostics before the next start.
-
joshkalk You make a very good point there. I certianly could add a runs100 check to this as well to determine how much a pitcher suffers when he gets near the injury zone. If his performance is going to be poor anyway and he has a good chance of being injured then there is no reason not to pull him. I'll try to tie up this loose end when I tie up the loose end that John pointed out. Thanks for the excellent input.
-
-
studes Wow. Awesome, Josh. -
richard.betzel A lot more goes into predicting injuries than studying pitch counts. If it were JUST pitch counts relief pitchers would hardly ever break down. How often does a guy throw more than 40 pitches in relief? Likewise, recovery methods also play a huge role. The guy who goes out after a start and houses beers at the local watering hole is not helping his arm heal. Monitoring pitch counts is the first step. But examining the actual mechanics used for delivery and examining anthropomorphic data for the pitcher may prove more enlightening as to WHY certain guys break down at different points.-
joshkalk I completely agree that just pitch counts won't do the job. I certainly would like to head in the direction that you suggest but right now I am limited to the data that I have available. Hopefully when more, better, data becomes available we can start looking into questions aboue why guys break down at different points in a meaningful way.
-
-
wesketchum Great work! Can you make/show the ROC curve for the net output? I can kind of guess by eye, but it would be nice to see it in full (though I realize it's going to look very noisy).-
joshkalk Here it is in all of it's uglyness:
baseball.bornbybits.com/THT/ROC.gif
For those of you unfamilar with this and interested in learning check out here:
en.wikipedia.org/wiki/Receiver_operating_characteristic
-
-
walshj58 Hey Josh,
Another question came to mind: did you use separate samples for training and for validation? In other words, does the plot of the NN output contain the data that you trained the NN on?
A fairer test is to show the performance of the NN on data that have not been used for training.
Just a thought.
-
joshkalk Yes the problem was I didn't want to break my signal sample up becuase I thought it was already small enough. The plan is going to be to use 2009 as a testing ground for validation.
-
-
TheProgram Excellent analysis Josh....it will help the baseball dummy (me) with drafting. -
1M1Ucla This is neat stuff -- I think there are a few factors that lower the ability to discriminate pitchers who are likely to hit the DL. One is the relatively coarse metric of going on the DL -- this happens for a variety of reasons (including roster management) and for differing levels of acuity. Some guys pitch a long time injured and in pain, some guys don't. There is also significant physiological and anatomical variation that makes some guys just more injury prone (Mark Prior, for one). There are also several ways to get to the DL -- you mentioned that this is likely to be sudden, but anecdotally, the number of guys with a sudden acute episode are relatively few compared to the guys that develop the lesion as a more chronic condition that exacerbates over time. Given these confounding factors, the ability of your model to discriminate to the degree it does is impressive. I wonder, if the individuals were marked for prior (no pun intended) injury alone, would that improve the separation of the populations? Anyway, really cool stuff. Looking forward to more!-
joshkalk A very well thought out paragraph there. Let me try to address some of these issues. First, going on the DL is indeed a very coarse metric and, as I mention in the article, I'd love to use a better metric but that is all I have available to me at this point. If you, or anyone, has a spreadsheet with more detailed information I'd love to incorporate that. The fact that I get such good discrimination with such a poor metric is indeed good news going forward. Second, the hope is that comparing a pitcher to himself will help handle issues of different pitchers durability levels. For instance, Sabathia had a very heavy workload in terms of pitch counts, innings, and PAPs but hardly even registered in this study. Third, I mentioned that landing on the DL could be sudden but generally isn't sudden at all. Pitchers who ended up on the DL but who scored very low on this scale likely were sudden injuries. You can see how relatively few of them there were. It is very likely that this, or similar metric, will never be able to do anything about these injuries. If you really are pitching just like you normally would and then all of a sudden a tendon snaps there just isn't much we can do as far as prevention. Fortunately, this appears to be the exception not the rule and chronic conditions occur much more frequently. Last, I am not exactly sure what you mean by marking certain individuals for prior injury. Do you mean if they have been on the DL already this year or on the DL the year before or something like that? If that is the case then that certainly is something I could potentially add to the NN but I am not sure how much it would add. Pitchers who have been injured before should behave in a similar fashion when they are injured again and hopefully that behavior is something this metric can pick up on.
-
The injury zone - The Harball Times
Beyond the Box Score —
The injury zone - The Harball Times
Josh Kalk's latest on pitching injuries is a major step, and one of the most, if not the most, advanced uses of PITCHf/x data to date.
The Injury Zone
THE BOOK--Playing The Percentages In Baseball —
Josh keeps pumping out the great work:
...total movement is actually more important than speed.
I’d think not only as a predictor of injuries, but of fatigue as well. My guess is that the pitcher, knowing he doesn’t have as much bite on his pitches, will “reach back” for a little extra. He gets the speed where he wants, but not the movement.
Kalk: Sabathia Not An Injury Risk
The Yankee Universe —
... The incredible Josh Kalk over at THT is doing some fascinating work with Pitchf/x data, and his latest installment is no exception. Although you are going to have to read his article to understand his methodology, the basic gist of the study was to try to utilize the data to better comprehend pitching injuries and the effects of overuse. The last paragraph in the piece should be heartening for Yankees fans: ...
Pitcher abuse, 'roids and Brosius
Pinstripe Alley —
No, they're not related - just three links to the above topics.
First off, good news regarding Carsten Charles (from a very interesting article about pitch counts and abuse):
I want to close by looking at the pitcher who was ridden as hard as any in the game last year, C.C. Sabathia. Sabathia gobbled up innings late in the year as the Brewers desperately attempted to make the playoffs. There was much talk about the Brewers' right to use a rented player this way, and about then-manager Ned Yost in particular. Near the end of ...
Friday Filberts
ESPN Feed: neyer rob —
... . Everybody loves a two-fer!• Could one use PITCHf/x data to predict injuries? If anyone could, it'd be Josh Kalk . Enjoy him while you can, because within the next year some smart baseball team is going to snatch him away from us.• ShysterBall -- who is a professional and probably does know what he's talking about -- says the case against ...
Harper: Sabathia Workload A Problem?
The Yankee Universe —
... Harper’s concern is fair, although there is no reason to connect Sabathia’s performance yesterday to his high workload. However, this is an issue that we have tackled in the past, with the help of THT’s Josh Kalk. The quoted portion below contains part of Kalk’s article on using statistics to comprehend pitching injuries, as well as ...



