-
sdanne Nice job, although I didn't understand the reasoning behind squaring the linear weights values of each event. And I'm not sure how GIDP or CS are positive relative to a single, either.
You should read this article from almost a year ago...
http://statspeak.net/2007/11/stats-204-the-proximity-matrix-or-re-visioning-similarity-scores.html
There's a "Zach" commenting on that thread, not sure if it's you or if it's just a coincidence. Two different ways to attack the same problem.
-
zwalters I'm the Zach in the comments from that previous article, as you suspect. It took me a while to follow up because my first SQL script was terribly inefficient. -
jcdorhauer Good stuff. It is articles like this that keep me logging on daily to THT - simply the best at analysis, history, and context. I would love to see the comps on Bob Gibson's 1968 season: 13 shutouts and a 1.12 ERA. I note that all of your comps are for offensive stats - which begs the questions about a parallel article for pitching comps. -
ger8ry The trouble with Arlie Latham & Hugh Nicol is that stolen base didn't mean then what it does today. Those guys could be credited with a steal for going from 1st to 3rd on a single.
Run-based similarity scores
THE BOOK--Playing The Percentages In Baseball —
Great work… to which I disagree. Pizza Cutter did similar work based on rate stats, to which I have lots of comments on his thread. My key point is this:
If you are interested in looking for similar players to Vince Coleman, you may insist that the speed components (3b per 2b+3b and sb per sbOpp) be weighted much more than you otherwise would, because you are really interested in the speed players mostly.
So, in a run-based system, the speed components simply won’t have much ...


