There’s been a lot of discussion/arguing about hitting mechanics and their relationship with success over at VEB recently.  The sabremetric crowd (of which I am clearly included) has proposed a number of different analytical studies to “test” and find the order of magnitude associated with the up to date hitting theories.  I figured I’d outline here what I’m specifically proposing.

I’d want someone (doesn’t have to be Chris, and can even be multiple people) to fill out the following:

Where the variables are coded 1 if the player has the flaw and 0 if he does not. If the grader wanted a “sometimes” choice I’d guess we could work that in. The overall grade is a scouting grade on swing mechanics. I’d probably look at multiple response variables including but not limited to wOBA, ISO (or some other power metric), BA, etc. I wouldn’t want it to be a swing by swing results analysis, rather a generalized season level look.

I’d let experts identify which flaws were important enough to code in.

So where are the pitfalls? One big assumption is that the swing is reasonably stable over a season. I’d want the grader to look at swings from a few time frames to vet this assumption some. Clearly there could be some results bias too. The grader would have to try and not let a players results factor into the grading (especially the overall category).

EDIT: Posted a little prematurely, but ya’ll get the idea I think. Ideally this would be multi-year too, so as to do a WOWY if possible.

Steve Sommer

Simulation analyst by day, father and baseball nerd by night

More Posts - Twitter

9 Responses to “What I'm Proposing”

  1. It seems like binomial variables would make it hard to run a proper regression, why not just have him grade the degree of flaw on a 1-10 scale or something?

    • That would depend on how accurate the various thing are. An ordinal variable would only really work if the differenc between a 3 and a 4 was the same as a difference between a 8 and a 9 , after all.

    • I’d do it this way initially for a couple of reasons. I think it lessens subjectivity some, and thus hopefully reduces bias. Additionally I think it’s easier to interpret the results if the variables are binary (you either have the flaw or don’t. How do you use the information of going from a 4 to 5 in leaky back elbow lowers your wOBA by 0.005 pts? It just seems hard to understand the 1-10 scale. That being said, I think I’d lean toward letting the scout pick the scale.

      • Well I would assume you’d want to run the thing in a regression. How else would you tease out the multi-collinearity and measure the relative importance of each individual flaw? Would a regression work if you regressed 4 or 5 binary variables on wOBA?

        I agree with Valentine that this would be simpler and may reduce bias; however, I disagree that the subjectivity in deciding what’s worse: 3-4, or 8-9, is a damning quality. Isn’t it much worse to assume that either a hitter has the flaw or doesn’t? That simplifies everything into a either perfect or terrible.

        I would much rather have the 1-10 scale, all things being equal, but I suppose that’s not really what Chris is going for in his analyses.

      • Yes, would use regression. I think it would be fine with 4-5 binary (categorical style) variables. I wouldn’t be locked into that approach though. If the subject matter expert (a scout) thought that the variables would be better expressed on a different scale then I would be fine with that too.

        I like the binary nature mainly for the ease of interpretation of results.

      • The scout is not the subject matter expert regarding which way the expression of the variables is more conducive to running a regression. That’s us.

        Again, I would think that running a regression with 5 (or however many categories TPG chooses) dummy variables would not produce useable results. What I’m reading online is that unless you excluded the intercept, running a regression with all dummy variables yields absolute multicollinearity.

      • Fair enough, you’ve researched it more than I. In the long run I don’t really care one way or the other. 1-10 is fine with me.

        I wasn’t suggesting the scout was the SME in how to create the variables re: regression; just that they have a good feel for the levels of the variables. Might as well express the variable in a way that matches the levels the scout already sees so they are apt to fill it out.

      • I believe the absolute co-linearity is if you use all of the sub-components for a dummy variable with multi categories (i.e. using the 4 dummy variables for a 4 category variable, you’d actually use 3).

        I think it’s less applicable to having multiple separate variables.

      • Ok that makes sense, I’m a little fuzzy on the statistics of it anyways. I think that Chris might prefer to do a binary variable anyway, so if it’s able to be manipulated then it should be fine.

Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

© 2011 Gas House Graphs Suffusion theme by Sayontan Sinha