Over at BtB I did a post on clustering (grouping for the non-math inclined) hitters based on various stats (typically batted ball information combined with discipline information). I ended up discussing the clusters that resulted from using line drive rate (LD%), home run per fly ball (HR/FB%), and walk rate (BB%). The following is a table that summarizes the qualifying Cardinals
| Name | LD% | HR/FB% | BB% | Cluster | wOBA | Cluster wOBA |
|---|---|---|---|---|---|---|
| Albert Pujols | 16% | 20% | 17% | 7 | 0.449 | 0.398 |
| Matt Holliday | 16% | 13% | 11% | 6 | 0.390 | 0.363 |
| Ryan Ludwick | 19% | 12% | 8% | 5 | 0.336 | 0.362 |
| Colby Rasmus | 20% | 9% | 7% | 3 | 0.311 | 0.330 |
| Skip Schumaker | 22% | 5% | 9% | 9 | 0.336 | 0.343 |
| Yadier Molina | 20% | 5% | 9% | 9 | 0.337 | 0.343 |
So what does this mean? Basically if you average across people with similar LD%, HR/FB%, and BB% to the players listed you get the wOBA in the last column (click here to see the entire sets of clusters based off of various stat sets). Is this predictive? Are those that are below the cluster due for an improvement, while those above due for regression? As Tango points out, not necessarily. There’s clearly bias due to the data sets used to generate the clusters, so there’d have to be more work done on finding the correct data elements to include in the cluster. With that being said, it’s still interesting to see what types of players are clustered together.
*****
On an unrelated note, tomorrow I’ll be attending the St. Louis Chapter of SABR’s hot stove luncheon to include sitting on a panel that discusses Cardinal blogging. I’ll be sure to report back with any interesting tidbits from the day.
*****
Final unrelated note. This website amuses me.