Created
August 4, 2012 22:34
-
-
Save grahams/3260381 to your computer and use it in GitHub Desktop.
Sabermetrics, Scouting and the Science of Baseball 2012 (saberseminar) Notes: Day 1
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sabermetrics, Scouting and the Science of Baseball | |
August 4, 2012 | |
Day One | |
== | |
The Science of Catching Balls | |
Dr. Michael K. McBeath | |
Dept. of Psychology @ ASU | |
Adjunct Neuroscience, Life Sceinces, EE, Kinesiology, and Arts, Media & Engr. | |
1) What - Catching Fly Balls - Dynamic coupling of the environment | |
* Interception Theory - Models of Catching Baseballs | |
** Fielders don't perform complext physics calcs to track down balls | |
** Can't easily predict where ball will land based on initial trajectory | |
** Non-deterministic, real-time dynamics necessary | |
* Principle 1: Dynamic Coupling Loop | |
** Fielder -> Guided Actions (top-down) -> Environment -> Perceived | |
Information (bottom up) | |
* Optical Acceleration Cancellation (OAC) Interception Control Mechanism | |
** "elevator model" ball coming directly at you. 2d model | |
** keep ball 'elevating' visually at a constant speed. | |
* Linear Optical Trajectory (LOT) Interception Control Mechanism | |
* Similar model but at a perpendicular model | |
** "keep it going in a straight line and you'll be guided to the right place" | |
2) Who - Frisbee Dogs - Natural selection of common mechanisms | |
* Do Dogs Do Calculus? | |
* Timothy J. Pennings - The College of Mathematics Journal | |
** Dog intercepting a ball thrown into water choice of location approximates | |
'best' value as predicted by calculus | |
* Do dogs use LOT? Frisbee catching dogs. | |
** Yes, it appears so. 94% of variance; data mirrors human data | |
** Hawks (and other animals) | |
3) Where - Grounders & Tag - Actor-vantage basis of general mechanisms | |
* Pursuers Use Same Simple Navigational Heuristics to intercept airborne and | |
ground based targets (Catching Grounders and Robots Moving along Complex | |
Pathways) | |
* Mo-cap | |
* Does LOT apply to ground balls as well? | |
* Flip LOT model on it's head; keep the image moving in a straight line | |
downward | |
* also applies for complex paths (robot weaving random paths toward | |
interceptor) | |
* small adjustments until radical reset of line necessary | |
4) How - Robotic Catching - Balanced functionality weighing costs-benefit | |
* Autonomous Ball-Catching Robots | |
* attempt to show that LOT model would work for robots to provide converging | |
evidence of LOT's | |
** Passive Control Algo - Fixed camera angle, ball image rises | |
** Active Control Algo - camera rotates up, ball image remains centered | |
** Does interception control utilize level coordinates or tilt to be | |
parallel to the ground? (latter) | |
5) When - Using Others' Gazes - parallel processing to enhance prediction | |
* can you use movements of other fielders to predict ball destination | |
** Yes. | |
6) Which - Football Pass Judgment - Natural regularities focus of coupled | |
alignment | |
* Perception of Motion can be affected by background motion (but it doesnt | |
seem to effect fielders, only outside observers) | |
== | |
Bobby Valentine | |
Manager, Boston Red Sox | |
== | |
Offensive Value Percentage (OVP) and Coaching | |
Matt Fincher, Bench Coach, USC Upstate | |
Tim Bogar, Bench Coach, Boston Red Sox | |
* Historically baseball stats have counted positives | |
* Sabermetrics have changed this somewhat | |
* What happens the 7.5/10 ABs when a hitter fails | |
* Are the ways in which players make outs important? | |
* OVP - Who produces offensively beyond current metrics | |
** What about outs which still help the offense (sacs, etc) | |
* Intentionally omitted power metrics from OVP. | |
** Adequately measured elsewhere | |
* Hitter tries to accrue base hits | |
* Batter will manage an AB | |
Eight Batting Components of OVP: | |
1. H | |
2. BB | |
3. HBP | |
4. Sac (bunt and fly) | |
5. Adv. the lead baserunner on an out | |
6. Eight or more pitch AB | |
7. GIDP/TP (Line out DP/TP only count as one, covered by defensive metrics) | |
8. First Pitch out of any inning | |
* Team that throws fewest pitches wins a higher percentage of the time | |
* 8 or more pitches means the hitter or subsequent hitter reaches at a higher | |
percentage | |
Six Baserunning Components of OVP: | |
1. Stolen Base | |
2. Going 1st to 3rd on any single (situational, but good metric for | |
baserunning ability) | |
3. Caught Stealing | |
4. Picked Off | |
5. Doubled Off | |
6. Out on Bases while trying to advance | |
Calculating OVP | |
Step 1: | |
Add together: AB, BB, HBP, SAC, SB, 1b->3b, CS, PO, Doubled Off, Out | |
attempting to advance | |
Step 2: | |
Total Positive Opps | |
Hits, Bb, HBP, Sac, Lead Runners Advnaced, 8 pitch (slide went by too fast) | |
Step 3: | |
Subtract from OVP GIDP (-2 for GITP) and 1po of inning | |
Step 4: | |
Divide calc OVP positives by total # of OVP Opps | |
== | |
Statistics | |
Tom Tippett | |
Boston Red Sox | |
== | |
Umpires and the Human Element | |
Dr. Dan Brooks, Postdoc Brown University Neuroscience | |
Baseball Prospectus | |
brooksbaseball.net | |
* We ask umps make a binary judgment over an unmarked area of space of an | |
object traveling imperceptibly fast by a person attempting to be deceptive | |
* We treat the zone as if there is no grey area, that it is a sharp box | |
* In reality, what does the edge look like? | |
* n-parameter fermi function/logistic function | |
* y = a + b ./ (1+exp(-(x-m)/s)) | |
** y == strike proportion | |
** x = horizontal location | |
** a = lower bound (0 because no negative strikes) | |
** b = upper bound (1 because upper bound 100% strikes) | |
** m = 'position on axis" (50% point of curve [m for midpoint]) | |
** s = "sharpness" (sharpness of curve. would be hard step in perfect | |
strike zone) | |
*use pitchf/x | |
*subset to taken pitches | |
*bin by 1" from +3/03feet for LHH/RHH | |
*use middle 1/3rd of vertical zone | |
*trim 99 total umps to 68 using criteria that they saw (1000 total pitches)? | |
All umpires: (outside to LHH) | |
a = 0.00055662 | |
b = 0.99245 | |
m = -14.3609 | |
s = 1.3412 | |
mse = 4.41e-005 | |
m param is almost 14.5, pretty shifted | |
s param is 1.35, pretty soft | |
Individual Umps: | |
CB Bucknor | |
m = -12.6443 | |
s = 1.524 | |
Tim Timmons: | |
m = -14.46 | |
s = 1.2743 | |
Wally Bell | |
m = -15.054 | |
s = 1.5183 | |
Mike Reilly | |
m = -14.339 | |
s = 0.89095 | |
Muke DiMuro | |
m = -14.8978 | |
s = | |
s = 1.4 | |
The LHH Strikezone | |
m = -14.3509 | |
s = 1.3512 | |
m - 9.8661 | |
s = -1.5202 | |
RHH | |
m = -12.5627 | |
s = 1.4793 | |
m = 11.9625 | |
s = -1.3729 | |
* mparam indicates shifted LHH zone relative to RHH zone | |
* is this statistically reliable? It appears so. | |
* sparam indicates sharper zones on the outside of plate | |
* is this statistically reliable? It appears so. | |
* no correlations between m and s | |
* different pitch types? | |
* use pitch data from PitchInfo, create logistics for FB/OS | |
fb m = -14.56 | |
s = 1.3709 | |
os m = -14.1934 | |
s = ... | |
Conclusions: | |
*diff between FB and OS called on inside edge of zone for both LHH & RHH | |
*fits with idea that outside edge of zone is better discriminated | |
== | |
What Steroids do that Years of Experience Can't | |
Dr. Rich B. Ivry | |
Prof. Psychology, UCBerkeley, Cognition and Action Lab (CogAc) | |
== | |
Trackman | |
Alan Nathan | |
Doppler radar, RoC of distance of ball | |
want to determine location of the baseball as a function of time | |
doppler shift determines dr/dt | |
two wave phase shit determines r(0) (initial dist) | |
3 detector array phase shifts determines angles | |
everything Pitchf/x PLUS | |
actual release point (perceived velo) | |
total spin | |
many more trajectory points | |
also batted ball speed, launch and spray angles | |
(equiv to HITf/x) | |
landing point at ground level and hang time | |
inital spin | |
== | |
Abstracts | |
=== | |
Mining the Evolution of Pitch Sequences for Career Performance Evaluation | |
Daniel LC Mack, et al, Vanderbilt University, Inst. for Software Integrated Sys | |
Building heat maps for various two-pitch sequences for an individual player | |
over time | |
Are there enough patterns in history to model future performance? | |
Potential: | |
* isolate strats for batters | |
* find relationships between strats and injury | |
Future: | |
* Better type classification than MLBAM, more specific locations | |
* Improve heat maps to illustrate relationships over time better | |
=== | |
WAR. Huh?? Yeah! What is it Good For? | |
Glenn DuPaul and George DuPaul, Lehigh University | |
^ Beyond the Box Score and Hardball Times | |
*Eval the ability of WAR | |
** To describe perf in a given season | |
** Predict future perf in a subsequent season | |
* Analysis of the assoc between WAR vs Actual Wins | |
* Sens to diff in perf of playoff teams between leagues | |
* Pred of wins in a subsequent season | |
* Random samp of 80 teams 96-11 | |
* Calc cum WAR for each team and regressed it against actual wins | |
* Repalcement-level teams exp to win between 46-52 games (i.e. WINS = | |
52+WAR) | |
* Used baseball ref WAR | |
* 83% variance in wins is accounted for in WAR | |
between 96-11 NL and AL playoff teams | |
sig diff in mean war between NL and AL; not signif diff in win totals | |
* predictive | |
** rand sample 30 06-11 | |
** summed predictor year WARs to project the win total of the subsequent | |
seez | |
Single seez WAR adequately describes what has happened | |
from 96-11 it took more talent to reach the PS for AL teams than NL, but for | |
some rason not more wins (DH? or # teams?) | |
* Sing seez WAR falls short in predicting future win totals (even for next | |
seez) | |
** too often saber-followers cite WAR in one year as a predictor of | |
subsequent year wins | |
=== | |
Run Expectancy | |
Women's Div I Softball | |
Jon Nachtigal | |
* NCAA first published p-by-p data in 11 | |
* 130 D1 playoff games | |
* 7636 plays | |
0 0518 0287 0100 | |
1 0959 0556 0286 | |
2 1032 0890 0356 | |
12 1733 1026 0379 | |
3 1000 0915 0545 | |
13 1720 1678 0611 | |
23 1615 1589 0744 | |
123 3054 1930 0658 | |
SB break even point | |
runn 1b, 0 outs, 90% success rate | |
runn 1b, 1 outs, 60% success rate | |
runn 1b, 2 outs, 80% success rate | |
=== | |
Steamer Projections | |
Dash Davidson, et al | |
* Basics of Proj Systems | |
** weighs stats from more recent seez more heavily | |
** regress to the mean | |
*** why Results == Ability + Luck | |
Steamer | |
* like most fancier systems | |
** uses adj minor league in addition to MLB | |
** adjusts for home BP, league, start v rele | |
* in addition | |
** uses different system for each component of projects (K%, etc) | |
K/PA for all P93-11 | |
most accurate - high decay rate and low # regression PA | |
HR/PA for all P 93-11 | |
flipped: low decay high # regression PA | |
Steamer uses different mix for each tracked stat, tuned to the stat | |
regress to peer groups instead of lgAVG | |
marcel undervalues softer throwers and overvalues hard throwers | |
steamer has lower r^2 over past two season, caveat low sample size etc | |
min sample size of a peer group? not below 15-20 pitchers | |
=== | |
Distribution of Draft WAR | |
Jesse Jeter | |
How much impact will a given draft pick have? | |
Small Sample Obstacle | |
* draft data for rounds 1-30, years 87-11 | |
* WAR from BR | |
* Data are suff to find dist of Draft WAR for all players | |
* not suff to find every dist contingent on overall selection number, N | |
1) prob a draftee plays in the Majors as a Fn(N) 30% | |
2) dist of WAR for all players together | |
3) "quantile" dis based on N | |
(all feed into) Dist of WAR based on N | |
1: inv linear fn, as N increases prob of playing decreases | |
2: 70% of all MLB produce 0 WAR; 5% produce >20WAR | |
exp fn for neg, gamma fn for pos values to model career CDF | |
3: quantiles indicates how well a player has performed relative to his draft | |
class; quantiles are determined by WAR but are independent of draft year | |
1st round DPs much more likely to create +WAR than their peers | |
can we fit a distribution based on selection numbers? yes, beta dists | |
Quantiles of each selection # can be modeled by a beta distribution, Bn | |
Distributions converge as N increases. | |
Synthesis: | |
(rest of pres rushed) | |
=== | |
Classifying Pitches using Pattern Recognition | |
or: Why 18 dimensions are better than two | |
Michael Schader CS Grad Student, GMU | |
CS -> AI -> ML -> PR | |
1) Select data: | |
* 9P from PFX | |
* discard pitches classified with low conf | |
* discard oddball pitches (knuck, eephus, etc) | |
* choose 300,000 pitches from 2011 | |
2) select algo: | |
* dozens of PR algos, chose Random Forests | |
** resilliant of overfitting | |
*creates set of decision trees (i/t statements chosen according to info | |
theory) and has them vote | |
results: | |
iphone pic of confusion matrix | |
room for improvement? | |
* MLB builds custom neural nets for each pitcher, have 1100 | |
* Is there a simply way to include the pitcher's context along with each | |
pitch? | |
3) Feature Selection | |
9P: 81.8% | |
9P + repertoire: 95% (rep comes from MLBAM, which isn't very useful) | |
9P + Maxes: 98.3% (max val of each 9P param for each pitcher) | |
9P + Mins: 98.4% (min val of each 9P param for each pitcher) | |
pic of results confusion matrix | |
4) Profit | |
* reproduce MLBAM's pitch class simply with high fid | |
* any handcrafted pitch class could be modeled using the PR to automate the | |
process | |
* future of bb data is hitfx, fieldfx and other vast streams of telemetery, | |
PR will allow us to make sense of it | |
Sneak Preview: | |
Fact: DPid has never faced Cole De Vries | |
Q: How can we predcit what will happen? | |
A: More PR! | |
=== | |
Neural signatures of rapid recognition of baseball pitch | |
Jordan Muraskin and Jason Sherwin, PhD | |
Columbia University, LIINC lab? | |
prob: | |
* hit a ML pitch 430ms 95mph FB to 515ms 80mph CB | |
* identifying neural indicators of performance | |
* percep decision making tasks show neural correlates as early as 180ms and | |
330ms (non baseball related tasks) | |
* these correlates in baseball-related tasks? | |
science: | |
* when does a hitter's brain discriminate a pitch? | |
* dropped down to hs/coll speeds | |
* subjects swung by pressing keyboard button when they recognized a pitch, | |
fastball, curve, or slider | |
* using EEG to read brain sigs and compt analys, we could find the times at | |
which hitters idientified the pitch | |
* when subjects otherwise identified the pitch, there is a common area | |
active when they do not (when they 'miss') | |
future: | |
* fMRI can reveal, with better spa reso of the brain | |
* EEG provides precise timing of the discrim | |
* simultaneous EEG/fMRI? | |
* neuro-feedback to imporve recognition time | |
* isolate the neural signal, determine if swing will eb successful or not | |
* if neural sig can be trained to approach limits of perceptual processing, | |
could possibly give hitters ~50-100ms more time (10-25%) | |
* frontal cortices in identification errors, could TMS or TDCS dis.... flipped | |
=== | |
New Approaches to Player Valuation: Analyzing how wins generate revenue for | |
MLB teams | |
Graham Tyler, Brown University | |
Web Analyst Intern for MLBAM | |
Concept of Marginal Revenue Product | |
* Player Eval -> Perf Contribution (WAR) | |
* Marginal Revenue per Win ($/W) | |
* PLayer Val: WAR * $ / W = MRP ($) | |
* Assumptions: | |
** Significant variation in valuation across teams | |
** Non-linear (win that takes you from 90->91 worht more than 60->61) | |
Brewers: Winning generates value | |
DBacks: Winning generates -value | |
Linear Wins Coeff Regression: | |
Population: 5.886 (pvalue 007) | |
Distance: -.145 (.000) | |
Sports: -23.243 (.003) | |
NL East sig +, NL West sig - | |
Distance explains about 4.28% of variation in returns to winning vs. 2.36% | |
for population and 2.85% for sports | |
Additional Factors: | |
* higher baseline revenue, less winning matters; Capacity constraint of | |
parks? | |
* Franchise Value instead of Revenue? | |
** Angels sign Pujols right before signing new TV contract | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment