grahams/gist:3260381

## gistfile1.txt
Sabermetrics, Scouting and the Science of Baseball
August 4, 2012
Day One

==

The Science of Catching Balls
Dr. Michael K. McBeath
Dept. of Psychology @ ASU
Adjunct Neuroscience, Life Sceinces, EE, Kinesiology, and Arts, Media & Engr.

1) What - Catching Fly Balls - Dynamic coupling of the environment

* Interception Theory - Models of Catching Baseballs
** Fielders don't perform complext physics calcs to track down balls
** Can't easily predict where ball will land based on initial trajectory
** Non-deterministic, real-time dynamics necessary
* Principle 1: Dynamic Coupling Loop
** Fielder -> Guided Actions (top-down) -> Environment -> Perceived
   Information (bottom up)
* Optical Acceleration Cancellation (OAC) Interception Control Mechanism
** "elevator model" ball coming directly at you.  2d model
** keep ball 'elevating' visually at a constant speed.
* Linear Optical Trajectory (LOT) Interception Control Mechanism
* Similar model but at a perpendicular model
** "keep it going in a straight line and you'll be guided to the right place"

2) Who - Frisbee Dogs - Natural selection of common mechanisms

* Do Dogs Do Calculus?
* Timothy J. Pennings - The College of Mathematics Journal
** Dog intercepting a ball thrown into water choice of location approximates
   'best' value as predicted by calculus
* Do dogs use LOT?  Frisbee catching dogs.
** Yes, it appears so.  94% of variance; data mirrors human data
** Hawks (and other animals)

3) Where - Grounders & Tag - Actor-vantage basis of general mechanisms
* Pursuers Use Same Simple Navigational Heuristics to intercept airborne and
  ground based targets (Catching Grounders and Robots Moving along Complex
  Pathways)
* Mo-cap
* Does LOT apply to ground balls as well?
* Flip LOT model on it's head;  keep the image moving in a straight line
  downward
* also applies for complex paths (robot weaving random paths toward
  interceptor)
* small adjustments until radical reset of line necessary

4) How - Robotic Catching - Balanced functionality weighing costs-benefit
* Autonomous Ball-Catching Robots
* attempt to show that LOT model would work for robots to provide converging
  evidence of LOT's
** Passive Control Algo - Fixed camera angle, ball image rises
** Active Control Algo - camera rotates up, ball image remains centered
** Does interception control utilize level coordinates or tilt to be
   parallel to the ground?  (latter)

5) When - Using Others' Gazes - parallel processing to enhance prediction
* can you use movements of other fielders to predict ball destination
** Yes.

6) Which - Football Pass Judgment - Natural regularities focus of coupled
   alignment
* Perception of Motion can be affected by background motion (but it doesnt
  seem to effect fielders, only outside observers)

==

Bobby Valentine
Manager, Boston Red Sox

==

Offensive Value Percentage (OVP) and Coaching
Matt Fincher, Bench Coach, USC Upstate
Tim Bogar, Bench Coach, Boston Red Sox

* Historically baseball stats have counted positives
* Sabermetrics have changed this somewhat
* What happens the 7.5/10 ABs when a hitter fails
* Are the ways in which players make outs important?
* OVP - Who produces offensively beyond current metrics
** What about outs which still help the offense (sacs, etc)
* Intentionally omitted power metrics from OVP.
** Adequately measured elsewhere
* Hitter tries to accrue base hits
* Batter will manage an AB

Eight Batting Components of OVP:
1. H
2. BB
3. HBP
4. Sac (bunt and fly)
5. Adv. the lead baserunner on an out
6. Eight or more pitch AB
7. GIDP/TP (Line out DP/TP only count as one, covered by defensive metrics)
8. First Pitch out of any inning

* Team that throws fewest pitches wins a higher percentage of the time

* 8 or more pitches means the hitter or subsequent hitter reaches at a higher
  percentage

Six Baserunning Components of OVP:
1. Stolen Base
2. Going 1st to 3rd on any single (situational, but good metric for
   baserunning ability)
3. Caught Stealing
4. Picked Off
5. Doubled Off
6. Out on Bases while trying to advance


Calculating OVP

Step 1:
Add together: AB, BB, HBP, SAC, SB, 1b->3b, CS, PO, Doubled Off, Out
attempting to advance

Step 2:
Total Positive Opps

Hits, Bb, HBP, Sac, Lead Runners Advnaced, 8 pitch (slide went by too fast)

Step 3:
Subtract from OVP GIDP (-2 for GITP) and 1po of inning

Step 4:
Divide calc OVP positives by total # of OVP Opps

==

Statistics
Tom Tippett
Boston Red Sox

==

Umpires and the Human Element
Dr. Dan Brooks, Postdoc Brown University Neuroscience
Baseball Prospectus
brooksbaseball.net


* We ask umps make a binary judgment over an unmarked area of space of an
  object traveling imperceptibly fast by a person attempting to be deceptive
* We treat the zone as if there is no grey area, that it is a sharp box
* In reality, what does the edge look like?

* n-parameter fermi function/logistic function
* y = a + b ./ (1+exp(-(x-m)/s))
** y == strike proportion
** x = horizontal location
** a = lower bound  (0 because no negative strikes)
** b = upper bound  (1 because upper bound 100% strikes)
** m = 'position on axis"  (50% point of curve [m for midpoint])
** s = "sharpness" (sharpness of curve.  would be hard step in perfect
       strike zone)

*use pitchf/x
*subset to taken pitches
*bin by 1" from +3/03feet for LHH/RHH
*use middle 1/3rd of vertical zone
*trim 99 total umps to 68 using criteria that they saw (1000 total pitches)?

All umpires: (outside to LHH)
a = 0.00055662
b = 0.99245
m = -14.3609
s = 1.3412
mse = 4.41e-005

m param is almost 14.5, pretty shifted
s param is 1.35, pretty soft

Individual Umps:

CB Bucknor
m = -12.6443
s = 1.524

Tim Timmons:
m = -14.46
s = 1.2743

Wally Bell
m = -15.054
s = 1.5183

Mike Reilly
m = -14.339
s = 0.89095

Muke DiMuro

m = -14.8978
s =

s = 1.4


The LHH Strikezone


m = -14.3509
s = 1.3512

m - 9.8661
s = -1.5202


RHH

m = -12.5627
s = 1.4793

m = 11.9625
s = -1.3729


* mparam indicates shifted LHH zone relative to RHH zone
* is this statistically reliable? It appears so.

* sparam indicates sharper zones on the outside of plate
* is this statistically reliable? It appears so.

* no correlations between m and s

* different pitch types?
* use pitch data from PitchInfo, create logistics for FB/OS

fb m = -14.56
s = 1.3709

os m = -14.1934
s = ...

Conclusions:

*diff between FB and OS called on inside edge of zone for both LHH & RHH
*fits with idea that outside edge of zone is better discriminated

==

What Steroids do that Years of Experience Can't
Dr. Rich B. Ivry
Prof. Psychology, UCBerkeley, Cognition and Action Lab (CogAc)

==

Trackman
Alan Nathan

Doppler radar, RoC of distance of ball
want to determine location of the baseball as a function of time

doppler shift determines dr/dt
two wave phase shit determines r(0) (initial dist)
3 detector array phase shifts determines angles

everything Pitchf/x PLUS

actual release point (perceived velo)
total spin
many more trajectory points

also batted ball speed, launch and spray angles
(equiv to HITf/x)
landing point at ground level and hang time
inital spin


==

Abstracts

===

Mining the Evolution of Pitch Sequences for Career Performance Evaluation
Daniel LC Mack, et al, Vanderbilt University, Inst. for Software Integrated Sys

Building heat maps for various two-pitch sequences for an individual player
over time

Are there enough patterns in history to model future performance?

Potential:
* isolate strats for batters
* find relationships between strats and injury

Future:
* Better type classification than MLBAM, more specific locations
* Improve heat maps to illustrate relationships over time better

===

WAR.  Huh??  Yeah!  What is it Good For?

Glenn DuPaul and George DuPaul, Lehigh University
^ Beyond the Box Score and Hardball Times

*Eval the ability of WAR
** To describe perf in a given season
** Predict future perf in a subsequent season
* Analysis of the assoc between WAR vs Actual Wins
* Sens to diff in perf of playoff teams between leagues
* Pred of wins in a subsequent season

* Random samp of 80 teams 96-11
* Calc cum WAR for each team and regressed it against actual wins
* Repalcement-level teams exp to win between 46-52 games (i.e. WINS =
  52+WAR)
* Used baseball ref WAR
* 83% variance in wins is accounted for in WAR

between 96-11 NL and AL playoff teams
sig diff in mean war between NL and AL;  not signif diff in win totals

* predictive
** rand sample 30 06-11
** summed predictor year WARs to project the win total of the subsequent
   seez

Single seez WAR adequately describes what has happened
from 96-11 it took more talent to reach the PS for AL teams than NL, but for
some rason not more wins (DH? or # teams?)

* Sing seez WAR falls short in predicting future win totals (even for next
  seez)
** too often saber-followers cite WAR in one year as a predictor of
   subsequent year wins

===

Run Expectancy
Women's Div I Softball
Jon Nachtigal

* NCAA first published p-by-p data in 11
* 130 D1 playoff games
* 7636 plays


0    0518 0287 0100
1    0959 0556 0286
2    1032 0890 0356
12   1733 1026 0379
3    1000 0915 0545
13   1720 1678 0611
23   1615 1589 0744
123  3054 1930 0658

SB break even point

runn 1b, 0 outs, 90% success rate
runn 1b, 1 outs, 60% success rate
runn 1b, 2 outs, 80% success rate

===

Steamer Projections
Dash Davidson, et al

* Basics of Proj Systems
** weighs stats from more recent seez more heavily
** regress to the mean
*** why Results == Ability + Luck

Steamer
* like most fancier systems
** uses adj minor league in addition to MLB
** adjusts for home BP, league, start v rele
* in addition
** uses different system for each component of projects (K%, etc)

K/PA for all P93-11
most accurate - high decay rate and low # regression PA

HR/PA for all P 93-11
flipped: low decay high # regression PA

Steamer uses different mix for each tracked stat, tuned to the stat

regress to peer groups instead of lgAVG

marcel undervalues softer throwers and overvalues hard throwers

steamer has lower r^2 over past two season, caveat low sample size etc

min sample size of a peer group? not below 15-20 pitchers

===

Distribution of Draft WAR
Jesse Jeter

How much impact will a given draft pick have?

Small Sample Obstacle

* draft data for rounds 1-30, years 87-11
* WAR from BR
* Data are suff to find dist of Draft WAR for all players
* not suff to find every dist contingent on overall selection number, N

1) prob a draftee plays in the Majors as a Fn(N) 30%
2) dist of WAR for all players together
3) "quantile" dis based on N

(all feed into) Dist of WAR based on N

1: inv linear fn, as N increases prob of playing decreases
2: 70% of all MLB produce 0 WAR; 5% produce >20WAR
   exp fn for neg, gamma fn for pos values to model career CDF
3: quantiles indicates how well a player has performed relative to his draft
   class; quantiles are determined by WAR but are independent of draft year
   1st round DPs much more likely to create +WAR than their peers
   can we fit a distribution based on selection numbers?  yes, beta dists
   Quantiles of each selection # can be modeled by a beta distribution, Bn
   Distributions converge as N increases.

Synthesis:

(rest of pres rushed)

===

Classifying Pitches using Pattern Recognition
or: Why 18 dimensions are better than two
Michael Schader CS Grad Student, GMU

CS -> AI -> ML -> PR

1) Select data:
* 9P from PFX
* discard pitches classified with low conf
* discard oddball pitches (knuck, eephus, etc)
* choose 300,000 pitches from 2011
2) select algo:
* dozens of PR algos, chose Random Forests
** resilliant of overfitting
*creates set of decision trees (i/t statements chosen according to info
 theory) and has them vote

results:
iphone pic of confusion matrix

room for improvement?
* MLB builds custom neural nets for each pitcher, have 1100

* Is there a simply way to include the pitcher's context along with each
  pitch?

3) Feature Selection

9P: 81.8%
9P + repertoire: 95%  (rep comes from MLBAM, which isn't very useful)
9P + Maxes: 98.3% (max val of each 9P param for each pitcher)
9P + Mins: 98.4% (min val of each 9P param for each pitcher)

pic of results confusion matrix

4) Profit

* reproduce MLBAM's pitch class simply with high fid
* any handcrafted pitch class could be modeled using the PR to automate the
  process
* future of bb data is hitfx, fieldfx and other vast streams of telemetery,
  PR will allow us to make sense of it

Sneak Preview:

Fact: DPid has never faced Cole De Vries
Q: How can we predcit what will happen?
A: More PR!

===
Neural signatures of rapid recognition of baseball pitch
Jordan Muraskin and Jason Sherwin, PhD
Columbia University, LIINC lab?

prob:
* hit a ML pitch 430ms 95mph FB to 515ms 80mph CB
* identifying neural indicators of performance

* percep decision making tasks show neural correlates as early as 180ms and
  330ms (non baseball related tasks)
* these correlates in baseball-related tasks?

science:
* when does a hitter's brain discriminate a pitch?

* dropped down to hs/coll speeds
* subjects swung by pressing keyboard button when they recognized a pitch,
  fastball, curve, or slider
* using EEG to read brain sigs and compt analys, we could find the times at
  which hitters idientified the pitch
* when subjects otherwise identified the pitch, there is a common area
  active when they do not (when they 'miss')

future:
* fMRI can reveal, with better spa reso of the brain
* EEG provides precise timing of the discrim
* simultaneous EEG/fMRI?

* neuro-feedback to imporve recognition time
* isolate the neural signal, determine if swing will eb successful or not
* if neural sig can be trained to approach limits of perceptual processing,
  could possibly give hitters ~50-100ms more time (10-25%)

* frontal cortices in identification errors, could TMS or TDCS dis.... flipped

===

New Approaches to Player Valuation: Analyzing how wins generate revenue for
MLB teams
Graham Tyler, Brown University
Web Analyst Intern for MLBAM

Concept of Marginal Revenue Product

* Player Eval -> Perf Contribution (WAR)
* Marginal Revenue per Win ($/W)
* PLayer Val: WAR * $ / W = MRP ($)
* Assumptions:
** Significant variation in valuation across teams
** Non-linear (win that takes you from 90->91 worht more than 60->61)

Brewers:  Winning generates value
DBacks: Winning generates -value

Linear Wins Coeff Regression:
Population: 5.886 (pvalue 007)
Distance: -.145 (.000)
Sports: -23.243 (.003)
NL East sig +, NL West sig -

Distance explains about 4.28% of variation in returns to winning vs. 2.36%
for population and 2.85% for sports

Additional Factors:
* higher baseline revenue, less winning matters;  Capacity constraint of
  parks?
* Franchise Value instead of Revenue?
** Angels sign Pujols right before signing new TV contract
	Sabermetrics, Scouting and the Science of Baseball
	August 4, 2012
	Day One

	==

	The Science of Catching Balls
	Dr. Michael K. McBeath
	Dept. of Psychology @ ASU
	Adjunct Neuroscience, Life Sceinces, EE, Kinesiology, and Arts, Media & Engr.

	1) What - Catching Fly Balls - Dynamic coupling of the environment

	* Interception Theory - Models of Catching Baseballs
	** Fielders don't perform complext physics calcs to track down balls
	** Can't easily predict where ball will land based on initial trajectory
	** Non-deterministic, real-time dynamics necessary
	* Principle 1: Dynamic Coupling Loop
	** Fielder -> Guided Actions (top-down) -> Environment -> Perceived
	Information (bottom up)
	* Optical Acceleration Cancellation (OAC) Interception Control Mechanism
	** "elevator model" ball coming directly at you. 2d model
	** keep ball 'elevating' visually at a constant speed.
	* Linear Optical Trajectory (LOT) Interception Control Mechanism
	* Similar model but at a perpendicular model
	** "keep it going in a straight line and you'll be guided to the right place"

	2) Who - Frisbee Dogs - Natural selection of common mechanisms

	* Do Dogs Do Calculus?
	* Timothy J. Pennings - The College of Mathematics Journal
	** Dog intercepting a ball thrown into water choice of location approximates
	'best' value as predicted by calculus
	* Do dogs use LOT? Frisbee catching dogs.
	** Yes, it appears so. 94% of variance; data mirrors human data
	** Hawks (and other animals)

	3) Where - Grounders & Tag - Actor-vantage basis of general mechanisms
	* Pursuers Use Same Simple Navigational Heuristics to intercept airborne and
	ground based targets (Catching Grounders and Robots Moving along Complex
	Pathways)
	* Mo-cap
	* Does LOT apply to ground balls as well?
	* Flip LOT model on it's head; keep the image moving in a straight line
	downward
	* also applies for complex paths (robot weaving random paths toward
	interceptor)
	* small adjustments until radical reset of line necessary

	4) How - Robotic Catching - Balanced functionality weighing costs-benefit
	* Autonomous Ball-Catching Robots
	* attempt to show that LOT model would work for robots to provide converging
	evidence of LOT's
	** Passive Control Algo - Fixed camera angle, ball image rises
	** Active Control Algo - camera rotates up, ball image remains centered
	** Does interception control utilize level coordinates or tilt to be
	parallel to the ground? (latter)

	5) When - Using Others' Gazes - parallel processing to enhance prediction
	* can you use movements of other fielders to predict ball destination
	** Yes.

	6) Which - Football Pass Judgment - Natural regularities focus of coupled
	alignment
	* Perception of Motion can be affected by background motion (but it doesnt
	seem to effect fielders, only outside observers)

	==

	Bobby Valentine
	Manager, Boston Red Sox

	==

	Offensive Value Percentage (OVP) and Coaching
	Matt Fincher, Bench Coach, USC Upstate
	Tim Bogar, Bench Coach, Boston Red Sox

	* Historically baseball stats have counted positives
	* Sabermetrics have changed this somewhat
	* What happens the 7.5/10 ABs when a hitter fails
	* Are the ways in which players make outs important?
	* OVP - Who produces offensively beyond current metrics
	** What about outs which still help the offense (sacs, etc)
	* Intentionally omitted power metrics from OVP.
	** Adequately measured elsewhere
	* Hitter tries to accrue base hits
	* Batter will manage an AB

	Eight Batting Components of OVP:
	1. H
	2. BB
	3. HBP
	4. Sac (bunt and fly)
	5. Adv. the lead baserunner on an out
	6. Eight or more pitch AB
	7. GIDP/TP (Line out DP/TP only count as one, covered by defensive metrics)
	8. First Pitch out of any inning

	* Team that throws fewest pitches wins a higher percentage of the time

	* 8 or more pitches means the hitter or subsequent hitter reaches at a higher
	percentage

	Six Baserunning Components of OVP:
	1. Stolen Base
	2. Going 1st to 3rd on any single (situational, but good metric for
	baserunning ability)
	3. Caught Stealing
	4. Picked Off
	5. Doubled Off
	6. Out on Bases while trying to advance


	Calculating OVP

	Step 1:
	Add together: AB, BB, HBP, SAC, SB, 1b->3b, CS, PO, Doubled Off, Out
	attempting to advance

	Step 2:
	Total Positive Opps

	Hits, Bb, HBP, Sac, Lead Runners Advnaced, 8 pitch (slide went by too fast)

	Step 3:
	Subtract from OVP GIDP (-2 for GITP) and 1po of inning

	Step 4:
	Divide calc OVP positives by total # of OVP Opps

	==

	Statistics
	Tom Tippett
	Boston Red Sox

	==

	Umpires and the Human Element
	Dr. Dan Brooks, Postdoc Brown University Neuroscience
	Baseball Prospectus
	brooksbaseball.net


	* We ask umps make a binary judgment over an unmarked area of space of an
	object traveling imperceptibly fast by a person attempting to be deceptive
	* We treat the zone as if there is no grey area, that it is a sharp box
	* In reality, what does the edge look like?

	* n-parameter fermi function/logistic function
	* y = a + b ./ (1+exp(-(x-m)/s))
	** y == strike proportion
	** x = horizontal location
	** a = lower bound (0 because no negative strikes)
	** b = upper bound (1 because upper bound 100% strikes)
	** m = 'position on axis" (50% point of curve [m for midpoint])
	** s = "sharpness" (sharpness of curve. would be hard step in perfect
	strike zone)

	*use pitchf/x
	*subset to taken pitches
	*bin by 1" from +3/03feet for LHH/RHH
	*use middle 1/3rd of vertical zone
	*trim 99 total umps to 68 using criteria that they saw (1000 total pitches)?

	All umpires: (outside to LHH)
	a = 0.00055662
	b = 0.99245
	m = -14.3609
	s = 1.3412
	mse = 4.41e-005

	m param is almost 14.5, pretty shifted
	s param is 1.35, pretty soft

	Individual Umps:

	CB Bucknor
	m = -12.6443
	s = 1.524

	Tim Timmons:
	m = -14.46
	s = 1.2743

	Wally Bell
	m = -15.054
	s = 1.5183

	Mike Reilly
	m = -14.339
	s = 0.89095

	Muke DiMuro

	m = -14.8978
	s =

	s = 1.4


	The LHH Strikezone


	m = -14.3509
	s = 1.3512

	m - 9.8661
	s = -1.5202


	RHH

	m = -12.5627
	s = 1.4793

	m = 11.9625
	s = -1.3729


	* mparam indicates shifted LHH zone relative to RHH zone
	* is this statistically reliable? It appears so.

	* sparam indicates sharper zones on the outside of plate
	* is this statistically reliable? It appears so.

	* no correlations between m and s

	* different pitch types?
	* use pitch data from PitchInfo, create logistics for FB/OS

	fb m = -14.56
	s = 1.3709

	os m = -14.1934
	s = ...

	Conclusions:

	*diff between FB and OS called on inside edge of zone for both LHH & RHH
	*fits with idea that outside edge of zone is better discriminated

	==

	What Steroids do that Years of Experience Can't
	Dr. Rich B. Ivry
	Prof. Psychology, UCBerkeley, Cognition and Action Lab (CogAc)

	==

	Trackman
	Alan Nathan

	Doppler radar, RoC of distance of ball
	want to determine location of the baseball as a function of time

	doppler shift determines dr/dt
	two wave phase shit determines r(0) (initial dist)
	3 detector array phase shifts determines angles

	everything Pitchf/x PLUS

	actual release point (perceived velo)
	total spin
	many more trajectory points

	also batted ball speed, launch and spray angles
	(equiv to HITf/x)
	landing point at ground level and hang time
	inital spin


	==

	Abstracts

	===

	Mining the Evolution of Pitch Sequences for Career Performance Evaluation
	Daniel LC Mack, et al, Vanderbilt University, Inst. for Software Integrated Sys

	Building heat maps for various two-pitch sequences for an individual player
	over time

	Are there enough patterns in history to model future performance?

	Potential:
	* isolate strats for batters
	* find relationships between strats and injury

	Future:
	* Better type classification than MLBAM, more specific locations
	* Improve heat maps to illustrate relationships over time better

	===

	WAR. Huh?? Yeah! What is it Good For?

	Glenn DuPaul and George DuPaul, Lehigh University
	^ Beyond the Box Score and Hardball Times

	*Eval the ability of WAR
	** To describe perf in a given season
	** Predict future perf in a subsequent season
	* Analysis of the assoc between WAR vs Actual Wins
	* Sens to diff in perf of playoff teams between leagues
	* Pred of wins in a subsequent season

	* Random samp of 80 teams 96-11
	* Calc cum WAR for each team and regressed it against actual wins
	* Repalcement-level teams exp to win between 46-52 games (i.e. WINS =
	52+WAR)
	* Used baseball ref WAR
	* 83% variance in wins is accounted for in WAR

	between 96-11 NL and AL playoff teams
	sig diff in mean war between NL and AL; not signif diff in win totals

	* predictive
	** rand sample 30 06-11
	** summed predictor year WARs to project the win total of the subsequent
	seez

	Single seez WAR adequately describes what has happened
	from 96-11 it took more talent to reach the PS for AL teams than NL, but for
	some rason not more wins (DH? or # teams?)

	* Sing seez WAR falls short in predicting future win totals (even for next
	seez)
	** too often saber-followers cite WAR in one year as a predictor of
	subsequent year wins

	===

	Run Expectancy
	Women's Div I Softball
	Jon Nachtigal

	* NCAA first published p-by-p data in 11
	* 130 D1 playoff games
	* 7636 plays


	0 0518 0287 0100
	1 0959 0556 0286
	2 1032 0890 0356
	12 1733 1026 0379
	3 1000 0915 0545
	13 1720 1678 0611
	23 1615 1589 0744
	123 3054 1930 0658

	SB break even point

	runn 1b, 0 outs, 90% success rate
	runn 1b, 1 outs, 60% success rate
	runn 1b, 2 outs, 80% success rate

	===

	Steamer Projections
	Dash Davidson, et al

	* Basics of Proj Systems
	** weighs stats from more recent seez more heavily
	** regress to the mean
	*** why Results == Ability + Luck

	Steamer
	* like most fancier systems
	** uses adj minor league in addition to MLB
	** adjusts for home BP, league, start v rele
	* in addition
	** uses different system for each component of projects (K%, etc)

	K/PA for all P93-11
	most accurate - high decay rate and low # regression PA

	HR/PA for all P 93-11
	flipped: low decay high # regression PA

	Steamer uses different mix for each tracked stat, tuned to the stat

	regress to peer groups instead of lgAVG

	marcel undervalues softer throwers and overvalues hard throwers

	steamer has lower r^2 over past two season, caveat low sample size etc

	min sample size of a peer group? not below 15-20 pitchers

	===

	Distribution of Draft WAR
	Jesse Jeter

	How much impact will a given draft pick have?

	Small Sample Obstacle

	* draft data for rounds 1-30, years 87-11
	* WAR from BR
	* Data are suff to find dist of Draft WAR for all players
	* not suff to find every dist contingent on overall selection number, N

	1) prob a draftee plays in the Majors as a Fn(N) 30%
	2) dist of WAR for all players together
	3) "quantile" dis based on N

	(all feed into) Dist of WAR based on N

	1: inv linear fn, as N increases prob of playing decreases
	2: 70% of all MLB produce 0 WAR; 5% produce >20WAR
	exp fn for neg, gamma fn for pos values to model career CDF
	3: quantiles indicates how well a player has performed relative to his draft
	class; quantiles are determined by WAR but are independent of draft year
	1st round DPs much more likely to create +WAR than their peers
	can we fit a distribution based on selection numbers? yes, beta dists
	Quantiles of each selection # can be modeled by a beta distribution, Bn
	Distributions converge as N increases.

	Synthesis:

	(rest of pres rushed)

	===

	Classifying Pitches using Pattern Recognition
	or: Why 18 dimensions are better than two
	Michael Schader CS Grad Student, GMU

	CS -> AI -> ML -> PR

	1) Select data:
	* 9P from PFX
	* discard pitches classified with low conf
	* discard oddball pitches (knuck, eephus, etc)
	* choose 300,000 pitches from 2011
	2) select algo:
	* dozens of PR algos, chose Random Forests
	** resilliant of overfitting
	*creates set of decision trees (i/t statements chosen according to info
	theory) and has them vote

	results:
	iphone pic of confusion matrix

	room for improvement?
	* MLB builds custom neural nets for each pitcher, have 1100

	* Is there a simply way to include the pitcher's context along with each
	pitch?

	3) Feature Selection

	9P: 81.8%
	9P + repertoire: 95% (rep comes from MLBAM, which isn't very useful)
	9P + Maxes: 98.3% (max val of each 9P param for each pitcher)
	9P + Mins: 98.4% (min val of each 9P param for each pitcher)

	pic of results confusion matrix

	4) Profit

	* reproduce MLBAM's pitch class simply with high fid
	* any handcrafted pitch class could be modeled using the PR to automate the
	process
	* future of bb data is hitfx, fieldfx and other vast streams of telemetery,
	PR will allow us to make sense of it

	Sneak Preview:

	Fact: DPid has never faced Cole De Vries
	Q: How can we predcit what will happen?
	A: More PR!

	===
	Neural signatures of rapid recognition of baseball pitch
	Jordan Muraskin and Jason Sherwin, PhD
	Columbia University, LIINC lab?

	prob:
	* hit a ML pitch 430ms 95mph FB to 515ms 80mph CB
	* identifying neural indicators of performance

	* percep decision making tasks show neural correlates as early as 180ms and
	330ms (non baseball related tasks)
	* these correlates in baseball-related tasks?

	science:
	* when does a hitter's brain discriminate a pitch?

	* dropped down to hs/coll speeds
	* subjects swung by pressing keyboard button when they recognized a pitch,
	fastball, curve, or slider
	* using EEG to read brain sigs and compt analys, we could find the times at
	which hitters idientified the pitch
	* when subjects otherwise identified the pitch, there is a common area
	active when they do not (when they 'miss')

	future:
	* fMRI can reveal, with better spa reso of the brain
	* EEG provides precise timing of the discrim
	* simultaneous EEG/fMRI?

	* neuro-feedback to imporve recognition time
	* isolate the neural signal, determine if swing will eb successful or not
	* if neural sig can be trained to approach limits of perceptual processing,
	could possibly give hitters ~50-100ms more time (10-25%)

	* frontal cortices in identification errors, could TMS or TDCS dis.... flipped

	===

	New Approaches to Player Valuation: Analyzing how wins generate revenue for
	MLB teams
	Graham Tyler, Brown University
	Web Analyst Intern for MLBAM

	Concept of Marginal Revenue Product

	* Player Eval -> Perf Contribution (WAR)
	* Marginal Revenue per Win ($/W)
	* PLayer Val: WAR * $ / W = MRP ($)
	* Assumptions:
	** Significant variation in valuation across teams
	** Non-linear (win that takes you from 90->91 worht more than 60->61)

	Brewers: Winning generates value
	DBacks: Winning generates -value

	Linear Wins Coeff Regression:
	Population: 5.886 (pvalue 007)
	Distance: -.145 (.000)
	Sports: -23.243 (.003)
	NL East sig +, NL West sig -

	Distance explains about 4.28% of variation in returns to winning vs. 2.36%
	for population and 2.85% for sports

	Additional Factors:
	* higher baseline revenue, less winning matters; Capacity constraint of
	parks?
	* Franchise Value instead of Revenue?
	** Angels sign Pujols right before signing new TV contract