Skip to content

Instantly share code, notes, and snippets.

View bayesball's full-sized avatar

Jim Albert bayesball

View GitHub Profile
# functions to compute and graph component estimates of a batting average
# Jim Albert
# JQAS paper "Improved Component Predictions of Batting and Pitching Measures"
fit_comp_half <- function(d){
# input d - a list with components playerID, AB, H, HR, SO
# output - a list with components
# S - a data frame with all of the component and shrinkage estimates
# component -- values of K and eta for all the component fits
# shrinkage -- values of K and eta for the shrinkage fit
###############################
# getting 2015 retrosheet data
# and computing runs expectancies
# assuming chadwick files are installed
library(devtools)
setwd("~/Desktop/retrosheet")
source_gist("https://gist.github.com/bayesball/8892981")
source_gist("https://gist.github.com/bayesball/8892999")
parse.retrosheet2.pbp(2015)
@bayesball
bayesball / inplay_visualization.R
Last active May 2, 2016 11:19
R Code to Visualize all of the In-Play Events During a Single Day of MLB Baseball
# Note: the openWAR and ggplot2 packages need to be installed
library(openWAR)
library(ggplot2)
ds <- getData(start = '2016-04-29', end = '2016-04-29')
ds$event <- as.character(ds$event)
ds$stand <- ifelse(ds$stand=="R", "Right-Handed Batter",
"Left-Handed Batter")
@bayesball
bayesball / pitchcountgraphfunctions.R
Last active May 18, 2016 13:34
Pitch count graphs
# The inputs to these functions are
# data - Retrosheet play-by-play data frame with variable RUNS.VALUE that indicates the runs value for each play, and
# variables c01, c10, etc that indicate if the PA went through the specific pitch counts
# p - name of the player
# type - by default, type = "p" (pitcher); use another value of type for a batter
count_plot <- function(data, p, type="p"){
require(ggplot2)
require(Lahman)
require(tidyr)
@bayesball
bayesball / markov_chain_pitch_count.R
Last active May 31, 2016 02:54
R code to compute transition probability matrix for Markov Chain model for pitch counts
# read in Retrosheet play-by-play data for 2015 season
load("~/OneDriveBusiness/Retrosheet/pbp.2015.Rdata")
# limit to batting plays
d2015 <- subset(d2015, BAT_EVENT_FL==TRUE)
# removes all non-pitches from PITCH_SEQ_TX
d2015$pseq <- gsub("[.>123N+*]", "", d2015$PITCH_SEQ_TX)
# create a b and s sequence
@bayesball
bayesball / simulation_half_inning.R
Created June 19, 2016 12:44
Functions to Simulate a Half-Inning of Baseball
# to simulate the number of runs in one half-inning
# st <- runs_setup()
# simulate_half_inning(st)
runs_setup <- function(){
# based on 2015 season data
Prob_Single <- matrix(0, 8, 8)
dimnames(Prob_Single)[[1]] <- c("000", "100", "010", "001",
"110", "101", "011", "111")
dimnames(Prob_Single)[[2]] <- c("000", "100", "010", "001",
@bayesball
bayesball / broom_career_trajectory.R
Created July 1, 2016 00:40
Illustrating broom package using career trajectory of home run rates
# read in Lahman batting and master files
# can also use Lahman package -- data is only through 2014 season
Batting <- read.csv("~/OneDriveBusiness/lahman-csv_2015-01-24/Batting.csv")
Master <- read.csv("~/OneDriveBusiness/lahman-csv_2015-01-24/Master.csv")
# find players with at least 500 career homes (through 2015)
library(dplyr)
@bayesball
bayesball / compare_batting_trajectories.R
Last active July 20, 2016 21:41
Compare batting trajectories by scraping baseball-reference data
compare_batting_trajectories <- function(Names,
table="batting_value",
stat="oWAR",
NCOL=1,
playerIDs=FALSE){
# table value - one of "batting_standard", "batting_value"
require(Lahman)
require(XML)
require(ggplot2)
require(ggthemes)
@bayesball
bayesball / heat_plot.R
Created September 17, 2016 14:47
Constructs heat map of probability of a hit or home run for a specific player from pitchFX data
heat_plot <- function(player, d, HR=FALSE){
# inputs
# player - name of player
# d - pitchRX data frame with variables Batter, Event, and X, Z (location of pitch)
# will output a ggplot2 object
# need to use print function to display the pot
require(dplyr)
require(ggplot2)
require(mgcv)
# define the strike zone
@bayesball
bayesball / ws2016_game3.R
Created October 29, 2016 12:47
Code for Swing and Miss Study on 10/29/16
library(pitchRx)
library(dplyr)
library(ggplot2)
# scrape data
ws3 <- scrape("2016-10-28", "2016-10-28")
# choose variables of interest