Skip to content

Instantly share code, notes, and snippets.

View bayesball's full-sized avatar

Jim Albert bayesball

View GitHub Profile
@bayesball
bayesball / pitchcount.R
Created July 16, 2015 13:12
pitch count transitions
# loads in the Retrosheet data
load("~/OneDriveBusiness/Retrosheet/pbp.2014.Rdata")
# removes all non-pitches from PITCH_SEQ_TX
pbp.14$pseq <- gsub("[.>123N+*]", "", pbp.14$PITCH_SEQ_TX)
# create a b and s sequence
pbp.14$pseq <-gsub("[BIPV]", "b", pbp.14$pseq)
pbp.14$pseq <-gsub("[CFKLMOQRST]", "s", pbp.14$pseq)
@bayesball
bayesball / groundball.plot.R
Last active August 29, 2015 14:27
Plots groundball statistics for all teams in a particular season
groundball.plot <- function(pbp, season){
require(dplyr)
require(ggplot2)
require(car)
inplay <- filter(pbp, BATTEDBALL_CD == "F" |
BATTEDBALL_CD == "G" |
BATTEDBALL_CD == "L" |
BATTEDBALL_CD == "P")
inplay <- mutate(inplay,
@bayesball
bayesball / model.data.sim.R
Last active September 19, 2015 14:02
Illustrates Model-Data Simulation to Learn About a Player's Batting Ability Based on a "ofer" Slump
# Script to Learn About Ryan Howard's Batting Ability from a "0 for 35" Slump
# Uses a function from the BayesTestStreak package
# install_github("bayesball/BayesTestStreak")
library(MASS)
library(BayesTestStreak)
# Simulate 500 at-bats with a constant hitting probability p = 0.250
@bayesball
bayesball / murphywork.R
Created October 24, 2015 14:55
Daniel Murphy -- Learning about his home run ability and predicting his home run output in the 2015 World Series
# Daniel Murphy Exercise
# Part I -- learning about Murphy's home run ability
# and updating this knowledge after the NLDS and NLCS
library(ggplot2)
library(LearnBayes)
# career home run data for Murphy
@bayesball
bayesball / component_average.R
Last active February 5, 2016 14:50
R functions for Improved Component Predictions of Batting and Pitching Measures
#######################################################################################
# R functions for Paper "Improved Component Predictions of Batting and Pitching Measures"
# Journal of Quantitative Analysis of Sports (2016)
# Jim Albert, albert@bgsu.edu
# functions fit_component_average, plot_avg_results
# fit_component_obp, plot_obp_results
# fit_component_fip, plot_fip_results
# require installation of packages Lahman, dplyr, ggplot2, and LearnBayes
########################################################################################
@bayesball
bayesball / plot_career_trajectory_rates.R
Last active December 29, 2015 17:11
Plots trajectories of strikeout rates, home run rates, and hit-in-play rates for players with similar batting averages
# requires packages
# dplyr, Lahman, ggplot2
# some preliminary work
library(dplyr)
library(Lahman)
get.birthyear <- function(player.id){
###############################################
# working with complete set of 2015 play-by-play data
# collected using the getData function in the openWAR
# package (retrieves MLBAM GameDay files)
# currently have this saved as a Rdata file
###############################################
load("alldata2015.Rdata")
# computes the run values of all plate appearances
@bayesball
bayesball / pacestudy.R
Created January 30, 2016 21:52
Exploring the pitcher pace (time between pitches) for games played in a week of the 2015 season
library(pitchRx)
library(dplyr)
library(ggplot2)
dat <- scrape(start = "2015-09-05", end = "2015-09-11")
pitches <- inner_join(select(dat$atbat,
batter_name, pitcher_name, inning,
gameday_link, num, url),
select(dat$pitch,
start_speed, pitch_type, sv_id, num, url),
@bayesball
bayesball / fit.model.R
Created February 6, 2016 16:04
Function to fit a beta-binomial exchangeable model
fit.model <- function(data){
# data is a list with two components
# - y: binomial counts
# - n: binomial sample sizes
require(LearnBayes)
mode <- laplace(betabinexch, c(1, 1),
cbind(data$y, data$n))$mode
eta <- exp(mode[1]) / (1 + exp(mode[1]))
K <- exp(mode[2])
list(eta=eta, K=K,
@bayesball
bayesball / efron_morris.R
Last active February 20, 2016 00:14
Modeling using Efron and Morris's famous dataset
# R script for
# "Revisiting Efron and Morris's Data"
# blog post of February 15, 2016
# in current working directory need a download folder with two subfolders
# zipped and unzipped
# (For a Windows computer, you need to have the Chadwick cwevent.exe
# inside the “upzipped” folder.)
library(devtools)