As the baseball season comes to a close, I've been in a few discussions about players switching from the New York Yankees and the Boston Red Sox. Below I've used RStudio
and the Lahman
and Retrosheet
databases to answer a few of those questions.
The necessary packages:
library(sqldf)
library(Lahman)
library(dplyr)
library(readr)
We'll start by identifying all the players who played for the Yankees and Red Sox.
ny_bos_batters <- Batting %>% filter(teamID %in% c('NYA','BOS')) %>% select(playerID, teamID, yearID)
ny_bos_pitchers <- Pitching %>% filter(teamID %in% c('NYA','BOS')) %>% select(playerID, teamID, yearID)
ny_bos <- rbind(ny_bos_batters, ny_bos_pitchers) %>% arrange(yearID)
Given their history, the Yankees and Red Sox don't make too many direct trades. We can get trade data up to November 22, 2017 (20171122) from Retrosheet transactions page.
trades <- read_csv("G:/My Drive/MSA/R simulations/trades.csv", col_names = FALSE)
colnames(trades) <- c('primary_date', 'time', 'approximate_indicator', 'secondary_date', 'approximate_indicator_secondary', 'transaction_ID', 'player', 'type', 'from_team', 'from_league', 'to_team', 'to_league', 'draft_type', 'draft_round', 'pick_number', 'info')
We'll identify all the players who've been traded between the Yankees and Red Sox.
ny_bos_trades <- trades %>%
dplyr::select(player, from_team, to_team, primary_date, type) %>%
filter(from_team %in% c('NYA','BOS') & to_team %in% c('NYA','BOS')) %>%
arrange(desc(primary_date))
I created a text file with the trade definitions from the readme page, and joined it to the original table so that we can see what type of transactions were most frequent between the teams.
trade_codebook <- read_delim("G:/My Drive/MSA/R simulations/trade_codebook.txt", "-", escape_double = FALSE, col_names = FALSE, trim_ws = TRUE)
names(trade_codebook) <- c('type', 'definition')
trade_codebook <- na.omit(trade_codebook)
updat_ny_bos <- ny_bos_trades %>% inner_join(trade_codebook, by = 'type')
trade_types <- updat_ny_bos %>%
group_by(definition, from_team) %>%
summarise(most_recent = max(primary_date), count = n()) %>%
arrange(desc(count))
trade_types
definition | from_team | most_recent | count |
---|---|---|---|
trade | NYA | 20140731 | 52 |
trade | BOS | 20140731 | 42 |
purchase | BOS | 19940901 | 12 |
purchase | NYA | 19460618 | 8 |
waiver pick | BOS | 20120704 | 5 |
rule 5 draft pick | NYA | 20031215 | 4 |
waiver pick | NYA | 20040108 | 4 |
unknown (could have been two separate transactions) | NYA | 19490000 | 2 |
returned to original team after draft selection | BOS | 20040318 | 1 |
rule 5 draft pick | BOS | 19681202 | 1 |
If we want the player names, we need to bring in one more table from Retrosheets
retrosheet_ids <- read_csv("G:/My Drive/MSA/R simulations/retrosheet_ids.txt")
We can see the players involved in the most recent transactions. This involves joining four tables, which gets cluttered in dplyr
. We'll switch to sqldf
for sql:
sqldf("select b.definition Definition, d.First, d.Last, a.from_team, a.primary_date date
from trades a
inner join trade_types b
on a.from_team = b.from_team
and a.primary_date = b.most_recent
inner join trade_codebook c
on a.type = c.type
inner join retrosheet_ids d
on a.player = d.ID
where a.to_team in ('BOS','NYA')
and a.player is not NULL
and b.definition not like 'unknown%'
order by 5 desc")
Definition | First | Last | from_team | date |
---|---|---|---|---|
trade | Stephen | Drew | BOS | 20140731 |
trade | Kelly | Johnson | NYA | 20140731 |
waiver pick | Darnell | McDonald | BOS | 20120704 |
returned to original team after draft selection | Colter | Bean | BOS | 20040318 |
waiver pick | Michel | Hernandez | NYA | 20040108 |
rule 5 draft pick | Colter | Bean | NYA | 20031215 |
purchase | Scott | Bankhead | BOS | 19940901 |
rule 5 draft pick | Bobby | Mitchell | BOS | 19681202 |
purchase | Bill | Zuber | NYA | 19460618 |
And we can get all of the trades in one table:
all_trades <- sqldf("select c.First, c.Last, b.definition, a.from_team, a.primary_date date
from ny_bos_trades a
inner join trade_codebook b
on a.type = b.type
inner join retrosheet_ids c
on a.player = c.ID
where a.player is not NULL
order by 5 desc")
all_trades
Not too many trades or waiver wire claims between the Yankees and Red Sox anymore, and purchases have pretty much ended (but not before the greatest one in the history of baseball in 1919). As rivals, neither team wants to make a bad trade, plus there are far more teams to trade with than there were prior to the 60's.
all_trades %>% filter(from_team == 'BOS') %>% transmute(yearID = as.integer(substr(date,1,4))) %>% pull() %>% hist(col = rgb(0,0,1,0.75), main = '', breaks = 12)
all_trades %>% filter(from_team == 'NYA') %>% transmute(yearID = as.integer(substr(date,1,4))) %>% pull() %>% hist(col = rgb(1,0,0,0.75), breaks = 12, add = T)
legend('topright', legend = c('BOS -> NY', 'NY -> BOS'), fill = c(rgb(0,0,1,0.75),rgb(1,0,0,0.75)))
An earlier question I'd discussed with someone was all the players who moved from one team to the other the next year. We were talking specifically of Jacoby Ellsbury and Johny Damon at the time, two Red Sox players who left to sign massive contracts with the Yankees.
We can start by identifying all of the players who switched from the Yankees to the Red Sox, or vice versa, in the next year.
switched <- ny_bos %>%
arrange(playerID, yearID, teamID) %>%
mutate(next_team = lead(teamID),
next_ID = lead(playerID)) %>%
distinct %>%
mutate(to_yankees = ifelse(teamID == 'BOS' & next_team == 'NYA' & playerID == next_ID, 1, 0),
to_red_sox = ifelse(teamID == 'NYA' & next_team == 'BOS' & playerID == next_ID, 1, 0)) %>%
ungroup() %>%
inner_join(Lahman::Master, by = 'playerID') %>%
dplyr::select(nameFirst, nameLast, teamID, yearID, next_team, to_yankees, to_red_sox) %>%
filter(to_yankees == 1 | to_red_sox == 1) %>%
arrange(desc(yearID), teamID)
We'll remove any players who were traded, identified in the earlier table.
free_agents <- sqldf("select a.*, b.First
from switched a
left join (select b.First, b.Last
from ny_bos_trades a
inner join retrosheet_ids b
on a.player = b.ID) b
on a.nameLast = b.Last
and a.nameFirst = b.First
where b.First is NULL
order by a.yearID desc")
free_agents %>% select(nameFirst:next_team)
nameFirst | nameLast | teamID | yearID | next_team |
---|---|---|---|---|
Tommy | Layne | BOS | 2016 | NYA |
Chris | Young | NYA | 2015 | BOS |
Chris | Capuano | BOS | 2014 | NYA |
Andrew | Miller | BOS | 2014 | NYA |
Rich | Hill | NYA | 2014 | BOS |
Alfredo | Aceves | BOS | 2013 | NYA |
Andrew | Bailey | BOS | 2013 | NYA |
Jacoby | Ellsbury | BOS | 2013 | NYA |
Matt | Thornton | BOS | 2013 | NYA |
Rich | Hill | BOS | 2012 | NYA |
Brent | Lillibridge | BOS | 2012 | NYA |
Kevin | Youkilis | BOS | 2012 | NYA |
Gustavo | Molina | BOS | 2010 | NYA |
Alfredo | Aceves | NYA | 2010 | BOS |
Mark | Melancon | NYA | 2010 | BOS |
Kevin | Cash | NYA | 2009 | BOS |
David | Aardsma | BOS | 2008 | NYA |
Kevin | Cash | BOS | 2008 | NYA |
Bartolo | Colon | BOS | 2008 | NYA |
Billy | Traber | NYA | 2008 | BOS |
Eric | Hinske | BOS | 2007 | NYA |
Nick | Green | NYA | 2006 | BOS |
Mark | Bellhorn | BOS | 2005 | NYA |
Johnny | Damon | BOS | 2005 | NYA |
Alan | Embree | BOS | 2005 | NYA |
Mike | Myers | BOS | 2005 | NYA |
Mike | Stanton | BOS | 2005 | NYA |
Derek | Lowe | BOS | 2004 | NYA |
Ramiro | Mendoza | BOS | 2004 | NYA |
Doug | Mientkiewicz | BOS | 2004 | NYA |
... | ... | ... | ... | ... |
Roy | Johnson | BOS | 1935 | NYA |
Bob | Seeds | BOS | 1934 | NYA |
Doc | Farrell | NYA | 1933 | BOS |
Pete | Appleton | BOS | 1932 | NYA |
Fred | Heimach | BOS | 1926 | NYA |
Fred | Hofmann | NYA | 1925 | BOS |
Ben | Shields | NYA | 1925 | BOS |
Pee-Wee | Wanninger | NYA | 1925 | BOS |
Milt | Gaston | NYA | 1924 | BOS |
George | Burns | BOS | 1923 | NYA |
Ben | Paschal | BOS | 1920 | NYA |
Joe | Lucey | NYA | 1920 | BOS |
Braggo | Roth | BOS | 1919 | NYA |
Tim | Hendryx | NYA | 1917 | BOS |
Marty | McHale | NYA | 1915 | BOS |
Joe | Harris | NYA | 1914 | BOS |
Frank | Truesdale | NYA | 1914 | BOS |
Jimmy | Walsh | NYA | 1914 | BOS |
George | Whiteman | NYA | 1913 | BOS |
Marty | McHale | BOS | 1911 | NYA |
Eddie | Foster | NYA | 1910 | BOS |
Neal | Ball | NYA | 1909 | BOS |
Lou | Criger | BOS | 1908 | NYA |
John | Knight | BOS | 1907 | NYA |
George | Whiteman | BOS | 1907 | NYA |
Louis | LeRoy | NYA | 1906 | BOS |
Jack | Thoney | NYA | 1904 | BOS |
Doc | Adkins | BOS | 1902 | NYA |
Fred | Mitchell | BOS | 1902 | NYA |
Charlie | Hemphill | BOS | 1901 | NYA |
We can take a quick look at the number of players switching teams, and the frequency over the years. We can see that pretty much since free agancy started (the mid 1970's) the Yankees have signed far more players from the Red Sox than vice versa.
free_agents %>% filter(next_team == 'NYA') %>% dplyr::select(yearID) %>% pull() %>% hist(col = rgb(0,0,1,0.75), main = '', breaks = 12)
free_agents %>% filter(next_team == 'BOS') %>% dplyr::select(yearID) %>% pull() %>% hist(col = rgb(1,0,0,0.75), add = T)
legend('topleft', legend = c('BOS -> NY', 'NY -> BOS'), fill = c(rgb(0,0,1,0.75),rgb(1,0,0,0.75)))
A friend was wondering about players who went from one team to the other, and then won a World Series.
We'll start by identifying the first year on each team, so that we can tell which one came first.
ny_and_bos <- ny_bos %>% filter(teamID == 'NYA') %>%
inner_join(ny_bos %>% filter(teamID == 'BOS'), by = 'playerID') %>%
distinct() %>%
group_by(playerID, teamID.x) %>%
mutate(min_NY = min(yearID.x)) %>%
ungroup() %>%
group_by(playerID, teamID.y) %>%
mutate(min_BOS = min(yearID.y)) %>%
ungroup()
Then identify the World Series winners.
ws_winners <- ny_bos %>%
left_join(Lahman::SeriesPost %>% filter(round == 'WS'), by = 'yearID') %>%
mutate(WS_winner = case_when(
as.character(teamID) == as.character(teamIDwinner) & teamID == 'BOS' ~ 'BOS',
as.character(teamID) == as.character(teamIDwinner) & teamID == 'NYA' ~ 'NY')) %>%
dplyr::select(playerID:yearID, WS_winner) %>%
filter(WS_winner %in% c('BOS','NY'))
All the years of players who played in championships after switching teams.
winning_years <- ny_and_bos %>%
inner_join(dplyr::select(ws_winners,c(playerID, WS_winner, yearID)), by = 'playerID') %>%
filter(yearID == yearID.x | yearID == yearID.y) %>%
distinct() %>%
mutate(movement = case_when(WS_winner == 'NY' & min_BOS < yearID & min_NY > min_BOS ~ 'BOS-NY',
WS_winner == 'BOS' & min_NY < yearID & min_BOS > min_NY ~ 'NY-BOS')) %>%
filter(movement %in% c('BOS-NY','NY-BOS')) %>%
distinct(playerID, yearID, movement) %>%
inner_join(dplyr::select(Lahman::Master, c(playerID, nameFirst, nameLast)), by = 'playerID') %>%
dplyr::select(nameFirst, nameLast, yearID, movement) %>%
data.frame()
winning_years
nameFirst | nameLast | yearID | movement |
---|---|---|---|
Neal | Ball | 1912 | NY-BOS |
Clyde | Engle | 1912 | NY-BOS |
Frank | Truesdale | 1918 | NY-BOS |
Jimmy | Walsh | 1916 | NY-BOS |
Bullet Joe | Bush | 1923 | BOS-NY |
Waite | Hoyt | 1923 | BOS-NY |
Sad Sam | Jones | 1923 | BOS-NY |
Carl | Mays | 1923 | BOS-NY |
Mike | McNally | 1923 | BOS-NY |
Herb | Pennock | 1923 | BOS-NY |
Babe | Ruth | 1923 | BOS-NY |
Wally | Schang | 1923 | BOS-NY |
Everett | Scott | 1923 | BOS-NY |
Waite | Hoyt | 1927 | BOS-NY |
Ben | Paschal | 1927 | BOS-NY |
Herb | Pennock | 1927 | BOS-NY |
Babe | Ruth | 1927 | BOS-NY |
George | Burns | 1928 | BOS-NY |
Fred | Heimach | 1928 | BOS-NY |
Waite | Hoyt | 1928 | BOS-NY |
Ben | Paschal | 1928 | BOS-NY |
Herb | Pennock | 1928 | BOS-NY |
Babe | Ruth | 1928 | BOS-NY |
Danny | MacFayden | 1932 | BOS-NY |
Herb | Pennock | 1932 | BOS-NY |
Red | Ruffing | 1932 | BOS-NY |
Babe | Ruth | 1932 | BOS-NY |
Roy | Johnson | 1936 | BOS-NY |
Red | Ruffing | 1936 | BOS-NY |
Bob | Seeds | 1936 | BOS-NY |
... | ... | ... | ... |
Wes | Ferrell | 1939 | BOS-NY |
Red | Ruffing | 1939 | BOS-NY |
Red | Ruffing | 1941 | BOS-NY |
Bobo | Newsom | 1947 | BOS-NY |
Jack | Kramer | 1951 | BOS-NY |
Jim | McDonald | 1952 | BOS-NY |
Ray | Scarborough | 1952 | BOS-NY |
Jim | McDonald | 1953 | BOS-NY |
Ray | Scarborough | 1953 | BOS-NY |
Mickey | McDermott | 1956 | BOS-NY |
Tex | Clevenger | 1961 | BOS-NY |
Hal | Brown | 1962 | BOS-NY |
Tex | Clevenger | 1962 | BOS-NY |
Sparky | Lyle | 1977 | BOS-NY |
Sparky | Lyle | 1978 | BOS-NY |
Wade | Boggs | 1996 | BOS-NY |
Ramiro | Mendoza | 2004 | NY-BOS |
Mike | Lowell | 2007 | NY-BOS |
Mike | Stanton | 1998 | BOS-NY |
Roger | Clemens | 1999 | BOS-NY |
Tony | Fossas | 1999 | BOS-NY |
Jeff | Manto | 1999 | BOS-NY |
Mike | Stanton | 1999 | BOS-NY |
Jose | Canseco | 2000 | BOS-NY |
Roger | Clemens | 2000 | BOS-NY |
Mike | Stanton | 2000 | BOS-NY |
Alfredo | Aceves | 2013 | NY-BOS |
Kevin | Cash | 2009 | BOS-NY |
Johnny | Damon | 2009 | BOS-NY |
Eric | Hinske | 2009 | BOS-NY |
A plot of the players in championships. Pretty much impossible for players to go from the Yankees to Red Sox championship teams for the better part of the 20th century.
winning_years %>% filter(movement == 'BOS-NY') %>% dplyr::select(yearID) %>% pull() %>% hist(col = rgb(0,0,1,0.75), main = '', xlim = c(1915, 2010), add = F)
winning_years %>% filter(movement == 'NY-BOS') %>% dplyr::select(yearID) %>% pull() %>% hist(col = rgb(1,0,0,0.75), add = T)
legend('topright', legend = c('BOS -> NY', 'NY -> BOS'), fill = c(rgb(0,0,1,0.75),rgb(1,0,0,0.75)))
We can get the totals.
winning_years %>%
distinct(nameFirst, nameLast, movement) %>%
group_by(movement) %>%
tally() %>%
rename('count' = 'n')
movement | count |
---|---|
BOS-NY | 35 |
NY-BOS | 7 |
And finally, all the players that switched to championship teams.
winning_years %>%
distinct(nameFirst, nameLast, movement) %>%
arrange(movement, nameLast)
nameFirst | nameLast | movement |
---|---|---|
Wade | Boggs | BOS-NY |
Hal | Brown | BOS-NY |
George | Burns | BOS-NY |
Bullet Joe | Bush | BOS-NY |
Jose | Canseco | BOS-NY |
Kevin | Cash | BOS-NY |
Roger | Clemens | BOS-NY |
Tex | Clevenger | BOS-NY |
Babe | Dahlgren | BOS-NY |
Johnny | Damon | BOS-NY |
Wes | Ferrell | BOS-NY |
Tony | Fossas | BOS-NY |
Fred | Heimach | BOS-NY |
Eric | Hinske | BOS-NY |
Waite | Hoyt | BOS-NY |
Roy | Johnson | BOS-NY |
Sad Sam | Jones | BOS-NY |
Jack | Kramer | BOS-NY |
Sparky | Lyle | BOS-NY |
Danny | MacFayden | BOS-NY |
Jeff | Manto | BOS-NY |
Carl | Mays | BOS-NY |
Mickey | McDermott | BOS-NY |
Jim | McDonald | BOS-NY |
Mike | McNally | BOS-NY |
Bobo | Newsom | BOS-NY |
Ben | Paschal | BOS-NY |
Herb | Pennock | BOS-NY |
Red | Ruffing | BOS-NY |
Babe | Ruth | BOS-NY |
Ray | Scarborough | BOS-NY |
Wally | Schang | BOS-NY |
Everett | Scott | BOS-NY |
Bob | Seeds | BOS-NY |
Mike | Stanton | BOS-NY |
Alfredo | Aceves | NY-BOS |
Neal | Ball | NY-BOS |
Clyde | Engle | NY-BOS |
Mike | Lowell | NY-BOS |
Ramiro | Mendoza | NY-BOS |
Frank | Truesdale | NY-BOS |
Jimmy | Walsh | NY-BOS |