Skip to content

Instantly share code, notes, and snippets.

@czsheets
Last active October 29, 2018 01:33
Show Gist options
  • Save czsheets/cd89b3bd7785c0b61daa94a706eedd0c to your computer and use it in GitHub Desktop.
Save czsheets/cd89b3bd7785c0b61daa94a706eedd0c to your computer and use it in GitHub Desktop.
How do players move between the Yankees and Red Sox?

As the baseball season comes to a close, I've been in a few discussions about players switching from the New York Yankees and the Boston Red Sox. Below I've used RStudio and the Lahman and Retrosheet databases to answer a few of those questions.

  The necessary packages:

library(sqldf)
library(Lahman)
library(dplyr)
library(readr)

We'll start by identifying all the players who played for the Yankees and Red Sox.

ny_bos_batters <- Batting %>% filter(teamID %in% c('NYA','BOS')) %>% select(playerID, teamID, yearID)
ny_bos_pitchers <- Pitching %>% filter(teamID %in% c('NYA','BOS')) %>% select(playerID, teamID, yearID)
ny_bos <- rbind(ny_bos_batters, ny_bos_pitchers) %>% arrange(yearID)

Given their history, the Yankees and Red Sox don't make too many direct trades. We can get trade data up to November 22, 2017 (20171122) from Retrosheet transactions page.

trades <- read_csv("G:/My Drive/MSA/R simulations/trades.csv", col_names = FALSE)
colnames(trades) <- c('primary_date', 'time', 'approximate_indicator', 'secondary_date', 'approximate_indicator_secondary', 'transaction_ID', 'player', 'type', 'from_team', 'from_league', 'to_team', 'to_league', 'draft_type', 'draft_round', 'pick_number', 'info')

We'll identify all the players who've been traded between the Yankees and Red Sox.

ny_bos_trades <- trades %>% 
  dplyr::select(player, from_team, to_team, primary_date, type) %>%
  filter(from_team %in% c('NYA','BOS') & to_team %in% c('NYA','BOS')) %>%
  arrange(desc(primary_date))

I created a text file with the trade definitions from the readme page, and joined it to the original table so that we can see what type of transactions were most frequent between the teams.

trade_codebook <- read_delim("G:/My Drive/MSA/R simulations/trade_codebook.txt", "-", escape_double = FALSE, col_names = FALSE, trim_ws = TRUE)
names(trade_codebook) <- c('type', 'definition')
trade_codebook <- na.omit(trade_codebook)
updat_ny_bos <- ny_bos_trades %>% inner_join(trade_codebook, by = 'type')
trade_types <- updat_ny_bos %>% 
  group_by(definition, from_team) %>% 
  summarise(most_recent = max(primary_date), count = n()) %>%
  arrange(desc(count))
trade_types
definitionfrom_teammost_recentcount
trade NYA 20140731 52
trade BOS 20140731 42
purchase BOS 19940901 12
purchase NYA 19460618 8
waiver pick BOS 20120704 5
rule 5 draft pick NYA 20031215 4
waiver pick NYA 20040108 4
unknown (could have been two separate transactions)NYA 19490000 2
returned to original team after draft selection BOS 20040318 1
rule 5 draft pick BOS 19681202 1

 
If we want the player names, we need to bring in one more table from Retrosheets

retrosheet_ids <- read_csv("G:/My Drive/MSA/R simulations/retrosheet_ids.txt")

We can see the players involved in the most recent transactions. This involves joining four tables, which gets cluttered in dplyr. We'll switch to sqldf for sql:

sqldf("select b.definition Definition, d.First, d.Last, a.from_team, a.primary_date date
  from trades a
  inner join trade_types b
  on a.from_team = b.from_team
    and a.primary_date = b.most_recent
  inner join trade_codebook c
    on a.type = c.type
  inner join retrosheet_ids d
    on a.player = d.ID
  where a.to_team in ('BOS','NYA')
    and a.player is not NULL
    and b.definition not like 'unknown%'
  order by 5 desc")
DefinitionFirstLastfrom_teamdate
trade Stephen Drew BOS 20140731
trade Kelly Johnson NYA 20140731
waiver pick Darnell McDonald BOS 20120704
returned to original team after draft selectionColter Bean BOS 20040318
waiver pick Michel Hernandez NYA 20040108
rule 5 draft pick Colter Bean NYA 20031215
purchase Scott Bankhead BOS 19940901
rule 5 draft pick Bobby Mitchell BOS 19681202
purchase Bill Zuber NYA 19460618

    And we can get all of the trades in one table:

all_trades <- sqldf("select c.First, c.Last, b.definition, a.from_team, a.primary_date date
  from ny_bos_trades a
  inner join trade_codebook b
    on a.type = b.type
  inner join retrosheet_ids c
    on a.player = c.ID
  where a.player is not NULL
  order by 5 desc")
all_trades

Not too many trades or waiver wire claims between the Yankees and Red Sox anymore, and purchases have pretty much ended (but not before the greatest one in the history of baseball in 1919). As rivals, neither team wants to make a bad trade, plus there are far more teams to trade with than there were prior to the 60's.

all_trades %>% filter(from_team == 'BOS') %>% transmute(yearID = as.integer(substr(date,1,4))) %>% pull() %>% hist(col = rgb(0,0,1,0.75), main = '', breaks = 12)
all_trades %>% filter(from_team == 'NYA') %>% transmute(yearID = as.integer(substr(date,1,4))) %>% pull() %>% hist(col = rgb(1,0,0,0.75), breaks = 12, add = T)
legend('topright', legend = c('BOS -> NY', 'NY -> BOS'), fill = c(rgb(0,0,1,0.75),rgb(1,0,0,0.75)))

png

An earlier question I'd discussed with someone was all the players who moved from one team to the other the next year. We were talking specifically of Jacoby Ellsbury and Johny Damon at the time, two Red Sox players who left to sign massive contracts with the Yankees.

  We can start by identifying all of the players who switched from the Yankees to the Red Sox, or vice versa, in the next year.

switched <- ny_bos %>%
  arrange(playerID, yearID, teamID) %>%
  mutate(next_team = lead(teamID), 
         next_ID = lead(playerID)) %>%
  distinct %>%
  mutate(to_yankees = ifelse(teamID == 'BOS' & next_team == 'NYA' & playerID == next_ID, 1, 0),
         to_red_sox = ifelse(teamID == 'NYA' & next_team == 'BOS' & playerID == next_ID, 1, 0)) %>%
  ungroup() %>%
  inner_join(Lahman::Master, by = 'playerID') %>%
  dplyr::select(nameFirst, nameLast, teamID, yearID, next_team, to_yankees, to_red_sox) %>%
  filter(to_yankees == 1 | to_red_sox == 1) %>%
  arrange(desc(yearID), teamID)

We'll remove any players who were traded, identified in the earlier table.

free_agents <- sqldf("select a.*, b.First
  from switched a
  left join (select b.First, b.Last
        from ny_bos_trades a
        inner join retrosheet_ids b
        on a.player = b.ID) b
  on a.nameLast = b.Last
    and a.nameFirst = b.First
  where b.First is NULL
  order by a.yearID desc")
free_agents %>% select(nameFirst:next_team)
nameFirstnameLastteamIDyearIDnext_team
Tommy Layne BOS 2016 NYA
Chris Young NYA 2015 BOS
Chris Capuano BOS 2014 NYA
Andrew Miller BOS 2014 NYA
Rich Hill NYA 2014 BOS
Alfredo Aceves BOS 2013 NYA
Andrew Bailey BOS 2013 NYA
Jacoby Ellsbury BOS 2013 NYA
Matt Thornton BOS 2013 NYA
Rich Hill BOS 2012 NYA
Brent Lillibridge BOS 2012 NYA
Kevin Youkilis BOS 2012 NYA
Gustavo Molina BOS 2010 NYA
Alfredo Aceves NYA 2010 BOS
Mark Melancon NYA 2010 BOS
Kevin Cash NYA 2009 BOS
David Aardsma BOS 2008 NYA
Kevin Cash BOS 2008 NYA
Bartolo Colon BOS 2008 NYA
Billy Traber NYA 2008 BOS
Eric Hinske BOS 2007 NYA
Nick Green NYA 2006 BOS
Mark Bellhorn BOS 2005 NYA
Johnny Damon BOS 2005 NYA
Alan Embree BOS 2005 NYA
Mike Myers BOS 2005 NYA
Mike Stanton BOS 2005 NYA
Derek Lowe BOS 2004 NYA
Ramiro Mendoza BOS 2004 NYA
Doug MientkiewiczBOS 2004 NYA
...............
Roy Johnson BOS 1935 NYA
Bob Seeds BOS 1934 NYA
Doc Farrell NYA 1933 BOS
Pete Appleton BOS 1932 NYA
Fred Heimach BOS 1926 NYA
Fred Hofmann NYA 1925 BOS
Ben Shields NYA 1925 BOS
Pee-Wee WanningerNYA 1925 BOS
Milt Gaston NYA 1924 BOS
George Burns BOS 1923 NYA
Ben Paschal BOS 1920 NYA
Joe Lucey NYA 1920 BOS
Braggo Roth BOS 1919 NYA
Tim Hendryx NYA 1917 BOS
Marty McHale NYA 1915 BOS
Joe Harris NYA 1914 BOS
Frank TruesdaleNYA 1914 BOS
Jimmy Walsh NYA 1914 BOS
George Whiteman NYA 1913 BOS
Marty McHale BOS 1911 NYA
Eddie Foster NYA 1910 BOS
Neal Ball NYA 1909 BOS
Lou Criger BOS 1908 NYA
John Knight BOS 1907 NYA
George Whiteman BOS 1907 NYA
Louis LeRoy NYA 1906 BOS
Jack Thoney NYA 1904 BOS
Doc Adkins BOS 1902 NYA
Fred Mitchell BOS 1902 NYA
Charlie Hemphill BOS 1901 NYA

  We can take a quick look at the number of players switching teams, and the frequency over the years. We can see that pretty much since free agancy started (the mid 1970's) the Yankees have signed far more players from the Red Sox than vice versa.

free_agents %>% filter(next_team == 'NYA') %>% dplyr::select(yearID) %>% pull() %>% hist(col = rgb(0,0,1,0.75), main = '', breaks = 12)
free_agents %>% filter(next_team == 'BOS') %>% dplyr::select(yearID) %>% pull() %>% hist(col = rgb(1,0,0,0.75), add = T)
legend('topleft', legend = c('BOS -> NY', 'NY -> BOS'), fill = c(rgb(0,0,1,0.75),rgb(1,0,0,0.75)))

png

A friend was wondering about players who went from one team to the other, and then won a World Series.

  We'll start by identifying the first year on each team, so that we can tell which one came first.

ny_and_bos <- ny_bos %>% filter(teamID == 'NYA') %>%
  inner_join(ny_bos %>% filter(teamID == 'BOS'), by = 'playerID') %>%
  distinct() %>%
  group_by(playerID, teamID.x) %>%
  mutate(min_NY = min(yearID.x)) %>%
  ungroup() %>%
  group_by(playerID, teamID.y) %>%
  mutate(min_BOS = min(yearID.y)) %>%
  ungroup()

Then identify the World Series winners.

ws_winners <- ny_bos %>% 
  left_join(Lahman::SeriesPost %>% filter(round == 'WS'), by = 'yearID') %>%
  mutate(WS_winner = case_when(
      as.character(teamID) == as.character(teamIDwinner) & teamID == 'BOS' ~ 'BOS',
      as.character(teamID) == as.character(teamIDwinner) & teamID == 'NYA' ~ 'NY')) %>%
  dplyr::select(playerID:yearID, WS_winner) %>% 
  filter(WS_winner %in% c('BOS','NY'))

All the years of players who played in championships after switching teams.

winning_years <- ny_and_bos %>% 
  inner_join(dplyr::select(ws_winners,c(playerID, WS_winner, yearID)), by = 'playerID') %>%
  filter(yearID == yearID.x | yearID == yearID.y) %>%
  distinct() %>%
  mutate(movement = case_when(WS_winner == 'NY' & min_BOS < yearID & min_NY > min_BOS ~ 'BOS-NY',
                              WS_winner == 'BOS' & min_NY < yearID & min_BOS > min_NY ~ 'NY-BOS')) %>%
  filter(movement %in% c('BOS-NY','NY-BOS')) %>% 
  distinct(playerID, yearID, movement) %>%
  inner_join(dplyr::select(Lahman::Master, c(playerID, nameFirst, nameLast)), by = 'playerID') %>%
  dplyr::select(nameFirst, nameLast, yearID, movement) %>%
  data.frame()

winning_years
nameFirstnameLastyearIDmovement
Neal Ball 1912 NY-BOS
Clyde Engle 1912 NY-BOS
Frank Truesdale 1918 NY-BOS
Jimmy Walsh 1916 NY-BOS
Bullet JoeBush 1923 BOS-NY
Waite Hoyt 1923 BOS-NY
Sad Sam Jones 1923 BOS-NY
Carl Mays 1923 BOS-NY
Mike McNally 1923 BOS-NY
Herb Pennock 1923 BOS-NY
Babe Ruth 1923 BOS-NY
Wally Schang 1923 BOS-NY
Everett Scott 1923 BOS-NY
Waite Hoyt 1927 BOS-NY
Ben Paschal 1927 BOS-NY
Herb Pennock 1927 BOS-NY
Babe Ruth 1927 BOS-NY
George Burns 1928 BOS-NY
Fred Heimach 1928 BOS-NY
Waite Hoyt 1928 BOS-NY
Ben Paschal 1928 BOS-NY
Herb Pennock 1928 BOS-NY
Babe Ruth 1928 BOS-NY
Danny MacFayden 1932 BOS-NY
Herb Pennock 1932 BOS-NY
Red Ruffing 1932 BOS-NY
Babe Ruth 1932 BOS-NY
Roy Johnson 1936 BOS-NY
Red Ruffing 1936 BOS-NY
Bob Seeds 1936 BOS-NY
............
Wes Ferrell 1939 BOS-NY
Red Ruffing 1939 BOS-NY
Red Ruffing 1941 BOS-NY
Bobo Newsom 1947 BOS-NY
Jack Kramer 1951 BOS-NY
Jim McDonald 1952 BOS-NY
Ray Scarborough1952 BOS-NY
Jim McDonald 1953 BOS-NY
Ray Scarborough1953 BOS-NY
Mickey McDermott 1956 BOS-NY
Tex Clevenger 1961 BOS-NY
Hal Brown 1962 BOS-NY
Tex Clevenger 1962 BOS-NY
Sparky Lyle 1977 BOS-NY
Sparky Lyle 1978 BOS-NY
Wade Boggs 1996 BOS-NY
Ramiro Mendoza 2004 NY-BOS
Mike Lowell 2007 NY-BOS
Mike Stanton 1998 BOS-NY
Roger Clemens 1999 BOS-NY
Tony Fossas 1999 BOS-NY
Jeff Manto 1999 BOS-NY
Mike Stanton 1999 BOS-NY
Jose Canseco 2000 BOS-NY
Roger Clemens 2000 BOS-NY
Mike Stanton 2000 BOS-NY
Alfredo Aceves 2013 NY-BOS
Kevin Cash 2009 BOS-NY
Johnny Damon 2009 BOS-NY
Eric Hinske 2009 BOS-NY

  A plot of the players in championships. Pretty much impossible for players to go from the Yankees to Red Sox championship teams for the better part of the 20th century.

winning_years %>% filter(movement == 'BOS-NY') %>% dplyr::select(yearID) %>% pull() %>% hist(col = rgb(0,0,1,0.75), main = '', xlim = c(1915, 2010), add = F)
winning_years %>% filter(movement == 'NY-BOS') %>% dplyr::select(yearID) %>% pull() %>% hist(col = rgb(1,0,0,0.75), add = T)
legend('topright', legend = c('BOS -> NY', 'NY -> BOS'), fill = c(rgb(0,0,1,0.75),rgb(1,0,0,0.75)))

png

We can get the totals.

winning_years %>% 
  distinct(nameFirst, nameLast, movement) %>% 
  group_by(movement) %>% 
  tally() %>%
  rename('count' = 'n')
movementcount
BOS-NY35
NY-BOS 7

  And finally, all the players that switched to championship teams.

winning_years %>% 
  distinct(nameFirst, nameLast, movement) %>%
  arrange(movement, nameLast)
nameFirstnameLastmovement
Wade Boggs BOS-NY
Hal Brown BOS-NY
George Burns BOS-NY
Bullet Joe Bush BOS-NY
Jose Canseco BOS-NY
Kevin Cash BOS-NY
Roger Clemens BOS-NY
Tex Clevenger BOS-NY
Babe Dahlgren BOS-NY
Johnny Damon BOS-NY
Wes Ferrell BOS-NY
Tony Fossas BOS-NY
Fred Heimach BOS-NY
Eric Hinske BOS-NY
Waite Hoyt BOS-NY
Roy Johnson BOS-NY
Sad Sam Jones BOS-NY
Jack Kramer BOS-NY
Sparky Lyle BOS-NY
Danny MacFayden BOS-NY
Jeff Manto BOS-NY
Carl Mays BOS-NY
Mickey McDermott BOS-NY
Jim McDonald BOS-NY
Mike McNally BOS-NY
Bobo Newsom BOS-NY
Ben Paschal BOS-NY
Herb Pennock BOS-NY
Red Ruffing BOS-NY
Babe Ruth BOS-NY
Ray ScarboroughBOS-NY
Wally Schang BOS-NY
Everett Scott BOS-NY
Bob Seeds BOS-NY
Mike Stanton BOS-NY
Alfredo Aceves NY-BOS
Neal Ball NY-BOS
Clyde Engle NY-BOS
Mike Lowell NY-BOS
Ramiro Mendoza NY-BOS
Frank Truesdale NY-BOS
Jimmy Walsh NY-BOS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment