Skip to content

Instantly share code, notes, and snippets.

@MonkmanMH
Last active August 29, 2015 14:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MonkmanMH/efdf9c772054131ca22f to your computer and use it in GitHub Desktop.
Save MonkmanMH/efdf9c772054131ca22f to your computer and use it in GitHub Desktop.
Lahman 3.0 tests
---
title: "Testing Lahman 3.0"
author: "Martin Monkman"
date: "Sunday, August 31, 2014"
output: html_document
---
This markdown document incorporates a variety of short scripts that draw on the various tables in the `Lahman` package. (See the Lahman project page on RForge for more details <http://lahman.r-forge.r-project.org/>.)
Note that some of scripts appear in the documentation of other R packages; in those cases, the original source is noted prior to the script.
First of all, install and load the new version of `Lahman`, and load the `dplyr` package.
```{r}
# install.packages("Lahman", repos="http://R-Forge.R-project.org")
library(Lahman)
library(dplyr)
#
```
##Master & Fielding
This uses an inner join to merge the two tables `Master` and `Fielding` into a new data frame "MasterFielding". The first approach relies on the `merge` instruction, while the second uses the much faster `inner_join` command from `dplyr`.
```{r}
# throwing by position
# version 1 - "merge"
MasterFielding <- data.frame(merge(Master, Fielding, by="playerID"))
MasterFielding <- merge(Master, Fielding, by="playerID")
system.time(MasterFielding <- merge(Master, Fielding, by="playerID"))
# the dplyr version -- faster
MasterFielding <- inner_join(Fielding, Master, by="playerID")
system.time(MasterFielding <- inner_join(Fielding, Master, by="playerID"))
#
#
# a count of games played, by position
MasterFielding <- subset(MasterFielding, POS != "OF" & yearID > "1944")
#
MasterFielding %.%
group_by(playerID, POS, throws) %.%
summarise(gamecount = sum(G)) %.%
arrange(desc(gamecount)) %.%
head(5)
#
MasterFielding <- inner_join(Fielding, Master, by="playerID")
# select only those seasons since 1945 and
# omit the records that are OF summary (i.e. leave the RF, CF, and LF)
MasterFielding <- subset(MasterFielding, POS != "OF" & yearID > "1944")
#
```
Now with the MasterFielding table, we can use the `summarise` command (`dplyr`) to generate a new table, "Player_games", that in turn is summarised into a table counting the games played at each position.
```{r}
Player_games <- MasterFielding %.%
group_by(playerID, POS, throws) %.%
summarise(gamecount = sum(G))
Player_POS <- Player_games %.%
group_by(POS, throws) %.%
summarise(playercount = length(gamecount))
head(Player_POS)
#
```
##Batting (dplyr reference)
This chunk of code is lifted from the `dplyr` [Vignettes page](http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html), under the topic "Postgresql". NOTE: the package `postgres` is not available for R 3.1.1
```{r}
if (has_lahman("postgres")) {
hflights_postgres <- tbl(src_postgres("hflights"), "hflights")
}
```
##Reference list
[Lahman at CRAN](http://cran.r-project.org/web/packages/Lahman/index.html)
[Lahman at RForge](http://lahman.r-forge.r-project.org/)
### dplyr
[dplyr at CRAN](http://cran.r-project.org/web/packages/dplyr/index.html)
[dplyr: Introduction to dplyr (at CRAN)](http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html)
[dplyr: Vignettes](http://cran.r-project.org/web/packages/dplyr/vignettes/databases.html)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment