Skip to content

Instantly share code, notes, and snippets.

View MonkmanMH's full-sized avatar
🎯
Multi-tasking

Martin Monkman MonkmanMH

🎯
Multi-tasking
View GitHub Profile
@MonkmanMH
MonkmanMH / geom_bar_col.Rmd
Created April 19, 2020 15:55
geom_bar vs geom_col
---
title: "geom_col vs geom_bar"
author: "Martin Monkman"
date: "2020/04/19"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
@MonkmanMH
MonkmanMH / transform_data_solutions.Rmd
Created November 4, 2019 03:34
Answer key for tidyverse transform data exercises
---
title: "Transform Data"
subtitle: "hands-on examples, with answers"
output: html_notebook
---
<!-- This file by Charlotte Wickham (with some modifications by Martin Monkman) is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio and https://github.com/cwickham/data-science-in-tidyverse-solutions. -->
```{r setup}
library(tidyverse)
### ---
#
# from @expersso
set.seed(894) # number of regular season NHL goals Wayne Gretzky scored
x <- replicate(10000, sum(sample(0:1, 20, TRUE, c(0.945, 0.055))))
table(ifelse(x == 0, "Team A win", ifelse(x == 1, "Draw", "Team B win"))) / 100
@MonkmanMH
MonkmanMH / gist:5802497
Created June 18, 2013 03:30
Annotating select points on an X-Y plot using ggplot2
#
# for details see
# http://bayesball.blogspot.ca/2013/06/annotating-select-points-on-x-y-plot.html
#
# load the ggplot2 and grid packages
library(ggplot2)
library(grid)
# read data (note csv files are renamed)
tbl1 = read.csv("FanGraphs_Leaderboard_h.csv")
tbl2 = read.csv("FanGraphs_Leaderboard_d.csv")
@MonkmanMH
MonkmanMH / correlatedrandomvariables.r
Last active October 20, 2017 22:31
correlated random variables
## make it reproducible
set.seed(8675309)
## two normally distributed variables
X1 <- rnorm(100, mean = 0, sd = 1)
X2 <- rnorm(100, mean = 0, sd = 1)
@MonkmanMH
MonkmanMH / dummy_var.R
Last active October 18, 2017 18:51
quick
# quick example of mutate (in the dplyr R package) to create a dummy variable
# packages (from the tidyverse)
library(tibble)
library(dplyr)
# a little tibble with an ID number and a gender variable (5 Female, 3 Male, 2 Not Stated)
mydata <- tibble(id = 1:10, gender = c("F", "F", "F", "F", "F",
"M", "M", "M",
"NS", "NS"))
@MonkmanMH
MonkmanMH / mutate_alternate.r
Created July 5, 2017 02:40
mutate alternate values
library(tidyverse)
datatab <- as.tibble(c(1:10))
# modulo division
datatab$value %% 2
# since we have alternating even and odd value in "value" variable
datatab %>%
mutate(valueplus = ifelse((value %% 2) == 0, "even", "odd"))
@MonkmanMH
MonkmanMH / col_name.r
Created July 5, 2017 02:04
column naming loop
# Problem:
# - row 2 of data file has non-data title that repeats every two columns
# - column 1 / row 1 header label is fine
# - the header in every even-numbered column applies to the next odd-humbered column (eg 2 applies to 3, 4 to 5, etc)
# - the header in those odd-numbered columns (3, 5, 7, etc) is read initially as an NA
# Solution
# - read column names only
# - hard code even and odd suffix
# - copy header value in those even columns to odd columns
@MonkmanMH
MonkmanMH / gist:5711584
Created June 5, 2013 04:21
MLB runs per game (Lahman database)
# load the package and data set "Teams"
install.packages("Lahman")
library("Lahman")
data(Teams)
#
#
# CREATE LEAGUE SUMMARY TABLES
# ============================
#
# select a sub-set of teams from 1901 [the establishment of the American League] forward to 2012
@MonkmanMH
MonkmanMH / datefixLahman.R
Created January 6, 2016 05:11
Work-around quick fix for inconsistent date values in the Master table of the Lahman package (R)
#
library(Lahman)
data(Master)
#
# `debut` variable; create new version `debutDate`
Master$debutDate <- (as.Date(Master$debut, "%m/%d/%Y"))
Master$debutDate[is.na(Master$debutDate)] <-
as.Date(Master$debut[is.na(Master$debutDate)])
#
# `finalGame` variable; create new version `finalGameDate`