Skip to content

Instantly share code, notes, and snippets.

Avatar
🎯
Multi-tasking

Martin Monkman MonkmanMH

🎯
Multi-tasking
View GitHub Profile
@MonkmanMH
MonkmanMH / mutate_alternate.r
Created Jul 5, 2017
mutate alternate values
View mutate_alternate.r
library(tidyverse)
datatab <- as.tibble(c(1:10))
# modulo division
datatab$value %% 2
# since we have alternating even and odd value in "value" variable
datatab %>%
mutate(valueplus = ifelse((value %% 2) == 0, "even", "odd"))
@MonkmanMH
MonkmanMH / col_name.r
Created Jul 5, 2017
column naming loop
View col_name.r
# Problem:
# - row 2 of data file has non-data title that repeats every two columns
# - column 1 / row 1 header label is fine
# - the header in every even-numbered column applies to the next odd-humbered column (eg 2 applies to 3, 4 to 5, etc)
# - the header in those odd-numbered columns (3, 5, 7, etc) is read initially as an NA
# Solution
# - read column names only
# - hard code even and odd suffix
# - copy header value in those even columns to odd columns
@MonkmanMH
MonkmanMH / datefixLahman.R
Created Jan 6, 2016
Work-around quick fix for inconsistent date values in the Master table of the Lahman package (R)
View datefixLahman.R
#
library(Lahman)
data(Master)
#
# `debut` variable; create new version `debutDate`
Master$debutDate <- (as.Date(Master$debut, "%m/%d/%Y"))
Master$debutDate[is.na(Master$debutDate)] <-
as.Date(Master$debut[is.na(Master$debutDate)])
#
# `finalGame` variable; create new version `finalGameDate`
View gist:efdf9c772054131ca22f
---
title: "Testing Lahman 3.0"
author: "Martin Monkman"
date: "Sunday, August 31, 2014"
output: html_document
---
This markdown document incorporates a variety of short scripts that draw on the various tables in the `Lahman` package. (See the Lahman project page on RForge for more details <http://lahman.r-forge.r-project.org/>.)
Note that some of scripts appear in the documentation of other R packages; in those cases, the original source is noted prior to the script.
@MonkmanMH
MonkmanMH / gist:0f92cba504f2e7f11bba
Created Jul 29, 2014
Wes Anderson palette in R
View gist:0f92cba504f2e7f11bba
if (!require(wesanderson)) install.packages("wesanderson")
library(wesanderson)
# for more on the Wes Anderson colour palette:
# https://github.com/karthik/wesanderson#wes-anderson-palettes
# http://blog.revolutionanalytics.com/2014/03/give-your-r-charts-that-wes-anderson-style.html
#
#
#
# add some Wes Anderson "Grand Budapest Hotel" colour to print object "p2"
p2 + scale_fill_manual(values = wes.palette(4, "GrandBudapest"))
@MonkmanMH
MonkmanMH / gist:3c0da6afd58eb61e2c51
Last active Aug 29, 2015
dplyr testing and goofing
View gist:3c0da6afd58eb61e2c51
#
# setwd("D:/R_the software/datatrials/Lahman")
#
require(Lahman)
require(dplyr)
#
# throwing by position
# version 1 - "merge"
MasterFielding <- data.frame(merge(Master, Fielding, by="playerID"))
MasterFielding <- merge(Master, Fielding, by="playerID")
@MonkmanMH
MonkmanMH / gist:9190970
Last active Sep 12, 2019
Categorical data analysis in R - a resource list
View gist:9190970
@MonkmanMH
MonkmanMH / gist:8798762
Last active Aug 29, 2015
Percentile function in R
View gist:8798762
# CALCULATING PERCENTILES IN R
#
# a basic percentile function using "ecdf" [Empirical Cumulative Distribution Function]
# using a data file "percentiledata" with variable VALUE
percentileFUN <- ecdf(percentiledata$VALUE)
percentileFUN
percentileFUN(percentiledata$VALUE)
# write the percentile values to the source file
percentiledata$pctl <- percentilefunction(percentiledata$VALUE)
#
@MonkmanMH
MonkmanMH / gist:7740998
Last active Sep 6, 2020
Random number generation in R (rstats, #rstats)
View gist:7740998

Random numbers in R

The creation of random numbers, or the random selection of elements in a set (or population), is an important part of statistics and data science. From simulating coin tosses to selecting potential respondents for a survey, we have a heavy reliance on random number generation.

R offers us a variety of solutions for random number generation; here's a quick overview of some of the options.

runif, rbinom, rnorm

One simple solution is to use the runif function, which generates a stated number of values between two end points (but not the end points themselves!) The function uses the continuous uniform distribution, meaning that every value between the two end points has an equal probability of being sampled.

You can’t perform that action at this time.