Skip to content

Instantly share code, notes, and snippets.

View tomhopper's full-sized avatar

Tom Hopper tomhopper

  • Michigan, United States
View GitHub Profile
@tomhopper
tomhopper / addNewData.R
Created October 9, 2016 00:12 — forked from dfalster/addNewData.R
The function addNewData.R modifies a data frame with a lookup table. This is useful where you want to supplement data loaded from file with other data, e.g. to add details, change treatment names, or similar. The function readNewData is also included. This function runs some checks on the new table to ensure it has correct variable names and val…
##' Modifies 'data' by adding new values supplied in newDataFileName
##'
##' newDataFileName is expected to have columns
##' c(lookupVariable,lookupValue,newVariable,newValue,source)
##'
##' Within the column 'newVariable', replace values that
##' match 'lookupValue' within column 'lookupVariable' with the value
##' newValue'. If 'lookupVariable' is NA, then replace *all* elements
##' of 'newVariable' with the value 'newValue'.
##'
@tomhopper
tomhopper / dplyr_filter_ungroup.R
Created January 29, 2016 20:31 — forked from jhofman/dplyr_filter_ungroup.R
careful when filtering with many groups in dplyr
library(dplyr)
# create a dummy dataframe with 100,000 groups and 1,000,000 rows
# and partition by group_id
df <- data.frame(group_id=sample(1:1e5, 1e6, replace=T),
val=sample(1:100, 1e6, replace=T)) %>%
group_by(group_id)
# filter rows with a value of 1 naively
system.time(df %>% filter(val == 1))
# Create 2 replicates of 5 "words" generated from random characters,
# each "word" 5 - 15 characters long, with word length following a
# poisson distribution.
rep(replicate(5, paste(sample(letters, round(rpois(5000, lambda = 3)+5, 0), replace = FALSE), collapse = "")), 2)
# Sample output:
# [1] "rfexnwyjst" "vwtadhjnly" "ztfgvldo" "tmerol" "mcqhosap" "rfexnwyjst" "vwtadhjnly" "ztfgvldo" "tmerol"
#[10] "mcqhosap"