Skip to content

Instantly share code, notes, and snippets.

@eddjberry
eddjberry / sparklyr_cv_pipeline_example.R
Last active August 22, 2019 21:24
An example of creating a Spark pipeline with sparklyr
# Load packages
library(dplyr)
library(sparklyr)
# Set up connect
sc <- spark_connect(master = "local")
# Create a Spark DataFrame of mtcars
mtcars_sdf <- copy_to(sc, mtcars)
@eddjberry
eddjberry / sim_binom.R
Last active May 26, 2018 10:36
Simulate a binomial target and some features
sim_binom <- function(n_samples = 1000, n_features = 2,
true_target_prob = 0.5, beta = NULL, seed = NULL) {
if(!is.null(seed)) {
set.seed(seed)
}
x = matrix(rnorm(n_samples * n_features),
nrow = n_samples, ncol = n_features)
@eddjberry
eddjberry / split_df_csv.R
Created August 14, 2018 14:25
Create separate csv files of the data for each level of some categorical column
library(tidyverse)
# Nest iris by Species
iris_nest <- iris %>%
group_by(Species) %>%
nest()
# Get the data list and set the names of the list to Species
# write_csv for each df in the data list with its name as the filename
iris_nest %>%
@eddjberry
eddjberry / filter_at_remove_nas.R
Last active February 19, 2019 08:36
Using filter_at to remove rows with some or all NAs for a specified set of columns. If we wanted to do this for all columns we could use janitor::remove_empty('rows')
# create some data
(df <- data_frame(x = 1:2,
y = c(NA, NA),
z = c(NA, 3)))
# remove rows where either col y or z contain NA
# i.e. keep rows where all variables are not NA
df %>%
filter_at(vars(y:z), all_vars(!is.na(.)))
@eddjberry
eddjberry / show_palette_cols.R
Created January 24, 2019 11:48
Show the colours in a palette with hex codes
library(scales)
library(viridis)
show_col(viridis(12))
@eddjberry
eddjberry / tibble_select_column.R
Last active January 24, 2019 16:11
Different return types for selecting columns from a tibble
# create a tibble----------------------
tbl <- tibble::tibble(x = letters[1:5],
y = letters[5:1])
# returns a tibble --------------------
dplyr::select(tbl, x)
tbl[1]
tbl[, 1]
@eddjberry
eddjberry / str_proper.R
Last active February 19, 2019 11:56
Format a string such that the first character is upper case and the rest are lower case
# a function to format strings
# to be in Proper case
str_proper <- function(string) {
# get the first letter
first_letter = substring(string, first = 1, last = 1)
# get the other letters
other_letters = substring(string, first = 2)
# combine the first letter (upper case)
@eddjberry
eddjberry / group_prop.R
Last active April 27, 2020 11:07
Get counts and proportions by group(s)
group_prop <- function(df, ...) {
# enquo the dots
vars <- enquos(...)
# count then calculate
# proportions
df_count <- df %>%
count(!!!vars)
if (length(vars) > 1) {
@eddjberry
eddjberry / round_nearest.R
Created June 5, 2019 10:28
Round values to the nearest 0.05, 5, 10 etc.
# function to round a value to the nearest digit
# e.g. if nearest = 5 then 42 would round to 40
# and 47 would be rounded to 45
# source: http://r.789695.n4.nabble.com/Rounding-to-the-nearest-5-td863189.html
round_nearest <- function(x, nearest) {
nearest * round(x / nearest)
}
@eddjberry
eddjberry / prop_test_power_curves.R
Created July 12, 2019 15:40
Power curves for a prop.test created using pwr & ggplot2
#========================================================#
# Setup
#========================================================#
library(dplyr)
library(ggplot2)
library(here)
library(pwr)
library(scales)
library(stringr)