Skip to content

Instantly share code, notes, and snippets.

View davidtedfordholt's full-sized avatar

David Tedford Holt davidtedfordholt

View GitHub Profile
@davidtedfordholt
davidtedfordholt / obj_size.R
Created May 29, 2024 12:27
print object dimensions and size in R
obj_size <- function(obj, units = 'Mb') {
dimensions <- dim(obj)
data.frame(
rows = format(dimensions[1], big.mark = ","),
columns = format(dimensions[2], big.mark = ","),
size = format(object.size(obj), units = units, big.mark = ",")
)
}
@davidtedfordholt
davidtedfordholt / get_BQ_project_table_sizes.SQL
Last active May 15, 2024 19:17
get BigQuery table sizes
DECLARE dataset_names ARRAY<STRING>;
DECLARE batch ARRAY<STRING>;
DECLARE batch_size INT64 DEFAULT 25;
CREATE TEMP TABLE results (
project_id STRING,
dataset_id STRING,
last_modified DATE,
table_id STRING,
row_count INT64,
@davidtedfordholt
davidtedfordholt / gg_snippets.R
Created September 14, 2021 14:33
gg_snippets
snippet ggstuff
labs(title = "${1:title}",
subtitle = "${2:subtitle}",
x = "${3:xlab}",
y = "${4:ylab}") +
theme_minimal()
snippet gglabs
labs(title = "${1:title}",
x = "${2:xlab}",
@davidtedfordholt
davidtedfordholt / sample_groups.R
Created July 30, 2021 14:03
A function to allow sampling of groups in dataframes, tibbles and tsibbles, in order to create easy train and test sets on time series data
#' Sample groups randomly in a grouped data frame
#'
#' @param .data dataframe
#' @param ... names of key variables to define groups. If unspecified, dataframe must be grouped.
#' @param n if integer, number of unique groups to keep. If between 1 and 0, proportion of unique groups to keep.
#'
#' @return tibble of all rows for sampled groups
#' @export
#'
#' @examples
@davidtedfordholt
davidtedfordholt / scale_between.R
Created June 17, 2021 13:12
small R function to rescale variables
#' Scale a Variable Between a Given Min and Max
#'
#' @param x values to be scaled
#' @param new_min new minimum value
#' @param new_max new maximum value
#'
#' @return an object of the same class as `x`
#' @export
#'
#' @examples
@davidtedfordholt
davidtedfordholt / symlog_trans.R
Last active May 31, 2022 18:14
A `scales`/`ggplot2` implementation of the `symlog` transformation
#' symlog transformation
#'
#' `symlog_trans()` transforms data using `log(x)` for `abs(x) > thr`, where
#' `thr` is a tuneable threshold, but leaves the data linear for `abs(x) < thr`.
#' (credit for base code to https://stackoverflow.com/users/1320535/julius-vainora)
#'
#'
#' @param base base of logarithm
#' @param thr numeric threshold for transitioning from log to linear
#' @param scale numeric scaling factor for data
@davidtedfordholt
davidtedfordholt / rand_within_p.sql
Created October 29, 2020 14:11
BigQuery - "random" number within a specified percentage of a number
-- implementation
WITH input AS (
SELECT *
FROM UNNEST([100,100,100,100,100,100,100,100,100,100]) x
JOIN UNNEST([.4,.4,.4,.4,.4,.4,.4,.4,.4,.4]) p
)
SELECT x * (1 + (-p + (RAND() * (2 * p)))) AS rand_within_p_percent_of_x
FROM input;
-- verification of concept
@davidtedfordholt
davidtedfordholt / source_R_dir.R
Created September 8, 2020 17:29
source everything in R/
sapply(list.files("R/", full.names = T), source)
@davidtedfordholt
davidtedfordholt / copy_to_gcp.R
Last active May 19, 2020 20:37
copy dataset to CSV in google storage using cloudml
copy_to_gcp <- function(.data, filename_prefix, gcp_folder = NULL, gcp_bucket) {
if (!startsWith(gcp_bucket, "gs://")) {
gcp_bucket <- paste0("gs://", gcp_bucket)
}
source <- paste0(filename_prefix, "_", format(Sys.Date(), "%Y%m%d"), ".csv")
destination <- ifelse(is.null(gcp_folder), gcp_bucket, paste0(gcp_bucket, "/", gcp_folder))
packageStartupMessage("Writing file")
write.csv(.data, source)
@davidtedfordholt
davidtedfordholt / Dockerfile
Created April 29, 2020 17:09
Dockerfile for davidtedfordholt/plumber-verse
## Plumber-verse Base Image #####################
FROM rocker/verse
RUN apt-get update -qq && apt-get install -y \
git-core \
libssl-dev \
libcurl4-gnutls-dev
## Install Plumber