Skip to content

Instantly share code, notes, and snippets.

View BenjaminWolfe's full-sized avatar

Benjamin Wolfe BenjaminWolfe

View GitHub Profile
@BenjaminWolfe
BenjaminWolfe / panel-gridlines-color-vector.R
Last active August 27, 2019 14:18
If you specify a vector of gridline colors, ggplot2 cycles through them
library(tidyverse) # dplyr, stringr, ggplot2
library(lubridate)
#' The `sunspot.month` dataset as a tibble:
sunspots_reshaped <-
tibble(
sunspot_month = sunspot.month %>%
start() %>%
c(1) %>%
paste(collapse = "-") %>%
#' Eureka!
#'
#' Running `devtools::check()` on my Windows laptop,
#' I kept getting the following warning:
#'
#' ```
#' 'qpdf' is needed for checks on size reduction of PDFs
#' ```
#'
#' I downloaded the latest version of `qpdf` from SourceForge,
library(tidyverse)
library(slackteams)
library(slackr)
library(lubridate)
library(here)
library(glue)
library(conflicted)
library(fs)
conflict_prefer("filter", "dplyr", "stats" )
# my 15yo's homework today during lockdown
# find an equation for the cooling of a cup of coffee
# with the following temperature measurements
library(tidyverse)
decay_tbl <- tribble(
~time, ~temperature,
0, 179.5,
5, 168.7,
@BenjaminWolfe
BenjaminWolfe / group-and-fill.R
Last active August 11, 2020 16:18
grouping, ordering, and filling in Python and R
# code to be compared to group-and-fill.py
# task: fill specific columns down, within each group, ordering by the order.
library(tidyverse)
df <- tribble(
~group, ~order, ~attribute_1, ~attribute_2, ~irrelevant,
"a", 0, 1, 1, "hello",
"a", 2, NA, 3, NA , # this one out of ordr
"a", 1, 6, NA, "world",
"b", 0, 2, 7, "foo" ,
@BenjaminWolfe
BenjaminWolfe / nested_got_data.R
Last active August 31, 2020 17:30
Nested Game of Thrones Data - Snowflake + R
library(tidyverse)
library(repurrrsive) # game of thromes dataset
library(listviewer) # jsonedit for pretty viewing
jsonedit(got_chars, mode = "view")
really_nested <- tibble(id = 1:30, nested_stuff = got_chars)
really_nested %>%
hoist(

Say you want to access GitLab.com for work (using your work email address), and you also want to access it for personal reasons (using your personal email address), and you want to do both on the same laptop.

Generate two SSH keys and save them in your ~/.ssh directory. I'll call them work_key and personal_key. As always with SSH keys in GitHub or GitLab, you'll want to generate a public key for each and add it to the corresponding GitLab account.

Then point to both private keys in your ~/.ssh/config file:

@BenjaminWolfe
BenjaminWolfe / test_chaining_series.py
Last active April 19, 2021 17:50
This is a nice test case to learn to use %timeit, as well as np.random.seed. I needed to do something like df.query, but with a series. Turns out that in at least simple cases it can be easy and fast with .loc and a lambda function.
import numpy as np, pandas as pd
df_len = 1000 # integer multiple of 4
np.random.seed(42)
# create a random data frame
df = pd.DataFrame(
{
"group_a": np.random.randint(0, df_len / 4, size=df_len),
"group_b": np.random.randint(0, df_len / 4, size=df_len),
@BenjaminWolfe
BenjaminWolfe / np-where-multiple-columns.py
Created April 28, 2021 02:59
How do I use np.where with multiple columns at once?
# how to use np.where with multiple columns at once, even whole data frames?
import numpy as np
import pandas as pd
np.random.seed(42)
df1 = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD"))
df2 = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD"))
condition = pd.Series(np.random.choice(a=[False, True], size=100, p=[.1, .9]))