Skip to content

Instantly share code, notes, and snippets.

Avatar

Benjamin Wolfe BenjaminWolfe

View GitHub Profile
@BenjaminWolfe
BenjaminWolfe / curl.R
Last active Apr 3, 2022
Create Pages in Notion with R
View curl.R
# See `curl.md`. Here is the R equivalent.
library(httr)
library(jsonlite)
make_page <- function(url, headers, data) {
response <- POST(
url = url,
body = toJSON(data, auto_unbox = TRUE),
config = add_headers(.headers = headers)
@BenjaminWolfe
BenjaminWolfe / np-where-multiple-columns.py
Created Apr 28, 2021
How do I use np.where with multiple columns at once?
View np-where-multiple-columns.py
# how to use np.where with multiple columns at once, even whole data frames?
import numpy as np
import pandas as pd
np.random.seed(42)
df1 = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD"))
df2 = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list("ABCD"))
condition = pd.Series(np.random.choice(a=[False, True], size=100, p=[.1, .9]))
@BenjaminWolfe
BenjaminWolfe / test_chaining_series.py
Last active Apr 19, 2021
This is a nice test case to learn to use %timeit, as well as np.random.seed. I needed to do something like df.query, but with a series. Turns out that in at least simple cases it can be easy and fast with .loc and a lambda function.
View test_chaining_series.py
import numpy as np, pandas as pd
df_len = 1000 # integer multiple of 4
np.random.seed(42)
# create a random data frame
df = pd.DataFrame(
{
"group_a": np.random.randint(0, df_len / 4, size=df_len),
"group_b": np.random.randint(0, df_len / 4, size=df_len),
View multiple-git-credentials.md

Say you want to access GitLab.com for work (using your work email address), and you also want to access it for personal reasons (using your personal email address), and you want to do both on the same laptop.

Generate two SSH keys and save them in your ~/.ssh directory. I'll call them work_key and personal_key. As always with SSH keys in GitHub or GitLab, you'll want to generate a public key for each and add it to the corresponding GitLab account.

Then point to both private keys in your ~/.ssh/config file:

@BenjaminWolfe
BenjaminWolfe / nested_got_data.R
Last active Aug 31, 2020
Nested Game of Thrones Data - Snowflake + R
View nested_got_data.R
library(tidyverse)
library(repurrrsive) # game of thromes dataset
library(listviewer) # jsonedit for pretty viewing
jsonedit(got_chars, mode = "view")
really_nested <- tibble(id = 1:30, nested_stuff = got_chars)
really_nested %>%
hoist(
View symlinks.sh
@BenjaminWolfe
BenjaminWolfe / group-and-fill.R
Last active Aug 11, 2020
grouping, ordering, and filling in Python and R
View group-and-fill.R
# code to be compared to group-and-fill.py
# task: fill specific columns down, within each group, ordering by the order.
library(tidyverse)
df <- tribble(
~group, ~order, ~attribute_1, ~attribute_2, ~irrelevant,
"a", 0, 1, 1, "hello",
"a", 2, NA, 3, NA , # this one out of ordr
"a", 1, 6, NA, "world",
"b", 0, 2, 7, "foo" ,
View collect-iteratively.R
collect_iteratively <- function(x, size = 500, timeout = 3600) {
start_time <- Sys.time()
message("pulling ", size, " records at a time...")
message("starting at ", start_time)
message("will time out at ", start_time + timeout, " (", timeout, "s later)")
con <- x$src$con
sql <- dbplyr::db_sql_render(con, x)
res <- DBI::dbSendQuery(con, sql)
View coffee-cooling-homework.R
# my 15yo's homework today during lockdown
# find an equation for the cooling of a cup of coffee
# with the following temperature measurements
library(tidyverse)
decay_tbl <- tribble(
~time, ~temperature,
0, 179.5,
5, 168.7,
View slack-random-draw.R
library(tidyverse)
library(slackteams)
library(slackr)
library(lubridate)
library(here)
library(glue)
library(conflicted)
library(fs)
conflict_prefer("filter", "dplyr", "stats" )