Skip to content

Instantly share code, notes, and snippets.

View matt-dray's full-sized avatar
®️

Matt Dray matt-dray

®️
View GitHub Profile
@matt-dray
matt-dray / count-dupe-rows.R
Created August 22, 2023 19:08
Count the number of times that dulicate records appear in a dataframe
x <- data.frame(
col_a = c("A", "B", "A", "C", "A", "D", "B", "C"),
col_b = c(1, 2, 1, 3, 9, 4, 2, 9)
)
x
# col_a col_b
# 1 A 1
# 2 B 2
@matt-dray
matt-dray / get-categories.R
Last active August 21, 2023 07:53
Get all the categories from Quarto blog posts (between 'categories: ' and second instance of '---')
posts <-
list.files("posts", pattern = ".qmd", recursive = TRUE, full.names = TRUE)
get_categories <- function(post_path, ignore_rx = "resources") {
post_lines <- readLines(post_path, warn = FALSE)
cats_start <- which(post_lines == "categories:") + 1
cats_end <- which(post_lines == "---")[2] - 1
@matt-dray
matt-dray / bd2q-redirects.R
Last active August 26, 2023 18:23
Create a redirects file from {blogdown}-style URL paths to Quarto-style (for rostrum.blog)
redirect_to <- paste0("/", list.dirs("posts", recursive = FALSE))
date_rx <- "\\d{4}-\\d{2}-\\d{2}"
date_portion <- regexpr(date_rx, redirect_to) |>
regmatches(redirect_to, m = _) |>
gsub("-", "/", x = _)
name_portion <- gsub(paste0("posts/", date_rx, "-"), "", redirect_to)
@matt-dray
matt-dray / q2bd-blog-post-dirs.R
Last active August 16, 2023 21:25
Rearranging post directory structure for a Quarto blog to mimic the URL path of a {blogdown} blog (for rostrum.blog)
paths <- list.dirs("posts", recursive = FALSE)
for (i in paths) {
from_dir <- i
date_rx <- "\\d{4}-\\d{2}-\\d{2}"
dates <- regexpr(date_rx, basename(i)) |>
regmatches(basename(i), m = _) |>
@matt-dray
matt-dray / extract-html-data-leaflet.R
Last active August 16, 2023 21:25
Using R to extract data out of some HTML code for a leaflet map (needed for a blogdown to Quarto blog conversion for rostrum.blog)
x <- readLines("~/Desktop/leaflet-map.txt")
popup_html <- stringr::str_split_1(x, "\",\"")
lmb_simple <- tibble::tibble(
status_id = stringr::str_extract(popup_html, "\\d{19}"),
lat = stringr::str_extract(popup_html, "(?<=📍 )5\\d{1}\\.\\d{0,4}(?=, )"),
lon = stringr::str_extract(popup_html, "(?<=\\d, )(-)?\\d\\.\\d{0,4}(?=<br>📮)"),
osm_url = glue::glue("https://www.openstreetmap.org/#map=17/{lat}/{lon}/"),
media_url = stringr::str_extract(popup_html, "(?<=img src=\\')http://pbs\\.twimg\\.com/media/.*\\.jpg(?=\\' width)")
@matt-dray
matt-dray / berthas-sadness.R
Created August 1, 2023 13:13
A stunning dual-y-axis chart to commemorate the reduction of people in the office called Bertha (not actual name) and the ensuing sadness.
library(ggplot2)
library(ggthemes)
library(extrafont)
font_import() # might take a minute
loadfonts(device = "win")
df <- data.frame(
Time = 1:3,
Berthas = 2:0,
Sadness = 0:2
@matt-dray
matt-dray / remove-massive-file-from-git.sh
Created July 21, 2023 20:00
Accidentally added a big file, Git refused to push, added a commit to remove the file, but it was still in the history, so needed to revert to prior commit, undo it but keep everything staged, then use git rm to remove the file, then re-add/commit push
git log --oneline
git reset dc871db
git log --oneline
git reset --soft HEAD~;
git status
git rm posts/2020-05-16-postcode-pandemonium/Data/NSPL21_FEB_2023_UK.csv
git status
git add .
git commit -m "Fix back to 2020-05-16 postcodes, rename folder dates, correct punctuation post, rebuild site"
git push origin md-fix-latest
@matt-dray
matt-dray / abominable-namespacing.R
Created June 14, 2023 08:15
Everything's a function in R, including namespacing
`::`(dplyr, `%>%`)(mtcars, `::`(dplyr, select)(., cyl))
# Challenge for the reader: isn't the bracket a function too...?
@matt-dray
matt-dray / extract-word-tables.R
Last active June 9, 2023 16:31
Extract tables from Word files that are in different subfolders, then combine them.
# Extract and combine tables from multiple Word files
# This script creates some dummy docx files in temporary subfolders to mimic a
# user's filesystem. It then uses docxtractr::read_docx() to extract all the
# tables, and combines them with rbind().
# A follow-up to my blogpost:
# https://www.rostrum.blog/2023/06/07/rectangular-officer/
# Attach packages (all are available from CRAN)
@matt-dray
matt-dray / rectangularise-tables.R
Last active July 10, 2023 09:07
{officer} is an R package that lets you extract elements of a Word document, including tables, into a tidy dataframe. I've written a function to 're-rectangularise' an extracted Word table into an R dataframe.
# Functions building on the {officer} package:
# https://davidgohel.github.io/officer/
# You can read more about these functions in a blog post:
# https://www.rostrum.blog/2023/06/07/rectangular-officer/
# There are other solutions. You can also try {docxtractr} by Bob Rudis
# (on CRAN), which doesn't depend on {officer}, or {officerExtras} by Eli
# Pousson (on GitHub).