Skip to content

Instantly share code, notes, and snippets.

@thisisnic
thisisnic / print_tz.R
Created September 12, 2023 07:32
Detect legacy timezone symlinks in code (2023c-8 not 2023c-10)
# Detect invalid timezones
all_names <- tzdb::tzdb_names()
bad_names <- c(
all_names[startsWith(all_names, "US/")],
all_names[!stringr::str_detect(all_names, "/")]
)
all_names[map_lgl(all_names, ~!.x %in% bad_names)]
@thisisnic
thisisnic / stream_to_feather_in_r.md
Created July 28, 2023 07:53
Stream to Arrow/Feather in R
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
@thisisnic
thisisnic / stream_to_parquet_in_r.md
Created July 27, 2023 21:55
Reprex showing how to stream data into a Parquet file in R
library(arrow)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
@thisisnic
thisisnic / blanks.R
Last active February 9, 2023 10:17
Example for user of how a combination of blank values may result in error reading a CSV
``` r
library(arrow)
library(dplyr)
library(stringr)
tf <- tempfile()
# values to save - note the space after the final new line
dodgy_vals <- "x,y\n0,1\n ,4"
cat(dodgy_vals)
@thisisnic
thisisnic / gist:14fb9c1001261f2cf249f9317cda6466
Last active September 8, 2022 15:14
lazy_query from dbplyr
# query details copied from https://github.com/voltrondata-labs/arrowbench/blob/main/R/tpch-queries.R
query_results <- lineitem_db %>%
select(l_shipdate, l_returnflag, l_linestatus, l_quantity,
l_extendedprice, l_discount, l_tax) %>%
# kludge, should be: filter(l_shipdate <= "1998-12-01" - interval x day) %>%
# where x is between 60 and 120, 90 is the only one that will validate.
filter(l_shipdate <= as.Date("1998-09-02")) %>%
select(l_returnflag, l_linestatus, l_quantity, l_extendedprice, l_discount, l_tax) %>%
group_by(l_returnflag, l_linestatus) %>%
summarise(
---
title: "Apache Arrow R Questions on Stack Overflow"
format: html
---
```{r}
#| label: load-packages-and-code
#| include: false
library(httr)
library(dplyr)
@thisisnic
thisisnic / pre-commit
Created August 26, 2021 17:03
pre-commit file which runs styler on everything
#!/bin/bash
set -e
SOURCE_DIR='<path_to_project_root_goes_here>'
# Find all .R files which have been staged via git add
FILES_TO_STYLE=$(git diff --name-only --staged | grep "\.R")
for FILE in ${FILES_TO_STYLE[@]}
do
@thisisnic
thisisnic / git.txt
Last active April 12, 2021 08:35
git rebase when you have a PR open on my-branch but the upstream/master branch has new commits
# checkout the branch you're working on
git checkout my-branch
# fetch all branches from the upstream repo
git fetch upstream
# rebase your changes on top of the upstream master branch
git rebase upstream/master
# force push your changes to your branch
@thisisnic
thisisnic / instructions.txt
Created November 25, 2020 08:43
Saving Coursera Assignments and Supporting Files in Bulk
Instructions condensed from https://stackoverflow.com/questions/62613253/downloading-all-jupyter-notebooks-from-coursera-tar-size-exeeding-100mb
# Click on the 'jupyter' logo in any notebook for that course to be taken to the working directory that jupyter is running on.
# Click "new" (on the right) and then "terminal" and run the following commands
# navigate to parent directory
cd ..
# compress all your notebooks and other associated files
@thisisnic
thisisnic / orderInput.R
Created October 8, 2019 23:11
orderInput stuff
library(shinyjqui)
library(shiny)
show_columns <- c('column a','column b','column c','column d','column e','column f','column g','column h','column i','column j','column k','column l','column m','column n','column o','column p')
hidden_columns <- c('column q','column r','column s','column t','column u','column v','column w','column x','column y','column z')
ui <- fluidPage(
column(
width = 12,