Skip to content

Instantly share code, notes, and snippets.

View jdblischak's full-sized avatar

John Blischak jdblischak

View GitHub Profile
@nanxstats
nanxstats / simtrial-10k.tsv
Last active November 16, 2023 06:19
simtrial backend benchmark sketch
n 1 2 4 8 16
dplyr 5093.77 2671.44 1447.21 810.42 446.06
data.table 1336.79 677.94 364.5 217.75 143.95
@nanxstats
nanxstats / gsDesign-gource.sh
Created October 28, 2022 02:13
Shell commands to generate version control visualization video for gsDesign using gource
# Clone repo
git clone https://github.com/keaven/gsDesign.git
cd gsDesign
# Run gource - this will generate a 411GB ppm file
gource -3840x2160 --seconds-per-day 0.1 --auto-skip-seconds 0.01 --file-idle-time 0 --font-size 34 --key --logo man/figures/logo.png -o gsDesign.ppm
# Convert ppm to mp4
ffmpeg -y -r 60 -f image2pipe -vcodec ppm -i gsDesign.ppm -vcodec libx264 -preset medium -pix_fmt yuv420p -crf 1 -threads 0 -bf 0 gsDesign.mp4
# Merge audio to video
ffmpeg -i gsDesign.mp4 -i music.mp3 -c:v copy -c:a aac output.mp4
# Recommended by YouTube
data(diamonds, package = "ggplot2")
# Most straightforward
diamonds$ppc <- diamonds$price / diamonds$carat
# Avoid repeating diamonds
diamonds$ppc <- with(diamonds, price / carat)
# The inspiration for dplyr's mutate
diamonds <- transform(diamonds, ppc = price / carat)
@jokergoo
jokergoo / r_pkg_downloads.R
Created October 2, 2022 15:07
Number of downloads from CRAN/Bioc/conda
library(rvest)
library(jsonlite)
downloads_from_conda = function(pkg) {
x = read_html(paste0("https://anaconda.org/search?q=r-", pkg))
tb = html_nodes(x, "table") %>% html_table()
if(length(tb) > 0) {
tb = tb[[1]]
sum(tb[, 2])
} else {
0
@wviechtb
wviechtb / output.txt
Created May 4, 2022 09:23
Benchmark comparison of for-loops versus apply()/sapply() in 3 different versions of R (2.5.0, 3.0.0, 4.2.0)
> ############################################################################
>
> # A comparison of for-loops versus apply() and sapply() for 1) computing the
> # row means in a matrix and 2) for computing the means of all elements in a
> # list. For task 1), we can also examine the performance of rowMeans() as a
> # specialized / vectorized function and for task 2), we can also compare
> # sapply() with vapply() (note: vapply() was added in version R-2.12.0). Also,
> # for the for-loop, we can examine what the impact is of pre-allocating the
> # vector in which to store the results versus 'growing' the vector in each
> # iteration.
@ernstki
ernstki / getsnpcoords
Last active April 18, 2022 19:24
Fetch SNP coordinates from the UCSC MySQL server
#!/usr/bin/env bash
#
# Script to query UCSC MySQL server for SNP coordinates (but could easily be
# repurposed to query any arbitrary database/table)
#
# Author: Kevin Ernst
# Date: 2 March 2019; updated 30 August 2021
# Source: https://gist.github.com/ernstki/91b427d6714cdd4dd6560e5b4fb961f4
# License: MIT
#
@smithjd
smithjd / Extending_Trackdown.md
Last active May 29, 2024 17:10
Using the R trackdown package for multiple pages and multiple authors

OVERVIEW

I have found that the trackdown package is incredibly useful for collaboration with non-R users. It's design suggests that the main use case was a group of researchers all working on one paper -- a big .Rmd file. The package documentation has a very clear workflow description.

I've put some wrappers and additional code around the package to make working with a couple dozen Distill pages.

This set of functions is handy for synchronizing more than a dozen .Rmd files by simplifying the following:

@davebraze
davebraze / emacs-data-work.md
Last active March 26, 2024 21:49
On using Emacs for data work with R

I use GNU Emacs on MS Windows 11, specifically, the pre-packaged pre-compiled distributions for Windows provided by Vince Goulet (https://vigou3.gitlab.io/emacs-modified-windows/). He also provides a bundle for MacOS (https://vigou3.gitlab.io/emacs-modified-macos/). I have used, and occassionally still use, Emacs on a variety of different unixen. I believe most of what follows will apply to any GNU Emacs distribution or derivative on any platform, but of course, YMMV.

By way of background, I've been using Emacs since the late 80s as an IDE for various programming languages (e.g., pascal, C, lisp, matlab, python), and as a general text editor. I've also got a lot of mileage out of it's features for calendaring, scheduling, note-taking, and agenda making. So, when I started using R around 2001, it was natural to do my R scripting and programming in Emacs (using its ESS package, which I'd already been using with SAS since the early 90s). When RStudio came out in about 2011, I did give it a look, but it was

# R port of Dmitry Kobak's excellent PCA animations.
# See https://gist.github.com/anonymous/7d888663c6ec679ea65428715b99bfdd
# for matlab/octave code
dir.create("gif", showWarnings = FALSE)
set.seed(42)
X <- matrix(rnorm(200), ncol = 2)
X <- X %*% chol(matrix(c(1, 0.6, 0.6, 0.6), ncol = 2, byrow = TRUE))
X <- apply(X, 2, function(col) col - mean(col))
@nanxstats
nanxstats / release_checklist_nanxstats.md
Last active June 4, 2024 14:42
Nan's simple R package release checklist

First release

  • Proof read Title: and Description: and ensure they are informative
  • Check that all exported functions have @returns and @examples
  • Check that Authors@R: includes a copyright holder (role 'cph')
  • Review extrachecks
  • usethis::use_cran_comments() (optional)

Prepare for release