Skip to content

Instantly share code, notes, and snippets.

View jhollist's full-sized avatar

Jeffrey W Hollister jhollist

View GitHub Profile
@benmarwick
benmarwick / super-and-sub-script-labels.R
Last active April 11, 2024 23:09
ggplot axis labels with superscript and subscript
library(tidyverse)
# using expression() for the text formatting:
ggplot(mtcars,
aes(disp,
mpg)) +
geom_point() +
# ~ for spaces, and * for no-space between (unquoted) expressions
ylab(expression(Anthropogenic~SO[4]^{"2-"}~(ngm^-3))) +
xlab(expression(italic(delta)^13*C[ap]*"‰")) +
#!/usr/bin/env Rscript
json_in <- file('stdin', 'r')
lat_newp <- '{"t":"RawBlock","c":["latex","\\\\newpage"]}'
doc_newp <- '{"t":"RawBlock","c":["openxml","<w:p><w:r><w:br w:type=\\"page\\"/></w:r></w:p>"]}'
ast <- paste(readLines(json_in, warn=FALSE), collapse="\n")
@fawda123
fawda123 / gist:5ecb73e1304e7faee83eb05b922937e7
Created August 31, 2017 20:09
save git log to csv with header
echo sha, contributor, date, message > log.csv
git log --date=local --pretty=format:'%h, %an, %ad, "%s"' >> log.csv
@Myfanwy
Myfanwy / timer.R
Created July 28, 2017 04:31
Timer function for R
# Depends on the `beepr` package by @rasmusab: https://cran.r-project.org/web/packages/beepr/README.html
# Code greatly improved by flodel on Stack Exchange Code Review (thanks flodel)
# Planned improvements/expansions: new end-of-timer sounds, add functionality for starting a separate R session so that it doesn't bogart the current one.
timer <- function(interval, units = c("secs", "mins", "hours", "days", "weeks")) {
units <- match.arg(units)
num_sec <- interval * switch(units, secs = 1, mins = 60, hours = 3600,
days = 86400, weeks = 604800)
Sys.sleep(num_sec)
if (require(beepr)) beep(2) else message("TIMER DONE YO!")
@jhollist
jhollist / get_nars.R
Last active January 26, 2017 18:12
R script to scrape US EPA NARS data and metadata
#Script to scrape NARS website
library(rvest)
library(dplyr)
nars <- read_html("https://www.epa.gov/national-aquatic-resource-surveys/data-national-aquatic-resource-surveys")
links <- nars %>%
html_nodes(".file-link, a") %>%
html_attr("href") %>%
tbl_df() %>%
filter(grepl("files",value)) %>%
@dill
dill / irltoots.R
Last active January 5, 2018 10:45
Make a grid of twitter folks I've met IRL
# you need my horse library from here: https://github.com/dill/horse
library(horse)
library(magrittr)
library(knitr)
setup_twitter_oauth("you auth stuff",
"goes here",
"you can't use mine",
"it's mine")

Playing with adding + operator for easily joining geojson objects together into a single valid geojson object. This is relatively easy with lists, but not so easy with json unless you speak json.

Get lawn package too for viewing data

devtools::install_github("ropensci/geojsonio")
@hadley
hadley / ds-training.md
Created March 13, 2015 18:49
My advise on what you need to do to become a data scientist...

If you were to give recommendations to your "little brother/sister" on things that they need to do to become a data scientist, what would those things be?

I think the "Data Science Venn Diagram" (http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram) is a great place to start. You need three things to be a good data scientist:

  • Statistical knowledge
  • Programming/hacking skills
  • Domain expertise

Statistical knowledge

@aammd
aammd / fill_down.R
Last active August 29, 2015 14:16
An R function I wrote in the process of translating a list kept in a .docx file to a proper R dataframe. After moving it from .docx to .txt via pandoc, I needed to turn section headers into levels of a grouping factor
#' convert positional information to two columns
#'
#' Sometimes text is organized by position. This function
#' turns positional group labels (e.g headers ) into the levels of a grouping variable
#' @param x character vector containing group labels followed by group members
#' @param pattern regular expression that identifies the group labels
fill_down <- function(x, pattern){
## find matches of the pattern
x <- as.character(x)
value_matches <- grepl(pattern = pattern, x = x)
@hadley
hadley / advise.md
Created February 13, 2015 21:32
Advise for teaching an R workshop

I think the two most important messages that people can get from a short course are:

a) the material is important and worthwhile to learn (even if it's challenging), and b) it's possible to learn it!

For those reasons, I usually start by diving as quickly as possible into visualisation. I think it's a bad idea to start by explicitly teaching programming concepts (like data structures), because the pay off isn't obvious. If you start with visualisation, the pay off is really obvious and people are more motivated to push past any initial teething problems. In stat405, I used to start with some very basic templates that got people up and running with scatterplots and histograms - they wouldn't necessary understand the code, but they'd know which bits could be varied for different effects.

Apart from visualisation, I think the two most important topics to cover are tidy data (i.e. http://www.jstatsoft.org/v59/i10/ + tidyr) and data manipulation (dplyr). These are both important for when people go off and apply