Skip to content

Instantly share code, notes, and snippets.

View tomhopper's full-sized avatar

Tom Hopper tomhopper

  • Michigan, United States
View GitHub Profile
@tomhopper
tomhopper / align_common_baseline.R
Last active November 5, 2016 17:47
Examples of aligning against a common baseline, using Cleveland-style dot plots
# Response to a post at Storytelling with Data:
# \url{http://www.storytellingwithdata.com/blog/orytellingwithdata.com/2015/07/align-against-common-baseline.html}
# Demonstrates
# * Cleveland-style dot plots (improvement over pie and bar charts)
# * Sorting categorical data by a numerical variable with more than one grouping variable
# * Highlighting differences between groups graphically
library(ggplot2)
library(scales)
@tomhopper
tomhopper / sort_factors.R
Created June 16, 2015 19:29
Several methods of sorting a factor in a data frame by a numeric variable so that it plots in ascending (or descending) order using ggplot2
#' @title Sorting data frames factor levels for ggplot2
#' @description Sorting a factor variable by a numeric variable.
#' In one case, each factor level is matched to one numeric value.
#' In the other case, each factor level is repeated across a second
#' grouping factor variable, and we want to sort only the
library(dplyr)
library(tidyr)
library(ggplot2)
# Sort a factor by variable by a numeric variable
@tomhopper
tomhopper / .Rprofile
Last active June 13, 2019 19:08
Rprofile file
## For original file showing use of .env to add functions invisibly, see
## \link{http://gettinggeneticsdone.blogspot.com/2013/06/customize-rprofile.html}
## Load packages
#library(BiocInstaller)
## Don't show those silly significanct stars
#options(show.signif.stars=FALSE)
## Do you want to automatically convert strings to factor variables in a data.frame?
# Create 2 replicates of 5 "words" generated from random characters,
# each "word" 5 - 15 characters long, with word length following a
# poisson distribution.
rep(replicate(5, paste(sample(letters, round(rpois(5000, lambda = 3)+5, 0), replace = FALSE), collapse = "")), 2)
# Sample output:
# [1] "rfexnwyjst" "vwtadhjnly" "ztfgvldo" "tmerol" "mcqhosap" "rfexnwyjst" "vwtadhjnly" "ztfgvldo" "tmerol"
#[10] "mcqhosap"
@tomhopper
tomhopper / ggplot2_xkcd_Humor_Sans.R
Created March 18, 2015 15:58
Use the font Humor Sans instead of font xkcd with theme_xkcd()
# The xkcd font used by the package xkcd (which provides a theme for ggplot2)
# is missing many characters and some characters don't seem to display correctly.
# An alternate xkcd-style font is Humor Sans, available free from
# \url{http://antiyawn.com/uploads/humorsans.html}
# The code below forces the use of Humor Sans instead of xkcd.
# The xkcd and ggplot2 packages are available from CRAN.
library(ggplot2)
library(xkcd)
@tomhopper
tomhopper / find_and_delete.sh
Last active November 10, 2022 02:55
Use the Mac OS X terminal (UNIX command line) to find and delete all files matching a pattern
find . -name '.filename' -print -exec rm -r {} \;
# . = in current directory
# -name = file name to find
# -print = print the result's full file name to standard output
# -exec = execute the following command
# {} = fill in with the result of standard output
# \; = semicolon to terminate the -exec command, and the escape
# character so that the terminal doesn't treat the semicolon as a
# return character (used for stringing together multiple commands).
@tomhopper
tomhopper / rnorm.r
Last active August 18, 2023 03:25
Functions to create normally distributed data between two values minimum and maximum. One function pegs the minimum and maximum; the other uses a 99.7% tolerance interval.
#' @title Returns a normally distributed vector within the 99.7% tolerance interval defined by minimum and maximum
#' @param n (required) The number of random numbers to generate
#' @param minimum (optional) The lower 99.9% tolerance limit
#' @param maximum (optional) The upper 99.9% tolerance limit
#' @return numeric vector with n elements randomly distributed so that approximately 99.7% of values will fall between minimum and maximum
#' @examples
#' rnorm.within(10)
#' rnorm.within(10, 10, 20)
#' summary(rnorm.within(10000, 10, 20))
rnorm.within <- function(n, minimum=0, maximum=1)
@tomhopper
tomhopper / facet_labelling.R
Last active August 29, 2015 14:06
Custom labels for ggplot2 facets.
#' Data frame column names are rarely human-readable, concise and clear, but are usually meaningful. Rather
#' than trying to modify the data, we can provide custom labels for facets.
library(data.table)
library(lubridate)
library(reshape2)
library(ggplot2)
#' Download raw data from "Weather Data" at \link{http://datamonitoring.marec.gvsu.edu/DataDownload.aspx},
#' rename the file to "Marec_weather.csv" and save it to /data/ in the current working directory.
@tomhopper
tomhopper / plot_aligned_series.R
Last active June 25, 2023 17:36
Align multiple ggplot2 graphs with a common x axis and different y axes, each with different y-axis labels.
#' When plotting multiple data series that share a common x axis but different y axes,
#' we can just plot each graph separately. This suffers from the drawback that the shared axis will typically
#' not align across graphs due to different plot margins.
#' One easy solution is to reshape2::melt() the data and use ggplot2's facet_grid() mapping. However, there is
#' no way to label individual y axes.
#' facet_grid() and facet_wrap() were designed to plot small multiples, where both x- and y-axis ranges are
#' shared acros all plots in the facetting. While the facet_ calls allow us to use different scales with
#' the \code{scales = "free"} argument, they should not be used this way.
#' A more robust approach is to the grid package grid.draw(), rbind() and ggplotGrob() to create a grid of
#' individual plots where the plot axes are properly aligned within the grid.
@tomhopper
tomhopper / dt_merge_nodups.R
Last active February 10, 2017 02:58
Merge two data.tables and eliminate duplicated rows
library(data.table)
# See \link{http://stackoverflow.com/questions/11792527/filtering-out-duplicated-non-unique-rows-in-data-table}
# for a discussion of how to eliminate duplicate rows.
# The problem is that the \code{unique()} function will use a key, if it exists. We need to
# eliminate the key.
# Create one column of data
temp1 <- data.table(sample(letters,size = 15, replace = FALSE))
temp2 <- data.table(sample(letters,size = 15, replace = FALSE))