Skip to content

Instantly share code, notes, and snippets.

View tomhopper's full-sized avatar

Tom Hopper tomhopper

  • Michigan, United States
View GitHub Profile
@tomhopper
tomhopper / model_fit_stats.R
Created June 19, 2017 23:39
Accepts one or more lm objects and returns a single data frame containing the fit statistics R-squared, adjusted R-squared, predictive R-squared and PRESS for each model.
#' Model Fit Statistics
#' @description Returns lm model fit statistics R-squared, adjusted R-squared,
#' predicted R-squared and PRESS.
#' Thanks to John Mount for his 6-June-2014 blog post, R style tip: prefer functions that return data frames" for
#' the idea \url{http://www.win-vector.com/blog/2014/06/r-style-tip-prefer-functions-that-return-data-frames}
#' @param ... One or more \code{lm()} models.
#' @return A data frame with rows for R-squared, adjusted R-squared, Predictive R-squared and PRESS statistics, and a column for each model passed to the function.
model_fit_stats <- function(...) {
var_names <- as.character(match.call())[-1]
dots <- list(...)
@tomhopper
tomhopper / dplyr_filter_ungroup.R
Created January 29, 2016 20:31 — forked from jhofman/dplyr_filter_ungroup.R
careful when filtering with many groups in dplyr
library(dplyr)
# create a dummy dataframe with 100,000 groups and 1,000,000 rows
# and partition by group_id
df <- data.frame(group_id=sample(1:1e5, 1e6, replace=T),
val=sample(1:100, 1e6, replace=T)) %>%
group_by(group_id)
# filter rows with a value of 1 naively
system.time(df %>% filter(val == 1))
@tomhopper
tomhopper / Child_Height_Weight_Stats.R
Created October 29, 2017 12:07
Growth chart summary statistics for Hong Kong children, ages 6 to 18, for 1963, 1993, 2005/6
## Download growth chart summary statistics for Hong Kong children, ages 6 to 18, for 1963, 1993, 2005/6
## Data from
## So, Hung-Kwan et al. “Secular Changes in Height, Weight and Body Mass Index in Hong Kong Children.” BMC Public Health 8 (2008): 320. PMC. Web. 29 Oct. 2017.
## Article at \url{https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2572616/}
## PMC Copyright and reuse terms: \url{https://www.ncbi.nlm.nih.gov/pmc/about/copyright/}
## Heights in cm
## Weights in kg
## Libraries ####
library(rvest)
@tomhopper
tomhopper / SOCR_Data_25000_Human_Height_Weight.R
Created October 29, 2017 21:45
SOCR Data - 25,000 Records of Human Heights (in) and Weights (lbs)
## Height and Weight of 18 year olds
## from Hong Kong 1993 Growth Survey data,
## simulated by SOCR from reported summary statistics
## Heights in inches
## Weights in pounds
## Explanation \url{http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights}
## Data \url{http://socr.ucla.edu/docs/resources/SOCR_Data/SOCR_Data_Dinov_020108_HeightsWeights.html}
## Libraries ####
library(rvest) # Web scraping
@tomhopper
tomhopper / Top_Charities_Hurricane_X.R
Created October 29, 2017 21:51
Web scrape and display top charities for hurricane relief.
## Top Charities for Hurricane Harvey Relief
## According to both Charity Navigator and Charity Watch
## Approach:
## Scrape data from Charity Navigator and Charity Watch.
## Merge and display the intersection (common entries) of
## the two data sets.
## ** BROKEN ** As of 2017-10-29, Charity Navigator has changed their page
## and the organization of the table of charities.
## Libraries ####
@tomhopper
tomhopper / data_from_packages.R
Created October 30, 2017 11:56
Programmatically access data provided by packages in R
## Access data provided by packages
## List all available data sets, plus ancillary information ####
the_data <- data()
## Extract just the packages as a data frame ####
the_data_df <- data.frame(the_data$results, stringsAsFactors = FALSE)
## Extract specific data sets ####
specific_data <- get(the_data_df$Item[1])
@tomhopper
tomhopper / central_limit_theorem.R
Created October 30, 2017 14:54
Demonstration of central limit theorem
## Demonstration of central limit theorem
## Based on code in an anonymous comment to the blog post at \url{https://sas-and-r.blogspot.com/2012/01/example-919-demonstrating-central-limit.html}
## Libraries ####
library(nortest)
library(dplyr)
library(ggplot2)
## Data used ####
# right-triangle distribution (peak at 0; minimum at 1)
@tomhopper
tomhopper / getCRANPackages.R
Created February 21, 2018 22:27
Returns a data frame of selected information on all packages on CRAN
#' @description Returns a list of all packages on CRAN
#' @param columns_list A character vector of field names to return from package DESCRIPTION files
#' @return A data frame containing all packages on CRAN
#' @details Function modified from StackOverflow answer at \url{https://stackoverflow.com/a/11561793}.
#' @importFrom magrittr %>%
#' @importFrom tibble as.tibble
#' @importFrom dplyr select_
getCRANPackages <- function(columns_list = c("Package", "Title", "Version", "Date", "Published", "URL")) {
contrib.url(getOption("repos")["CRAN"], "source")
description <- sprintf("%s/web/packages/packages.rds",
@tomhopper
tomhopper / captioning_rmarkdown.RMD
Last active April 29, 2018 13:16
Example of creating captions and automatic numbering in an RMarkdown document.
The _captioner_ library makes this work.
```{r libraries}
library(captioner)
```
Before captioning any tables, we have to set up the table captions and the numbering using `captioner::captioner()`. _captioner_ numbers tables in the order they appear in this code block, so _tab_curve_ will be table 1, and _tab_comp_ will be table 2, wherever they appear in the document.
```{r setup_captions}
table_nums <- captioner(prefix = "Table")
@tomhopper
tomhopper / rmarkdown_num_equations.rmd
Last active April 29, 2018 21:46
Demonstration of creating automatically-numbered equations in RMarkdown documents.
At the top of your R markdown, beneath the yaml title block, add the following `<script>` section
---
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
TeX: {
equationNumbers: {
autoNumber: "all",
formatNumber: function (n) {return n}