Skip to content

Instantly share code, notes, and snippets.

View Roon's full-sized avatar

Aaron Mayzes Roon

View GitHub Profile
#' Parse a codebook file with variable and level information.
#'
#' Parses a codebook file where lines starting at column zero (far left) represet
#' variable information (e.g. name, description, type) and indented lines
#' (i.e. lines beginning with white space, either tabs or spaces, etc.) represent factor
#' levels and labels.
#'
#' Note that white space at the beginning and end of each line is stripped before
#' processing that line.
#'
@Roon
Roon / package.R
Created September 28, 2015 15:19 — forked from jbryer/package.R
#' Simplified loading and installing of packages
#'
#' This is a wrapper to \code{\link{require}} and \code{\link{install.packages}}.
#' Specifically, this will first try to load the package(s) and if not found
#' it will install then load the packages. Additionally, if the
#' \code{update=TRUE} parameter is specified it will check the currently
#' installed package version with what is available on CRAN (or mirror) and
#' install the newer version.
#'
#' @param pkgs a character vector with the names of the packages to load.
@Roon
Roon / advise.md
Created October 3, 2015 03:33 — forked from hadley/advise.md
Advise for teaching an R workshop

I think the two most important messages that people can get from a short course are:

a) the material is important and worthwhile to learn (even if it's challenging), and b) it's possible to learn it!

For those reasons, I usually start by diving as quickly as possible into visualisation. I think it's a bad idea to start by explicitly teaching programming concepts (like data structures), because the pay off isn't obvious. If you start with visualisation, the pay off is really obvious and people are more motivated to push past any initial teething problems. In stat405, I used to start with some very basic templates that got people up and running with scatterplots and histograms - they wouldn't necessary understand the code, but they'd know which bits could be varied for different effects.

Apart from visualisation, I think the two most important topics to cover are tidy data (i.e. http://www.jstatsoft.org/v59/i10/ + tidyr) and data manipulation (dplyr). These are both important for when people go off and apply

@Roon
Roon / README.md
Created December 9, 2015 18:50 — forked from sebkopf/README.md
wolfram alpha from R

Wolfram Alpha API from R

The attached code file provides an easy basic interface to the Wolfram Alpha API. Inspired by the wolframalpha module available for Python.

source("wa_lib.R")

Initialize client

@Roon
Roon / Unattended Encrypted Incremental Backup to Amazon S3.md
Created August 2, 2016 16:32 — forked from janikvonrotz/Unattended Encrypted Incremental Backup to Amazon S3.md
Ubuntu: Unattended Encrypted Incremental Backup to Amazon S3#AmazonAWS#Markdown

Introduction

For this task we are going to configure a duplicity script wrapper. Unregarded of the installation instructions it's expected that you have already signed up for an Amazon account and know how to use their services.

Requirements

  • Ubuntu server
  • duplicity, Git, GnuPG
  • MySQL
import urllib2, csv
import matplotlib.pyplot as plt
import datetime
import seaborn
import numpy, scipy.stats, math
f = urllib2.urlopen('https://raw.githubusercontent.com/datasets/s-and-p-500/master/data/data.csv')
csv = csv.reader(f)
csv.next() # headers
@Roon
Roon / pep-440-semver.md
Created September 14, 2018 17:43 — forked from colinvh/pep-440-semver.md
440 Semantic

PEP-440-Compatible Semantic Versioning

This document attempts to refine Python's PEP 440 to include the principles of Semantic Versioning.

Specification

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

@Roon
Roon / graphql_example.py
Created November 5, 2018 18:24 — forked from gbaman/graphql_example.py
An example on using the Github GraphQL API with Python 3
# An example to get the remaining rate limit using the Github GraphQL API.
import requests
headers = {"Authorization": "Bearer YOUR API KEY"}
def run_query(query): # A simple function to use requests.post to make the API call. Note the json= section.
request = requests.post('https://api.github.com/graphql', json={'query': query}, headers=headers)
if request.status_code == 200:
@Roon
Roon / Writing Tools Writeup.markdown
Created January 8, 2019 16:14 — forked from mojavelinux/Writing Tools Writeup.markdown
How To Write A Technical Book (One Man's Modest Suggestions)
# This function takes a vector x and returns a factor representation of the same vector.
# The key advantage of factorize is that you can assign levels for infrequent categories,
# as well as empty and NA values. This makes it much easier to perform
# multidimensional/thematic analysis on your largest population subsets.
factorize <- function(
x, # vector to be transformed
min_freq = .01, # all levels < this % of records will be bucketed
min_n = 1, # all levels < this # of records will be bucketed
NA_level = '(missing)', # level created for NA values
blank_level = '(blank)', # level created for "" values