Skip to content

Instantly share code, notes, and snippets.

@ramhiser
ramhiser / impute-naive.r
Created March 27, 2015 18:27
Naive imputation of missing data within an R data frame
#' Naive imputation of missing data
#'
#' Imputes missing data in a data frame a column at a time, e.g., univariate.
#' Missing numeric values are replaced with the median. Similarly, missing
#' factor values are replaced with the mode.
#'
#' If \code{draw} is set to \code{TRUE}, missing data are drawn from a basic
#' distribution to make the imputation slightly less naive. For continuous,
#' values are drawn from a uniform distribution ranging from the min to max
#' values observed within the column. For categorical, values are drawn from a
@ramhiser
ramhiser / austin-thd-stores.html
Created April 21, 2015 18:08
Basic Leaflet Example - Home Depot Stores in Austin Area
<!DOCTYPE html>
<html>
<head>
<title>Leaflet Example -- Home Depot Stores</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="stylesheet" href="http://cdn.leafletjs.com/leaflet-0.7.3/leaflet.css" />
@ramhiser
ramhiser / add-na-rows.r
Last active August 29, 2015 14:20
Add NA row for each group in data frame
# Useful for drawing polygons with leaflet
# Polygons are stored in a `tbl_df` object with a mandatory `NA` row between each
# polygon so that `leaflet` knows to stop drawing between each polygon.
# Rather than magic, I found a slick way to do this via `dplyr::arrange`
# See: (http://stackoverflow.com/a/25267681/234233).
# Example using Iris data set:
df_na <- matrix(NA, nrow=nlevels(iris$Species), ncol=ncol(iris) - 1)
df_na <- tbl_df(as.data.frame(df_na))
colnames(df_na) <- setdiff(colnames(iris), "Species")
@ramhiser
ramhiser / leaflet-county-explorer.r
Created May 4, 2015 18:25
Leaflet app in R to explore U.S. Census demographics by county
# TODO: Add a Shiny dropdown to select demographic variable
library(leaflet)
library(noncensus)
library(dplyr)
data("counties", package="noncensus")
data("county_polygons", package="noncensus")
data("quick_facts", package="noncensus")
counties <- counties %>%
@ramhiser
ramhiser / exercise-4.7.r
Created July 4, 2015 03:45
Exercise 4.7 (p.160) from Box, Hunter and Hunter (2005) "Statistics for Experimenters"
library(dplyr)
library(BHH2)
df <- expand.grid(drivers=c('I', 'II', 'III', 'IV'),
cars=1:4)
df <- rbind(df, df) %>% arrange(drivers, cars)
df$treatment <- c(
rep(c('A', 'B', 'D', 'C'), each=2),
rep(c('D', 'C', 'A', 'B'), each=2),
rep(c('B', 'D', 'C', 'A'), each=2),
@ramhiser
ramhiser / boundieboxes.py
Created July 18, 2015 22:16
Boundieboxes from Pierce (OpenCV 3.0)
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatch
import matplotlib.cm as cm
import cv2
import csv
from sklearn import cluster
def find_points(gray_img, color_img, num_points):
@ramhiser
ramhiser / thd.py
Created July 19, 2015 02:11
Simple EDA of daily Home Depot stock prices
%matplotlib inline
from yahoo_finance import Share
import matplotlib.pylab
import pandas as pd
import numpy as np
thd = Share('HD')
thd_prices = thd.get_historical('2010-01-01', '2015-06-01')
thd_prices = pd.DataFrame(thd_prices)
@ramhiser
ramhiser / sample.js
Last active August 29, 2015 14:26
Weighted random sample from a vector in JavaScript
// Weighted random sample from a vector
//
// By default, the `weights` are set to 1. This equates to equal weighting.
// Loosely based on http://codereview.stackexchange.com/a/4265
//
// If any weight is `null`, revert to default weights (i.e., all 1).
//
// A random-number generator (RNG) seed is optionally set via seedrandom.js.
// NOTE: The JS file is loaded via jQuery.
// Details: https://github.com/davidbau/seedrandom
@ramhiser
ramhiser / sphinx2vtt.py
Created September 10, 2015 02:49
Convert CMU Sphinx closed-captioning auto alignment to WebVTT format
#!/usr/bin/env python
import argparse
import sys
import time
from itertools import izip, count
def parse_sphinx_line(line):
'''Parse a line from Sphinx's closed captioning alignment'''
line_split = line.split()
@ramhiser
ramhiser / RSQlite.r
Created June 9, 2011 23:48
Input data from a SQLite database with RSQLite
library('RSQLite')
# Establishes a connection to the specified SQLite database file.
db_filename <- "choose_filename.db3"
db_driver <- dbDriver("SQLite")
db_conn <- dbConnect(db_driver, dbname = db_filename)
# An alternative is... (not sure about the difference)
# db_conn <- dbConnect(SQLite(), dbname = db_filename)