Skip to content

Instantly share code, notes, and snippets.

View mmparker's full-sized avatar

Matt Parker mmparker

View GitHub Profile
@mmparker
mmparker / data_indeed_searches.md
Last active August 29, 2015 14:16
Data Analyst/Scientist/Etc. Indeed Searches

No SAS, no ads, and no post-docs, please

("statistical programmer" or "statistical programming") -SAS
data scientist -marketing -advertising
applied data (analysis or analyst) -SAS -postdoc -postdoctoral -"post-doctoral"

@mmparker
mmparker / transform_example.r
Created February 17, 2015 17:33
Quick illustration of variable transformations
# This is a quick script to illustrate how to go about transforming variables
# for statistical analysis and the effects of some basic transformations.
# I'm by no means an expert on transformations, so be sure to read up on
# how to best apply the transformations!
# These two packages are for demonstrating the transformations -
# not necessary for the transformations themselves.
library(reshape2)
library(ggplot2)
get_random_subset <- function(df, id, n_ids = 1, select = TRUE) {
# Pick one ID to review
random_id <- sample(df[ , id], size = n_ids)
# Print it
df[df[ , id] %in% random_id, select]
@mmparker
mmparker / equals_vs_in.r
Created October 20, 2014 23:55
== vs. %in%
# Broadly speaking, it's safer to %in% instead of == when using a logical
# vector in R because R's indexing will return an NA - which is
# probably not the intuitive behavior. Here's what I mean:
x <- c("a", "b", NA, "c")
# Indexing with ==
x[x == "a"]
@mmparker
mmparker / calc_percent.r
Created October 15, 2014 16:31
Percentage calculation snippet
# Calculate percent of responses by department
# The input here is a subset of kano_questions - all of the
# responses related to one item
x_by_dept <- ddply(subset(x, !is.na(department)), # Dropping that person with no department
.var = "department", # I'm going to split x by department
.fun = function(y) { # Writing a custom, nameless function to apply
# to each chunk of x
# Dummy list
z <- list(list('a' = 1, 'b' = 2, 'c' = 3), list('a' = 4, 'b' = 5, 'c' = 6))
# If you just want to stack all of the values, not keeping any of the list name data
# but ensuring they're all of one type
data.frame(values = as.numeric(unlist(z)))
@mmparker
mmparker / date_diffs.r
Created May 22, 2014 21:48
Successive pairwise date diffs
library(zoo)
# Some sequence of Dates
x <- as.Date("2014-01-01") + c(0, 1, 2, 5, 10)
data.frame(x,
basediff = c(NA, diff(x)),
@mmparker
mmparker / applytorows.r
Created April 18, 2014 00:03
Ways to use a non-vectorized function on every row of a data.frame
# Sample data
X <- data.frame(
x = c(1, 2, 3),
y = c(4, 5, 6),
etc = c("a", "b", "c")
)
# Arbitrary stand-in for function that can't be vectorized (no pmax)
max.fun <- function(a, b) { max(c(a, b)) }
@mmparker
mmparker / printable_table.css
Created February 20, 2014 19:20
CSS for printing an HTML table, one row per page (it's a tall row)
@media print {
tr{
page-break-after: always;
display: block;
}
}
@mmparker
mmparker / calc_qtr_end.r
Created January 30, 2014 16:50
Calculate the last day of a quarter, given year and quarter
# Calculate the date of the last day of a given quarter by pasting
# together its first day, adding three months, and subtracting a day.
# Very elegance
# Such vectorized
calc_qtr_end <- function(year, qtr) {
require(lubridate) # Easiest way to add a month to a date
(as.Date(paste(year, qtr * 3, "01", sep = "-")) %m+% months(1)) - 1