Skip to content

Instantly share code, notes, and snippets.

View hrbrmstr's full-sized avatar
💤
#tired

boB Rudis hrbrmstr

💤
#tired
View GitHub Profile
@hrbrmstr
hrbrmstr / logistic_regression.R
Created April 17, 2016 16:59 — forked from mick001/logistic_regression.R
Logistic regression tutorial code. Full article available at http://datascienceplus.com/perform-logistic-regression-in-r/
# Load the raw training data and replace missing values with NA
training.data.raw <- read.csv('train.csv',header=T,na.strings=c(""))
# Output the number of missing values for each column
sapply(training.data.raw,function(x) sum(is.na(x)))
# Quick check for how many different values for each feature
sapply(training.data.raw, function(x) length(unique(x)))
# A visual way to check for missing data
@hrbrmstr
hrbrmstr / benchread.r
Created February 23, 2016 12:20
benchmarking R line readers
library(stringi)
library(microbenchmark)
library(ggplot2)
library(readr)
ex <- "example.txt"
mb <- microbenchmark(readLines=readLines(ex),
read_lines=read_lines(ex),
stri_read_lines=stri_read_lines(ex))
library('httr')
library('jsonlite')
get_measures <- function(deviceid) {
.h <- list(
"Origin" = "http://anasim.iet.unipi.it",
"Accept-Encoding" = "gzip, deflate",
"Accept-Language" = "it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4",
"User-Agent" = "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36",

Livecoding D3 in RStudio

This is a proof-of-concept on how one can use RStudio to livecode D3 visualizations.

Usage

You will need to install a couple of packages before getting started

devtools::install_github("yihui/servr")
library(SmarterPoland)
library(riverplot)
library(RColorBrewer)
library(graphics)
library(reshape2)
library(plyr)
library(stringr)
library(countrycode)
# DOWNLOAD THE DATA
@hrbrmstr
hrbrmstr / experiments-spark.R
Created October 26, 2015 11:00
Sample code for working with Apache Spark (v1.4), SparkR and ParquetFiles from RStudio
# see github repos & package documentation
# - http://github.com/apache/spark/tree/master/R
# - http://spark.apache.org/docs/latest/api/R/
# install the SparkR package
devtools::install_github("apache/spark", ref="master", subdir="R/pkg")
# load the SparkR & ggplot2 packages
library('SparkR')
library('ggplot2')
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Data sources:
CDC data on vaccine coverage: http://www.cdc.gov/flu/professionals/vaccination/reporti1112/reporti/index.htm
CDC data on current influenza situation: http://gis.cdc.gov/grasp/fluview/fluportaldashboard.html
Historical flu data: http://www.cdc.gov/flu/weekly/pastreports.htm
Locations of hospitals in Boston: https://data.cityofboston.gov/Public-Health/Hospital-Locations/46f7-2snz
NetLogo, a simple agent based model: http://ccl.northwestern.edu/netlogo/
Nemsis, emergency medical services API (haven't used this one myself): http://www.nemsis.org/v3/downloads/v3Archive.html
Written for R, but useful for epi basics in any language: cran.r-project.org/doc/contrib/Epicalc_Book.pdf
"Provides access to health statistics and information on hospital inpatient and emergency department utilization": http://hcupnet.ahrq.gov/
Healthcare Research and Quality data directory: http://www.ahrq.gov/data/dataresources.htm
doInstall <- TRUE
toInstall <- c("XML", "maps", "ggplot2", "sp")
if(doInstall){install.packages(toInstall, repos = "http://cran.us.r-project.org")}
lapply(toInstall, library, character.only = TRUE)
myURL <- "http://en.wikipedia.org/wiki/United_States_presidential_election,_2012"
allTables <- readHTMLTable(myURL)
str(allTables) # Look at the allTables object to find the specific table we want
stateTable <- allTables[[14]] # We want the 14th table in the list (maybe 13th?)