Skip to content

Instantly share code, notes, and snippets.

View djhocking's full-sized avatar

Daniel J. Hocking djhocking

View GitHub Profile
@djhocking
djhocking / subset_save.R
Last active November 9, 2018 04:01
Subset and Save
# make fake dataset
df <- data.frame(x = runif(100, 0, 1), y = rnorm(100, 10, 3), z = rpois(100, 10))
# subset dataframe
df_sub <- df[which(df$x >= 0.75), ]
# subset using dplyr
library(dplyr)
df_sub2 <- df %>%
@djhocking
djhocking / predict_vs_simulate.org
Created April 21, 2016 19:21 — forked from tmalsburg/predict_vs_simulate.org
Predict vs simulate in lme4

Predict vs simulate in lme4

For this investigation we are going to use the sleepdata data set from the lme4 package. Here is the head of the data frame:

@djhocking
djhocking / ggplot boxplot continuous x with groups
Last active February 12, 2016 20:58
Trying to make a boxplot with ggplot2 in R where the x-axis in continous but their are paired boxplots for each value of on the x-axis based on another factor
# ggplot boxplot groups of continuous x
### Daniel J. Hocking
I am trying to make a boxplot with ggplot2 in R where the x-axis in continous but their are paired boxplots for each value of on the x-axis based on another factor with two possible values. I want to make a plot where boxplots are arranged by number of survey years on the x-axis but paired by spatialTF (2 boxplots for every value of n_years) but n_years are not evenly spaced.
This plot gets the paired boxplots correct by year but the years on the x-axis are evenly spaced and don't reflect the actual (continuous) time between years.
```
ggplot(df_converged, aes(factor(n_years), mean_N_est)) +
@djhocking
djhocking / temperature_data
Created February 11, 2015 19:23
Pull temperature data from the database
# table references
tbl_locations <- tbl(db, 'locations') %>%
rename(location_id=id, location_name=name, location_description=description) %>%
select(-created_at, -updated_at)
tbl_series <- tbl(db, 'series') %>%
rename(series_id=id) %>%
select(-created_at, -updated_at)
tbl_variables <- tbl(db, 'variables') %>%
rename(variable_id=id, variable_name=name, variable_description=description) %>%
@djhocking
djhocking / derive_metrics
Created January 15, 2015 21:29
R code to derive metrics for all catchments
# Get list of unique catchments with daymet data in our database
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, dbname=...)
# con <- dbConnect(drv, dbname=...)
qry <- "SELECT DISTINCT featureid FROM daymet;"
result <- dbSendQuery(con, qry)
catchments <- fetch(result, n=-1)
catchments <- as.character(catchments$featureid)
# get daymet data for a subset of catchments
@djhocking
djhocking / gist:eff1072b54b6d8049270
Last active August 29, 2015 14:13
Big Queries and collect with dplyr
# fetch temperature data
tbl_values <- left_join(tbl_series,
select(tbl_variables, variable_id, variable_name),
by=c('variable_id'='variable_id')) %>%
select(-file_id) %>%
filter(location_id %in% df_locations$location_id,
variable_name=="TEMP") %>%
left_join(tbl_values,
by=c('series_id'='series_id')) %>%
left_join(select(tbl_locations, location_id, location_name, latitude, longitude, featureid=catchment_id),
@djhocking
djhocking / temperature_equations
Created January 7, 2015 04:20
First example of using Markdown with LaTeX equations
We assumed stream temperature measurements were normally distributed following,
\\[ t_{s,h,d,y} \sim \mathcal{N}(\mu_{s,h,d,y}, \sigma) \\]
where $t_{s,h,d,y}$ is the observed stream water temperature at the site ($s$) within the sub-basin identified by the 8-digit Hydrologic Unit Code (HUC8; $h$) for each day ($d$) in each year ($y$). We describe the normal distribution with the standard deviation ($\sigma$). The expected temperature follows a linear trend
\\[ \omega_{s,h,d,y} = X^0 B^0 + X_{h}^{huc} B_{h}^{huc} + X_{s,h}^{site} B_{s,h}^{site} + X_{y}^{year} B_{y}^{year} \\]
but the expected temperature ($\mu_{s,h,d,y}$) is adjusted based on the residual error from the previous day
#purling is in the knitr package
library(knitr)
setwd("C:/ALR/Models/boo") #example using local windows directory, can easily switch
#it can be really simple
purl( "script1.Rmd", "script1.R" )
#just specify the rmd filename, then r filename, with extensions
@djhocking
djhocking / dplyr-select-names.R
Last active February 28, 2022 19:08
Select columns by vector of names using dplyr
one <- seq(1:10)
two <- rnorm(10)
three <- runif(10, 1, 2)
four <- -10:-1
df <- data.frame(one, two, three)
df2 <- data.frame(one, two, three, four)
str(df)
@djhocking
djhocking / glmmBoot
Created May 21, 2014 14:12
Bootstrap mixed effects logistic regression predictions
# Function for getting bootstrapped glmer predictions in parallel
glmmBoot <- function(dat, form, R, nc){
# dat = data for glmer (lme4) logistic regression
# form = formula of glmer equation for fitting
# R = total number of bootstrap draws - should be multiple of nc b/c divided among cores evenly
# nc = number of cores to use in parallel
library(parallel)
cl <- makeCluster(nc) # Request # cores
clusterExport(cl, c("dat", "form", "nc", "R"), envir = environment()) # Make these available to each core