Skip to content

Instantly share code, notes, and snippets.

Avatar

Clark Fitzgerald clarkfitzg

  • Math and Stats Department
  • CSU Sacramento
View GitHub Profile
View papers.bib
@article{floratou2014sql,
title={SQL-on-Hadoop: full circle back to shared-nothing database architectures},
author={Floratou, Avrilia and Minhas, Umar Farooq and {\"O}zcan, Fatma},
journal={Proceedings of the VLDB Endowment},
volume={7},
number={12},
pages={1295--1306},
year={2014},
publisher={VLDB Endowment}
}
@clarkfitzg
clarkfitzg / tasks.csv
Created Oct 15, 2018
Using W3 CSV standard https://www.w3.org/TR/tabular-data-primer/ to see how we would like to extend it to work with statistics type data.
View tasks.csv
date category task complete notes
2018-10-15 support type and share meeting notes from Friday with Duncan 0
2018-10-15 revise rewrite software alchemy example for clarity following Duncan's feedback 0
@clarkfitzg
clarkfitzg / Rtesseract_install.log
Created Sep 8, 2018
Fun with installing Rtesseract
View Rtesseract_install.log
clark@campus-108-089 ~/dev/Rtesseract (master)
$ ./configure
checking for pkg-config... /usr/local/bin/pkg-config
Package tesseract was not found in the pkg-config search path.
Perhaps you should add the directory containing `tesseract.pc'
to the PKG_CONFIG_PATH environment variable
No package 'tesseract' found
Package tesseract was not found in the pkg-config search path.
Perhaps you should add the directory containing `tesseract.pc'
to the PKG_CONFIG_PATH environment variable
View pingpong.R
#!/usr/bin/env Rscript
# 2018-06-04 11:26:12
# Automatically generated from R by autoparallel version 0.0.1
library(parallel)
nworkers = 2
timeout = 600
@clarkfitzg
clarkfitzg / forloop.R
Last active Nov 27, 2017
R for loops with possibly difficult vectorization
View forloop.R
# Given observations of linear functions f and g at points a and b this
# calculates the integral of f * g from a to b.
#
# Looks like it will already work as a vectorized function. Sweet!
inner_one_piece = function(a, b, fa, fb, ga, gb)
{
# Roughly following my notes
fslope = (fb - fa) / (b - a)
gslope = (gb - ga) / (b - a)
View test_rslurm.R
# Following:
# https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html
library(rslurm)
test_func <- function(par_mu, par_sd) {
samp <- rnorm(10^6, par_mu, par_sd)
c(s_mu = mean(samp), s_sd = sd(samp))
}
View split_write.R
#' Split And Append Results To CSV File
#'
#' x will be split by f and each group will be appended to a directory of
#' csv files named according to f
#'
#' @param x data frame to split
#' @param f factor defining splits
#' @param ... further arguments to split
#' @param dirname character directory, will be created if doesn't exist
#' @return NULL
View scale.R
# Mon Aug 28 16:33:46 PDT 2017
#
# sweep() used to implement scale() is inefficient. Profiling shows that
# only 2% of the time is spent in colMeans. The only other thing to do is
# subtract the mean, which should be fast, but isn't because memory
# layout requires a transpose to use recycling (broadcasting).
#
# But I don't know how to do any better short of writing in C
library(microbenchmark)
@clarkfitzg
clarkfitzg / cov_chunked.R
Created Aug 10, 2017
Chunked version of covariance
View cov_chunked.R
cov_chunked = function(x, nchunks = 2L)
{
p = ncol(x)
indices = parallel::splitIndices(p, nchunks)
diagonal_blocks = lapply(indices, function(idx) cov(x[, idx, drop = FALSE]))
upper_right_indices = combn(indices, 2, simplify = FALSE)
View recurse_globals.R
# From wlandau
# https://github.com/duncantl/CodeDepends/issues/19
library(CodeDepends)
#' Recursively Find Global Variables
#'
#' TODO: Modify this to work without requiring that the code be evaluated
#' Probably means we can't use codetools::findGlobals