Skip to content

Instantly share code, notes, and snippets.


Clark Fitzgerald clarkfitzg

  • Math and Stats Department
  • CSU Sacramento
View GitHub Profile
View papers.bib
title={SQL-on-Hadoop: full circle back to shared-nothing database architectures},
author={Floratou, Avrilia and Minhas, Umar Farooq and {\"O}zcan, Fatma},
journal={Proceedings of the VLDB Endowment},
publisher={VLDB Endowment}
clarkfitzg / tasks.csv
Created Oct 15, 2018
Using W3 CSV standard to see how we would like to extend it to work with statistics type data.
View tasks.csv
date category task complete notes
2018-10-15 support type and share meeting notes from Friday with Duncan 0
2018-10-15 revise rewrite software alchemy example for clarity following Duncan's feedback 0
clarkfitzg / Rtesseract_install.log
Created Sep 8, 2018
Fun with installing Rtesseract
View Rtesseract_install.log
clark@campus-108-089 ~/dev/Rtesseract (master)
$ ./configure
checking for pkg-config... /usr/local/bin/pkg-config
Package tesseract was not found in the pkg-config search path.
Perhaps you should add the directory containing `tesseract.pc'
to the PKG_CONFIG_PATH environment variable
No package 'tesseract' found
Package tesseract was not found in the pkg-config search path.
Perhaps you should add the directory containing `tesseract.pc'
to the PKG_CONFIG_PATH environment variable
View pingpong.R
#!/usr/bin/env Rscript
# 2018-06-04 11:26:12
# Automatically generated from R by autoparallel version 0.0.1
nworkers = 2
timeout = 600
clarkfitzg / forloop.R
Last active Nov 27, 2017
R for loops with possibly difficult vectorization
View forloop.R
# Given observations of linear functions f and g at points a and b this
# calculates the integral of f * g from a to b.
# Looks like it will already work as a vectorized function. Sweet!
inner_one_piece = function(a, b, fa, fb, ga, gb)
# Roughly following my notes
fslope = (fb - fa) / (b - a)
gslope = (gb - ga) / (b - a)
View test_rslurm.R
# Following:
test_func <- function(par_mu, par_sd) {
samp <- rnorm(10^6, par_mu, par_sd)
c(s_mu = mean(samp), s_sd = sd(samp))
View split_write.R
#' Split And Append Results To CSV File
#' x will be split by f and each group will be appended to a directory of
#' csv files named according to f
#' @param x data frame to split
#' @param f factor defining splits
#' @param ... further arguments to split
#' @param dirname character directory, will be created if doesn't exist
#' @return NULL
View scale.R
# Mon Aug 28 16:33:46 PDT 2017
# sweep() used to implement scale() is inefficient. Profiling shows that
# only 2% of the time is spent in colMeans. The only other thing to do is
# subtract the mean, which should be fast, but isn't because memory
# layout requires a transpose to use recycling (broadcasting).
# But I don't know how to do any better short of writing in C
clarkfitzg / cov_chunked.R
Created Aug 10, 2017
Chunked version of covariance
View cov_chunked.R
cov_chunked = function(x, nchunks = 2L)
p = ncol(x)
indices = parallel::splitIndices(p, nchunks)
diagonal_blocks = lapply(indices, function(idx) cov(x[, idx, drop = FALSE]))
upper_right_indices = combn(indices, 2, simplify = FALSE)
View recurse_globals.R
# From wlandau
#' Recursively Find Global Variables
#' TODO: Modify this to work without requiring that the code be evaluated
#' Probably means we can't use codetools::findGlobals