Skip to content

Instantly share code, notes, and snippets.

View clarkfitzg's full-sized avatar

Clark Fitzgerald clarkfitzg

  • Mathematics and Statistics Department, CSU Sacramento
  • Sacramento
View GitHub Profile
@clarkfitzg
clarkfitzg / recursive_numba.py
Created March 25, 2017 00:45
Way faster version using numba
import numpy as np
import pandas as pd
from numba import jit
n_smpl = int(1e6)
ni = 5
group_id = np.repeat(np.arange(n_smpl), ni)
a = np.repeat(1, len(group_id))
b = np.repeat(1, len(group_id))
> lookup::lookup(rt)
stats::rt [closure]
function (n, df, ncp)
{
if (missing(ncp))
.Call(C_rt, n, df)
else if (is.na(ncp)) {
warning("NAs produced")
rep(NaN, n)
}
#' Cut Into Bins
#'
#' No boundaries on the endpoints, and handles character \code{x}.
#' A little different than normal \code{\link[base]{cut}}
#'
#' @param x column to be cut
#' @param breaks define the bins
#' @param bin_names names for the result
#' @return bins factor
cutbin = function(x, breaks, bin_names)
# From wlandau
# https://github.com/duncantl/CodeDepends/issues/19
library(CodeDepends)
#' Recursively Find Global Variables
#'
#' TODO: Modify this to work without requiring that the code be evaluated
#' Probably means we can't use codetools::findGlobals
@clarkfitzg
clarkfitzg / cov_chunked.R
Created August 10, 2017 23:57
Chunked version of covariance
cov_chunked = function(x, nchunks = 2L)
{
p = ncol(x)
indices = parallel::splitIndices(p, nchunks)
diagonal_blocks = lapply(indices, function(idx) cov(x[, idx, drop = FALSE]))
upper_right_indices = combn(indices, 2, simplify = FALSE)
# Mon Aug 28 16:33:46 PDT 2017
#
# sweep() used to implement scale() is inefficient. Profiling shows that
# only 2% of the time is spent in colMeans. The only other thing to do is
# subtract the mean, which should be fast, but isn't because memory
# layout requires a transpose to use recycling (broadcasting).
#
# But I don't know how to do any better short of writing in C
library(microbenchmark)
#' Split And Append Results To CSV File
#'
#' x will be split by f and each group will be appended to a directory of
#' csv files named according to f
#'
#' @param x data frame to split
#' @param f factor defining splits
#' @param ... further arguments to split
#' @param dirname character directory, will be created if doesn't exist
#' @return NULL
# Following:
# https://cran.r-project.org/web/packages/rslurm/vignettes/rslurm.html
library(rslurm)
test_func <- function(par_mu, par_sd) {
samp <- rnorm(10^6, par_mu, par_sd)
c(s_mu = mean(samp), s_sd = sd(samp))
}
@clarkfitzg
clarkfitzg / forloop.R
Last active November 27, 2017 23:29
R for loops with possibly difficult vectorization
# Given observations of linear functions f and g at points a and b this
# calculates the integral of f * g from a to b.
#
# Looks like it will already work as a vectorized function. Sweet!
inner_one_piece = function(a, b, fa, fb, ga, gb)
{
# Roughly following my notes
fslope = (fb - fa) / (b - a)
gslope = (gb - ga) / (b - a)
#!/usr/bin/env Rscript
# 2018-06-04 11:26:12
# Automatically generated from R by autoparallel version 0.0.1
library(parallel)
nworkers = 2
timeout = 600