Skip to content

Instantly share code, notes, and snippets.

Avatar

Clark Fitzgerald clarkfitzg

  • Math and Stats Department
  • CSU Sacramento
View GitHub Profile
View cut.R
#' Cut Into Bins
#'
#' No boundaries on the endpoints, and handles character \code{x}.
#' A little different than normal \code{\link[base]{cut}}
#'
#' @param x column to be cut
#' @param breaks define the bins
#' @param bin_names names for the result
#' @return bins factor
cutbin = function(x, breaks, bin_names)
View lookup_rt.R
> lookup::lookup(rt)
stats::rt [closure]
function (n, df, ncp)
{
if (missing(ncp))
.Call(C_rt, n, df)
else if (is.na(ncp)) {
warning("NAs produced")
rep(NaN, n)
}
@clarkfitzg
clarkfitzg / install.md
Created May 19, 2017
Working through install problems for RSQLiteUDF.
View install.md

Fri May 19 09:34:10 PDT 2017

Working through install problems for RSQLiteUDF.

** testing if installed package can be loaded
Error in dyn.load(file, DLLpath = DLLpath, ...) :
  unable to load shared object '/usr/local/lib/R/site-library/RSQLiteUDF/libs/RSQLiteUDF.so':
  /usr/local/lib/R/site-library/RSQLiteUDF/libs/RSQLiteUDF.so: undefined symbol: sqlite3_enable_load_extension
Error: loading failed
@clarkfitzg
clarkfitzg / recursive_numba.py
Created Mar 25, 2017
Way faster version using numba
View recursive_numba.py
import numpy as np
import pandas as pd
from numba import jit
n_smpl = int(1e6)
ni = 5
group_id = np.repeat(np.arange(n_smpl), ni)
a = np.repeat(1, len(group_id))
b = np.repeat(1, len(group_id))
@clarkfitzg
clarkfitzg / recursive_normal.py
Last active Mar 24, 2017
Comparing groupby speed in pandas versus R data.table
View recursive_normal.py
"""
http://stackoverflow.com/questions/41886507/data-table-faster-row-wise-recursive-update-within-group/41891693#41891693
require(data.table) # v1.10.0
n_smpl = 1e6
ni = 5
id = rep(1:n_smpl, each = ni)
smpl = data.table(id)
smpl[, time := 1:.N, by = id]
a_init = 1; b_init = 1
View shuffle.R
x = 1:3
y = 11:13
# Given vectors x, y, what's the cleanest general way to make z?
# z = c(1, 11, 2, 12, 3, 13)
shuffle = function(x, y)
{
as.vector(mapply(c, x, y))
}
@clarkfitzg
clarkfitzg / ripser.md
Created Dec 17, 2016
Calling "ripser" from R
View ripser.md

First install ripser.

Then make it locatable on your system PATH, something like:

$ ln -s /home/clark/dev/ripser/ripser /usr/local/bin/ripser

After this your system should be able to find ripser:

$ which ripser
@clarkfitzg
clarkfitzg / simple_parallel.R
Created Nov 17, 2016
For tutorial on Nov 17, 2016
View simple_parallel.R
# A very simple parallel program
#
# We specify the probability that each individual
# votes for a candidate, and then simulate the counts
# for n such voters.
#
# count_votes and count_votes_slow are the functions
# to parallelize. Typically each run will take some
# time to complete.
#
@clarkfitzg
clarkfitzg / keyvalue.R
Created Jul 18, 2016
Use serialization to store arbitrary R objects as key value pairs in Spark DataFrames
View keyvalue.R
# Mon Jul 18 08:08:09 PDT 2016
# Goal: Store arbitrary objects in DataFrames as bytes to make dapply more
# general
#
# Inefficient- this uses CLOB rather than BLOB
# Comments throughout this question are helpful
# http://stackoverflow.com/questions/5950084/how-to-handle-binary-strings-in-r
library(SparkR)
@clarkfitzg
clarkfitzg / ddR_parts.R
Last active Jul 14, 2016
Creating and accessing the parts of a nonuniform distributed array
View ddR_parts.R
> library(ddR)
Welcome to 'ddR' (Distributed Data-structures in R)!
For more information, visit: https://github.com/vertica/ddR
Attaching package:ddR
The following objects are masked frompackage:base:
cbind, rbind