Skip to content

Instantly share code, notes, and snippets.

View DavidRdgz's full-sized avatar

David Rodriguez DavidRdgz

  • San Francisco State University - Graduate Student
  • Berkeley, CA
View GitHub Profile
@DavidRdgz
DavidRdgz / Vagrantfile
Created May 5, 2018 18:00
Quick Vagrant machine with Hadoop & Spark using Ansible
Vagrant.configure("2") do |config|
config.vm.box = "ubuntu/xenial64"
config.vm.hostname = "spark.xenial.box"
config.vm.network :private_network, ip: "192.168.0.42"
config.vm.synced_folder "./data", "/vagrant_data"
config.vm.provider "virtualbox" do |vb|
vb.gui = false
vb.memory = 4096
@DavidRdgz
DavidRdgz / MapRawData.scala
Created May 7, 2018 17:16
Spark unique counts using HyperLogLog Algebird object with tests
package com.dvidr.counts
import com.twitter.algebird.{HLL, HyperLogLogMonoid}
import org.apache.spark.rdd.RDD
case class EmailSchema(sender: String,
to: String,
cc: String,
bcc: String,
sentDate: String,
@DavidRdgz
DavidRdgz / expert_mixtures.py
Last active June 23, 2018 19:10
Really small example (multi-class) mixture of experts model, almost. Technically, belief_per_model function needs to assign probabilities based on a function.
from keras.models import Model
from keras.layers import Input, Lambda, Dense
from keras.utils import to_categorical
from numpy.random import randint
import numpy as np
def belief_per_model(x):
x1, x2, x3, x4 = x
return x1 * .2 + x2 * .3 + x3 * .4 + x4 * .1
@DavidRdgz
DavidRdgz / mixture_of_experts.py
Last active June 23, 2018 19:12
Another toy (binary) mixture of experts model with 4 experts and a gating network.
from keras.models import Model
from keras.layers import Input, Dense, concatenate, dot
from numpy.random import randint
import numpy as np
def my_model(n=20):
inputs = Input(shape=(n,))
m1 = Dense(1)(inputs)
@DavidRdgz
DavidRdgz / Discrete.scala
Last active July 30, 2018 15:23
A few discrete probability distributions for Rainier
/**
* Bernoulli distribution with expectation `p`
*
* @param p The probability of success
*/
final case class Bernoulli(p: Real) extends Discrete {
val generator: Generator[Int] =
Generator.require(Set(p)) { (r, n) =>
val u = r.standardUniform
val l = n.toDouble(p)
@DavidRdgz
DavidRdgz / README.md
Last active August 5, 2018 18:42
Negative Binomial Approximation to Normal Threshold

nb approximation to normal threshold

There's many ways to test if a negative binomial is approximately normal: e.g.

  • visualize the qq plot
  • normalize the nb sample and perform shapiro-wilkes test

Below is an image of the envelope where the negative binomial parameters create a distribution that is approximately normal.

@DavidRdgz
DavidRdgz / Discrete.scala
Last active August 18, 2018 21:49
[Rainier] Example truncating combinator on base Discrete distributions.
package com.stripe.rainier.core
import com.stripe.rainier.compute.{Evaluator, If, Real}
trait Discrete extends Distribution[Int] {
self: Discrete =>
val emptyEvaluator = new Evaluator(Map.empty)
def logDensity(v: Real): Real
@DavidRdgz
DavidRdgz / SparkRainer.scala
Last active August 23, 2018 18:02
[Rainier] Massive Bayesian Inference in Spark using Rainer
import com.stripe.rainier.core.{Normal, Poisson}
import com.stripe.rainier.sampler.{RNG, ScalaRNG}
import org.apache.spark.{SparkConf, SparkContext}
object Driver {
implicit val rng: RNG = ScalaRNG(1527608515939L)
val DROP_BURN_IN = 100
/*
Refer to StackOverflow Q, about serializing methods/objects:
@DavidRdgz
DavidRdgz / Numpy.scala
Created January 2, 2019 21:34
Creating numpy-like arrays in scala using implicit class conversion
package numpy
trait NumpyWriter[A] {
def lessThan(list: List[A])(value: A): List[A]
def greaterThan(list: List[A])(value: A): List[A]
def multiply(list: List[A])(value: A): List[A]
def add(list: List[A])(value: A): List[A]
def subtract(list: List[A])(value: A): List[A]
}
@DavidRdgz
DavidRdgz / annotations.py
Created January 7, 2019 17:33
Xgboost model on the Prudential life insurance dataset from Kaggle
"""
A simple example of decluttering the settings for pandas so
that when developing the model and testing it, the dataframe
is a little cleaner and more readable.
"""
def pandas_defaults(defaults, pd):
def decorator(f):
def wrapper(*args, **kwargs):