Skip to content

Instantly share code, notes, and snippets.

View RobColeman's full-sized avatar

Rob Coleman RobColeman

View GitHub Profile
package com.chartboost.adrel.preprocessing.featureComputers
import com.chartboost.adrel.dataModels.EcpmPredictionRequest
import com.chartboost.adrel.util.JsonSaving
import org.apache.spark.mllib.linalg.{Vector => SparkVector, Vectors => SparkVectors}
import scala.util.hashing.MurmurHash3
/**
* The meta-data for computing features. In this format to easily save and load with models.
*
@RobColeman
RobColeman / ApproximateDistribution.scala
Last active January 29, 2016 01:06
An Approximate Distribution wrapper class for TDunnings java TDigest library
package com.preact.platform.math.models
import java.lang.System._
import java.util
import com.tdunning.math.stats.{Centroid, TreeDigest}
import org.apache.commons.math3.distribution.NormalDistribution
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkContext, SparkConf}
import scala.collection.immutable.IndexedSeq
@RobColeman
RobColeman / TreeDigestHelper.scala
Last active March 26, 2019 16:09
Helpers for TDunnings Java TDigest library
package com.preact.platform.math.models
import java.lang.System._
import java.nio.ByteBuffer
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.commons.math3.distribution.ExponentialDistribution
import org.apache.commons.math3.distribution.NormalDistribution
import com.tdunning.math.stats.TreeDigest
@RobColeman
RobColeman / TDigest.scala
Last active January 18, 2018 17:21
A fast, serializable, Scala implementation of tDigest. Thrown together for a Spark project.
import java.nio.ByteBuffer
import org.apache.spark.{SparkContext, SparkConf}
import org.apache.commons.math3.distribution.ExponentialDistribution
import scala.collection._
import scala.collection.generic.CanBuildFrom
import scala.util.Random
case class Centroid(var mean: Double, var count: Long) extends Ordered[Centroid] with Serializable {
@RobColeman
RobColeman / some_algorithms.py
Created November 9, 2015 22:19
Some Algorithms In Python
# binary search
def binary_search(x, arr):
"""
x is a integer,
arr is an array of integers, in sorted order
Binary search will tell you if x is contained in arr
A binary search bisects the array, recursively, to search smaller and smaller
sub-sections of the array until it either, finds the value or finds no more candidate
@RobColeman
RobColeman / count_nucleotides.py
Last active October 27, 2015 18:39
Counting nucleotides from a sequence
def count_nucleotides_by_type(sequence):
# use a dictionary to keep track of the counts of each nucleotide type in the sequence
counts = {
"A": 0,
"C": 0,
"G": 0,
"T": 0,
"length": 0
}
for x in "asdfasdfas2343asdf232dfasdf": print(x.isdigit())
def how_many_numbers_in_the_word(word):
counter = 0
for x in word:
if x.isdigit():
counter = counter + 1
return counter
@RobColeman
RobColeman / index.html
Created April 3, 2015 20:33
meteorites_module2_exercize
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Loading CSV Data with D3</title>
<script type="text/javascript" src="http://d3js.org/d3.v3.js"></script>
</head>
<body>
<p>Not much to see here; try looking in the console!</p>
@RobColeman
RobColeman / d3_course_dataset_meteorites
Last active August 29, 2015 14:18
d3_course_dataset_meteorites
I have chosen the meteorites dataset from The Meteorite Society, hosted at http://visualizing.org/datasets/meteorite-landings .
It has a couple interesting things going for it.
- It's only about 3.1MB, not very big, but not to small in size.
- It's got a lot of different data-types, with intersting relationships to play with (variable-types)
- Ratio (Mass, year)
- Qualitative - Interval (lat, long)
- Catagorical non-ordinal (class/type, namtype)
- It's already clean
@RobColeman
RobColeman / pdc_dtf.py
Last active August 29, 2015 14:06 — forked from agramfort/pdc_dtf.py
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""Implements Partial Directed Coherence and Direct Transfer Function
using MVAR processes.
Reference
---------
Luiz A. Baccala and Koichi Sameshima. Partial directed coherence:
a new concept in neural structure determination.