Skip to content

Instantly share code, notes, and snippets.

View freeman-lab's full-sized avatar

Jeremy Freeman freeman-lab

View GitHub Profile
@freeman-lab
freeman-lab / StreamingKMeans.scala
Last active February 26, 2019 07:13
Spark Streaming + MLLib integration examples
package thunder.streaming
import org.apache.spark.{SparkConf, Logging}
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext._
import org.apache.spark.streaming._
import org.apache.spark.streaming.dstream.DStream
import org.apache.spark.mllib.clustering.KMeansModel
import scala.util.Random.nextDouble
@freeman-lab
freeman-lab / bisecting.scala
Last active December 29, 2015 07:45
Bisecting k-means for hierarchical clustering in Spark
/**
* bisecting <master> <input> <nNodes> <subIterations>
*
* divisive hierarchical clustering using bisecting k-means
* assumes input is a text file, each row is a data point
* given as numbers separated by spaces
*
*/
import org.apache.spark.SparkContext
@freeman-lab
freeman-lab / FixedLengthBinaryInputFormat.scala
Created August 12, 2014 23:54
Binary input with fixed record length
import java.io.{FileFilter, File}
import org.apache.hadoop.fs.Path
import org.apache.hadoop.io.{LongWritable, BytesWritable}
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat
import org.apache.hadoop.mapreduce.{JobContext, InputSplit, RecordReader, TaskAttemptContext}
/**
* Custom Input Format for reading and splitting flat binary files that contain records, each of which
* are a fixed size in bytes. The fixed record size is specified through a parameter recordLength
* in the Hadoop configuration.
var THREE = require('three.js');
var _ = require('lodash');
var ParticleTest = function(selector, data, images, opts) {
var width = $(selector).width();
var height = width * 0.7;
@freeman-lab
freeman-lab / channel-parallel.ipynb
Created May 24, 2015 21:30
Multi-channel time series parallelization in Thunder
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@freeman-lab
freeman-lab / sklearn-mllib-local.ipynb
Created July 30, 2015 17:47
Comparing sklearn & mllib locally
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@freeman-lab
freeman-lab / lightning-for-loop.ipynb
Created September 5, 2015 04:25
Example using a for loop with Lightning in the notebook
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@freeman-lab
freeman-lab / notes.md
Last active October 4, 2015 21:15
chrome @ home

Notes on compiling chromium using an EC2 cluster

This note lays out the machine specifications and configurations used to get a distributed Chome build working on cloud compute (in this case AWS EC2) using the icecc tool.

These links were a good starting point:

but I couldn't fine a full walk through, so hopefully we can make it easier for others.

@freeman-lab
freeman-lab / convert.py
Created November 5, 2015 13:37
Hack to split images along black gaps
from numpy import array
from thunder import Images, ThunderContext
tsc = ThunderContext.start()
rawpath = ''
savepath = ''
data = tsc.loadImages(rawpath, inputFormat='tif', nplanes=1)
@freeman-lab
freeman-lab / loading-sample-data.ipynb
Created November 17, 2015 21:52
Loading sample neuro data
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.