Jedd Haberstro jhaberstro

## unfair-casino-hmm.js
/*
Dealer repeatedly flips a coin. Sometimes the coin is fair, with P(heads) = 0.5,
sometimes it’s loaded, with P(heads) = 0.8. Dealer occasionally switches coins,
invisibly to you. Given an list of observed coin flips, infer if the coin was
fair for each individual flip.
*/

var hmm = function(n, initial, transitionf, observef) {
  var impl = function(N) {
    if (N > 1) {

## optical_flow_horn_schunk.py
import numpy as np
from skimage import filters
from scipy.sparse import csc_matrix
from scipy.sparse.linalg import spsolve

def optical_flow_hs(t0, t1, alpha):
    h, w = t0.shape[:2]
    gradients = np.gradient(t0)
    dx, dy = gradients[1], gradients[0]
    dt = t1 - t0

## optical_flow_lucas_kanade.py
import numpy as np
from skimage import filters

def optical_flow_lk(t0, t1, sigma):
    # setup the local linear systems of equations
    gradients = np.gradient(t0)
    dx, dy = gradients[1], gradients[0]
    dt = t1 - t0
    A00 = filters.gaussian(dx * dx, sigma)
    A11 = filters.gaussian(dy * dy, sigma)

## esvi_gaussian.py
import numpy as np
from scipy.stats import norm
from math import log

N = 1000
true_loc = 10.0
true_stddev = 0.1
x_data = true_loc + (np.random.randn(N) * true_stddev)

def lognormalpdf(x,loc,scale):

## gpu_arch_resources
http://courses.cms.caltech.edu/cs179/
http://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
https://community.arm.com/graphics/b/blog
http://cdn.imgtec.com/sdk-documentation/PowerVR+Hardware.Architecture+Overview+for+Developers.pdf
http://cdn.imgtec.com/sdk-documentation/PowerVR+Series5.Architecture+Guide+for+Developers.pdf
https://www.imgtec.com/blog/a-look-at-the-powervr-graphics-architecture-tile-based-rendering/
https://www.imgtec.com/blog/the-dr-in-tbdr-deferred-rendering-in-rogue/
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-412605
https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
https://community.arm.com/graphics/b/documents/posts/moving-mobile-graphics#siggraph2015

## hkt.cpp
// Example program
#include <iostream>
#include <string>
#include <vector>
#include <type_traits>

//---------------------
// Maybe and MyVector, two totally unrelated classes whose only commanilty is that they are both type constructors of the same arity (e.g. 1) and order (e.g. 1).
//---------------------
template< typename T >

## compiler_optimize_atomics.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jhaberstro
                / compiler_optimize_atomics.md
            
            
              Created
              May 22, 2016 19:23
            
              
                "N4455 No Sane Compiler Would Optimize Atomics" notes
              
          
    Showcases some interesting and non-obvious optimizations that compilers can make on and around atomics. In particular, I liked this example: the following code
int x = 0;
std::atomic<int> y;
int dso() {
  x = 0;
  int z = y.load(std::memory_order_seq_cst);
  y.store(0, std::memory_order_seq_cst);
  x = 1;
 return z;


## spinlock.cpp
// spinlock.h
#include <thread>

class Mutex
{
public:
    Mutex();

    Mutex(Mutex const&) = delete;


## jaguar_optimization_notes.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                jhaberstro
                / jaguar_optimization_notes.md
            
            
              Last active
              April 10, 2016 01:13
            
              
                "Taming the Jaguar x86 Optimization at Insomniac Games" Notes
              
          
    Taming the Jaguar x86 Optimization at Insomniac Games


When the branch predictor is wrong and speculatively executes code from a branch that is not taken, that can actually pollute caches, causing much worse performance than just wasted cycles from fetch, decode, ALU.
Retiring: all instructions retire (commit) in program order and happens at a max rate of 2/cycle.

i.e. the visible side-effects of an instruction are committed in order, even if executed out of order.


L1 hit takes 3 cycle, L2 hit takes 25 cycles, i.e. L2 is ~8x slower.
Main memory ~200 cycles, i.e. around 66x slower than L1
Retire control unit (RCU) can only store 64 instructions.
L2 miss + full RCU can be a recipe for disaster:

L2 miss will not retire for 200+ cycles and frontend is (almost) always fetching 2 instructions / cycle, which means after ~32 instructions the RCU is full and so the entire pipeline must stall. CPU can no longer (out of order) execute instructions that occur after the memory op to hide that memory


## ml_links.md

      
              1 file
            
          
              2 forks
            
          
              0 comments
            
          
              1 star
            
          
                jhaberstro
                / ml_links.md
            
            
              Last active
              August 16, 2018 18:17
            
              
                Interesting Machine Learning Links
              
          
A Statistical View of Deep Learning
What My Deep Model Doesn't Know...
Bayesian Methods for Machine Learning
Deep Learning Summer School 2015
Bengio's Deep learning course notes
deeplearning.net tutorial
UFLDL Tutorial
Reading lists for new MILA students
[What's the best way to go about transitioning to a ML career? Is it even realistic for someone with my background?](https://www.reddit.com/r/MachineLearning/comments/3sknex/whats_the_bes
	/*
	Dealer repeatedly flips a coin. Sometimes the coin is fair, with P(heads) = 0.5,
	sometimes it’s loaded, with P(heads) = 0.8. Dealer occasionally switches coins,
	invisibly to you. Given an list of observed coin flips, infer if the coin was
	fair for each individual flip.
	*/

	var hmm = function(n, initial, transitionf, observef) {
	var impl = function(N) {
	if (N > 1) {
	import numpy as np
	from skimage import filters
	from scipy.sparse import csc_matrix
	from scipy.sparse.linalg import spsolve

	def optical_flow_hs(t0, t1, alpha):
	h, w = t0.shape[:2]
	gradients = np.gradient(t0)
	dx, dy = gradients[1], gradients[0]
	dt = t1 - t0
	import numpy as np
	from skimage import filters

	def optical_flow_lk(t0, t1, sigma):
	# setup the local linear systems of equations
	gradients = np.gradient(t0)
	dx, dy = gradients[1], gradients[0]
	dt = t1 - t0
	A00 = filters.gaussian(dx * dx, sigma)
	A11 = filters.gaussian(dy * dy, sigma)
	import numpy as np
	from scipy.stats import norm
	from math import log

	N = 1000
	true_loc = 10.0
	true_stddev = 0.1
	x_data = true_loc + (np.random.randn(N) * true_stddev)

	def lognormalpdf(x,loc,scale):
	http://courses.cms.caltech.edu/cs179/
	http://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
	https://community.arm.com/graphics/b/blog
	http://cdn.imgtec.com/sdk-documentation/PowerVR+Hardware.Architecture+Overview+for+Developers.pdf
	http://cdn.imgtec.com/sdk-documentation/PowerVR+Series5.Architecture+Guide+for+Developers.pdf
	https://www.imgtec.com/blog/a-look-at-the-powervr-graphics-architecture-tile-based-rendering/
	https://www.imgtec.com/blog/the-dr-in-tbdr-deferred-rendering-in-rogue/
	http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-412605
	https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
	https://community.arm.com/graphics/b/documents/posts/moving-mobile-graphics#siggraph2015
	// Example program
	#include <iostream>
	#include <string>
	#include <vector>
	#include <type_traits>

	//---------------------
	// Maybe and MyVector, two totally unrelated classes whose only commanilty is that they are both type constructors of the same arity (e.g. 1) and order (e.g. 1).
	//---------------------
	template< typename T >
	// spinlock.h
	#include <thread>

	class Mutex
	{
	public:
	Mutex();

	Mutex(Mutex const&) = delete;