Skip to content

Instantly share code, notes, and snippets.

@jhaberstro
jhaberstro / unfair-casino-hmm.js
Created January 25, 2019 07:33
WebPPL Hidden Markov Model Inference
/*
Dealer repeatedly flips a coin. Sometimes the coin is fair, with P(heads) = 0.5,
sometimes it’s loaded, with P(heads) = 0.8. Dealer occasionally switches coins,
invisibly to you. Given an list of observed coin flips, infer if the coin was
fair for each individual flip.
*/
var hmm = function(n, initial, transitionf, observef) {
var impl = function(N) {
if (N > 1) {
@jhaberstro
jhaberstro / optical_flow_horn_schunk.py
Last active January 10, 2018 05:52
Horn-Schunck Dense Optical Flow
import numpy as np
from skimage import filters
from scipy.sparse import csc_matrix
from scipy.sparse.linalg import spsolve
def optical_flow_hs(t0, t1, alpha):
h, w = t0.shape[:2]
gradients = np.gradient(t0)
dx, dy = gradients[1], gradients[0]
dt = t1 - t0
@jhaberstro
jhaberstro / optical_flow_lucas_kanade.py
Last active January 9, 2018 04:30
Lucas-Kanade Dense Optical Flow
import numpy as np
from skimage import filters
def optical_flow_lk(t0, t1, sigma):
# setup the local linear systems of equations
gradients = np.gradient(t0)
dx, dy = gradients[1], gradients[0]
dt = t1 - t0
A00 = filters.gaussian(dx * dx, sigma)
A11 = filters.gaussian(dy * dy, sigma)
@jhaberstro
jhaberstro / esvi_gaussian.py
Created September 23, 2017 16:46
Variational inference by evolutionary optimization for a simple gaussian model
import numpy as np
from scipy.stats import norm
from math import log
N = 1000
true_loc = 10.0
true_stddev = 0.1
x_data = true_loc + (np.random.randn(N) * true_stddev)
def lognormalpdf(x,loc,scale):
@jhaberstro
jhaberstro / gpu_arch_resources
Last active October 26, 2023 00:14
GPU Architecture Learning Resources
http://courses.cms.caltech.edu/cs179/
http://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
https://community.arm.com/graphics/b/blog
http://cdn.imgtec.com/sdk-documentation/PowerVR+Hardware.Architecture+Overview+for+Developers.pdf
http://cdn.imgtec.com/sdk-documentation/PowerVR+Series5.Architecture+Guide+for+Developers.pdf
https://www.imgtec.com/blog/a-look-at-the-powervr-graphics-architecture-tile-based-rendering/
https://www.imgtec.com/blog/the-dr-in-tbdr-deferred-rendering-in-rogue/
http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-412605
https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
https://community.arm.com/graphics/b/documents/posts/moving-mobile-graphics#siggraph2015
@jhaberstro
jhaberstro / hkt.cpp
Last active September 25, 2020 12:28
Functor, Maybe, and Higher-Kinded Types in C++
// Example program
#include <iostream>
#include <string>
#include <vector>
#include <type_traits>
//---------------------
// Maybe and MyVector, two totally unrelated classes whose only commanilty is that they are both type constructors of the same arity (e.g. 1) and order (e.g. 1).
//---------------------
template< typename T >
@jhaberstro
jhaberstro / compiler_optimize_atomics.md
Created May 22, 2016 19:23
"N4455 No Sane Compiler Would Optimize Atomics" notes

Showcases some interesting and non-obvious optimizations that compilers can make on and around atomics. In particular, I liked this example: the following code

int x = 0;
std::atomic<int> y;
int dso() {
  x = 0;
  int z = y.load(std::memory_order_seq_cst);
  y.store(0, std::memory_order_seq_cst);
  x = 1;
 return z;
@jhaberstro
jhaberstro / spinlock.cpp
Created April 15, 2016 08:34
C++11 recursive spinlock
// spinlock.h
#include <thread>
class Mutex
{
public:
Mutex();
Mutex(Mutex const&) = delete;
@jhaberstro
jhaberstro / jaguar_optimization_notes.md
Last active April 10, 2016 01:13
"Taming the Jaguar x86 Optimization at Insomniac Games" Notes

Taming the Jaguar x86 Optimization at Insomniac Games

  • When the branch predictor is wrong and speculatively executes code from a branch that is not taken, that can actually pollute caches, causing much worse performance than just wasted cycles from fetch, decode, ALU.
  • Retiring: all instructions retire (commit) in program order and happens at a max rate of 2/cycle.
    • i.e. the visible side-effects of an instruction are committed in order, even if executed out of order.
  • L1 hit takes 3 cycle, L2 hit takes 25 cycles, i.e. L2 is ~8x slower.
  • Main memory ~200 cycles, i.e. around 66x slower than L1
  • Retire control unit (RCU) can only store 64 instructions.
  • L2 miss + full RCU can be a recipe for disaster:
    • L2 miss will not retire for 200+ cycles and frontend is (almost) always fetching 2 instructions / cycle, which means after ~32 instructions the RCU is full and so the entire pipeline must stall. CPU can no longer (out of order) execute instructions that occur after the memory op to hide that memory