Skip to content

Instantly share code, notes, and snippets.

View dcyoung's full-sized avatar

David Young dcyoung

View GitHub Profile
@dcyoung
dcyoung / EM_random_initializations.matlab
Created April 3, 2018 20:39
For the random initializations used for the exploration of variation.
% Initialize pi vector and mu matrix to random values (unitized)
% Vector of probabilities for segments... 1 value for each segment.
% Best to think of it like this...
% When the image was generated, color was determined for each pixel by selecting
% a value from one of "n" normal distributions. Each value in this vector
% corresponds to the probability that a given normal distribution was chosen.
pi = rand(nSegments, 1); %repmat(1/nSegments, nSegments, 1);
pi = pi./sum(pi);
@dcyoung
dcyoung / EM_image_segmentation.matlab
Created April 3, 2018 20:38
Image Segmentation Using Expectation Maximization (EM) Algorithm
% Clear the workspace
clear all; clc;
% Set the workspace
cd '/directoryWithImageNamesGoHere'
% Potential image names
imgNames = {'balloons', 'mountains', 'nature', 'ocean', 'polarlights'};
segmentCounts = [10,20,50];
@dcyoung
dcyoung / parallel_vector_addition_cuda_gpu.c
Created April 3, 2018 20:26
Weighted Vector Addition on Nvidia Cuda Framework
// kernel for weighted vector addition on GPU
__global__ void weightedVecAddKernel(float* out, float* A, float* B, int len, float weight_a, float weight_b) {
int thisThreadIndex = blockIdx.x*blockDim.x + threadIdx.x;
if (thisThreadIndex < len) {
out[thisThreadIndex] = A[thisThreadIndex] * weight_a + B[thisThreadIndex] * weight_b;
}
}
// compute weighted vector addition on GPU: out = weight_a*A + weight_b*B
void weightedVecAdd(float* out, float* A, float* B, int len, float weight_a, float weight_b) {
// figures out how to fit computation to the "geometry"
// compute weighted vector addition on CPU: out = weight_a*A + weight_b*B
void cpuWeightedVectorAdd(float* out, float* A, float* B, int len, float weight_a, float weight_b) {
for (int i = 0; i < len; i++) {
out[i] = A[i] * weight_a + B[i] * weight_b;
}
}
// naive CPU reduction
float reductionCPU(float* A, int len) {
float result = 0.0;
for (int i = 0; i < len ; i++) {
result += A[i];
}
return result;
}
@dcyoung
dcyoung / parallel_reduction_cuda_gpu.c
Last active December 3, 2021 10:58
Parallel Reduction: Interleaved Addressing with Cuda Framework
// kernel for reduction on GPU
__global__ void reductionKernel(float* A, int len, int level) {
int thisThreadIndex = blockIdx.x*blockDim.x + threadIdx.x;
thisThreadIndex = thisThreadIndex * 2 * level;
if (thisThreadIndex < len) {
A[thisThreadIndex] = A[thisThreadIndex] + A[thisThreadIndex + level];
}
}
// Compute reduction of elements in A
@dcyoung
dcyoung / solver.c
Created April 3, 2018 20:17
Optimizing a Rudimentary Eigen Value Solver with Intel x86 SSE Intrinsics
#include <string.h>
#include <stdio.h>
#include <math.h>
#include "benchmark.h"
#include <nmmintrin.h>
#include <smmintrin.h>
#include <omp.h>
/** Computes the dot product of 2 vectors*/
float dotp(float* u, float* A, size_t n);
@dcyoung
dcyoung / unique_identifier.js
Created September 8, 2017 23:04
Simple javascript method to produce a GUID - Globally Unique Identifier
// Creates a GUID - Globaly Unique Identifier
let createGUID = () => {
function s4() {
return Math.floor((1 + Math.random()) * 0x10000).toString(16).substring(1);
}
return s4() + s4() + '-' + s4() + '-' + s4() + '-' +
s4() + '-' + s4() + s4() + s4();
};
@dcyoung
dcyoung / simple_timer.cpp
Last active September 8, 2017 22:54
A simple C++ timer class useful when testing etc.
#include <chrono>
#include <iostream>
class Timer {
public:
Timer() : beg_(clock_::now()) {}
void reset() { beg_ = clock_::now(); }
double elapsed() const { return std::chrono::duration_cast<second_>(clock_::now() - beg_).count(); }
private: