Skip to content

Instantly share code, notes, and snippets.

View vangheem's full-sized avatar

Nathan Van Gheem vangheem

View GitHub Profile
I am the owner of And I'm glad to hear it's helpful. In truth, it's just a fancy DNS trick. and all of it's sub-domains just point back to your computer ( That means running ssl is as simple (or difficult) as running ssl on your computer.
I'm not sure how comfortable you are with the command line, but here's my how I setup my development environment. (rvm, passenger, nginx w/ SSL, etc).
# Install rvm (no sudo!)
# ------------------------------------------------------
bash < <( curl )
source ~/.rvm/scripts/rvm
rvm install ree-1.8.7-2010.02
nicowilliams /
Last active May 18, 2024 14:10
fork() is evil; vfork() is goodness; afork() would be better; clone() is stupid

I recently happened upon a very interesting implementation of popen() (different API, same idea) called popen-noshell using clone(2), and so I opened an issue requesting use of vfork(2) or posix_spawn() for portability. It turns out that on Linux there's an important advantage to using clone(2). I think I should capture the things I wrote there in a better place. A gist, a blog, whatever.

This is not a paper. I assume reader familiarity with fork() in particular and Unix in general, though, of course, I link to relevant wiki pages, so if the unfamiliar reader is willing to go down the rabbit hole, they should be able to come ou

pfreixes /
Created January 26, 2018 20:35
Testing how slow or fast is fetch attributes and call external functions
from timeit import timeit
class Object:
def __init__(self, x):
self.x = x
class Rule:
def __init__(self, valid_values, valid_values2):
self.valid_values = valid_values
self.valid_values2 = valid_values2

Executing code on a GPU vs CPU

I have had working knowledge of GPUs for a very long time now. I remember playing around with NVIDIA RIVA 128 or something similar with DirectX when they were still 3D graphics accelerators. I have also tried to keep up with the times and did some basic shader programming on a contemporary NVIDIA or AMD GPU.

However, today's GPUs are necessary for another reason – the explosion of AI workloads, including large language models (LLMs). From a GPU perspective, AI workloads are just massive applications of tensor operations such as matrix addition and multiplication. However, how does the modern GPU execute them, which is much more efficient than running the workloads on a CPU?

Consider CUDA, NVIDIA's programming language that extends C to exploit data parallelism on GPUs. In CUDA, you write code for CPU (host code) or GPU (device code). CPU code is just mostly plain C, but CUDA extends the language in two ways: it allows you to define functions for GPUs (kernels) and also prov