Skip to content

Instantly share code, notes, and snippets.

@sebbbi
sebbbi / FastUniformLoadWithWaveOps.txt
Last active February 15, 2024 08:41
Fast uniform load with wave ops (up to 64x speedup)
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.
Simplified HLSL code looks like this:
Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;
[numthreads(8, 8, 1)]

Quick Tips for Fast Code on the JVM

I was talking to a coworker recently about general techniques that almost always form the core of any effort to write very fast, down-to-the-metal hot path code on the JVM, and they pointed out that there really isn't a particularly good place to go for this information. It occurred to me that, really, I had more or less picked up all of it by word of mouth and experience, and there just aren't any good reference sources on the topic. So… here's my word of mouth.

This is by no means a comprehensive gist. It's also important to understand that the techniques that I outline in here are not 100% absolute either. Performance on the JVM is an incredibly complicated subject, and while there are rules that almost always hold true, the "almost" remains very salient. Also, for many or even most applications, there will be other techniques that I'm not mentioning which will have a greater impact. JMH, Java Flight Recorder, and a good profiler are your very best friend! Mea

A Tour of PyTorch Internals (Part I)

The fundamental unit in PyTorch is the Tensor. This post will serve as an overview for how we implement Tensors in PyTorch, such that the user can interact with it from the Python shell. In particular, we want to answer four main questions:

  1. How does PyTorch extend the Python interpreter to define a Tensor type that can be manipulated from Python code?
  2. How does PyTorch wrap the C libraries that actually define the Tensor's properties and methods?
  3. How does PyTorch cwrap work to generate code for Tensor methods?
  4. How does PyTorch's build system take all of these components to compile and generate a workable application?

Extending the Python Interpreter

PyTorch defines a new package torch. In this post we will consider the ._C module. This module is known as an "extension module" - a Python module written in C. Such modules allow us to define new built-in object types (e.g. the Tensor) and to call C/C++ functions.

@rygorous
rygorous / box_pruning_notes.txt
Created February 17, 2017 00:41
Note on changes to the box pruning code.
Brief explanation what I did to get the speed-up, and the thought process behind it.
The original code went:
EnterLoop:
movaps xmm3, xmmword ptr [edx+ecx*2] // Box1YZ
cmpnltps xmm3, xmm2
movmskps eax, xmm3
cmp eax, 0Ch
@yossorion
yossorion / what-i-wish-id-known-about-equity-before-joining-a-unicorn.md
Last active April 7, 2024 22:55
What I Wish I'd Known About Equity Before Joining A Unicorn

What I Wish I'd Known About Equity Before Joining A Unicorn

Disclaimer: This piece is written anonymously. The names of a few particular companies are mentioned, but as common examples only.

This is a short write-up on things that I wish I'd known and considered before joining a private company (aka startup, aka unicorn in some cases). I'm not trying to make the case that you should never join a private company, but the power imbalance between founder and employee is extreme, and that potential candidates would

Getting Started in Scala

This is my attempt to give Scala newcomers a quick-and-easy rundown to the prerequisite steps they need to a) try Scala, and b) get a standard project up and running on their machine. I'm not going to talk about the language at all; there are plenty of better resources a google search away. This is just focused on the prerequisite tooling and machine setup. I will not be assuming you have any background in JVM languages. So if you're coming from Python, Ruby, JavaScript, Haskell, or anywhere…  I hope to present the information you need without assuming anything.

Disclaimer It has been over a decade since I was new to Scala, and when I was new to Scala, I was coming from a Java and Ruby background. This has probably caused me to unknowingly make some assumptions. Please feel free to call me out in comments/tweets!

One assumption I'm knowingly making is that you're on a Unix-like platform. Sorry, Windows users.

Getting the JVM

@Reedbeta
Reedbeta / cool-game-programming-blogs.opml
Last active December 29, 2023 10:02
List of cool blogs on game programming, graphics, theoretical physics, and other random stuff
<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
<head>
<title>Graphics, Games, Programming, and Physics Blogs</title>
</head>
<body>
<outline text="Tech News" title="Tech News">
<outline type="rss" text="Ars Technica" title="Ars Technica" xmlUrl="http://feeds.arstechnica.com/arstechnica/index/" htmlUrl="https://arstechnica.com"/>
<outline type="rss" text="Polygon - Full" title="Polygon - Full" xmlUrl="http://www.polygon.com/rss/index.xml" htmlUrl="https://www.polygon.com/"/>
<outline type="rss" text="Road to VR" title="Road to VR" xmlUrl="http://www.roadtovr.com/feed" htmlUrl="https://www.roadtovr.com"/>
#!/usr/bin/env python
import pdfkit
chapters = [
'intro',
'linear_algebra',
'prob',
'numerical',
@EderSantana
EderSantana / CATCH_Keras_RL.md
Last active October 16, 2023 08:32
Keras plays catch - a single file Reinforcement Learning example
@smarr
smarr / truffle-material.md
Last active March 16, 2023 14:06
Truffle: Languages and Material