Jedd Haberstro jhaberstro

## FastUniformLoadWithWaveOps.txt
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.

Simplified HLSL code looks like this:

Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;

[numthreads(8, 8, 1)]

## quick-tips-optimizing-jvm.md

      
              1 file
            
          
              46 forks
            
          
              9 comments
            
          
              523 stars
            
          
                djspiewak
                / quick-tips-optimizing-jvm.md
            
            
              Last active
              April 29, 2024 10:05
            
          
    Quick Tips for Fast Code on the JVM

I was talking to a coworker recently about general techniques that almost always form the core of any effort to write very fast, down-to-the-metal hot path code on the JVM, and they pointed out that there really isn't a particularly good place to go for this information.  It occurred to me that, really, I had more or less picked up all of it by word of mouth and experience, and there just aren't any good reference sources on the topic.  So… here's my word of mouth.
This is by no means a comprehensive gist. It's also important to understand that the techniques that I outline in here are not 100% absolute either. Performance on the JVM is an incredibly complicated subject, and while there are rules that almost always hold true, the "almost" remains very salient. Also, for many or even most applications, there will be other techniques that I'm not mentioning which will have a greater impact. JMH, Java Flight Recorder, and a good profiler are your very best friend! Mea

  
## internals.md

      
              1 file
            
          
              11 forks
            
          
              2 comments
            
          
              122 stars
            
          
                killeent
                / internals.md
            
            
              Last active
              February 14, 2023 05:15
            
          
    A Tour of PyTorch Internals (Part I)

The fundamental unit in PyTorch is the Tensor. This post will serve as an overview for how we implement Tensors in PyTorch, such that the user can interact with it from the Python shell. In particular, we want to answer four main questions:

How does PyTorch extend the Python interpreter to define a Tensor type that can be manipulated from Python code?
How does PyTorch wrap the C libraries that actually define the Tensor's properties and methods?
How does PyTorch cwrap work to generate code for Tensor methods?
How does PyTorch's build system take all of these components to compile and generate a workable application?

Extending the Python Interpreter

PyTorch defines a new package torch. In this post we will consider the ._C module. This module is known as an "extension module" - a Python module written in C. Such modules allow us to define new built-in object types (e.g. the Tensor) and to call C/C++ functions.

  
## box_pruning_notes.txt
Brief explanation what I did to get the speed-up, and the thought process behind it.

The original code went:

EnterLoop:
			movaps		xmm3, xmmword ptr [edx+ecx*2]		// Box1YZ
			cmpnltps	xmm3, xmm2
			movmskps	eax, xmm3

			cmp			eax, 0Ch

## what-i-wish-id-known-about-equity-before-joining-a-unicorn.md

      
              1 file
            
          
              243 forks
            
          
              63 comments
            
          
              1946 stars
            
          
                yossorion
                / what-i-wish-id-known-about-equity-before-joining-a-unicorn.md
            
            
              Last active
              April 7, 2024 22:55
            
              
                What I Wish I'd Known About Equity Before Joining A Unicorn
              
          
    What I Wish I'd Known About Equity Before Joining A Unicorn

Disclaimer: This piece is written anonymously. The names of a few
particular companies are mentioned, but as common examples only.
This is a short write-up on things that I wish I'd known and
considered before joining a private company (aka startup, aka unicorn
in some cases). I'm not trying to make the case that you should
never join a private company, but the power imbalance between
founder and employee is extreme, and that potential candidates would

  
## getting-started-in-scala.md

      
              1 file
            
          
              20 forks
            
          
              16 comments
            
          
              193 stars
            
          
                djspiewak
                / getting-started-in-scala.md
            
            
              Last active
              April 10, 2022 18:52
            
          
    Getting Started in Scala

This is my attempt to give Scala newcomers a quick-and-easy rundown to the prerequisite steps they need to a) try Scala, and b) get a standard project up and running on their machine.  I'm not going to talk about the language at all; there are plenty of better resources a google search away.  This is just focused on the prerequisite tooling and machine setup.  I will not be assuming you have any background in JVM languages.  So if you're coming from Python, Ruby, JavaScript, Haskell, or anywhere…  I hope to present the information you need without assuming anything.
Disclaimer  It has been over a decade since I was new to Scala, and when I was new to Scala, I was coming from a Java and Ruby background.  This has probably caused me to unknowingly make some assumptions.  Please feel free to call me out in comments/tweets!
One assumption I'm knowingly making is that you're on a Unix-like platform.  Sorry, Windows users.
Getting the JVM


## cool-game-programming-blogs.opml
<?xml version="1.0" encoding="UTF-8"?>
<opml version="1.0">
    <head>
        <title>Graphics, Games, Programming, and Physics Blogs</title>
    </head>
    <body>
        <outline text="Tech News" title="Tech News">
            <outline type="rss" text="Ars Technica" title="Ars Technica" xmlUrl="http://feeds.arstechnica.com/arstechnica/index/" htmlUrl="https://arstechnica.com"/>
            <outline type="rss" text="Polygon - Full" title="Polygon - Full" xmlUrl="http://www.polygon.com/rss/index.xml" htmlUrl="https://www.polygon.com/"/>
            <outline type="rss" text="Road to VR" title="Road to VR" xmlUrl="http://www.roadtovr.com/feed" htmlUrl="https://www.roadtovr.com"/>

## book.py
#!/usr/bin/env python

import pdfkit


chapters = [
  'intro',
  'linear_algebra',
  'prob',
  'numerical',

## CATCH_Keras_RL.md

      
              3 files
            
          
              63 forks
            
          
              27 comments
            
          
              153 stars
            
          
                EderSantana
                / CATCH_Keras_RL.md
            
            
              Last active
              October 16, 2023 08:32
            
              
                Keras plays catch - a single file Reinforcement Learning example
              
          
    Code for Keras plays catch blog post
Train

python qlearn.py
Test


Generate figures


## truffle-material.md

      
              1 file
            
          
              23 forks
            
          
              1 comment
            
          
              128 stars
            
          
                smarr
                / truffle-material.md
            
            
              Last active
              March 16, 2023 14:06
            
              
                Truffle: Languages and Material
              
          
    Introductory Material


Add Graal JIT Compilation to Your JVM Language in 5 Steps, A Tutorial
http://stefan-marr.de/2015/11/add-graal-jit-compilation-to-your-jvm-language-in-5-easy-steps-step-1/


The SimpleLanguage, an example of using Truffle with great JavaDocs. It is the officle getting-started project:
https://github.com/graalvm/simplelanguage


Truffle Tutorial, Christan Wimmer, PLDI 2016, 3h recording
https://youtu.be/FJY96_6Y3a4 Slides
	In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
	group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.

	Simplified HLSL code looks like this:

	Buffer<float4> lightDatas;
	Texture2D<uint2> lightStartCounts;
	RWTexture2D<float4> output;

	[numthreads(8, 8, 1)]
	Brief explanation what I did to get the speed-up, and the thought process behind it.

	The original code went:

	EnterLoop:
	movaps xmm3, xmmword ptr [edx+ecx*2] // Box1YZ
	cmpnltps xmm3, xmm2
	movmskps eax, xmm3

	cmp eax, 0Ch
	<?xml version="1.0" encoding="UTF-8"?>
	<opml version="1.0">
	<head>
	<title>Graphics, Games, Programming, and Physics Blogs</title>
	</head>
	<body>
	<outline text="Tech News" title="Tech News">
	<outline type="rss" text="Ars Technica" title="Ars Technica" xmlUrl="http://feeds.arstechnica.com/arstechnica/index/" htmlUrl="https://arstechnica.com"/>
	<outline type="rss" text="Polygon - Full" title="Polygon - Full" xmlUrl="http://www.polygon.com/rss/index.xml" htmlUrl="https://www.polygon.com/"/>
	<outline type="rss" text="Road to VR" title="Road to VR" xmlUrl="http://www.roadtovr.com/feed" htmlUrl="https://www.roadtovr.com"/>
	#!/usr/bin/env python

	import pdfkit


	chapters = [
	'intro',
	'linear_algebra',
	'prob',
	'numerical',