Tristan Hume trishume

## imgui_node_graph_test.cpp
// Creating a node graph editor for Dear ImGui
// Quick sample, not production code!
// This is quick demo I crafted in a few hours in 2015 showcasing how to use Dear ImGui to create custom stuff,
// which ended up feeding a thread full of better experiments.
// See https://github.com/ocornut/imgui/issues/306 for details

// Fast forward to 2023, see e.g. https://github.com/ocornut/imgui/wiki/Useful-Extensions#node-editors

// Changelog
// - v0.05 (2023-03): fixed for renamed api: AddBezierCurve()->AddBezierCubic().

## Rust Optimization.md

      
              1 file
            
          
              76 forks
            
          
              11 comments
            
          
              897 stars
            
          
                jFransham
                / Rust Optimization.md
            
            
              Last active
              April 22, 2024 14:25
            
          
    Achieving warp speed with Rust

Contents:


Number one optimization tip: don't
Never optimize blindly
Don't bother optimizing one-time costs
Improve your algorithms
CPU architecture primer
Keep as much as possible in cache
Keep as much as possible in registers


## all-the-rust-blogs.md

      
              1 file
            
          
              49 forks
            
          
              0 comments
            
          
              350 stars
            
          
                brson
                / all-the-rust-blogs.md
            
            
              Last active
              April 20, 2024 11:35
            
              
                A collection of notable Rust blog posts
              
          
    Edit: This list is now maintained in the rust-anthology repo.

Introduction

Understanding Over Guesswork
An Alternative Introduction to Rust
Rust and CSV Parsing


Ownership

Where Rust Really Shines
Rust Means Never Having to Close a Socket


The Problem with Single-threaded Shared Mutability


## anonymous
#pragma once

#pragma warning(push)
#pragma warning(disable:4996)

#include <cassert>
#include <memory>
#include <vector>


## mcpwm.c
// CHANGE the clock prescale in mcpwm.c file to make possible get 30000hz of pwm frequency (15Khz in center aligned mode)

#define MCPWM_BASE_CLK (2 * APB_CLK_FREQ)   //2*APB_CLK_FREQ 160Mhz
#define MCPWM_CLK_PRESCL 1       //MCPWM clock prescale Original = 15    <---------------------
#define TIMER_CLK_PRESCALE 1      //MCPWM timer prescales Original = 9   <---------------------
#define MCPWM_CLK (MCPWM_BASE_CLK/(MCPWM_CLK_PRESCL +1))
#define MCPWM_PIN_IGNORE    (-1)
#define OFFSET_FOR_GPIO_IDX_1  6
#define OFFSET_FOR_GPIO_IDX_2 75

## asm_x64.c
// x64 encoding

enum Reg {
    RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI,
    R8,  R9,  R10, R11, R12, R13, R14, R15,
};

enum XmmReg {
    XMM0, XMM1, XMM2,  XMM3,  XMM4,  XMM5,  XMM6,  XMM7,
    XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15,

## rh_grow.c
// This can grow a Robin Hood linear probing hash table near word-at-a-time memcpy speeds. If you're confused why I use 'keys'
// to describe the hash values, it's because my favorite perspective on Robin Hood (which I learned from Paul Khuong)
// is that it's just a sorted gap array which is MSB bucketed and insertion sorted per chain:
// https://pvk.ca/Blog/2019/09/29/a-couple-of-probabilistic-worst-case-bounds-for-robin-hood-linear-probing/
// The more widely known "max displacement" picture of Robin Hood hashing also has strengths since the max displacement
// can be stored very compactly. You can see a micro-optimized example of that here for small tables where the max displacement
// can fit in 4 bits: Sub-nanosecond Searches Using Vector Instructions, https://www.youtube.com/watch?v=paxIkKBzqBU
void grow(Table *table) {
	u64 exp = 64 - table->shift;
	// We grow the table downward in place by a factor of 2 (not counting the overflow area at table->end).

## segregated_tables.c
// Length-segregated string tables for length < 16. You use a separate overflow table for length >= 16.
// By segregating like this you can pack the string data in the table itself tightly without any padding. The datapath
// is uniform and efficient for all lengths < 16 by using unaligned 16-byte SIMD loads/compares and masking off the length prefix.

// One of the benefits of packing string data tightly for each length table is that you can afford to reduce the load factor
// on shorter length tables without hurting space utilization too much. This can push hole-in-one rates into the 95% range without
// too much of a negative impact on cache utilization.

// Since get() takes a vector register as an argument with the key, you want to shape the upstream code so the string to be queried
// is naturally in a vector. For example, in an optimized identifier lexer you should already have a SIMD fast path for length < 16

## perfold.c
// Heavily based on ideas from https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_opt_fold.c
// The most fundamental deviation is that I eschew the big hash table and the lj_opt_fold()
// trampoline for direct tail calls. The biggest problem with a trampoline is that you lose
// the control flow context. Another problem is that there's too much short-term round-tripping
// of data through memory. It's also easier to do ad-hoc sharing between rules with my approach.
// From what I can tell, it also isn't possible to do general reassociation with LJ's fold engine
// since that requires non-tail recursion, so LJ does cases like (x + n1) + n2 => x + (n1 + n2)
// but not (x + n1) + (y + n2) => x + (y + (n1 + n2)) which is common in address generation. The
// code below has some not-so-obvious micro-optimizations for register passing and calling conventions,
// e.g. the unary_cse/binary_cse parameter order, the use of long fields in ValueRef.

## bluenoise.md

      
              1 file
            
          
              10 forks
            
          
              1 comment
            
          
              107 stars
            
          
                pixelmager
                / bluenoise.md
            
            
              Last active
              October 11, 2023 07:05
            
              
                Blue Noise links
              
          
    Use cases


Bluenoise in the game INSIDE (dithering, raymarching, reflections)
Dithering, Ray marching, shadows etc
A Survery of Blue Noise and Its Applications

Textures/Matrices for direct use (data!)


Moments In Graphics (void-and-cluster)

2D
3D and 4D


Bart Wronski Implementation of Solid Angle algorithm
	// Creating a node graph editor for Dear ImGui
	// Quick sample, not production code!
	// This is quick demo I crafted in a few hours in 2015 showcasing how to use Dear ImGui to create custom stuff,
	// which ended up feeding a thread full of better experiments.
	// See https://github.com/ocornut/imgui/issues/306 for details

	// Fast forward to 2023, see e.g. https://github.com/ocornut/imgui/wiki/Useful-Extensions#node-editors

	// Changelog
	// - v0.05 (2023-03): fixed for renamed api: AddBezierCurve()->AddBezierCubic().
	#pragma once

	#pragma warning(push)
	#pragma warning(disable:4996)

	#include <cassert>
	#include <memory>
	#include <vector>
	// CHANGE the clock prescale in mcpwm.c file to make possible get 30000hz of pwm frequency (15Khz in center aligned mode)

	#define MCPWM_BASE_CLK (2 * APB_CLK_FREQ) //2*APB_CLK_FREQ 160Mhz
	#define MCPWM_CLK_PRESCL 1 //MCPWM clock prescale Original = 15 <---------------------
	#define TIMER_CLK_PRESCALE 1 //MCPWM timer prescales Original = 9 <---------------------
	#define MCPWM_CLK (MCPWM_BASE_CLK/(MCPWM_CLK_PRESCL +1))
	#define MCPWM_PIN_IGNORE (-1)
	#define OFFSET_FOR_GPIO_IDX_1 6
	#define OFFSET_FOR_GPIO_IDX_2 75
	// x64 encoding

	enum Reg {
	RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI,
	R8, R9, R10, R11, R12, R13, R14, R15,
	};

	enum XmmReg {
	XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
	XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15,
	// This can grow a Robin Hood linear probing hash table near word-at-a-time memcpy speeds. If you're confused why I use 'keys'
	// to describe the hash values, it's because my favorite perspective on Robin Hood (which I learned from Paul Khuong)
	// is that it's just a sorted gap array which is MSB bucketed and insertion sorted per chain:
	// https://pvk.ca/Blog/2019/09/29/a-couple-of-probabilistic-worst-case-bounds-for-robin-hood-linear-probing/
	// The more widely known "max displacement" picture of Robin Hood hashing also has strengths since the max displacement
	// can be stored very compactly. You can see a micro-optimized example of that here for small tables where the max displacement
	// can fit in 4 bits: Sub-nanosecond Searches Using Vector Instructions, https://www.youtube.com/watch?v=paxIkKBzqBU
	void grow(Table *table) {
	u64 exp = 64 - table->shift;
	// We grow the table downward in place by a factor of 2 (not counting the overflow area at table->end).
	// Length-segregated string tables for length < 16. You use a separate overflow table for length >= 16.
	// By segregating like this you can pack the string data in the table itself tightly without any padding. The datapath
	// is uniform and efficient for all lengths < 16 by using unaligned 16-byte SIMD loads/compares and masking off the length prefix.

	// One of the benefits of packing string data tightly for each length table is that you can afford to reduce the load factor
	// on shorter length tables without hurting space utilization too much. This can push hole-in-one rates into the 95% range without
	// too much of a negative impact on cache utilization.

	// Since get() takes a vector register as an argument with the key, you want to shape the upstream code so the string to be queried
	// is naturally in a vector. For example, in an optimized identifier lexer you should already have a SIMD fast path for length < 16
	// Heavily based on ideas from https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_opt_fold.c
	// The most fundamental deviation is that I eschew the big hash table and the lj_opt_fold()
	// trampoline for direct tail calls. The biggest problem with a trampoline is that you lose
	// the control flow context. Another problem is that there's too much short-term round-tripping
	// of data through memory. It's also easier to do ad-hoc sharing between rules with my approach.
	// From what I can tell, it also isn't possible to do general reassociation with LJ's fold engine
	// since that requires non-tail recursion, so LJ does cases like (x + n1) + n2 => x + (n1 + n2)
	// but not (x + n1) + (y + n2) => x + (y + (n1 + n2)) which is common in address generation. The
	// code below has some not-so-obvious micro-optimizations for register passing and calling conventions,
	// e.g. the unary_cse/binary_cse parameter order, the use of long fields in ValueRef.