Skip to content

Instantly share code, notes, and snippets.

View trishume's full-sized avatar

Tristan Hume trishume

View GitHub Profile
@ocornut
ocornut / imgui_node_graph_test.cpp
Last active April 23, 2024 19:02
Node graph editor basic demo for ImGui
// Creating a node graph editor for Dear ImGui
// Quick sample, not production code!
// This is quick demo I crafted in a few hours in 2015 showcasing how to use Dear ImGui to create custom stuff,
// which ended up feeding a thread full of better experiments.
// See https://github.com/ocornut/imgui/issues/306 for details
// Fast forward to 2023, see e.g. https://github.com/ocornut/imgui/wiki/Useful-Extensions#node-editors
// Changelog
// - v0.05 (2023-03): fixed for renamed api: AddBezierCurve()->AddBezierCubic().
#pragma once
#pragma warning(push)
#pragma warning(disable:4996)
#include <cassert>
#include <memory>
#include <vector>
@fabiovila
fabiovila / mcpwm.c
Last active March 28, 2024 11:05
How sync ESP32 Timers MCPWM?
// CHANGE the clock prescale in mcpwm.c file to make possible get 30000hz of pwm frequency (15Khz in center aligned mode)
#define MCPWM_BASE_CLK (2 * APB_CLK_FREQ) //2*APB_CLK_FREQ 160Mhz
#define MCPWM_CLK_PRESCL 1 //MCPWM clock prescale Original = 15 <---------------------
#define TIMER_CLK_PRESCALE 1 //MCPWM timer prescales Original = 9 <---------------------
#define MCPWM_CLK (MCPWM_BASE_CLK/(MCPWM_CLK_PRESCL +1))
#define MCPWM_PIN_IGNORE (-1)
#define OFFSET_FOR_GPIO_IDX_1 6
#define OFFSET_FOR_GPIO_IDX_2 75
// x64 encoding
enum Reg {
RAX, RCX, RDX, RBX, RSP, RBP, RSI, RDI,
R8, R9, R10, R11, R12, R13, R14, R15,
};
enum XmmReg {
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15,
// This can grow a Robin Hood linear probing hash table near word-at-a-time memcpy speeds. If you're confused why I use 'keys'
// to describe the hash values, it's because my favorite perspective on Robin Hood (which I learned from Paul Khuong)
// is that it's just a sorted gap array which is MSB bucketed and insertion sorted per chain:
// https://pvk.ca/Blog/2019/09/29/a-couple-of-probabilistic-worst-case-bounds-for-robin-hood-linear-probing/
// The more widely known "max displacement" picture of Robin Hood hashing also has strengths since the max displacement
// can be stored very compactly. You can see a micro-optimized example of that here for small tables where the max displacement
// can fit in 4 bits: Sub-nanosecond Searches Using Vector Instructions, https://www.youtube.com/watch?v=paxIkKBzqBU
void grow(Table *table) {
u64 exp = 64 - table->shift;
// We grow the table downward in place by a factor of 2 (not counting the overflow area at table->end).
// Length-segregated string tables for length < 16. You use a separate overflow table for length >= 16.
// By segregating like this you can pack the string data in the table itself tightly without any padding. The datapath
// is uniform and efficient for all lengths < 16 by using unaligned 16-byte SIMD loads/compares and masking off the length prefix.
// One of the benefits of packing string data tightly for each length table is that you can afford to reduce the load factor
// on shorter length tables without hurting space utilization too much. This can push hole-in-one rates into the 95% range without
// too much of a negative impact on cache utilization.
// Since get() takes a vector register as an argument with the key, you want to shape the upstream code so the string to be queried
// is naturally in a vector. For example, in an optimized identifier lexer you should already have a SIMD fast path for length < 16
// Heavily based on ideas from https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_opt_fold.c
// The most fundamental deviation is that I eschew the big hash table and the lj_opt_fold()
// trampoline for direct tail calls. The biggest problem with a trampoline is that you lose
// the control flow context. Another problem is that there's too much short-term round-tripping
// of data through memory. It's also easier to do ad-hoc sharing between rules with my approach.
// From what I can tell, it also isn't possible to do general reassociation with LJ's fold engine
// since that requires non-tail recursion, so LJ does cases like (x + n1) + n2 => x + (n1 + n2)
// but not (x + n1) + (y + n2) => x + (y + (n1 + n2)) which is common in address generation. The
// code below has some not-so-obvious micro-optimizations for register passing and calling conventions,
// e.g. the unary_cse/binary_cse parameter order, the use of long fields in ValueRef.