Skip to content

Instantly share code, notes, and snippets.

View RealNeGate's full-sized avatar

Yasser Arguelles Snape RealNeGate

  • Washington, USA
View GitHub Profile

For simplicity, this will just go over the unsigned case
Written for someone, such as myself, who isn't as quick to put together the same logical steps

Given $y \in \mathbb{Z}$ where $y > 0$
Find $a,sh \in \mathbb{Z}$ such that $\forall x \in \mathbb{Z} \ (\ \lfloor \frac{x}{y} \rfloor = \lfloor x \cdot a / 2^{sh} \rfloor)$
To motivate what we'll be doing, lets start by considering:

$$ \begin{align*}

Code generation

ABI Switching

Optimizers

Ramblings about code optimization.

TODO

@mmozeiko
mmozeiko / !README.md
Last active July 8, 2024 18:25
Download MSVC compiler/linker & Windows SDK without installing full Visual Studio

This downloads standalone MSVC compiler, linker & other tools, also headers/libraries from Windows SDK into portable folder, without installing Visual Studio. Has bare minimum components - no UWP/Store/WindowsRT stuff, just files & tools for native desktop app development.

Run py.exe portable-msvc.py and it will download output into msvc folder. By default it will download latest available MSVC & Windows SDK - currently v14.40.33807 and v10.0.26100.0.

You can list available versions with py.exe portable-msvc.py --show-versions and then pass versions you want with --msvc-version and --sdk-version arguments.

To use cl.exe/link.exe first run setup_TARGET.bat - after that PATH/INCLUDE/LIB env variables will be updated to use all the tools as usual. You can also use clang-cl.exe with these includes & libraries.

To use clang-cl.exe without running setup.bat, pass extra /winsysroot msvc argument (msvc is folder name where output is stored).

// I tried not doing anything too non-portable so it should be possible to run
// this on Mac or Linux... probably... even then, you can't use the obj files there
//
// once you have the obj file you should be able to do:
// link YOUROBJ.obj /defaultlib:libcmt
// ^^^^^^^^^^^^^^^^^^
// linking against crt
#define _CRT_SECURE_NO_WARNINGS
#include <stdint.h>
#include <stdlib.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <x86intrin.h>
static const char keywords[][16] = {
"auto",
"break",
"case",
// Example: Opcode dispatch in a bytecode VM. Assume the opcode case dispatching is mispredict heavy,
// and that pc, ins, next_ins, next_opcase are always in registers.
#define a ((ins >> 8) & 0xFF)
#define b ((ins >> 16) & 0xFF)
#define c ((ins >> 24) & 0xFF)
// Version 1: Synchronous instruction fetch and opcode dispatch. The big bottleneck is that given how light
// the essential work is for each opcode case (e.g. something like ADD is typical), you're dominated
// by the cost of the opcode dispatch branch mispredicts. When there's a mispredict, the pipeline restarts
@mmozeiko
mmozeiko / win32_d3d11.c
Last active May 17, 2024 08:42
setting up and using D3D11 in C
// example how to set up D3D11 rendering on Windows in C
#define COBJMACROS
#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include <d3d11.h>
#include <dxgi1_3.h>
#include <d3dcompiler.h>
#include <dxgidebug.h>
// Heavily based on ideas from https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_opt_fold.c
// The most fundamental deviation is that I eschew the big hash table and the lj_opt_fold()
// trampoline for direct tail calls. The biggest problem with a trampoline is that you lose
// the control flow context. Another problem is that there's too much short-term round-tripping
// of data through memory. It's also easier to do ad-hoc sharing between rules with my approach.
// From what I can tell, it also isn't possible to do general reassociation with LJ's fold engine
// since that requires non-tail recursion, so LJ does cases like (x + n1) + n2 => x + (n1 + n2)
// but not (x + n1) + (y + n2) => x + (y + (n1 + n2)) which is common in address generation. The
// code below has some not-so-obvious micro-optimizations for register passing and calling conventions,
// e.g. the unary_cse/binary_cse parameter order, the use of long fields in ValueRef.
@mmozeiko
mmozeiko / astar.h
Last active December 29, 2023 10:18
generic A* in C
// generic A* pathfinding
//
// INTERFACE
//
// mandatory macros
#ifndef ASTAR_POS_TYPE
#error ASTAR_POS_TYPE should specify position type
@bnnm
bnnm / lz4.c
Created March 7, 2020 00:14
LZ4 from XNB decompressor
// Decompresses LZ4 found in XNB (just a test tool for vgmstream).
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
/* Decompresses LZ4 from MonoGame. The original C lib has a lot of modes and configs, but
* MonoGame only uses the core 'block' part, which is a fairly simple LZ77 (has one command
* to copy literal and window values, with variable copy lengths).