Skip to content

Instantly share code, notes, and snippets.

View mmozeiko's full-sized avatar

Mārtiņš Možeiko mmozeiko

View GitHub Profile
@zingaburga
zingaburga / sve2.md
Last active April 10, 2024 06:21
ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

Scalable Vector Extensions (SVE) is ARM’s latest SIMD extension to their instruction set, which was announced back in 2016. A follow-up SVE2 extension was announced in 2019, designed to incorporate all functionality from ARM’s current primary SIMD extension, NEON (aka ASIMD).

Despite being announced 5 years ago, there is currently no generally available CPU which supports any form of SVE (which excludes the [Fugaku supercomputer](https://www.fujitsu.com/global/about/innovation/

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#if defined(__x86_64__)
#define BREAK asm("int3")
#else
#error Implement macros for your CPU.
#endif
@christianparpart
christianparpart / terminal-synchronized-output.md
Last active April 5, 2024 23:07
Terminal Spec: Synchronized Output

Synchronized Output

Synchronized output is merely implementing the feature as inspired by iTerm2 synchronized output, except that it's not using the rare DCS but rather the well known SM ? and RM ?. iTerm2 has now also adopted to use the new syntax instead of using DCS.

Semantics

When rendering the screen of the terminal, the Emulator usually iterates through each visible grid cell and renders its current state. With applications updating the screen a at higher frequency this can cause tearing.

This mode attempts to mitigate that.

@dondragmer
dondragmer / PrefixSort.compute
Created January 20, 2021 23:32
An optimized GPU counting sort
#pragma use_dxc //enable SM 6.0 features, in Unity this is only supported on version 2020.2.0a8 or later with D3D12 enabled
#pragma kernel CountTotalsInBlock
#pragma kernel BlockCountPostfixSum
#pragma kernel CalculateOffsetsForEachKey
#pragma kernel FinalSort
uint _FirstBitToSort;
int _NumElements;
int _NumBlocks;
bool _ShouldSortPayload;
@animetosho
animetosho / gf2p8affineqb-articles.md
Last active February 2, 2024 11:53
A list of articles documenting uses of the GF2P8AFFINE instruction

Unexpected Uses for the Galois Field Affine Transformation Instruction

Intel added the Galois Field instruction set (GFNI) extensions to their Sunny Cove and Tremont cores. What’s particularly interesting is that GFNI is the only new SIMD extension that came with SSE and VEX/AVX encodings (in addition to EVEX/AVX512), to allow it to be supported on all future Intel cores, including those which don’t support AVX512 (such as the Atom line, as well as Celeron/Pentium branded “big” cores).

I suspect GFNI was aimed at accelerating SM4 encryption, however, one of the instructions can be used for many other purposes. The extension includes three instructions, but of particular interest here is the Affine Transformation (GF2P8AFFINEQB), aka bit-matrix multiply, instruction.

There have been various articles which discuss out-of-band

Perfect Quantization of DXT endpoints
-------------------------------------
One of the issues that affect the quality of most DXT compressors is the way floating point colors are rounded.
For example, stb_dxt does:
max16 = (unsigned short)(stb__sclamp((At1_r*yy - At2_r*xy)*frb+0.5f,0,31) << 11);
max16 |= (unsigned short)(stb__sclamp((At1_g*yy - At2_g*xy)*fg +0.5f,0,63) << 5);
max16 |= (unsigned short)(stb__sclamp((At1_b*yy - At2_b*xy)*frb+0.5f,0,31) << 0);
@JarkkoPFC
JarkkoPFC / sphere_screen_extents.h
Last active June 13, 2023 21:27
Calculates view space 3D sphere extents on the screen
struct vec3f {float x, y, z;};
struct vec4f {float x, y, z, w;};
struct mat44f {vec4f x, y, z, w;};
//============================================================================
// sphere_screen_extents
//============================================================================
// Calculates the exact screen extents xyzw=[left, bottom, right, top] in
// normalized screen coordinates [-1, 1] for a sphere in view space. For
// performance, the projection matrix (v2p) is assumed to be setup so that
@vurtun
vurtun / x11_clipboard.c
Last active June 13, 2023 21:28
X11 clipboard
static char*
sys_clip_get(struct sys *s, Atom selection, Atom target)
{
assert(s);
struct sys_x11 *x11 = s->platform;
/* blocking wait for clipboard data */
XEvent notify;
XConvertSelection(x11->dpy, selection, target, selection, x11->helper, CurrentTime);
while (!XCheckTypedWindowEvent(x11->dpy, x11->helper, SelectionNotify, &notify)) {
#include <stdio.h>
#include <errno.h>
#include <sys/types.h>
#include <unistd.h>
#include <string.h>
#include <assert.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/ioctl.h>
@vurtun
vurtun / _readme_quarks.md
Last active December 9, 2023 12:03
Quarks: Graphical user interface

gui