Skip to content

Instantly share code, notes, and snippets.

View ashvardanian's full-sized avatar
💭
I love small solutions for big problems

Ash Vardanian ashvardanian

💭
I love small solutions for big problems
View GitHub Profile
@animetosho
animetosho / gf2p8affineqb-articles.md
Last active February 2, 2024 11:53
A list of articles documenting uses of the GF2P8AFFINE instruction

Unexpected Uses for the Galois Field Affine Transformation Instruction

Intel added the Galois Field instruction set (GFNI) extensions to their Sunny Cove and Tremont cores. What’s particularly interesting is that GFNI is the only new SIMD extension that came with SSE and VEX/AVX encodings (in addition to EVEX/AVX512), to allow it to be supported on all future Intel cores, including those which don’t support AVX512 (such as the Atom line, as well as Celeron/Pentium branded “big” cores).

I suspect GFNI was aimed at accelerating SM4 encryption, however, one of the instructions can be used for many other purposes. The extension includes three instructions, but of particular interest here is the Affine Transformation (GF2P8AFFINEQB), aka bit-matrix multiply, instruction.

There have been various articles which discuss out-of-band

@resilar
resilar / ctz.c
Created June 15, 2016 13:59
de Bruijn CTZ with proper handling of 0
// you faggots probably know the de bruijn trick to count trailing zeros:
inline int ctz32_retarded(uint32_t x)
{
static const unsigned char debruijn_ctz32[32] = {
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
return debruijn_ctz32[((x & -x) * 0x077CB531) >> 27];
}
@iandees
iandees / dlib_plus_osm.md
Last active May 30, 2018 19:07
Detecting Road Signs in Mapillary Images with dlib C++

image

I've been interested in computer vision for a long time, but I haven't had any free time to make any progress until this holiday season. Over Christmas and the New Years I experimented with various methodologies in OpenCV to detect road signs and other objects of interest to OpenStreetMap. After some failed experiments with thresholding and feature detection, the excellent /r/computervision suggested using the dlib C++ module because it has more consistently-good documentation and the pre-built tools are faster.

After a day or two figuring out how to compile the examples, I finally made some progress:

Compiling dlib C++ on a Mac with Homebrew

  1. Clone dlib from Github to your local machine:
@markusl
markusl / IpToCountrySlow.cpp
Created July 16, 2011 15:32
C++ class to map IP addresses to countries using database from http://software77.net/geo-ip/
#include <string>
#include <fstream>
#include <vector>
#include <sstream>
#include <algorithm>
#include <stdexcept>
std::vector<std::string> &split(const std::string &s, char delim, std::vector<std::string> &elems) {
std::stringstream ss(s);
std::string item;