- 2011 - A trip through the Graphics Pipeline 2011
- 2013 - Performance Optimization Guidelines and the GPU Architecture behind them
- 2015 - Life of a triangle - NVIDIA's logical pipeline
- 2015 - Render Hell 2.0
- 2016 - How bad are small triangles on GPU and why?
- 2017 - GPU Performance for Game Artists
- 2019 - Understanding the anatomy of GPUs using Pokémon
why doesn't radfft support AVX on PC?
So there's two separate issues here: using instructions added in AVX and using 256-bit wide vectors. The former turns out to be much easier than the latter for our use case.
Problem number 1 was that you positively need to put AVX code in a separate file with different compiler settings (/arch:AVX for VC++, -mavx for GCC/Clang) that make all SSE code emitted also use VEX encoding, and at the time radfft was written there was no way in CDep to set compiler flags for just one file, just for the overall build.
[There's the GCC "target" annotations on individual funcs, which in principle fix this, but I ran into nasty problems with this for several compiler versions, and VC++ has no equivalent, so we're not currently using that and just sticking with different compilation units.]
The other issue is to do with CPU power management.
| #pragma once | |
| #include <typestring/typestring.hh> | |
| namespace rfl | |
| { | |
| // Compile-time data member description |
| // TransientFuction: A light-weight alternative to std::function [C++11] | |
| // Pass any callback - including capturing lambdas - cheaply and quickly as a | |
| // function argument | |
| // | |
| // Based on: | |
| // https://deplinenoise.wordpress.com/2014/02/23/using-c11-capturing-lambdas-w-vanilla-c-api-functions/ | |
| // | |
| // - No instantiation of called function at each call site | |
| // - Simple to use - use TransientFunction<> as the function argument | |
| // - Low cost: cheap setup, one indirect function call to invoke |
| // The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae | |
| // Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16. | |
| // See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details | |
| float4 SampleTextureCatmullRom(in Texture2D<float4> tex, in SamplerState linearSampler, in float2 uv, in float2 texSize) | |
| { | |
| // We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding | |
| // down the sample location to get the exact center of our "starting" texel. The starting texel will be at | |
| // location [1, 1] in the grid, where [0, 0] is the top left corner. | |
| float2 samplePos = uv * texSize; |
This article has been updated and is available here.
| #include <stdint.h> //for int8_t | |
| #include <string.h> //for memcmp | |
| #include <wmmintrin.h> //for intrinsics for AES-NI | |
| //compile using gcc and following arguments: -g;-O0;-Wall;-msse2;-msse;-march=native;-maes | |
| //internal stuff | |
| //macros | |
| #define DO_ENC_BLOCK(m,k) \ | |
| do{\ |
Note: this was written in April/May 2014 and the API may has definitely changed since. I have nothing to do with Tinder, nor its API, and I do not offer any support for anything you may build on top of this. Proceed with caution
I've sniffed most of the Tinder API to see how it works. You can use this to create bots (etc) very trivially. Some example python bot code is here -> https://gist.github.com/rtt/5a2e0cfa638c938cca59 (horribly quick and dirty, you've been warned!)