Dietmar Suoch didito

## Volition Programmer Test 2003.txt
Volition, Inc. Programmer's Test
Created: October 12, 1999
Last Revision: Tuesday, January 7, 2003 (MWA)

Please attempt all questions on this test.  Type your answers immediately
after the questions.  If you are unable to solve a problem, typing your
thoughts as you attempt the problem is useful.

There are eleven questions on this test.  If you get stuck on one, move to the
next one.  Please be sure that you completely understand the problem

## gf2p8affineqb-articles.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              22 stars
            
          
                animetosho
                / gf2p8affineqb-articles.md
            
            
              Last active
              February 2, 2024 11:53
            
              
                A list of articles documenting uses of the GF2P8AFFINE instruction
              
          
    Unexpected Uses for the Galois Field Affine Transformation Instruction

Intel added the Galois Field instruction set (GFNI) extensions to their Sunny Cove and Tremont cores. What’s particularly interesting is that GFNI is the only new SIMD extension that came with SSE and VEX/AVX encodings (in addition to EVEX/AVX512), to allow it to be supported on all future Intel cores, including those which don’t support AVX512 (such as the Atom line, as well as Celeron/Pentium branded “big” cores).
I suspect GFNI was aimed at accelerating SM4 encryption, however, one of the instructions can be used for many other purposes. The extension includes three instructions, but of particular interest here is the Affine Transformation (GF2P8AFFINEQB), aka bit-matrix multiply, instruction.
There have been various articles which discuss out-of-band

  
## CloudsResources.md

      
              1 file
            
          
              19 forks
            
          
              1 comment
            
          
              209 stars
            
          
                pixelsnafu
                / CloudsResources.md
            
            
              Last active
              July 10, 2024 08:04
            
              
                Useful Resources for Rendering Volumetric Clouds
              
          
    Volumetric Clouds Resources List


A. Schneider, "Real-Time Volumetric Cloudscapes," in GPU Pro 7: Advanced
Rendering Techniques, 2016, pp. 97-127. (Follow up presentations here, and here.)


S. Hillaire, "Physically Based Sky, Atmosphere and Cloud Rendering in Frostbite"
in Physically Based Shading in Theory and Practice course, SIGGRAPH 2016. [video] [course notes] [scatter integral shadertoy]


[R. Högfeldt, "Convincing Cloud Rendering – An Implementation of Real-Time Dynamic Volumetric Clouds in Frostbite"](https://odr.chalmers.se/hand


## roblox-graphics-apis-2020.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              12 stars
            
          
                zeux
                / roblox-graphics-apis-2020.md
            
            
              Last active
              May 10, 2021 14:24
            
          
    State of Roblox graphics API across all platforms, with percentage deltas since EOY 2019. Updated December 27 2020.
Windows


API
Share


Direct3D 11+
89% (+4%)


Direct3D 10.1
7% (-2%)


Direct3D 10.0
3.5% (-1.5%)


Direct3D 9
0.5% (-0.5%)


## GPUOptimizationForGames.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              9 stars
            
          
                JoseEmilio-ARM
                / GPUOptimizationForGames.md
            
            
              Last active
              January 31, 2024 16:56
                — forked from silvesthu/GPUOptimizationForGameDev.md
            
              
                GPU Optimization for Games
              
          
    GPU Optimization for Games

By person (random order)


Emil Persson @Humus

Blog
<2013> Low-Level Thinking in High-Level Shading Languages
<2014> Low-Level Shader Optimization for Next-Gen and DX11
<2018> Rule of optimization


Matt Pettineo @mynameismjp


## cache-counters-rant.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              20 stars
            
          
                travisdowns
                / cache-counters-rant.md
            
            
              Created
              October 13, 2019 16:46
            
              
                Discussion of x86 L1D related cache counters
              
          
    The counters that are the easiest to understand and the best for making ratios that are internally consistent (i.e., always fall in the range 0.0 to 1.0) are the mem_load_retired events, e.g., mem_load_retired.l1_hit and mem_load_retired.l1_miss.
These count at the instruction level, i.e., the universe of retired instructions. For example, could make a reasonable hit ratio from mem_load_retired.l1_hit / mem_inst_retired.all_loads and it will be sane (never indicate a hit rate more than 100%, for example).
That one isn't perfect though, in that it may not reflect the true costs of cache misses and the behavior of the program for at least the following reasons:

It appplies only to loads and can't catch misses imposed by stores (AFAICT there is no event that counts store misses).
It only counts loads that retire - a lot of the load activity in your process may be due to loads on a speculative path that never retire. Loads on a speculative path may bring in data that is never used, causing misses and d


## GPUOptimizationForGameDev.md

      
              1 file
            
          
              96 forks
            
          
              11 comments
            
          
              1055 stars
            
          
                silvesthu
                / GPUOptimizationForGameDev.md
            
            
              Last active
              July 19, 2024 21:32
            
              
                GPU Optimization for GameDev
              
          
    GPU Optimization for GameDev

Graphics Pipeline / GPU Architecture Overview


2011 - A trip through the Graphics Pipeline 2011
2015 - Life of a triangle - NVIDIA's logical pipeline
2015 - Render Hell 2.0
2016 - How bad are small triangles on GPU and why?
2017 - GPU Performance for Game Artists
2019 - Understanding the anatomy of GPUs using Pokémon
2020 - GPU ARCHITECTURE RESOURCES


## BetterBuffers.txt
All current buffer types in shading languages are slightly different ways to present homogeneous arrays (single struct or type repeating N times in memory).

DirectX has raw buffers (RWByteAddressBuffer) but that is limited to 32 bit integer types and the implementation doesn't require natural alignment for wide loads resulting in suboptimal codegen on Nvidia GPUs.

Complex use cases, such as tree traversal in spatial data structures (physics, ray-tracing, etc) require data structure that is non-homogeneous. You want different node payloads and tight memory layout.

Ability to mix 8/16/32 bit data types and 1d/2d/4d vectors to faciliate GPU wide loads (max bandwidth) in same data structure is crucial for complex use cases like this.

On the other hand we want better more readable/maintainable code syntax than DirectX raw buffers without manual bit packing/extracting and reinterpret casting. Goal should be to allow modern GPUs to use sub-register addressing (SDWA on AMD hardware). Saving both ALU and register

## ss-fs.glsl
#version 300 es
precision highp float;

in vec2 UV;
out vec4 out_color;
uniform float ratio, time;
uniform sampler2D texture0;

const float PI_3 = 1.0471975512;

## reducing-build-times.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              11 stars
            
          
                niklas-ourmachinery
                / reducing-build-times.md
            
            
              Created
              January 24, 2019 16:30
            
              
                Reducing build times by 20 % with a one line change
              
          
    Reducing build times by 20 % with a one line change

Experimenting a bit with the /d2cgsummary  and /d1reportTime flags described by Aras here and here I noticed that one of our functions was consistently showing up in the Anomalistic Compile Times section:
1>	Anomalistic Compile Times: 2
1>		create_truth_types: 0.643 sec, 2565 instrs
1>		draw_nodes: 0.180 sec, 5348 instrs
	Volition, Inc. Programmer's Test
	Created: October 12, 1999
	Last Revision: Tuesday, January 7, 2003 (MWA)

	Please attempt all questions on this test. Type your answers immediately
	after the questions. If you are unable to solve a problem, typing your
	thoughts as you attempt the problem is useful.

	There are eleven questions on this test. If you get stuck on one, move to the
	next one. Please be sure that you completely understand the problem
API	Share
Direct3D 11+	89% (+4%)
Direct3D 10.1	7% (-2%)
Direct3D 10.0	3.5% (-1.5%)
Direct3D 9	0.5% (-0.5%)
	All current buffer types in shading languages are slightly different ways to present homogeneous arrays (single struct or type repeating N times in memory).

	DirectX has raw buffers (RWByteAddressBuffer) but that is limited to 32 bit integer types and the implementation doesn't require natural alignment for wide loads resulting in suboptimal codegen on Nvidia GPUs.

	Complex use cases, such as tree traversal in spatial data structures (physics, ray-tracing, etc) require data structure that is non-homogeneous. You want different node payloads and tight memory layout.

	Ability to mix 8/16/32 bit data types and 1d/2d/4d vectors to faciliate GPU wide loads (max bandwidth) in same data structure is crucial for complex use cases like this.

	On the other hand we want better more readable/maintainable code syntax than DirectX raw buffers without manual bit packing/extracting and reinterpret casting. Goal should be to allow modern GPUs to use sub-register addressing (SDWA on AMD hardware). Saving both ALU and register
	#version 300 es
	precision highp float;

	in vec2 UV;
	out vec4 out_color;
	uniform float ratio, time;
	uniform sampler2D texture0;

	const float PI_3 = 1.0471975512;