Skip to content

Instantly share code, notes, and snippets.


Sebastian Aaltonen sebbbi

  • Unity
  • Helsinki
View GitHub Profile
sebbbi / BDF2_integrate_HLSL.txt
Last active Mar 28, 2018
BDF2 integrator in HLSL
View BDF2_integrate_HLSL.txt
void BFD2(inout ParticleSimulationData Particle, float3 Accel)
float3 x = Particle.Position;
float3 v = Particle.Velocity;
float3 x1 = Particle.PositionPrev;
float3 v1 = Particle.VelocityPrev;
Particle.Position = (4.0/3.0) * x - (1.0/3.0) * x1 + 1.0 * ((8.0/9.0) * v - (2.0/9.0) * v1 + (4.0/9.0) * TimeStep2 * Accel);
Particle.PositionPrev = x;
sebbbi / PerfTestNewOutput.txt
Created Nov 10, 2018
Improved PerfTest output. Compare to RGBA8. 30 frame warm-up + 30 frame benchmark. No printf spam to ensure GPU bound case.
View PerfTestNewOutput.txt
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]
Adapters found:
0: Radeon (TM) RX 480 Graphics
1: Intel(R) HD Graphics 530
2: Microsoft Basic Render Driver
Using adapter 0
Running 30 warm-up frames and 30 benchmark frames:
sebbbi / PerfTestResult6700K.txt
Created Nov 10, 2018
View PerfTestResult6700K.txt
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]
Adapters found:
0: Radeon (TM) RX 480 Graphics
1: Intel(R) HD Graphics 530
2: Microsoft Basic Render Driver
Using adapter 1
Running 5 warm-up frames and 5 benchmark frames:
sebbbi / PerfTestRX480.txt
Created Nov 10, 2018
PerfTest new constant buffer and structured buffer test cases
View PerfTestRX480.txt
PerfTest results on RX480
NEW: Added constant buffer and structured buffer test cases.
Buffer<R8>.Load uniform: 0.367ms
Buffer<R8>.Load linear: 0.374ms
Buffer<R8>.Load random: 1.431ms
Buffer<RG8>.Load uniform: 1.608ms
Buffer<RG8>.Load linear: 1.624ms
Buffer<RG8>.Load random: 1.608ms
Buffer<RGBA8>.Load uniform: 1.430ms
sebbbi / FramentShaderWaveCoherency.txt
Last active Nov 28, 2018
FramentShaderWaveCoherency test shader (Vulkan 1.1)
View FramentShaderWaveCoherency.txt
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_KHR_shader_subgroup_basic : enable
#extension GL_KHR_shader_subgroup_ballot : enable
#extension GL_KHR_shader_subgroup_vote : enable
#extension GL_KHR_shader_subgroup_arithmetic : enable
layout(location = 0) out vec4 outColor;
sebbbi / BadCode.txt
Last active Dec 23, 2018
Let's improve this
View BadCode.txt
int i13;
i13 = 0;
for (;i13<3;)
int i14;
i14 = 0;
for (;i14<3;)
uvec3 v15;
v15.x = 0u;
View BetterBuffers.txt
All current buffer types in shading languages are slightly different ways to present homogeneous arrays (single struct or type repeating N times in memory).
DirectX has raw buffers (RWByteAddressBuffer) but that is limited to 32 bit integer types and the implementation doesn't require natural alignment for wide loads resulting in suboptimal codegen on Nvidia GPUs.
Complex use cases, such as tree traversal in spatial data structures (physics, ray-tracing, etc) require data structure that is non-homogeneous. You want different node payloads and tight memory layout.
Ability to mix 8/16/32 bit data types and 1d/2d/4d vectors to faciliate GPU wide loads (max bandwidth) in same data structure is crucial for complex use cases like this.
On the other hand we want better more readable/maintainable code syntax than DirectX raw buffers without manual bit packing/extracting and reinterpret casting. Goal should be to allow modern GPUs to use sub-register addressing (SDWA on AMD hardware). Saving both ALU and register
sebbbi / fast_spheres.txt
Created Feb 18, 2018
Fast way to render lots of spheres
View fast_spheres.txt
1. Index buffer containing N quads (each 2 triangles), where N is the max amount of spheres. Repeating pattern of {0,1,2,1,3,2} + K*4.
2. No vertex buffer.
Render N*2 triangles, where N is the number of spheres you have.
Vertex shader:
1. Sphere index = N/4 (N = SV_VertexId)
2. Quad coord: Q = float2(N%2, (N%4)/2) * 2.0 - 1.0
3. Transform sphere center -> pos
sebbbi / 5600x.txt
Last active Nov 9, 2020
5600X vs 3700X
View 5600x.txt
TestName (lower = better): 3700X -> 5600X (performance difference)
Less than 1% difference = tie
Office and Science
Agisoft Photoscan (lower = better): 2377 -> 2133 (+11.4%)
GIMP (lower = better): 20.72 -> 17.15 (+20.8%)
3D particle movement non-AVX: 2768->2452 (-11.4%)
sebbbi / leroy.txt
Last active Jan 1, 2021
Leroy investigation
View leroy.txt
b2 mid+high:
3 frame startup and can interrupt many (non-NC) strings.
1 or 2 followup = 30 damage
Against slow recovery moves can launch with b3 or uf4.
3+4 (Hermit stance) low:
3 frame startup and can interrupt many (non-NC) strings. Hermit string transitions parry dick jab even at -9.
4,1+2 followup = 56 damage