Skip to content

Instantly share code, notes, and snippets.

Avatar

Sebastian Aaltonen sebbbi

  • Unity
  • Helsinki
View GitHub Profile
@sebbbi
sebbbi / fast_spheres.txt
Created Feb 18, 2018
Fast way to render lots of spheres
View fast_spheres.txt
Setup:
1. Index buffer containing N quads (each 2 triangles), where N is the max amount of spheres. Repeating pattern of {0,1,2,1,3,2} + K*4.
2. No vertex buffer.
Render N*2 triangles, where N is the number of spheres you have.
Vertex shader:
1. Sphere index = N/4 (N = SV_VertexId)
2. Quad coord: Q = float2(N%2, (N%4)/2) * 2.0 - 1.0
3. Transform sphere center -> pos
@sebbbi
sebbbi / BDF2_integrate_HLSL.txt
Last active Mar 28, 2018
BDF2 integrator in HLSL
View BDF2_integrate_HLSL.txt
void BFD2(inout ParticleSimulationData Particle, float3 Accel)
{
float3 x = Particle.Position;
float3 v = Particle.Velocity;
float3 x1 = Particle.PositionPrev;
float3 v1 = Particle.VelocityPrev;
Particle.Position = (4.0/3.0) * x - (1.0/3.0) * x1 + 1.0 * ((8.0/9.0) * v - (2.0/9.0) * v1 + (4.0/9.0) * TimeStep2 * Accel);
Particle.PositionPrev = x;
@sebbbi
sebbbi / SinglePassMipPyramid.hlsl
Last active May 29, 2021
Single pass globallycoherent mip pyramid generation
View SinglePassMipPyramid.hlsl
// NOTE: Must bind 8x single mip RWTexture views, because HLSL doesn't have .mips member for RWTexture2D. (SRVs only have .mips member)
// NOTE: globallycoherent attribute is needed. Without it writes aren't guaranteed to be seen by other groups
globallycoherent RWTexture2D<float> MipTextures[8];
RWTexture2D<uint> Counters[8];
groupshared uint CounterReturnLDS;
[numthreads(16, 16, 1)]
void GenerateMipPyramid(uint3 Tid : SV_DispatchThreadID, uint3 Group : SV_GroupId, uint Gix : SV_GroupIndex)
{
[unroll]
@sebbbi
sebbbi / ConeTraceAnalytic.txt
Created Aug 27, 2018
Cone trace analytic solution
View ConeTraceAnalytic.txt
Spherical cap cone analytic solution is a 1d problem, since the cone cap sphere slides along the ray. The intersection point to empty space sphere is always on the ray.
S : radius of cone cap sphere at t=1
r(d) : cone cap sphere radius at distance d
r(d) = d*S
p = distance of current SDF sample
SDF(p) = sdf function result at location p
x = distance after conservative step
@sebbbi
sebbbi / BadCode.txt
Last active Dec 23, 2018
Let's improve this
View BadCode.txt
int i13;
i13 = 0;
for (;i13<3;)
{
int i14;
i14 = 0;
for (;i14<3;)
{
uvec3 v15;
v15.x = 0u;
@sebbbi
sebbbi / PerfTestRX480.txt
Created Nov 10, 2018
PerfTest new constant buffer and structured buffer test cases
View PerfTestRX480.txt
PerfTest results on RX480
NEW: Added constant buffer and structured buffer test cases.
Buffer<R8>.Load uniform: 0.367ms
Buffer<R8>.Load linear: 0.374ms
Buffer<R8>.Load random: 1.431ms
Buffer<RG8>.Load uniform: 1.608ms
Buffer<RG8>.Load linear: 1.624ms
Buffer<RG8>.Load random: 1.608ms
Buffer<RGBA8>.Load uniform: 1.430ms
@sebbbi
sebbbi / PerfTestNewOutput.txt
Created Nov 10, 2018
Improved PerfTest output. Compare to RGBA8. 30 frame warm-up + 30 frame benchmark. No printf spam to ensure GPU bound case.
View PerfTestNewOutput.txt
PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]
Adapters found:
0: Radeon (TM) RX 480 Graphics
1: Intel(R) HD Graphics 530
2: Microsoft Basic Render Driver
Using adapter 0
Running 30 warm-up frames and 30 benchmark frames:
@sebbbi
sebbbi / PerfTestResult6700K.txt
Created Nov 10, 2018
PerfTestResult6700K.txt
View PerfTestResult6700K.txt
PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]
Adapters found:
0: Radeon (TM) RX 480 Graphics
1: Intel(R) HD Graphics 530
2: Microsoft Basic Render Driver
Using adapter 1
Running 5 warm-up frames and 5 benchmark frames:
@sebbbi
sebbbi / FastUniformLoadWithWaveOps.txt
Last active Mar 26, 2021
Fast uniform load with wave ops (up to 64x speedup)
View FastUniformLoadWithWaveOps.txt
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.
Simplified HLSL code looks like this:
Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;
[numthreads(8, 8, 1)]
@sebbbi
sebbbi / FramentShaderWaveCoherency.txt
Last active Nov 28, 2018
FramentShaderWaveCoherency test shader (Vulkan 1.1)
View FramentShaderWaveCoherency.txt
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_KHR_shader_subgroup_basic : enable
#extension GL_KHR_shader_subgroup_ballot : enable
#extension GL_KHR_shader_subgroup_vote : enable
#extension GL_KHR_shader_subgroup_arithmetic : enable
layout(location = 0) out vec4 outColor;
//#define VISUALIZE_WAVES