Skip to content

Instantly share code, notes, and snippets.

Avatar

Sebastian Aaltonen sebbbi

  • Unity
  • Helsinki
View GitHub Profile
@sebbbi
sebbbi / lidia.txt
Last active Mar 28, 2021
Lidia investigation
View lidia.txt
i10 jab / punish:
1,2,2: mid NC, -10 block, +8 CFT hit
1,2,4: low, -13 block, +3 hit. CH: ff3 followup = 34 dmg
+8 CFT mixup:
1: mid i20, -9 block, hit KD, CH ff1+2 followup = 59 dmg
4: low i19, -26 block, ff3 followup = 26 dmg
ff2 i14 long range mid:
block: -2 pushback -> backdash
@sebbbi
sebbbi / jack.txt
Last active Feb 24, 2021
Jack investigation
View jack.txt
Top moves:
2: i11 high, +1 block, +9 hit, 10 dmg, followup (NC): (1) mid, -2 block, +3 hit, NC = 22 dmg
f2: i10 high, -12 block, +5 hit, 17 dmg, CH followups: ff1 = 40 dmg, ff3 = 48 dmg, f3+4 = 42 dmg
f1: i14 mid, -6 block, +5 hit, 15 dmg, followup (NC): (1) high, -7 block, NC = 40 dmg
df1: i14 mid, -4 block, +3 hit, 12 dmg, followups (CH NC): (2,1) delayable high,high, launch, (1) mid, -12 block. CH NC = 55 dmg
db1: i12 low, -12 block, +2 hit, 13 dmg
df2: i15 mid, -14 block (safe tip range), launch
f1+2: i15 mid, -19 block (pushback), wall bounce
db,d,df1: i24 low, high crush, -37 block, 30 dmg
FC db1: i12 low, high crush, -8 block, +6 hit, 15 dmg
@sebbbi
sebbbi / asuka.txt
Last active Jan 8, 2021
Asuka investigation
View asuka.txt
Top moves (close):
1,2,4/3: 10 high, -2 block, followups: mid(-8 push),mid(-12 push), low(-11 / 0 hit)
1+2: i16 mid, -9 block, launch
df1,2/4: i13 mid,high/mid, -3 block, followups: high(-1), mid(-12)
df2: i15 mid, -6 block, launch (no crouch)
d1+2: i20 low, -18 block, high crush, 36 damage minicombo (d2, f2)
d3+4: i14 low,high, -6 block (push), low crush, CH launch
db1,2: i14 mid,high, -9 block, high crush, followup CH launch
db2: i20 mid, -11 block, high cruch, launch
b2,1,4/4/1+2,4: i15 mid, -4 block, followups: mid,low/high (-7,-6 push), low(-11), high,mid (-9,-13)
@sebbbi
sebbbi / 5600x.txt
Last active Nov 9, 2020
5600X vs 3700X
View 5600x.txt
Source: https://www.anandtech.com/show/16214/amd-zen-3-ryzen-deep-dive-review-5950x-5900x-5800x-and-5700x-tested
Format:
TestName (lower = better): 3700X -> 5600X (performance difference)
Less than 1% difference = tie
Office and Science
Agisoft Photoscan (lower = better): 2377 -> 2133 (+11.4%)
GIMP (lower = better): 20.72 -> 17.15 (+20.8%)
3D particle movement non-AVX: 2768->2452 (-11.4%)
@sebbbi
sebbbi / leroy.txt
Last active Jan 1, 2021
Leroy investigation
View leroy.txt
Parry:
b2 mid+high:
3 frame startup and can interrupt many (non-NC) strings.
1 or 2 followup = 30 damage
Against slow recovery moves can launch with b3 or uf4.
3+4 (Hermit stance) low:
3 frame startup and can interrupt many (non-NC) strings. Hermit string transitions parry dick jab even at -9.
4,1+2 followup = 56 damage
View BetterBuffers.txt
All current buffer types in shading languages are slightly different ways to present homogeneous arrays (single struct or type repeating N times in memory).
DirectX has raw buffers (RWByteAddressBuffer) but that is limited to 32 bit integer types and the implementation doesn't require natural alignment for wide loads resulting in suboptimal codegen on Nvidia GPUs.
Complex use cases, such as tree traversal in spatial data structures (physics, ray-tracing, etc) require data structure that is non-homogeneous. You want different node payloads and tight memory layout.
Ability to mix 8/16/32 bit data types and 1d/2d/4d vectors to faciliate GPU wide loads (max bandwidth) in same data structure is crucial for complex use cases like this.
On the other hand we want better more readable/maintainable code syntax than DirectX raw buffers without manual bit packing/extracting and reinterpret casting. Goal should be to allow modern GPUs to use sub-register addressing (SDWA on AMD hardware). Saving both ALU and register
@sebbbi
sebbbi / FramentShaderWaveCoherency.txt
Last active Nov 28, 2018
FramentShaderWaveCoherency test shader (Vulkan 1.1)
View FramentShaderWaveCoherency.txt
#version 450
#extension GL_ARB_separate_shader_objects : enable
#extension GL_KHR_shader_subgroup_basic : enable
#extension GL_KHR_shader_subgroup_ballot : enable
#extension GL_KHR_shader_subgroup_vote : enable
#extension GL_KHR_shader_subgroup_arithmetic : enable
layout(location = 0) out vec4 outColor;
//#define VISUALIZE_WAVES
@sebbbi
sebbbi / FastUniformLoadWithWaveOps.txt
Last active Mar 26, 2021
Fast uniform load with wave ops (up to 64x speedup)
View FastUniformLoadWithWaveOps.txt
In shader programming, you often run into a problem where you want to iterate an array in memory over all pixels in a compute shader
group (tile). Tiled deferred lighting is the most common case. 8x8 tile loops over a light list culled for that tile.
Simplified HLSL code looks like this:
Buffer<float4> lightDatas;
Texture2D<uint2> lightStartCounts;
RWTexture2D<float4> output;
[numthreads(8, 8, 1)]
@sebbbi
sebbbi / PerfTestResult6700K.txt
Created Nov 10, 2018
PerfTestResult6700K.txt
View PerfTestResult6700K.txt
PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]
Adapters found:
0: Radeon (TM) RX 480 Graphics
1: Intel(R) HD Graphics 530
2: Microsoft Basic Render Driver
Using adapter 1
Running 5 warm-up frames and 5 benchmark frames:
@sebbbi
sebbbi / PerfTestNewOutput.txt
Created Nov 10, 2018
Improved PerfTest output. Compare to RGBA8. 30 frame warm-up + 30 frame benchmark. No printf spam to ensure GPU bound case.
View PerfTestNewOutput.txt
PerfTest
To select adapter, use: PerfTest.exe [ADAPTER_INDEX]
Adapters found:
0: Radeon (TM) RX 480 Graphics
1: Intel(R) HD Graphics 530
2: Microsoft Basic Render Driver
Using adapter 0
Running 30 warm-up frames and 30 benchmark frames: