Skip to content

Instantly share code, notes, and snippets.

@vassvik
Last active March 26, 2023 01:58
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vassvik/67becffbd7dfa94b58df2ee9c9d7e608 to your computer and use it in GitHub Desktop.
Save vassvik/67becffbd7dfa94b58df2ee9c9d7e608 to your computer and use it in GitHub Desktop.

Compute memory access pattern throughput benchmarks

We test the runtimes of simple compute shaders reading from one 3D texture using some kind of filter, and writing back to another texture. The local work group size of the compute shader is varied for some arbitrary set of work group sizes, and the effect of different internal texture formats are studied.

All tests are performed using 512x512x512 3D textures. At this size memory throughput and latency will be the primary bottleneck, so any extra calculations should have negligible impact on the timings.

All timings are measured by averaging the frame time across 128 frames, with a 128 frame warmup, with vsync disabled. Using queries might provide more stable numbers.

The work group sizes are:

[8, 8, 8]
[32, 32, 1]
[32, 1, 32]
[16, 16, 1]
[16, 1, 16]
[16, 16, 4]
[16, 4, 16]
[4, 16, 16]
[4, 2, 16]
[16, 2, 4]
[16, 4, 2]
[4, 16, 2]
[8, 2, 16]
[8, 16, 2]
[128, 4, 1]
[256, 4, 1]
[128, 1, 4]
[256, 1, 4]
[128, 1, 1]
[256, 1, 1]
[512, 1, 1]

And the internal texture formats used are:

R16F
R32F
RG16F
RG32F
RGBA16F
RGBA32F

It is expected that timings on similarly sized chunks should be roughly the same, and the timings should roughly scale linearly with the size of the chunks. R16F should, in an ideal world, be 8x faster than RGBA32F etc., although this will not be the case in practice. In particular, small sized (2 and 4 bytes) reads and write timings seem to vary much more than larger ones (8 and 16 bytes).

Note: The first column is the average frame time, and the second column is the standard deviation (variation in each measurement) of the frame times. Some measurements are more uncertain than others. Better timings will be provided in the future.

Note: Explicitly stating the format of the image write variable did not seem to matter much in practice, so I kept it at rgba32f for all instances for simplicity.

Table of Contents

Texture write

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	imageStore(write_image, index, vec4(index, 1.0));
}

Timings

1.898   0.156   R16F   [8, 8, 8] 
2.065   0.124   R16F   [32, 32, 1] 
2.059   0.133   R16F   [32, 1, 32] 
1.732   0.124   R16F   [16, 16, 1] 
1.729   0.122   R16F   [16, 1, 16] 
2.062   0.161   R16F   [16, 16, 4] 
2.076   0.149   R16F   [16, 4, 16] 
2.073   0.155   R16F   [4, 16, 16] 
1.903   0.105   R16F   [4, 2, 16] 
1.903   0.107   R16F   [16, 2, 4] 
1.904   0.109   R16F   [16, 4, 2] 
1.907   0.129   R16F   [4, 16, 2] 
1.712   0.102   R16F   [8, 2, 16] 
1.711   0.103   R16F   [8, 16, 2] 
1.848   0.111   R16F   [128, 4, 1] 
2.046   0.111   R16F   [256, 4, 1] 
1.853   0.105   R16F   [128, 1, 4] 
2.040   0.113   R16F   [256, 1, 4] 
1.908   0.112   R16F   [128, 1, 1] 
1.718   0.121   R16F   [256, 1, 1] 
1.851   0.118   R16F   [512, 1, 1] 

2.619   0.144   R32F   [8, 8, 8] 
2.620   0.149   R32F   [32, 32, 1] 
2.613   0.143   R32F   [32, 1, 32] 
2.616   0.145   R32F   [16, 16, 1] 
2.617   0.144   R32F   [16, 1, 16] 
2.616   0.143   R32F   [16, 16, 4] 
2.614   0.143   R32F   [16, 4, 16] 
2.615   0.142   R32F   [4, 16, 16] 
2.614   0.138   R32F   [4, 2, 16] 
2.612   0.141   R32F   [16, 2, 4] 
2.615   0.146   R32F   [16, 4, 2] 
2.616   0.143   R32F   [4, 16, 2] 
2.614   0.143   R32F   [8, 2, 16] 
2.615   0.157   R32F   [8, 16, 2] 
2.616   0.145   R32F   [128, 4, 1] 
2.621   0.144   R32F   [256, 4, 1] 
2.617   0.141   R32F   [128, 1, 4] 
2.616   0.144   R32F   [256, 1, 4] 
2.625   0.156   R32F   [128, 1, 1] 
2.619   0.146   R32F   [256, 1, 1] 
2.620   0.139   R32F   [512, 1, 1] 

2.617   0.143   RG16F   [8, 8, 8] 
2.622   0.140   RG16F   [32, 32, 1] 
2.618   0.142   RG16F   [32, 1, 32] 
2.618   0.141   RG16F   [16, 16, 1] 
2.619   0.143   RG16F   [16, 1, 16] 
2.628   0.140   RG16F   [16, 16, 4] 
2.619   0.142   RG16F   [16, 4, 16] 
2.619   0.146   RG16F   [4, 16, 16] 
2.620   0.149   RG16F   [4, 2, 16] 
2.617   0.144   RG16F   [16, 2, 4] 
2.617   0.139   RG16F   [16, 4, 2] 
2.618   0.141   RG16F   [4, 16, 2] 
2.616   0.139   RG16F   [8, 2, 16] 
2.621   0.140   RG16F   [8, 16, 2] 
2.618   0.135   RG16F   [128, 4, 1] 
2.622   0.143   RG16F   [256, 4, 1] 
2.617   0.138   RG16F   [128, 1, 4] 
2.613   0.136   RG16F   [256, 1, 4] 
2.618   0.140   RG16F   [128, 1, 1] 
2.621   0.143   RG16F   [256, 1, 1] 
2.618   0.140   RG16F   [512, 1, 1] 

5.007   0.167   RG32F   [8, 8, 8] 
5.010   0.168   RG32F   [32, 32, 1] 
5.017   0.175   RG32F   [32, 1, 32] 
5.092   0.206   RG32F   [16, 16, 1] 
5.031   0.177   RG32F   [16, 1, 16] 
5.036   0.182   RG32F   [16, 16, 4] 
5.035   0.181   RG32F   [16, 4, 16] 
5.030   0.174   RG32F   [4, 16, 16] 
5.048   0.185   RG32F   [4, 2, 16] 
5.030   0.180   RG32F   [16, 2, 4] 
5.034   0.182   RG32F   [16, 4, 2] 
5.029   0.185   RG32F   [4, 16, 2] 
5.052   0.191   RG32F   [8, 2, 16] 
5.052   0.201   RG32F   [8, 16, 2] 
5.035   0.187   RG32F   [128, 4, 1] 
5.062   0.216   RG32F   [256, 4, 1] 
5.046   0.192   RG32F   [128, 1, 4] 
5.013   0.172   RG32F   [256, 1, 4] 
5.094   0.206   RG32F   [128, 1, 1] 
5.094   0.215   RG32F   [256, 1, 1] 
5.096   0.210   RG32F   [512, 1, 1] 

5.021   0.171   RGBA16F   [8, 8, 8] 
5.033   0.184   RGBA16F   [32, 32, 1] 
5.034   0.179   RGBA16F   [32, 1, 32] 
5.027   0.172   RGBA16F   [16, 16, 1] 
5.020   0.177   RGBA16F   [16, 1, 16] 
5.027   0.174   RGBA16F   [16, 16, 4] 
5.016   0.176   RGBA16F   [16, 4, 16] 
5.004   0.166   RGBA16F   [4, 16, 16] 
5.018   0.175   RGBA16F   [4, 2, 16] 
5.006   0.169   RGBA16F   [16, 2, 4] 
5.017   0.169   RGBA16F   [16, 4, 2] 
5.013   0.170   RGBA16F   [4, 16, 2] 
5.017   0.174   RGBA16F   [8, 2, 16] 
5.010   0.174   RGBA16F   [8, 16, 2] 
5.093   0.257   RGBA16F   [128, 4, 1] 
5.049   0.229   RGBA16F   [256, 4, 1] 
5.002   0.205   RGBA16F   [128, 1, 4] 
4.999   0.214   RGBA16F   [256, 1, 4] 
5.009   0.206   RGBA16F   [128, 1, 1] 
5.014   0.217   RGBA16F   [256, 1, 1] 
5.011   0.211   RGBA16F   [512, 1, 1] 

9.875   0.290   RGBA32F   [8, 8, 8] 
9.867   0.275   RGBA32F   [32, 32, 1] 
10.159  0.239   RGBA32F   [32, 1, 32] 
9.861   0.263   RGBA32F   [16, 16, 1] 
9.886   0.278   RGBA32F   [16, 1, 16] 
9.860   0.277   RGBA32F   [16, 16, 4] 
9.872   0.274   RGBA32F   [16, 4, 16] 
9.854   0.277   RGBA32F   [4, 16, 16] 
9.828   0.277   RGBA32F   [4, 2, 16] 
9.876   0.271   RGBA32F   [16, 2, 4] 
9.875   0.276   RGBA32F   [16, 4, 2] 
9.793   0.244   RGBA32F   [4, 16, 2] 
9.800   0.246   RGBA32F   [8, 2, 16] 
9.809   0.244   RGBA32F   [8, 16, 2] 
9.872   0.280   RGBA32F   [128, 4, 1] 
9.858   0.262   RGBA32F   [256, 4, 1] 
9.852   0.258   RGBA32F   [128, 1, 4] 
9.861   0.277   RGBA32F   [256, 1, 4] 
9.916   0.290   RGBA32F   [128, 1, 1] 
9.879   0.262   RGBA32F   [256, 1, 1] 
9.876   0.251   RGBA32F   [512, 1, 1] 

Texture copy

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	vec4 t = texelFetch(read_sampler, index, 0);
	imageStore(write_image, index, t);
}

Timings

3.573   0.178   R16F   [8, 8, 8] 
3.510   0.148   R16F   [32, 32, 1] 
3.208   0.195   R16F   [32, 1, 32] 
3.271   0.170   R16F   [16, 16, 1] 
3.580   0.174   R16F   [16, 1, 16] 
3.625   0.173   R16F   [16, 16, 4] 
3.529   0.188   R16F   [16, 4, 16] 
3.704   0.201   R16F   [4, 16, 16] 
3.230   0.152   R16F   [4, 2, 16] 
3.710   0.173   R16F   [16, 2, 4] 
3.378   0.145   R16F   [16, 4, 2] 
3.393   0.157   R16F   [4, 16, 2] 
3.318   0.158   R16F   [8, 2, 16] 
3.431   0.150   R16F   [8, 16, 2] 
3.282   0.145   R16F   [128, 4, 1] 
3.459   0.159   R16F   [256, 4, 1] 
4.193   6.952   R16F   [128, 1, 4] 
3.770   0.148   R16F   [256, 1, 4] 
3.230   0.131   R16F   [128, 1, 1] 
3.281   0.128   R16F   [256, 1, 1] 
3.349   0.151   R16F   [512, 1, 1] 

5.517   0.176   R32F   [8, 8, 8] 
5.366   0.176   R32F   [32, 32, 1] 
5.212   0.171   R32F   [32, 1, 32] 
5.366   0.194   R32F   [16, 16, 1] 
5.338   0.196   R32F   [16, 1, 16] 
5.535   0.171   R32F   [16, 16, 4] 
5.331   0.180   R32F   [16, 4, 16] 
5.675   0.182   R32F   [4, 16, 16] 
5.311   0.173   R32F   [4, 2, 16] 
5.463   0.168   R32F   [16, 2, 4] 
5.463   0.234   R32F   [16, 4, 2] 
5.519   0.246   R32F   [4, 16, 2] 
5.321   0.194   R32F   [8, 2, 16] 
5.462   0.175   R32F   [8, 16, 2] 
5.404   0.188   R32F   [128, 4, 1] 
5.387   0.178   R32F   [256, 4, 1] 
5.533   0.233   R32F   [128, 1, 4] 
5.450   0.204   R32F   [256, 1, 4] 
5.398   0.188   R32F   [128, 1, 1] 
5.384   0.193   R32F   [256, 1, 1] 
5.378   0.186   R32F   [512, 1, 1] 

5.583   0.231   RG16F   [8, 8, 8] 
5.409   0.197   RG16F   [32, 32, 1] 
5.279   0.233   RG16F   [32, 1, 32] 
5.383   0.198   RG16F   [16, 16, 1] 
5.332   0.182   RG16F   [16, 1, 16] 
5.547   0.187   RG16F   [16, 16, 4] 
5.346   0.179   RG16F   [16, 4, 16] 
5.700   0.183   RG16F   [4, 16, 16] 
5.338   0.210   RG16F   [4, 2, 16] 
5.484   0.174   RG16F   [16, 2, 4] 
5.449   0.175   RG16F   [16, 4, 2] 
5.452   0.184   RG16F   [4, 16, 2] 
5.364   0.211   RG16F   [8, 2, 16] 
5.479   0.206   RG16F   [8, 16, 2] 
5.380   0.170   RG16F   [128, 4, 1] 
5.382   0.169   RG16F   [256, 4, 1] 
5.448   0.168   RG16F   [128, 1, 4] 
5.410   0.177   RG16F   [256, 1, 4] 
5.360   0.162   RG16F   [128, 1, 1] 
5.382   0.232   RG16F   [256, 1, 1] 
5.375   0.178   RG16F   [512, 1, 1] 

10.770   0.178   RG32F   [8, 8, 8] 
10.535   0.188   RG32F   [32, 32, 1] 
10.440   0.218   RG32F   [32, 1, 32] 
10.530   0.185   RG32F   [16, 16, 1] 
10.414   0.182   RG32F   [16, 1, 16] 
10.929   0.194   RG32F   [16, 16, 4] 
10.509   0.207   RG32F   [16, 4, 16] 
10.664   0.188   RG32F   [4, 16, 16] 
10.470   0.189   RG32F   [4, 2, 16] 
10.605   0.179   RG32F   [16, 2, 4] 
10.594   0.178   RG32F   [16, 4, 2] 
10.790   0.199   RG32F   [4, 16, 2] 
10.475   0.183   RG32F   [8, 2, 16] 
10.775   0.189   RG32F   [8, 16, 2] 
10.531   0.229   RG32F   [128, 4, 1] 
10.521   0.182   RG32F   [256, 4, 1] 
10.662   0.176   RG32F   [128, 1, 4] 
10.606   0.182   RG32F   [256, 1, 4] 
10.538   0.206   RG32F   [128, 1, 1] 
10.562   0.207   RG32F   [256, 1, 1] 
10.562   0.204   RG32F   [512, 1, 1] 

10.719   0.177   RGBA16F   [8, 8, 8] 
10.534   0.185   RGBA16F   [32, 32, 1] 
10.364   0.180   RGBA16F   [32, 1, 32] 
10.526   0.189   RGBA16F   [16, 16, 1] 
10.415   0.178   RGBA16F   [16, 1, 16] 
10.874   0.182   RGBA16F   [16, 16, 4] 
10.476   0.182   RGBA16F   [16, 4, 16] 
10.658   0.182   RGBA16F   [4, 16, 16] 
10.467   0.184   RGBA16F   [4, 2, 16] 
10.604   0.188   RGBA16F   [16, 2, 4] 
10.593   0.189   RGBA16F   [16, 4, 2] 
10.756   0.371   RGBA16F   [4, 16, 2] 
10.476   0.181   RGBA16F   [8, 2, 16] 
10.761   0.178   RGBA16F   [8, 16, 2] 
10.516   0.172   RGBA16F   [128, 4, 1] 
10.523   0.186   RGBA16F   [256, 4, 1] 
10.649   0.185   RGBA16F   [128, 1, 4] 
10.618   0.183   RGBA16F   [256, 1, 4] 
10.525   0.201   RGBA16F   [128, 1, 1] 
10.518   0.177   RGBA16F   [256, 1, 1] 
10.526   0.193   RGBA16F   [512, 1, 1] 

20.926   0.073   RGBA32F   [8, 8, 8] 
20.789   0.110   RGBA32F   [32, 32, 1] 
31.603   0.284   RGBA32F   [32, 1, 32] 
20.785   0.117   RGBA32F   [16, 16, 1] 
20.831   0.118   RGBA32F   [16, 1, 16] 
21.077   0.102   RGBA32F   [16, 16, 4] 
20.707   0.098   RGBA32F   [16, 4, 16] 
20.835   0.266   RGBA32F   [4, 16, 16] 
20.837   0.263   RGBA32F   [4, 2, 16] 
20.705   0.086   RGBA32F   [16, 2, 4] 
20.901   0.209   RGBA32F   [16, 4, 2] 
20.890   0.074   RGBA32F   [4, 16, 2] 
20.690   0.085   RGBA32F   [8, 2, 16] 
20.843   0.064   RGBA32F   [8, 16, 2] 
20.704   0.106   RGBA32F   [128, 4, 1] 
20.699   0.083   RGBA32F   [256, 4, 1] 
20.799   0.076   RGBA32F   [128, 1, 4] 
20.912   0.310   RGBA32F   [256, 1, 4] 
20.696   0.187   RGBA32F   [128, 1, 1] 
20.661   0.116   RGBA32F   [256, 1, 1] 
20.702   0.125   RGBA32F   [512, 1, 1] 

3-stencil 1D filter X

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	vec4 t1 = texelFetch(read_sampler, index-ivec3(1, 0, 0), 0);
	vec4 t2 = texelFetch(read_sampler, index, 0);
	vec4 t3 = texelFetch(read_sampler, index+ivec3(1, 0, 0), 0);
	imageStore(write_image, index, 0.25*(t1 + 2*t2 + t3));
}

Timings

4.003   0.147   R16F   [8, 8, 8] 
4.327   0.163   R16F   [32, 32, 1] 
4.192   0.163   R16F   [32, 1, 32] 
3.680   0.145   R16F   [16, 16, 1] 
4.170   0.166   R16F   [16, 1, 16] 
4.383   0.161   R16F   [16, 16, 4] 
4.270   0.153   R16F   [16, 4, 16] 
4.336   0.165   R16F   [4, 16, 16] 
3.674   0.332   R16F   [4, 2, 16] 
4.185   0.263   R16F   [16, 2, 4] 
3.748   0.260   R16F   [16, 4, 2] 
3.720   0.164   R16F   [4, 16, 2] 
3.797   0.230   R16F   [8, 2, 16] 
3.841   0.260   R16F   [8, 16, 2] 
3.811   0.272   R16F   [128, 4, 1] 
4.113   0.232   R16F   [256, 4, 1] 
4.218   0.161   R16F   [128, 1, 4] 
4.380   0.151   R16F   [256, 1, 4] 
3.643   0.167   R16F   [128, 1, 1] 
3.728   0.245   R16F   [256, 1, 1] 
3.840   0.167   R16F   [512, 1, 1] 

5.713   0.271   R32F   [8, 8, 8] 
5.544   0.270   R32F   [32, 32, 1] 
6.379   6.625   R32F   [32, 1, 32] 
5.505   0.255   R32F   [16, 16, 1] 
5.969   0.209   R32F   [16, 1, 16] 
5.636   0.204   R32F   [16, 16, 4] 
5.442   0.253   R32F   [16, 4, 16] 
6.047   0.436   R32F   [4, 16, 16] 
5.434   0.284   R32F   [4, 2, 16] 
5.503   0.350   R32F   [16, 2, 4] 
5.509   0.233   R32F   [16, 4, 2] 
5.567   0.251   R32F   [4, 16, 2] 
5.493   0.250   R32F   [8, 2, 16] 
5.572   0.225   R32F   [8, 16, 2] 
5.449   0.246   R32F   [128, 4, 1] 
5.434   0.217   R32F   [256, 4, 1] 
5.505   0.214   R32F   [128, 1, 4] 
5.492   0.196   R32F   [256, 1, 4] 
5.443   0.198   R32F   [128, 1, 1] 
5.440   0.250   R32F   [256, 1, 1] 
5.438   0.229   R32F   [512, 1, 1] 

5.658   0.244   RG16F   [8, 8, 8] 
5.468   0.204   RG16F   [32, 32, 1] 
5.785   0.202   RG16F   [32, 1, 32] 
5.381   0.176   RG16F   [16, 16, 1] 
5.954   0.175   RG16F   [16, 1, 16] 
5.599   0.192   RG16F   [16, 16, 4] 
5.384   0.185   RG16F   [16, 4, 16] 
5.954   0.173   RG16F   [4, 16, 16] 
5.348   0.177   RG16F   [4, 2, 16] 
5.467   0.182   RG16F   [16, 2, 4] 
5.451   0.176   RG16F   [16, 4, 2] 
5.507   0.178   RG16F   [4, 16, 2] 
5.408   0.185   RG16F   [8, 2, 16] 
5.501   0.177   RG16F   [8, 16, 2] 
5.395   0.191   RG16F   [128, 4, 1] 
5.454   0.216   RG16F   [256, 4, 1] 
5.490   0.211   RG16F   [128, 1, 4] 
5.482   0.199   RG16F   [256, 1, 4] 
5.389   0.187   RG16F   [128, 1, 1] 
5.387   0.189   RG16F   [256, 1, 1] 
5.394   0.263   RG16F   [512, 1, 1] 

10.817   0.175   RG32F   [8, 8, 8] 
10.584   0.234   RG32F   [32, 32, 1] 
10.287   0.206   RG32F   [32, 1, 32] 
10.533   0.346   RG32F   [16, 16, 1] 
10.449   0.251   RG32F   [16, 1, 16] 
11.136   0.287   RG32F   [16, 16, 4] 
10.482   0.198   RG32F   [16, 4, 16] 
10.730   0.303   RG32F   [4, 16, 16] 
10.337   0.220   RG32F   [4, 2, 16] 
10.630   0.231   RG32F   [16, 2, 4] 
10.576   0.194   RG32F   [16, 4, 2] 
10.783   0.214   RG32F   [4, 16, 2] 
10.414   0.243   RG32F   [8, 2, 16] 
10.794   0.208   RG32F   [8, 16, 2] 
10.558   0.232   RG32F   [128, 4, 1] 
10.537   0.208   RG32F   [256, 4, 1] 
10.672   0.203   RG32F   [128, 1, 4] 
10.659   0.210   RG32F   [256, 1, 4] 
10.512   0.194   RG32F   [128, 1, 1] 
10.523   0.210   RG32F   [256, 1, 1] 
10.527   0.216   RG32F   [512, 1, 1] 

10.812   0.180   RGBA16F   [8, 8, 8] 
10.570   0.189   RGBA16F   [32, 32, 1] 
10.312   0.253   RGBA16F   [32, 1, 32] 
10.510   0.210   RGBA16F   [16, 16, 1] 
10.415   0.196   RGBA16F   [16, 1, 16] 
11.011   0.210   RGBA16F   [16, 16, 4] 
10.460   0.183   RGBA16F   [16, 4, 16] 
10.655   0.186   RGBA16F   [4, 16, 16] 
10.374   0.216   RGBA16F   [4, 2, 16] 
10.616   0.208   RGBA16F   [16, 2, 4] 
10.605   0.197   RGBA16F   [16, 4, 2] 
10.788   0.191   RGBA16F   [4, 16, 2] 
10.445   0.209   RGBA16F   [8, 2, 16] 
10.786   0.186   RGBA16F   [8, 16, 2] 
10.500   0.213   RGBA16F   [128, 4, 1] 
10.502   0.188   RGBA16F   [256, 4, 1] 
10.680   0.189   RGBA16F   [128, 1, 4] 
10.640   0.179   RGBA16F   [256, 1, 4] 
10.506   0.214   RGBA16F   [128, 1, 1] 
10.504   0.196   RGBA16F   [256, 1, 1] 
10.505   0.194   RGBA16F   [512, 1, 1] 

20.974   0.115   RGBA32F   [8, 8, 8] 
20.833   0.073   RGBA32F   [32, 32, 1] 
30.154   0.122   RGBA32F   [32, 1, 32] 
20.728   0.093   RGBA32F   [16, 16, 1] 
20.582   0.086   RGBA32F   [16, 1, 16] 
21.177   0.143   RGBA32F   [16, 16, 4] 
20.817   0.145   RGBA32F   [16, 4, 16] 
20.885   0.109   RGBA32F   [4, 16, 16] 
20.650   0.107   RGBA32F   [4, 2, 16] 
20.722   0.103   RGBA32F   [16, 2, 4] 
20.828   0.072   RGBA32F   [16, 4, 2] 
20.781   0.177   RGBA32F   [4, 16, 2] 
20.519   0.155   RGBA32F   [8, 2, 16] 
20.689   0.076   RGBA32F   [8, 16, 2] 
20.660   0.143   RGBA32F   [128, 4, 1] 
20.610   0.106   RGBA32F   [256, 4, 1] 
20.821   0.099   RGBA32F   [128, 1, 4] 
20.838   0.132   RGBA32F   [256, 1, 4] 
20.699   0.131   RGBA32F   [128, 1, 1] 
20.604   0.098   RGBA32F   [256, 1, 1] 
20.625   0.474   RGBA32F   [512, 1, 1] 

3-stencil 1D filter Y

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	vec4 t1 = texelFetch(read_sampler, index-ivec3(0, 1, 0), 0);
	vec4 t2 = texelFetch(read_sampler, index, 0);
	vec4 t3 = texelFetch(read_sampler, index+ivec3(0, 1, 0), 0);
	imageStore(write_image, index, 0.25*(t1 + 2*t2 + t3));
}

Timings

4.079   0.165   R16F   [8, 8, 8] 
4.338   0.154   R16F   [32, 32, 1] 
4.209   0.163   R16F   [32, 1, 32] 
3.726   0.160   R16F   [16, 16, 1] 
4.210   0.149   R16F   [16, 1, 16] 
4.437   0.151   R16F   [16, 16, 4] 
4.377   0.158   R16F   [16, 4, 16] 
4.461   0.169   R16F   [4, 16, 16] 
4.174   0.156   R16F   [4, 2, 16] 
4.093   0.144   R16F   [16, 2, 4] 
3.729   0.154   R16F   [16, 4, 2] 
3.802   0.149   R16F   [4, 16, 2] 
4.198   0.146   R16F   [8, 2, 16] 
3.852   0.158   R16F   [8, 16, 2] 
3.847   0.161   R16F   [128, 4, 1] 
4.195   0.171   R16F   [256, 4, 1] 
4.259   0.155   R16F   [128, 1, 4] 
4.950   6.184   R16F   [256, 1, 4] 
3.699   0.156   R16F   [128, 1, 1] 
3.759   0.152   R16F   [256, 1, 1] 
3.941   0.166   R16F   [512, 1, 1] 

5.489   0.203   R32F   [8, 8, 8] 
5.476   0.183   R32F   [32, 32, 1] 
5.430   0.184   R32F   [32, 1, 32] 
5.419   0.186   R32F   [16, 16, 1] 
5.736   0.178   R32F   [16, 1, 16] 
5.672   0.188   R32F   [16, 16, 4] 
5.417   0.173   R32F   [16, 4, 16] 
5.774   0.204   R32F   [4, 16, 16] 
5.515   0.183   R32F   [4, 2, 16] 
5.504   0.176   R32F   [16, 2, 4] 
5.499   0.183   R32F   [16, 4, 2] 
5.564   0.175   R32F   [4, 16, 2] 
5.408   0.177   R32F   [8, 2, 16] 
5.532   0.176   R32F   [8, 16, 2] 
5.416   0.185   R32F   [128, 4, 1] 
5.460   0.164   R32F   [256, 4, 1] 
5.513   0.246   R32F   [128, 1, 4] 
5.498   0.177   R32F   [256, 1, 4] 
5.409   0.158   R32F   [128, 1, 1] 
5.408   0.173   R32F   [256, 1, 1] 
5.420   0.176   R32F   [512, 1, 1] 

5.476   0.182   RG16F   [8, 8, 8] 
5.471   0.188   RG16F   [32, 32, 1] 
5.445   0.766   RG16F   [32, 1, 32] 
5.407   0.199   RG16F   [16, 16, 1] 
5.763   0.269   RG16F   [16, 1, 16] 
5.676   0.259   RG16F   [16, 16, 4] 
5.401   0.196   RG16F   [16, 4, 16] 
5.772   0.223   RG16F   [4, 16, 16] 
5.528   0.242   RG16F   [4, 2, 16] 
5.497   0.182   RG16F   [16, 2, 4] 
5.488   0.200   RG16F   [16, 4, 2] 
5.657   0.332   RG16F   [4, 16, 2] 
5.411   0.223   RG16F   [8, 2, 16] 
5.536   0.195   RG16F   [8, 16, 2] 
5.422   0.184   RG16F   [128, 4, 1] 
5.453   0.202   RG16F   [256, 4, 1] 
5.485   0.203   RG16F   [128, 1, 4] 
5.499   0.197   RG16F   [256, 1, 4] 
5.408   0.196   RG16F   [128, 1, 1] 
5.409   0.206   RG16F   [256, 1, 1] 
5.417   0.208   RG16F   [512, 1, 1] 

10.699   0.240   RG32F   [8, 8, 8] 
10.565   0.218   RG32F   [32, 32, 1] 
12.087   0.210   RG32F   [32, 1, 32] 
10.550   0.219   RG32F   [16, 16, 1] 
10.539   0.282   RG32F   [16, 1, 16] 
10.772   0.205   RG32F   [16, 16, 4] 
10.806   0.305   RG32F   [16, 4, 16] 
12.837   0.192   RG32F   [4, 16, 16] 
10.520   0.263   RG32F   [4, 2, 16] 
10.684   0.246   RG32F   [16, 2, 4] 
10.611   0.213   RG32F   [16, 4, 2] 
10.811   0.430   RG32F   [4, 16, 2] 
10.533   0.249   RG32F   [8, 2, 16] 
10.638   0.215   RG32F   [8, 16, 2] 
10.541   0.206   RG32F   [128, 4, 1] 
10.543   0.263   RG32F   [256, 4, 1] 
10.709   0.199   RG32F   [128, 1, 4] 
10.661   0.260   RG32F   [256, 1, 4] 
10.533   0.308   RG32F   [128, 1, 1] 
10.549   0.267   RG32F   [256, 1, 1] 
10.572   0.222   RG32F   [512, 1, 1] 

10.697   0.214   RGBA16F   [8, 8, 8] 
10.617   0.283   RGBA16F   [32, 32, 1] 
12.152   0.233   RGBA16F   [32, 1, 32] 
10.515   0.213   RGBA16F   [16, 16, 1] 
10.517   0.274   RGBA16F   [16, 1, 16] 
10.746   0.196   RGBA16F   [16, 16, 4] 
10.753   0.207   RGBA16F   [16, 4, 16] 
12.835   0.207   RGBA16F   [4, 16, 16] 
10.519   0.257   RGBA16F   [4, 2, 16] 
10.663   0.192   RGBA16F   [16, 2, 4] 
10.627   0.203   RGBA16F   [16, 4, 2] 
10.729   0.302   RGBA16F   [4, 16, 2] 
10.592   0.310   RGBA16F   [8, 2, 16] 
10.729   0.261   RGBA16F   [8, 16, 2] 
10.568   0.260   RGBA16F   [128, 4, 1] 
10.582   0.225   RGBA16F   [256, 4, 1] 
10.746   0.266   RGBA16F   [128, 1, 4] 
10.643   0.211   RGBA16F   [256, 1, 4] 
10.547   0.243   RGBA16F   [128, 1, 1] 
10.513   0.195   RGBA16F   [256, 1, 1] 
10.571   0.261   RGBA16F   [512, 1, 1] 

23.959   0.150   RGBA32F   [8, 8, 8] 
20.694   0.109   RGBA32F   [32, 32, 1] 
57.408   0.156   RGBA32F   [32, 1, 32] 
20.672   0.087   RGBA32F   [16, 16, 1] 
24.954   0.125   RGBA32F   [16, 1, 16] 
21.663   0.163   RGBA32F   [16, 16, 4] 
34.343   0.135   RGBA32F   [16, 4, 16] 
25.197   0.198   RGBA32F   [4, 16, 16] 
28.447   0.141   RGBA32F   [4, 2, 16] 
20.743   0.228   RGBA32F   [16, 2, 4] 
20.786   0.072   RGBA32F   [16, 4, 2] 
20.714   0.060   RGBA32F   [4, 16, 2] 
28.122   0.094   RGBA32F   [8, 2, 16] 
20.760   0.073   RGBA32F   [8, 16, 2] 
20.731   0.085   RGBA32F   [128, 4, 1] 
20.739   0.069   RGBA32F   [256, 4, 1] 
20.926   0.073   RGBA32F   [128, 1, 4] 
20.871   0.080   RGBA32F   [256, 1, 4] 
20.672   0.100   RGBA32F   [128, 1, 1] 
20.686   0.074   RGBA32F   [256, 1, 1] 
20.704   0.069   RGBA32F   [512, 1, 1] 

3-stencil 1D filter Z

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	vec4 t1 = texelFetch(read_sampler, index-ivec3(0, 0, 1), 0);
	vec4 t2 = texelFetch(read_sampler, index, 0);
	vec4 t3 = texelFetch(read_sampler, index+ivec3(0, 0, 1), 0);
	imageStore(write_image, index, 0.25*(t1 + 2*t2 + t3));
}

Timings

4.167   0.165   R16F   [8, 8, 8] 
4.688   0.188   R16F   [32, 32, 1] 
4.184   0.150   R16F   [32, 1, 32] 
4.225   0.208   R16F   [16, 16, 1] 
4.111   0.148   R16F   [16, 1, 16] 
4.664   0.147   R16F   [16, 16, 4] 
4.366   0.140   R16F   [16, 4, 16] 
4.447   0.146   R16F   [4, 16, 16] 
3.800   0.144   R16F   [4, 2, 16] 
4.220   0.133   R16F   [16, 2, 4] 
4.266   0.156   R16F   [16, 4, 2] 
4.310   0.165   R16F   [4, 16, 2] 
3.945   0.148   R16F   [8, 2, 16] 
4.383   0.175   R16F   [8, 16, 2] 
4.405   0.212   R16F   [128, 4, 1] 
4.705   0.176   R16F   [256, 4, 1] 
4.341   0.163   R16F   [128, 1, 4] 
4.601   0.177   R16F   [256, 1, 4] 
4.183   0.181   R16F   [128, 1, 1] 
4.233   0.193   R16F   [256, 1, 1] 
4.358   0.173   R16F   [512, 1, 1] 

6.155    0.195   R32F   [8, 8, 8] 
10.654   6.695   R32F   [32, 32, 1] 
5.712    0.182   R32F   [32, 1, 32] 
10.185   0.192   R32F   [16, 16, 1] 
6.187    0.184   R32F   [16, 1, 16] 
6.740    0.196   R32F   [16, 16, 4] 
5.676    0.200   R32F   [16, 4, 16] 
6.160    0.258   R32F   [4, 16, 16] 
5.753    0.219   R32F   [4, 2, 16] 
6.676    0.208   R32F   [16, 2, 4] 
7.772    0.207   R32F   [16, 4, 2] 
7.742    0.185   R32F   [4, 16, 2] 
5.659    0.172   R32F   [8, 2, 16] 
7.745    0.167   R32F   [8, 16, 2] 
10.152   0.167   R32F   [128, 4, 1] 
10.158   0.175   R32F   [256, 4, 1] 
6.618    0.182   R32F   [128, 1, 4] 
6.620    0.175   R32F   [256, 1, 4] 
10.193   0.173   R32F   [128, 1, 1] 
10.204   0.178   R32F   [256, 1, 1] 
10.216   0.180   R32F   [512, 1, 1] 

6.136    0.181   RG16F   [8, 8, 8] 
10.212   0.177   RG16F   [32, 32, 1] 
5.693    0.162   RG16F   [32, 1, 32] 
10.154   0.181   RG16F   [16, 16, 1] 
6.180    0.169   RG16F   [16, 1, 16] 
6.724    0.185   RG16F   [16, 16, 4] 
5.662    0.183   RG16F   [16, 4, 16] 
6.089    0.187   RG16F   [4, 16, 16] 
5.686    0.185   RG16F   [4, 2, 16] 
6.620    0.179   RG16F   [16, 2, 4] 
7.719    0.194   RG16F   [16, 4, 2] 
7.751    0.181   RG16F   [4, 16, 2] 
5.687    0.204   RG16F   [8, 2, 16] 
7.760    0.194   RG16F   [8, 16, 2] 
10.156   0.178   RG16F   [128, 4, 1] 
10.166   0.181   RG16F   [256, 4, 1] 
6.624    0.188   RG16F   [128, 1, 4] 
6.624    0.185   RG16F   [256, 1, 4] 
10.299   0.218   RG16F   [128, 1, 1] 
10.266   0.200   RG16F   [256, 1, 1] 
10.227   0.183   RG16F   [512, 1, 1] 

11.890   0.172   RG32F   [8, 8, 8] 
20.151   0.057   RG32F   [32, 32, 1] 
11.462   0.189   RG32F   [32, 1, 32] 
20.045   0.044   RG32F   [16, 16, 1] 
11.589   0.187   RG32F   [16, 1, 16] 
13.064   0.156   RG32F   [16, 16, 4] 
11.018   0.176   RG32F   [16, 4, 16] 
11.176   0.803   RG32F   [4, 16, 16] 
10.986   0.188   RG32F   [4, 2, 16] 
12.855   0.159   RG32F   [16, 2, 4] 
15.148   0.124   RG32F   [16, 4, 2] 
15.235   0.137   RG32F   [4, 16, 2] 
11.072   0.188   RG32F   [8, 2, 16] 
15.298   0.128   RG32F   [8, 16, 2] 
20.046   0.080   RG32F   [128, 4, 1] 
19.978   0.043   RG32F   [256, 4, 1] 
12.891   0.149   RG32F   [128, 1, 4] 
12.853   0.156   RG32F   [256, 1, 4] 
20.136   0.070   RG32F   [128, 1, 1] 
20.132   0.073   RG32F   [256, 1, 1] 
20.083   0.057   RG32F   [512, 1, 1] 

11.919   0.192   RGBA16F   [8, 8, 8] 
20.137   0.047   RGBA16F   [32, 32, 1] 
11.386   0.179   RGBA16F   [32, 1, 32] 
20.005   0.084   RGBA16F   [16, 16, 1] 
11.558   0.189   RGBA16F   [16, 1, 16] 
13.074   0.160   RGBA16F   [16, 16, 4] 
11.014   0.175   RGBA16F   [16, 4, 16] 
11.194   0.164   RGBA16F   [4, 16, 16] 
11.006   0.198   RGBA16F   [4, 2, 16] 
12.883   0.167   RGBA16F   [16, 2, 4] 
15.135   0.112   RGBA16F   [16, 4, 2] 
15.184   0.105   RGBA16F   [4, 16, 2] 
11.035   0.186   RGBA16F   [8, 2, 16] 
15.227   0.120   RGBA16F   [8, 16, 2] 
20.003   0.054   RGBA16F   [128, 4, 1] 
20.018   0.072   RGBA16F   [256, 4, 1] 
12.938   0.169   RGBA16F   [128, 1, 4] 
12.893   0.164   RGBA16F   [256, 1, 4] 
20.024   0.049   RGBA16F   [128, 1, 1] 
20.062   0.083   RGBA16F   [256, 1, 1] 
20.088   0.071   RGBA16F   [512, 1, 1] 

23.258   0.081   RGBA32F   [8, 8, 8] 
39.570   0.086   RGBA32F   [32, 32, 1] 
34.507   0.144   RGBA32F   [32, 1, 32] 
39.456   0.074   RGBA32F   [16, 16, 1] 
21.960   0.053   RGBA32F   [16, 1, 16] 
25.446   0.211   RGBA32F   [16, 16, 4] 
21.855   0.152   RGBA32F   [16, 4, 16] 
21.957   0.187   RGBA32F   [4, 16, 16] 
22.197   0.153   RGBA32F   [4, 2, 16] 
25.247   0.120   RGBA32F   [16, 2, 4] 
29.737   0.219   RGBA32F   [16, 4, 2] 
29.744   0.062   RGBA32F   [4, 16, 2] 
21.855   0.082   RGBA32F   [8, 2, 16] 
29.758   0.067   RGBA32F   [8, 16, 2] 
39.328   0.074   RGBA32F   [128, 4, 1] 
39.696   0.090   RGBA32F   [256, 4, 1] 
25.533   0.085   RGBA32F   [128, 1, 4] 
25.449   0.089   RGBA32F   [256, 1, 4] 
39.519   0.107   RGBA32F   [128, 1, 1] 
39.498   0.076   RGBA32F   [256, 1, 1] 
39.499   0.099   RGBA32F   [512, 1, 1] 

5-stencil 2D filter

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	vec4 t1 = texelFetch(read_sampler, index-ivec3(0, 1, 0), 0);
	vec4 t2 = texelFetch(read_sampler, index-ivec3(1, 0, 0), 0);
	vec4 t3 = texelFetch(read_sampler, index, 0);
	vec4 t4 = texelFetch(read_sampler, index+ivec3(1, 0, 0), 0);
	vec4 t5 = texelFetch(read_sampler, index+ivec3(0, 1, 0), 0);
	imageStore(write_image, index, 0.125*(t1 + t2 + 4*t3 + t4 + t5));
}

Timings

5.104   0.158   R16F   [8, 8, 8] 
5.452   0.164   R16F   [32, 32, 1] 
5.334   0.154   R16F   [32, 1, 32] 
4.813   0.152   R16F   [16, 16, 1] 
5.121   0.164   R16F   [16, 1, 16] 
5.664   0.157   R16F   [16, 16, 4] 
5.739   0.156   R16F   [16, 4, 16] 
5.706   0.150   R16F   [4, 16, 16] 
4.997   0.144   R16F   [4, 2, 16] 
5.037   0.155   R16F   [16, 2, 4] 
4.861   0.152   R16F   [16, 4, 2] 
4.886   0.149   R16F   [4, 16, 2] 
4.998   0.154   R16F   [8, 2, 16] 
4.906   0.150   R16F   [8, 16, 2] 
4.960   0.151   R16F   [128, 4, 1] 
5.264   0.168   R16F   [256, 4, 1] 
5.155   0.156   R16F   [128, 1, 4] 
5.484   0.168   R16F   [256, 1, 4] 
4.761   0.148   R16F   [128, 1, 1] 
4.791   0.164   R16F   [256, 1, 1] 
4.992   0.170   R16F   [512, 1, 1] 

6.137   0.168   R32F   [8, 8, 8] 
6.512   0.184   R32F   [32, 32, 1] 
6.382   0.180   R32F   [32, 1, 32] 
5.851   0.180   R32F   [16, 16, 1] 
6.826   0.199   R32F   [16, 1, 16] 
6.664   0.176   R32F   [16, 16, 4] 
6.219   0.183   R32F   [16, 4, 16] 
7.171   0.187   R32F   [4, 16, 16] 
6.213   0.175   R32F   [4, 2, 16] 
5.818   0.166   R32F   [16, 2, 4] 
5.878   0.172   R32F   [16, 4, 2] 
6.141   0.180   R32F   [4, 16, 2] 
6.157   0.178   R32F   [8, 2, 16] 
6.084   0.175   R32F   [8, 16, 2] 
6.028   0.171   R32F   [128, 4, 1] 
6.429   0.174   R32F   [256, 4, 1] 
6.031   0.173   R32F   [128, 1, 4] 
6.309   0.181   R32F   [256, 1, 4] 
5.972   0.172   R32F   [128, 1, 1] 
6.012   0.168   R32F   [256, 1, 1] 
6.220   0.172   R32F   [512, 1, 1] 

6.131   0.174   RG16F   [8, 8, 8] 
6.506   0.188   RG16F   [32, 32, 1] 
6.388   0.178   RG16F   [32, 1, 32] 
5.850   0.181   RG16F   [16, 16, 1] 
6.827   0.189   RG16F   [16, 1, 16] 
6.667   0.185   RG16F   [16, 16, 4] 
6.224   0.180   RG16F   [16, 4, 16] 
7.172   0.180   RG16F   [4, 16, 16] 
6.212   0.169   RG16F   [4, 2, 16] 
5.817   0.174   RG16F   [16, 2, 4] 
5.879   0.176   RG16F   [16, 4, 2] 
6.150   0.179   RG16F   [4, 16, 2] 
6.163   0.172   RG16F   [8, 2, 16] 
6.091   0.196   RG16F   [8, 16, 2] 
6.028   0.186   RG16F   [128, 4, 1] 
6.434   0.176   RG16F   [256, 4, 1] 
6.034   0.180   RG16F   [128, 1, 4] 
6.309   0.173   RG16F   [256, 1, 4] 
5.975   0.179   RG16F   [128, 1, 1] 
6.012   0.171   RG16F   [256, 1, 1] 
6.237   0.185   RG16F   [512, 1, 1] 

10.669   0.191   RG32F   [8, 8, 8] 
10.685   0.190   RG32F   [32, 32, 1] 
11.355   0.186   RG32F   [32, 1, 32] 
10.548   0.184   RG32F   [16, 16, 1] 
11.391   0.172   RG32F   [16, 1, 16] 
11.129   0.175   RG32F   [16, 16, 4] 
10.524   0.184   RG32F   [16, 4, 16] 
12.825   0.210   RG32F   [4, 16, 16] 
10.668   0.198   RG32F   [4, 2, 16] 
10.590   0.185   RG32F   [16, 2, 4] 
10.584   0.187   RG32F   [16, 4, 2] 
10.677   0.186   RG32F   [4, 16, 2] 
10.508   0.192   RG32F   [8, 2, 16] 
10.716   0.184   RG32F   [8, 16, 2] 
10.545   0.189   RG32F   [128, 4, 1] 
10.533   0.182   RG32F   [256, 4, 1] 
10.668   0.174   RG32F   [128, 1, 4] 
10.608   0.202   RG32F   [256, 1, 4] 
10.531   0.185   RG32F   [128, 1, 1] 
10.528   0.180   RG32F   [256, 1, 1] 
10.516   0.183   RG32F   [512, 1, 1] 

10.661   0.183   RGBA16F   [8, 8, 8] 
10.675   0.198   RGBA16F   [32, 32, 1] 
11.355   0.202   RGBA16F   [32, 1, 32] 
10.548   0.189   RGBA16F   [16, 16, 1] 
11.382   0.174   RGBA16F   [16, 1, 16] 
11.129   0.179   RGBA16F   [16, 16, 4] 
10.528   0.205   RGBA16F   [16, 4, 16] 
12.825   0.153   RGBA16F   [4, 16, 16] 
10.667   0.187   RGBA16F   [4, 2, 16] 
10.590   0.200   RGBA16F   [16, 2, 4] 
10.585   0.180   RGBA16F   [16, 4, 2] 
10.678   0.196   RGBA16F   [4, 16, 2] 
10.506   0.189   RGBA16F   [8, 2, 16] 
10.709   0.188   RGBA16F   [8, 16, 2] 
10.542   0.194   RGBA16F   [128, 4, 1] 
10.532   0.188   RGBA16F   [256, 4, 1] 
10.664   0.190   RGBA16F   [128, 1, 4] 
10.607   0.180   RGBA16F   [256, 1, 4] 
10.534   0.178   RGBA16F   [128, 1, 1] 
10.519   0.188   RGBA16F   [256, 1, 1] 
10.519   0.196   RGBA16F   [512, 1, 1] 

23.525   0.068   RGBA32F   [8, 8, 8] 
20.762   0.057   RGBA32F   [32, 32, 1] 
55.123   0.086   RGBA32F   [32, 1, 32] 
20.667   0.099   RGBA32F   [16, 16, 1] 
22.857   0.067   RGBA32F   [16, 1, 16] 
21.478   0.063   RGBA32F   [16, 16, 4] 
33.865   0.088   RGBA32F   [16, 4, 16] 
25.081   0.052   RGBA32F   [4, 16, 16] 
26.124   0.088   RGBA32F   [4, 2, 16] 
20.719   0.065   RGBA32F   [16, 2, 4] 
20.753   0.047   RGBA32F   [16, 4, 2] 
20.654   0.059   RGBA32F   [4, 16, 2] 
26.513   0.099   RGBA32F   [8, 2, 16] 
20.726   0.051   RGBA32F   [8, 16, 2] 
20.635   0.053   RGBA32F   [128, 4, 1] 
20.632   0.057   RGBA32F   [256, 4, 1] 
20.902   0.065   RGBA32F   [128, 1, 4] 
23.612   0.068   RGBA32F   [256, 1, 4] 
20.714   0.051   RGBA32F   [128, 1, 1] 
20.798   0.046   RGBA32F   [256, 1, 1] 
24.009   0.031   RGBA32F   [512, 1, 1] 

9-stencil 2D filter

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	vec4 t1 = texelFetch(read_sampler, index+ivec3(-1, -1, 0), 0);
	vec4 t2 = texelFetch(read_sampler, index+ivec3( 0, -1, 0), 0);
	vec4 t3 = texelFetch(read_sampler, index+ivec3(+1, -1, 0), 0);
	vec4 t4 = texelFetch(read_sampler, index+ivec3(-1,  0, 0), 0);
	vec4 t5 = texelFetch(read_sampler, index+ivec3( 0,  0, 0), 0);
	vec4 t6 = texelFetch(read_sampler, index+ivec3(+1,  0, 0), 0);
	vec4 t7 = texelFetch(read_sampler, index+ivec3(-1, +1, 0), 0);
	vec4 t8 = texelFetch(read_sampler, index+ivec3( 0, +1, 0), 0);
	vec4 t9 = texelFetch(read_sampler, index+ivec3(+1, +1, 0), 0);
	
	imageStore(write_image, index, (1.0/16.0)*(t1 + 2*t2 + t3 + 2*t4 + 4*t5 + 2*t6 + t7 + 2*t8 + t9));
}

Timings

7.612   0.183   R16F   [8, 8, 8] 
7.798   0.190   R16F   [32, 32, 1] 
7.812   0.171   R16F   [32, 1, 32] 
7.331   0.171   R16F   [16, 16, 1] 
7.333   0.168   R16F   [16, 1, 16] 
7.890   0.182   R16F   [16, 16, 4] 
7.478   0.166   R16F   [16, 4, 16] 
7.861   0.185   R16F   [4, 16, 16] 
7.265   0.182   R16F   [4, 2, 16] 
7.329   0.172   R16F   [16, 2, 4] 
7.289   0.170   R16F   [16, 4, 2] 
7.327   0.190   R16F   [4, 16, 2] 
7.355   0.198   R16F   [8, 2, 16] 
7.445   0.219   R16F   [8, 16, 2] 
7.601   0.229   R16F   [128, 4, 1] 
7.860   0.215   R16F   [256, 4, 1] 
7.648   0.221   R16F   [128, 1, 4] 
7.858   0.193   R16F   [256, 1, 4] 
7.222   0.195   R16F   [128, 1, 1] 
7.322   0.203   R16F   [256, 1, 1] 
7.585   0.173   R16F   [512, 1, 1] 

8.426   0.184   R32F   [8, 8, 8] 
8.720   0.284   R32F   [32, 32, 1] 
8.240   0.227   R32F   [32, 1, 32] 
8.703   0.195   R32F   [16, 16, 1] 
8.322   0.264   R32F   [16, 1, 16] 
8.721   0.227   R32F   [16, 16, 4] 
8.334   0.230   R32F   [16, 4, 16] 
9.333   0.197   R32F   [4, 16, 16] 
8.538   0.183   R32F   [4, 2, 16] 
8.160   0.201   R32F   [16, 2, 4] 
8.633   0.294   R32F   [16, 4, 2] 
8.577   0.241   R32F   [4, 16, 2] 
8.471   0.261   R32F   [8, 2, 16] 
8.558   0.231   R32F   [8, 16, 2] 
8.625   0.163   R32F   [128, 4, 1] 
8.710   0.272   R32F   [256, 4, 1] 
8.515   0.223   R32F   [128, 1, 4] 
8.527   0.194   R32F   [256, 1, 4] 
8.676   0.308   R32F   [128, 1, 1] 
8.662   0.171   R32F   [256, 1, 1] 
8.646   0.191   R32F   [512, 1, 1] 

8.403   0.175   RG16F   [8, 8, 8] 
8.692   0.202   RG16F   [32, 32, 1] 
8.186   0.170   RG16F   [32, 1, 32] 
8.742   0.259   RG16F   [16, 16, 1] 
8.329   0.220   RG16F   [16, 1, 16] 
8.746   0.273   RG16F   [16, 16, 4] 
8.248   0.196   RG16F   [16, 4, 16] 
9.294   0.157   RG16F   [4, 16, 16] 
8.565   0.201   RG16F   [4, 2, 16] 
8.189   0.213   RG16F   [16, 2, 4] 
8.667   0.253   RG16F   [16, 4, 2] 
8.592   0.283   RG16F   [4, 16, 2] 
8.397   0.186   RG16F   [8, 2, 16] 
8.552   0.234   RG16F   [8, 16, 2] 
8.671   0.193   RG16F   [128, 4, 1] 
8.698   0.199   RG16F   [256, 4, 1] 
8.519   0.227   RG16F   [128, 1, 4] 
8.475   0.161   RG16F   [256, 1, 4] 
8.635   0.151   RG16F   [128, 1, 1] 
8.637   0.166   RG16F   [256, 1, 1] 
8.701   0.223   RG16F   [512, 1, 1] 

11.206   0.149   RG32F   [8, 8, 8] 
11.417   0.169   RG32F   [32, 32, 1] 
12.442   0.227   RG32F   [32, 1, 32] 
11.812   0.202   RG32F   [16, 16, 1] 
13.048   0.223   RG32F   [16, 1, 16] 
12.869   0.186   RG32F   [16, 16, 4] 
11.033   0.212   RG32F   [16, 4, 16] 
13.954   0.225   RG32F   [4, 16, 16] 
13.155   0.212   RG32F   [4, 2, 16] 
11.075   0.186   RG32F   [16, 2, 4] 
11.504   0.226   RG32F   [16, 4, 2] 
11.302   0.236   RG32F   [4, 16, 2] 
12.346   0.215   RG32F   [8, 2, 16] 
11.305   0.297   RG32F   [8, 16, 2] 
11.294   0.246   RG32F   [128, 4, 1] 
11.014   0.303   RG32F   [256, 4, 1] 
11.692   0.205   RG32F   [128, 1, 4] 
11.754   0.263   RG32F   [256, 1, 4] 
11.551   0.313   RG32F   [128, 1, 1] 
11.716   0.239   RG32F   [256, 1, 1] 
11.441   0.231   RG32F   [512, 1, 1] 

11.318   0.266   RGBA16F   [8, 8, 8] 
11.446   0.247   RGBA16F   [32, 32, 1] 
12.464   0.231   RGBA16F   [32, 1, 32] 
11.842   0.245   RGBA16F   [16, 16, 1] 
13.018   0.212   RGBA16F   [16, 1, 16] 
12.840   0.231   RGBA16F   [16, 16, 4] 
11.098   0.280   RGBA16F   [16, 4, 16] 
13.943   0.268   RGBA16F   [4, 16, 16] 
13.171   0.238   RGBA16F   [4, 2, 16] 
11.148   0.239   RGBA16F   [16, 2, 4] 
11.528   0.251   RGBA16F   [16, 4, 2] 
11.326   0.317   RGBA16F   [4, 16, 2] 
12.289   0.198   RGBA16F   [8, 2, 16] 
11.269   0.252   RGBA16F   [8, 16, 2] 
11.247   0.209   RGBA16F   [128, 4, 1] 
10.885   0.220   RGBA16F   [256, 4, 1] 
11.721   0.277   RGBA16F   [128, 1, 4] 
11.773   0.295   RGBA16F   [256, 1, 4] 
11.439   0.161   RGBA16F   [128, 1, 1] 
11.609   0.160   RGBA16F   [256, 1, 1] 
11.391   0.189   RGBA16F   [512, 1, 1] 

21.437   0.120   RGBA32F   [8, 8, 8] 
24.676   0.177   RGBA32F   [32, 32, 1] 
52.367   0.278   RGBA32F   [32, 1, 32] 
21.062   0.258   RGBA32F   [16, 16, 1] 
26.020   0.295   RGBA32F   [16, 1, 16] 
24.131   0.223   RGBA32F   [16, 16, 4] 
30.333   0.228   RGBA32F   [16, 4, 16] 
25.640   0.194   RGBA32F   [4, 16, 16] 
23.129   0.212   RGBA32F   [4, 2, 16] 
21.113   0.317   RGBA32F   [16, 2, 4] 
21.084   0.315   RGBA32F   [16, 4, 2] 
20.929   0.174   RGBA32F   [4, 16, 2] 
23.290   0.240   RGBA32F   [8, 2, 16] 
21.013   0.232   RGBA32F   [8, 16, 2] 
23.660   0.171   RGBA32F   [128, 4, 1] 
23.203   0.111   RGBA32F   [256, 4, 1] 
26.930   0.112   RGBA32F   [128, 1, 4] 
35.940   0.146   RGBA32F   [256, 1, 4] 
25.518   0.135   RGBA32F   [128, 1, 1] 
29.479   0.154   RGBA32F   [256, 1, 1] 
35.506   0.163   RGBA32F   [512, 1, 1] 

7-stencil 3D filter

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	vec4 t1 = texelFetch(read_sampler, index+ivec3( 0,  0, -1), 0);
	vec4 t2 = texelFetch(read_sampler, index+ivec3( 0, -1,  0), 0);
	vec4 t3 = texelFetch(read_sampler, index+ivec3(-1,  0,  0), 0);
	vec4 t4 = texelFetch(read_sampler, index+ivec3( 0,  0,  0), 0);
	vec4 t5 = texelFetch(read_sampler, index+ivec3(+1,  0,  0), 0);
	vec4 t6 = texelFetch(read_sampler, index+ivec3( 0, +1,  0), 0);
	vec4 t7 = texelFetch(read_sampler, index+ivec3( 0,  0, +1), 0);
	
	imageStore(write_image, index, t1+t2+t3+t4+t5+t6+t7);
}

Timings

6.435   0.188   R16F   [8, 8, 8] 
6.583   0.252   R16F   [32, 32, 1] 
6.533   0.170   R16F   [32, 1, 32] 
6.142   0.192   R16F   [16, 16, 1] 
6.290   0.171   R16F   [16, 1, 16] 
6.914   0.207   R16F   [16, 16, 4] 
6.407   0.205   R16F   [16, 4, 16] 
6.729   0.162   R16F   [4, 16, 16] 
6.175   0.178   R16F   [4, 2, 16] 
6.226   0.206   R16F   [16, 2, 4] 
6.296   0.299   R16F   [16, 4, 2] 
6.326   0.264   R16F   [4, 16, 2] 
6.174   0.212   R16F   [8, 2, 16] 
6.415   0.187   R16F   [8, 16, 2] 
6.340   0.195   R16F   [128, 4, 1] 
6.500   0.198   R16F   [256, 4, 1] 
6.361   0.198   R16F   [128, 1, 4] 
6.739   0.154   R16F   [256, 1, 4] 
6.094   0.193   R16F   [128, 1, 1] 
6.140   0.218   R16F   [256, 1, 1] 
6.365   0.189   R16F   [512, 1, 1] 

7.382    0.214   R32F   [8, 8, 8] 
10.918   0.187   R32F   [32, 32, 1] 
7.897    0.391   R32F   [32, 1, 32] 
10.409   0.283   R32F   [16, 16, 1] 
7.891    0.265   R32F   [16, 1, 16] 
8.387    0.191   R32F   [16, 16, 4] 
7.356    0.189   R32F   [16, 4, 16] 
8.501    0.238   R32F   [4, 16, 16] 
7.503    0.236   R32F   [4, 2, 16] 
7.399    0.265   R32F   [16, 2, 4] 
8.215    0.241   R32F   [16, 4, 2] 
8.559    0.251   R32F   [4, 16, 2] 
7.418    0.197   R32F   [8, 2, 16] 
8.491    0.218   R32F   [8, 16, 2] 
10.319   0.195   R32F   [128, 4, 1] 
10.725   0.229   R32F   [256, 4, 1] 
7.564    0.206   R32F   [128, 1, 4] 
8.122    0.178   R32F   [256, 1, 4] 
10.342   0.230   R32F   [128, 1, 1] 
10.356   0.254   R32F   [256, 1, 1] 
10.475   0.220   R32F   [512, 1, 1] 

7.367    0.214   RG16F   [8, 8, 8] 
10.905   0.269   RG16F   [32, 32, 1] 
7.739    0.215   RG16F   [32, 1, 32] 
10.310   0.193   RG16F   [16, 16, 1] 
7.867    0.238   RG16F   [16, 1, 16] 
8.405    0.215   RG16F   [16, 16, 4] 
7.397    0.235   RG16F   [16, 4, 16] 
8.487    0.187   RG16F   [4, 16, 16] 
7.474    0.170   RG16F   [4, 2, 16] 
7.388    0.198   RG16F   [16, 2, 4] 
8.197    0.194   RG16F   [16, 4, 2] 
8.526    0.174   RG16F   [4, 16, 2] 
7.450    0.210   RG16F   [8, 2, 16] 
8.502    0.203   RG16F   [8, 16, 2] 
10.348   0.199   RG16F   [128, 4, 1] 
10.749   0.170   RG16F   [256, 4, 1] 
7.605    0.174   RG16F   [128, 1, 4] 
8.149    0.193   RG16F   [256, 1, 4] 
10.344   0.183   RG16F   [128, 1, 1] 
10.361   0.188   RG16F   [256, 1, 1] 
10.471   0.173   RG16F   [512, 1, 1] 

11.884   0.185   RG32F   [8, 8, 8] 
20.197   0.195   RG32F   [32, 32, 1] 
14.985   0.129   RG32F   [32, 1, 32] 
20.009   0.443   RG32F   [16, 16, 1] 
12.605   0.324   RG32F   [16, 1, 16] 
13.655   0.187   RG32F   [16, 16, 4] 
11.118   0.233   RG32F   [16, 4, 16] 
13.773   0.191   RG32F   [4, 16, 16] 
11.213   0.185   RG32F   [4, 2, 16] 
12.893   0.254   RG32F   [16, 2, 4] 
15.266   0.273   RG32F   [16, 4, 2] 
15.211   0.223   RG32F   [4, 16, 2] 
11.204   0.232   RG32F   [8, 2, 16] 
15.443   0.313   RG32F   [8, 16, 2] 
20.023   0.152   RG32F   [128, 4, 1] 
20.105   0.214   RG32F   [256, 4, 1] 
12.929   0.219   RG32F   [128, 1, 4] 
13.282   0.164   RG32F   [256, 1, 4] 
20.029   0.136   RG32F   [128, 1, 1] 
20.044   0.189   RG32F   [256, 1, 1] 
20.220   0.261   RG32F   [512, 1, 1] 

12.052   0.239   RGBA16F   [8, 8, 8] 
20.296   0.274   RGBA16F   [32, 32, 1] 
14.948   0.169   RGBA16F   [32, 1, 32] 
20.002   0.099   RGBA16F   [16, 16, 1] 
12.571   0.194   RGBA16F   [16, 1, 16] 
13.628   0.148   RGBA16F   [16, 16, 4] 
11.086   0.212   RGBA16F   [16, 4, 16] 
13.747   0.157   RGBA16F   [4, 16, 16] 
11.219   0.203   RGBA16F   [4, 2, 16] 
12.843   0.181   RGBA16F   [16, 2, 4] 
15.167   0.175   RGBA16F   [16, 4, 2] 
15.283   0.233   RGBA16F   [4, 16, 2] 
11.423   0.259   RGBA16F   [8, 2, 16] 
15.398   0.253   RGBA16F   [8, 16, 2] 
20.011   0.119   RGBA16F   [128, 4, 1] 
20.186   0.208   RGBA16F   [256, 4, 1] 
13.105   0.268   RGBA16F   [128, 1, 4] 
13.279   0.151   RGBA16F   [256, 1, 4] 
20.110   0.245   RGBA16F   [128, 1, 1] 
20.139   0.151   RGBA16F   [256, 1, 1] 
20.232   0.258   RGBA16F   [512, 1, 1] 

26.664   0.247   RGBA32F   [8, 8, 8] 
40.048   0.298   RGBA32F   [32, 32, 1] 
58.822   0.215   RGBA32F   [32, 1, 32] 
39.535   0.238   RGBA32F   [16, 16, 1] 
26.678   0.220   RGBA32F   [16, 1, 16] 
26.197   0.112   RGBA32F   [16, 16, 4] 
31.270   0.111   RGBA32F   [16, 4, 16] 
26.354   0.242   RGBA32F   [4, 16, 16] 
27.780   0.219   RGBA32F   [4, 2, 16] 
25.236   0.207   RGBA32F   [16, 2, 4] 
29.670   0.099   RGBA32F   [16, 4, 2] 
29.739   0.130   RGBA32F   [4, 16, 2] 
27.502   0.156   RGBA32F   [8, 2, 16] 
29.772   0.228   RGBA32F   [8, 16, 2] 
39.432   0.086   RGBA32F   [128, 4, 1] 
39.470   0.136   RGBA32F   [256, 4, 1] 
26.273   0.087   RGBA32F   [128, 1, 4] 
32.123   0.093   RGBA32F   [256, 1, 4] 
40.648   0.215   RGBA32F   [128, 1, 1] 
40.724   0.213   RGBA32F   [256, 1, 1] 
41.236   0.318   RGBA32F   [512, 1, 1] 

27-stencil 3D filter

Shader code

#version 440 core

layout(local_size_x = %d, local_size_y = %d, local_size_z = %d) in;

uniform layout(binding=0) sampler3D read_sampler;
uniform layout(binding=0, rgba32f) writeonly image3D write_image;

void main() {
	ivec3 index = ivec3(gl_GlobalInvocationID.xyz);
	
	vec4 t1 = texelFetch(read_sampler, index+ivec3(-1, -1, -1), 0);
	vec4 t2 = texelFetch(read_sampler, index+ivec3( 0, -1, -1), 0);
	vec4 t3 = texelFetch(read_sampler, index+ivec3(+1, -1, -1), 0);
	vec4 t4 = texelFetch(read_sampler, index+ivec3(-1,  0, -1), 0);
	vec4 t5 = texelFetch(read_sampler, index+ivec3( 0,  0, -1), 0);
	vec4 t6 = texelFetch(read_sampler, index+ivec3(+1,  0, -1), 0);
	vec4 t7 = texelFetch(read_sampler, index+ivec3(-1, +1, -1), 0);
	vec4 t8 = texelFetch(read_sampler, index+ivec3( 0, +1, -1), 0);
	vec4 t9 = texelFetch(read_sampler, index+ivec3(+1, +1, -1), 0);

	vec4 t10 = texelFetch(read_sampler, index+ivec3(-1, -1, 0), 0);
	vec4 t11 = texelFetch(read_sampler, index+ivec3( 0, -1, 0), 0);
	vec4 t12 = texelFetch(read_sampler, index+ivec3(+1, -1, 0), 0);
	vec4 t13 = texelFetch(read_sampler, index+ivec3(-1,  0, 0), 0);
	vec4 t14 = texelFetch(read_sampler, index+ivec3( 0,  0, 0), 0);
	vec4 t15 = texelFetch(read_sampler, index+ivec3(+1,  0, 0), 0);
	vec4 t16 = texelFetch(read_sampler, index+ivec3(-1, +1, 0), 0);
	vec4 t17 = texelFetch(read_sampler, index+ivec3( 0, +1, 0), 0);
	vec4 t18 = texelFetch(read_sampler, index+ivec3(+1, +1, 0), 0);

	vec4 t19 = texelFetch(read_sampler, index+ivec3(-1, -1, +1), 0);
	vec4 t20 = texelFetch(read_sampler, index+ivec3( 0, -1, +1), 0);
	vec4 t21 = texelFetch(read_sampler, index+ivec3(+1, -1, +1), 0);
	vec4 t22 = texelFetch(read_sampler, index+ivec3(-1,  0, +1), 0);
	vec4 t23 = texelFetch(read_sampler, index+ivec3( 0,  0, +1), 0);
	vec4 t24 = texelFetch(read_sampler, index+ivec3(+1,  0, +1), 0);
	vec4 t25 = texelFetch(read_sampler, index+ivec3(-1, +1, +1), 0);
	vec4 t26 = texelFetch(read_sampler, index+ivec3( 0, +1, +1), 0);
	vec4 t27 = texelFetch(read_sampler, index+ivec3(+1, +1, +1), 0);
	
	imageStore(write_image, index, t1+t2+t3+t4+t5+t6+t7+t8+t9+t10+t11+t12+t13+t14+t15+t16+t17+t18+t19+t20+t21+t22+t23+t24+t25+t26+t27);
}

Timings

19.471   0.053   R16F   [8, 8, 8] 
19.214   0.085   R16F   [32, 32, 1] 
19.010   0.054   R16F   [32, 1, 32] 
18.250   0.056   R16F   [16, 16, 1] 
18.527   0.060   R16F   [16, 1, 16] 
19.610   0.180   R16F   [16, 16, 4] 
19.346   0.092   R16F   [16, 4, 16] 
19.547   0.095   R16F   [4, 16, 16] 
18.286   0.095   R16F   [4, 2, 16] 
18.202   0.043   R16F   [16, 2, 4] 
18.296   0.066   R16F   [16, 4, 2] 
18.397   0.040   R16F   [4, 16, 2] 
18.627   0.055   R16F   [8, 2, 16] 
18.861   0.256   R16F   [8, 16, 2] 
18.548   0.057   R16F   [128, 4, 1] 
19.192   0.094   R16F   [256, 4, 1] 
18.985   0.051   R16F   [128, 1, 4] 
19.231   0.042   R16F   [256, 1, 4] 
18.103   0.040   R16F   [128, 1, 1] 
18.236   0.060   R16F   [256, 1, 1] 
18.677   0.150   R16F   [512, 1, 1] 

20.099   0.075   R32F   [8, 8, 8] 
20.574   0.073   R32F   [32, 32, 1] 
19.790   0.055   R32F   [32, 1, 32] 
20.051   0.055   R32F   [16, 16, 1] 
19.224   0.219   R32F   [16, 1, 16] 
20.196   0.556   R32F   [16, 16, 4] 
19.587   0.070   R32F   [16, 4, 16] 
20.668   0.085   R32F   [4, 16, 16] 
19.444   0.356   R32F   [4, 2, 16] 
18.888   0.072   R32F   [16, 2, 4] 
18.934   0.052   R32F   [16, 4, 2] 
19.253   0.065   R32F   [4, 16, 2] 
19.262   0.051   R32F   [8, 2, 16] 
19.929   0.047   R32F   [8, 16, 2] 
20.520   0.051   R32F   [128, 4, 1] 
20.436   0.040   R32F   [256, 4, 1] 
19.771   0.056   R32F   [128, 1, 4] 
19.976   0.076   R32F   [256, 1, 4] 
19.350   0.073   R32F   [128, 1, 1] 
19.989   0.083   R32F   [256, 1, 1] 
20.525   0.085   R32F   [512, 1, 1] 

20.305   0.055   RG16F   [8, 8, 8] 
20.827   0.085   RG16F   [32, 32, 1] 
19.885   0.074   RG16F   [32, 1, 32] 
20.289   0.197   RG16F   [16, 16, 1] 
19.302   0.044   RG16F   [16, 1, 16] 
20.386   0.068   RG16F   [16, 16, 4] 
19.803   0.068   RG16F   [16, 4, 16] 
20.741   0.050   RG16F   [4, 16, 16] 
19.272   0.053   RG16F   [4, 2, 16] 
18.801   0.061   RG16F   [16, 2, 4] 
18.953   0.092   RG16F   [16, 4, 2] 
19.186   0.059   RG16F   [4, 16, 2] 
19.220   0.087   RG16F   [8, 2, 16] 
19.948   0.102   RG16F   [8, 16, 2] 
20.488   0.060   RG16F   [128, 4, 1] 
20.426   0.058   RG16F   [256, 4, 1] 
19.782   0.062   RG16F   [128, 1, 4] 
19.972   0.137   RG16F   [256, 1, 4] 
19.305   0.039   RG16F   [128, 1, 1] 
20.000   0.063   RG16F   [256, 1, 1] 
20.296   0.085   RG16F   [512, 1, 1] 

22.543   0.093   RG32F   [8, 8, 8] 
25.244   0.064   RG32F   [32, 32, 1] 
38.323   0.076   RG32F   [32, 1, 32] 
24.397   0.071   RG32F   [16, 16, 1] 
30.501   0.071   RG32F   [16, 1, 16] 
23.389   0.100   RG32F   [16, 16, 4] 
21.724   0.172   RG32F   [16, 4, 16] 
24.561   0.137   RG32F   [4, 16, 16] 
28.405   0.179   RG32F   [4, 2, 16] 
21.194   0.254   RG32F   [16, 2, 4] 
21.453   0.063   RG32F   [16, 4, 2] 
22.132   0.110   RG32F   [4, 16, 2] 
22.337   0.080   RG32F   [8, 2, 16] 
22.534   0.052   RG32F   [8, 16, 2] 
23.815   0.081   RG32F   [128, 4, 1] 
24.331   0.068   RG32F   [256, 4, 1] 
25.852   0.053   RG32F   [128, 1, 4] 
32.502   0.095   RG32F   [256, 1, 4] 
25.990   0.064   RG32F   [128, 1, 1] 
27.928   0.105   RG32F   [256, 1, 1] 
27.420   0.067   RG32F   [512, 1, 1] 

22.620   0.062   RGBA16F   [8, 8, 8] 
25.231   0.065   RGBA16F   [32, 32, 1] 
38.369   0.094   RGBA16F   [32, 1, 32] 
24.485   0.062   RGBA16F   [16, 16, 1] 
30.553   0.073   RGBA16F   [16, 1, 16] 
23.312   0.052   RGBA16F   [16, 16, 4] 
21.613   0.054   RGBA16F   [16, 4, 16] 
24.438   0.067   RGBA16F   [4, 16, 16] 
28.292   0.059   RGBA16F   [4, 2, 16] 
21.006   0.060   RGBA16F   [16, 2, 4] 
21.573   0.073   RGBA16F   [16, 4, 2] 
22.178   0.088   RGBA16F   [4, 16, 2] 
22.279   0.217   RGBA16F   [8, 2, 16] 
22.647   0.059   RGBA16F   [8, 16, 2] 
23.864   0.078   RGBA16F   [128, 4, 1] 
24.394   0.072   RGBA16F   [256, 4, 1] 
25.863   0.074   RGBA16F   [128, 1, 4] 
32.399   0.092   RGBA16F   [256, 1, 4] 
26.014   0.067   RGBA16F   [128, 1, 1] 
27.845   0.067   RGBA16F   [256, 1, 1] 
27.354   0.051   RGBA16F   [512, 1, 1] 

263.142   20.231   RGBA32F   [8, 8, 8] 
539.418   27.280   RGBA32F   [32, 32, 1] 
896.605   32.594   RGBA32F   [32, 1, 32] 
538.636   28.211   RGBA32F   [16, 16, 1] 
245.417   20.118   RGBA32F   [16, 1, 16] 
280.745   19.339   RGBA32F   [16, 16, 4] 
335.301   22.527   RGBA32F   [16, 4, 16] 
318.864   22.657   RGBA32F   [4, 16, 16] 
258.222   20.055   RGBA32F   [4, 2, 16] 
269.406   20.556   RGBA32F   [16, 2, 4] 
358.750   24.258   RGBA32F   [16, 4, 2] 
378.604   24.653   RGBA32F   [4, 16, 2] 
241.925   19.392   RGBA32F   [8, 2, 16] 
365.230   23.817   RGBA32F   [8, 16, 2] 
530.502   26.973   RGBA32F   [128, 4, 1] 
532.646   27.821   RGBA32F   [256, 4, 1] 
267.446   19.633   RGBA32F   [128, 1, 4] 
268.398   19.261   RGBA32F   [256, 1, 4] 
532.141   28.114   RGBA32F   [128, 1, 1] 
531.302   27.661   RGBA32F   [256, 1, 1] 
531.910   29.231   RGBA32F   [512, 1, 1] 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment