# TheRealMJP/Tex2DCatmullRom.hlsl

Last active Sep 16, 2022
An HLSL function for sampling a 2D texture with Catmull-Rom filtering, using 9 texture samples instead of 16
 // The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae // Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16. // See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details float4 SampleTextureCatmullRom(in Texture2D tex, in SamplerState linearSampler, in float2 uv, in float2 texSize) { // We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding // down the sample location to get the exact center of our "starting" texel. The starting texel will be at // location [1, 1] in the grid, where [0, 0] is the top left corner. float2 samplePos = uv * texSize; float2 texPos1 = floor(samplePos - 0.5f) + 0.5f; // Compute the fractional offset from our starting texel to our original sample location, which we'll // feed into the Catmull-Rom spline function to get our filter weights. float2 f = samplePos - texPos1; // Compute the Catmull-Rom weights using the fractional offset that we calculated earlier. // These equations are pre-expanded based on our knowledge of where the texels will be located, // which lets us avoid having to evaluate a piece-wise function. float2 w0 = f * (-0.5f + f * (1.0f - 0.5f * f)); float2 w1 = 1.0f + f * f * (-2.5f + 1.5f * f); float2 w2 = f * (0.5f + f * (2.0f - 1.5f * f)); float2 w3 = f * f * (-0.5f + 0.5f * f); // Work out weighting factors and sampling offsets that will let us use bilinear filtering to // simultaneously evaluate the middle 2 samples from the 4x4 grid. float2 w12 = w1 + w2; float2 offset12 = w2 / (w1 + w2); // Compute the final UV coordinates we'll use for sampling the texture float2 texPos0 = texPos1 - 1; float2 texPos3 = texPos1 + 2; float2 texPos12 = texPos1 + offset12; texPos0 /= texSize; texPos3 /= texSize; texPos12 /= texSize; float4 result = 0.0f; result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos0.y), 0.0f) * w0.x * w0.y; result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos0.y), 0.0f) * w12.x * w0.y; result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos0.y), 0.0f) * w3.x * w0.y; result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos12.y), 0.0f) * w0.x * w12.y; result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos12.y), 0.0f) * w12.x * w12.y; result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos12.y), 0.0f) * w3.x * w12.y; result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos3.y), 0.0f) * w0.x * w3.y; result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos3.y), 0.0f) * w12.x * w3.y; result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos3.y), 0.0f) * w3.x * w3.y; return result; }

### aras-p commented Sep 21, 2016

btw a coworker suggested this small optimization:

``````// get rid of f3, and:
float2 w0 = (1.0f / 2.0f) * f * (-1.0f + f * (2.0f - f));
float2 w1 = (1.0f / 6.0f) * f2 * (-15.0f + 9.0f * f) + 1.0f;
float2 w2 = (1.0f / 6.0f) * f * (3.0f + f * (12.0f - f * 9.0f));
float2 w3 = (1.0f / 2.0f) * f2 * (f - 1.0f);
``````

Checking with Pyramid using AMDDXX for Bonaire target:
VGPRs: 51 -> 49
VALU: 147 -> 146

### pixelmager commented Sep 21, 2016 • edited

Alternatively putting the polynomials straight in horner-form:

``````float2 w0 = f * ( -0.5 + f * (1.0 - 0.5*f));
float2 w1 = 1.0 + f * f * (-2.5 + 1.5*f );
float2 w2 = f * ( 0.5 + f * (2.0 - 1.5*f) );
float2 w3 = f * f * (-0.5 + 0.5 * f);
``````

Pyramid, AMDDXX, Bonaire ( http://pastebin.com/12ccE9Lk )
VGPRs: 55 -> 47
VALU: 146 -> 135

### TheRealMJP commented Sep 22, 2016

Thanks guys! I updated the code with the optimizations.

### dwulive commented Aug 22, 2022

If you are doing the filtering yourself and you want to use a linear buffer, you can use rawBuffer0.Load4()
coherency might or might not be worse, it depends. Dynamic updates are usually easier.

