{{ message }}

Instantly share code, notes, and snippets.

# TheRealMJP/Tex2DCatmullRom.hlsl

Last active Sep 16, 2022
An HLSL function for sampling a 2D texture with Catmull-Rom filtering, using 9 texture samples instead of 16
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters
 // The following code is licensed under the MIT license: https://gist.github.com/TheRealMJP/bc503b0b87b643d3505d41eab8b332ae // Samples a texture with Catmull-Rom filtering, using 9 texture fetches instead of 16. // See http://vec3.ca/bicubic-filtering-in-fewer-taps/ for more details float4 SampleTextureCatmullRom(in Texture2D tex, in SamplerState linearSampler, in float2 uv, in float2 texSize) { // We're going to sample a a 4x4 grid of texels surrounding the target UV coordinate. We'll do this by rounding // down the sample location to get the exact center of our "starting" texel. The starting texel will be at // location [1, 1] in the grid, where [0, 0] is the top left corner. float2 samplePos = uv * texSize; float2 texPos1 = floor(samplePos - 0.5f) + 0.5f; // Compute the fractional offset from our starting texel to our original sample location, which we'll // feed into the Catmull-Rom spline function to get our filter weights. float2 f = samplePos - texPos1; // Compute the Catmull-Rom weights using the fractional offset that we calculated earlier. // These equations are pre-expanded based on our knowledge of where the texels will be located, // which lets us avoid having to evaluate a piece-wise function. float2 w0 = f * (-0.5f + f * (1.0f - 0.5f * f)); float2 w1 = 1.0f + f * f * (-2.5f + 1.5f * f); float2 w2 = f * (0.5f + f * (2.0f - 1.5f * f)); float2 w3 = f * f * (-0.5f + 0.5f * f); // Work out weighting factors and sampling offsets that will let us use bilinear filtering to // simultaneously evaluate the middle 2 samples from the 4x4 grid. float2 w12 = w1 + w2; float2 offset12 = w2 / (w1 + w2); // Compute the final UV coordinates we'll use for sampling the texture float2 texPos0 = texPos1 - 1; float2 texPos3 = texPos1 + 2; float2 texPos12 = texPos1 + offset12; texPos0 /= texSize; texPos3 /= texSize; texPos12 /= texSize; float4 result = 0.0f; result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos0.y), 0.0f) * w0.x * w0.y; result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos0.y), 0.0f) * w12.x * w0.y; result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos0.y), 0.0f) * w3.x * w0.y; result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos12.y), 0.0f) * w0.x * w12.y; result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos12.y), 0.0f) * w12.x * w12.y; result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos12.y), 0.0f) * w3.x * w12.y; result += tex.SampleLevel(linearSampler, float2(texPos0.x, texPos3.y), 0.0f) * w0.x * w3.y; result += tex.SampleLevel(linearSampler, float2(texPos12.x, texPos3.y), 0.0f) * w12.x * w3.y; result += tex.SampleLevel(linearSampler, float2(texPos3.x, texPos3.y), 0.0f) * w3.x * w3.y; return result; }

### aras-p commented Sep 21, 2016

btw a coworker suggested this small optimization:

``````// get rid of f3, and:
float2 w0 = (1.0f / 2.0f) * f * (-1.0f + f * (2.0f - f));
float2 w1 = (1.0f / 6.0f) * f2 * (-15.0f + 9.0f * f) + 1.0f;
float2 w2 = (1.0f / 6.0f) * f * (3.0f + f * (12.0f - f * 9.0f));
float2 w3 = (1.0f / 2.0f) * f2 * (f - 1.0f);
``````

Checking with Pyramid using AMDDXX for Bonaire target:
VGPRs: 51 -> 49
VALU: 147 -> 146

### pixelmager commented Sep 21, 2016 • edited

Alternatively putting the polynomials straight in horner-form:

``````float2 w0 = f * ( -0.5 + f * (1.0 - 0.5*f));
float2 w1 = 1.0 + f * f * (-2.5 + 1.5*f );
float2 w2 = f * ( 0.5 + f * (2.0 - 1.5*f) );
float2 w3 = f * f * (-0.5 + 0.5 * f);
``````

Pyramid, AMDDXX, Bonaire ( http://pastebin.com/12ccE9Lk )
VGPRs: 55 -> 47
VALU: 146 -> 135

### TheRealMJP commented Sep 22, 2016

Thanks guys! I updated the code with the optimizations.

### dwulive commented Aug 22, 2022

If you are doing the filtering yourself and you want to use a linear buffer, you can use rawBuffer0.Load4()
coherency might or might not be worse, it depends. Dynamic updates are usually easier.

to join this conversation on GitHub. Already have an account? Sign in to comment