Skip to content

Instantly share code, notes, and snippets.

@Joshua-Ashton
Last active June 10, 2024 10:31
Show Gist options
  • Save Joshua-Ashton/04f666b8a0a0a15f6ab133937f6e0db8 to your computer and use it in GitHub Desktop.
Save Joshua-Ashton/04f666b8a0a0a15f6ab133937f6e0db8 to your computer and use it in GitHub Desktop.
template <class _Ty>
_NODISCARD /* constexpr */ _Ty _Common_lerp(const _Ty _ArgA, const _Ty _ArgB, const _Ty _ArgT) noexcept {
// on a line intersecting {(0.0, _ArgA), (1.0, _ArgB)}, return the Y value for X == _ArgT
const int _Finite_mask = (int{isfinite(_ArgA)} << 2) | (int{isfinite(_ArgB)} << 1) | int{isfinite(_ArgT)};
if (_Finite_mask == 0b111) {
// 99% case, put it first; this block comes from P0811R3
if ((_ArgA <= 0 && _ArgB >= 0) || (_ArgA >= 0 && _ArgB <= 0)) {
// exact, monotonic, bounded, determinate, and (for _ArgA == _ArgB == 0) consistent:
return _ArgT * _ArgB + (1 - _ArgT) * _ArgA;
}
if (_ArgT == 1) {
// exact
return _ArgB;
}
// exact at _ArgT == 0, monotonic except near _ArgT == 1, bounded, determinate, and consistent:
const auto _Candidate = _ArgA + _ArgT * (_ArgB - _ArgA);
// monotonic near _ArgT == 1:
if ((_ArgT > 1) == (_ArgB > _ArgA)) {
if (_ArgB > _Candidate) {
return _ArgB;
}
} else {
if (_Candidate > _ArgB) {
return _ArgB;
}
}
return _Candidate;
}
if (isnan(_ArgA)) {
return _ArgA;
}
if (isnan(_ArgB)) {
return _ArgB;
}
if (isnan(_ArgT)) {
return _ArgT;
}
switch (_Finite_mask) {
case 0b000:
// All values are infinities
if (_ArgT >= 1) {
return _ArgB;
}
return _ArgA;
case 0b010:
case 0b100:
case 0b110:
// _ArgT is an infinity; return infinity in the "direction" of _ArgA and _ArgB
return _ArgT * (_ArgB - _ArgA);
case 0b001:
// Here _ArgA and _ArgB are infinities
if (_ArgA == _ArgB) {
// same sign, so T doesn't matter
return _ArgA;
}
// Opposite signs, choose the "infinity direction" according to T if it makes sense.
if (_ArgT <= 0) {
return _ArgA;
}
if (_ArgT >= 1) {
return _ArgB;
}
// Interpolating between infinities of opposite signs doesn't make sense, NaN
if constexpr (sizeof(_Ty) == sizeof(float)) {
return __builtin_nanf("0");
} else {
return __builtin_nan("0");
}
case 0b011:
// _ArgA is an infinity but _ArgB is not
if (_ArgT == 1) {
return _ArgB;
}
if (_ArgT < 1) {
// towards the infinity, return it
return _ArgA;
}
// away from the infinity
return -_ArgA;
case 0b101:
// _ArgA is finite and _ArgB is an infinity
if (_ArgT == 0) {
return _ArgA;
}
if (_ArgT > 0) {
// toward the infinity
return _ArgB;
}
return -_ArgB;
case 0b111: // impossible; handled in fast path
default:
_CSTD abort();
}
}
@angelfor3v3r
Copy link

The only reasonable change is to remove it, it shouldn't be in stl, 90% of people don't need such a complex lerp function and if they do they should implement it themselves.

It’s not complex and does it’s job. You fall into the 99% case usually and it’s just a basic lerp function. If you think it’s complicated then this function is not for you.

You should go complain to the people who write the standard, not the people who implement it.

@Nielsbishere
Copy link

It is complex for a lerp function; one that's literally just a * perc + b * (1 - perc); which is literally 2 instructions for AVX2 (and that's literally 8 floats instead of just 1). This shouldn't be in stl. The only real reason I could think of is if your input isn't defined, but in 99% of the cases, they are; and that 1% shouldn't be catered to. And then you'll just have unnecessary branches and checks in a very simple function. Putting this into the standard makes it so everyone will use it (since they will assume that std is fast); lerp is commonly used and this WILL be a bottleneck if called very often.
https://gist.github.com/Joshua-Ashton/1f25704b6ccb989c03c1dd60757a6044 It shouldn't generate this, it's literally 2 asm instructions, that's it.

@angelfor3v3r
Copy link

It is complex for a lerp function; one that's literally just a * perc + b * (1 - perc); which is literally 2 instructions for AVX2 (and that's literally 8 floats instead of just 1). This shouldn't be in stl. The only real reason I could think of is if your input isn't defined, but in 99% of the cases, they are; and that 1% shouldn't be catered to. And then you'll just have unnecessary branches and checks in a very simple function. Putting this into the standard makes it so everyone will use it (since they will assume that std is fast); lerp is commonly used and this WILL be a bottleneck if called very often.
https://gist.github.com/Joshua-Ashton/1f25704b6ccb989c03c1dd60757a6044 It shouldn't generate this, it's literally 2 asm instructions, that's it.

First off, It depends on the compiler. You keep saying it's 2 instructions for AVX2 when that only depends on the compiler and target CPU.

MSVC by default compiles to SSE2 for x86-64 and usually won't use special instructions unless you specify AVX/AVX2. Basically; Compatibility over speed. There's nothing wrong in the snippit of asm you posted, go look at the label that has the "99% case" in it, it's literally ~5 SSE/2 instructions for a simple linear interpolation.

Also, go take your ideas to the people who write the STL, not the people who implement it.

And as a final note, it's not slow. A branch isn't slow on modern CPUs (since you argue the func should be using AVX instructions) since we have good branch predictors these days.

The function is going for safety over speed. Want a "faster" func? Write it yourself.

@Nielsbishere
Copy link

Even without AVX2, SSE2 is still able to do 4 floats within a few instructions. I don't see the "it's just a few instructions" as clearly it calls a few other functions (though with inlining this might be more preventable).
I do agree tho that msvc can't do anything about it since the standard defines this, so it's not up to the implementation.
Branches are still slower than not requiring branches for a few SSE or AVX instructions. Yes we do have good branch prediction but with spectre mitigations, they are slower.
My point is that 'safety over speed' is only applicable for a small percentage of people and people that see that std has lerp, will use that since they assume it is fast.

@angelfor3v3r
Copy link

angelfor3v3r commented Apr 8, 2020

My main point was bringing up the "99% case". It's done in a few instructions, the branches before that are negligible for performance.

The compiler will generally know what it's doing, more so than handwritten asm. Looks fine to me.

Also, It should be "fast enough". But do your own benchmarks if you think its slow. If someone truly cared about performance they wouldn't be using the STL all over the place. But in a modern application, it's still fast enough.

@justinmeiners
Copy link

Compare with original STL which does exactly what you would think:

https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h

@ondrej008
Copy link

Compare with original STL which does exactly what you would think:

https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h

What does that do? I don't even see a lerp function in there? Compare this gist to what, exactly?

@justinmeiners
Copy link

Compare with original STL which does exactly what you would think:
https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h

What does that do? I don't even see a lerp function in there? Compare this gist to what, exactly?

You're right. Lerp is new in the latest C++ revision, and so is not included here.
However, if you look at the functions that you would expect to be simple (inner_product, etc) they are implemented
in a straightforward manner without a fixation on generality at all costs, in every strange case.
The particular file is just one example, other parts of the repo are worth looking at.

@BillyONeal
Copy link

functions that you would expect to be simple (inner_product, etc)

inner_product is not a function where the "simple" implementation produces incorrect answers.

@justinmeiners
Copy link

justinmeiners commented Aug 15, 2020

functions that you would expect to be simple (inner_product, etc)

inner_product is not a function where the "simple" implementation produces incorrect answers.

You don't think we could find inputs that would be incorrect? We would have to seek them out, but that's what has been done for lerp.
The end result of anticipating every possible misuse is it just won't be useful for anyone. Any implementation is a compromise compared to its theoretical design. As other posters have mentioned, who is this lerp for?

Let's assume you're right though. inner_product was just one example. I shared the link to show that the STL used to be extremely simple. It's doubtful ANY of the functions or data structures you find in the original STL are even close to as long or complex as the current standard library has made them. Can you find any of them which has become more simple in any major implementation?

@BillyONeal
Copy link

The priorities are (1) correctness, (2) performance, (3) simplicity. The result of that is that simplicity goes down over time in favor of correctness or perf.

lerp got standardized precisely because a correct implementation is not simple. All the normal floating point rules already do the correct thing for inner_product (e.g. always take the leftmost NaN, raise the right floating point exceptions, don't accumulate rounding errors within the algorithm, etc.).

If you want the 'simple' one then go for it. The library function doesn't exist for you.

@justinmeiners
Copy link

justinmeiners commented Sep 6, 2020

Correctness is relative to a domain. You can often get all three (correctness, performance, and simplicity) by constraining your domain to more reasonable inputs to avoid pathological results. For example, some may say code which is not threadsafe is not correct. You can resolve this by throwing locks on everything, or you can just declare that accessing it from multiple threads is incorrect. Could we accomplish the same with Lerp by specifying a reasonable domain instead?

@yamirui
Copy link

yamirui commented Aug 5, 2022

If someone truly cared about performance they wouldn't be using the STL all over the place

Ah yes, great, if I cared about performance, I'd instead be reinventing my own wheels instead of standing on shoulders of giants, of course I would do that, such an amazing idea, surely one that works out just great for many people...

@angelfor3v3r
Copy link

angelfor3v3r commented Aug 5, 2022

If someone truly cared about performance they wouldn't be using the STL all over the place

Ah yes, great, if I cared about performance, I'd instead be reinventing my own wheels instead of standing on shoulders of giants, of course I would do that, such an amazing idea, surely one that works out just great for many people...

This is more common than you think in high performance applications in the professional workspace. Look at Rad Game Tools for example.

But, I digress… I too enjoy using the STL as much as I can.

You can take your trolling elsewhere though, with all the sarcasm and such.

@TruncatedDinoSour
Copy link

beautiful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment