Skip to content

Instantly share code, notes, and snippets.

@Joshua-Ashton
Last active August 5, 2022 22:16
Show Gist options
  • Save Joshua-Ashton/04f666b8a0a0a15f6ab133937f6e0db8 to your computer and use it in GitHub Desktop.
Save Joshua-Ashton/04f666b8a0a0a15f6ab133937f6e0db8 to your computer and use it in GitHub Desktop.
template <class _Ty>
_NODISCARD /* constexpr */ _Ty _Common_lerp(const _Ty _ArgA, const _Ty _ArgB, const _Ty _ArgT) noexcept {
// on a line intersecting {(0.0, _ArgA), (1.0, _ArgB)}, return the Y value for X == _ArgT
const int _Finite_mask = (int{isfinite(_ArgA)} << 2) | (int{isfinite(_ArgB)} << 1) | int{isfinite(_ArgT)};
if (_Finite_mask == 0b111) {
// 99% case, put it first; this block comes from P0811R3
if ((_ArgA <= 0 && _ArgB >= 0) || (_ArgA >= 0 && _ArgB <= 0)) {
// exact, monotonic, bounded, determinate, and (for _ArgA == _ArgB == 0) consistent:
return _ArgT * _ArgB + (1 - _ArgT) * _ArgA;
}
if (_ArgT == 1) {
// exact
return _ArgB;
}
// exact at _ArgT == 0, monotonic except near _ArgT == 1, bounded, determinate, and consistent:
const auto _Candidate = _ArgA + _ArgT * (_ArgB - _ArgA);
// monotonic near _ArgT == 1:
if ((_ArgT > 1) == (_ArgB > _ArgA)) {
if (_ArgB > _Candidate) {
return _ArgB;
}
} else {
if (_Candidate > _ArgB) {
return _ArgB;
}
}
return _Candidate;
}
if (isnan(_ArgA)) {
return _ArgA;
}
if (isnan(_ArgB)) {
return _ArgB;
}
if (isnan(_ArgT)) {
return _ArgT;
}
switch (_Finite_mask) {
case 0b000:
// All values are infinities
if (_ArgT >= 1) {
return _ArgB;
}
return _ArgA;
case 0b010:
case 0b100:
case 0b110:
// _ArgT is an infinity; return infinity in the "direction" of _ArgA and _ArgB
return _ArgT * (_ArgB - _ArgA);
case 0b001:
// Here _ArgA and _ArgB are infinities
if (_ArgA == _ArgB) {
// same sign, so T doesn't matter
return _ArgA;
}
// Opposite signs, choose the "infinity direction" according to T if it makes sense.
if (_ArgT <= 0) {
return _ArgA;
}
if (_ArgT >= 1) {
return _ArgB;
}
// Interpolating between infinities of opposite signs doesn't make sense, NaN
if constexpr (sizeof(_Ty) == sizeof(float)) {
return __builtin_nanf("0");
} else {
return __builtin_nan("0");
}
case 0b011:
// _ArgA is an infinity but _ArgB is not
if (_ArgT == 1) {
return _ArgB;
}
if (_ArgT < 1) {
// towards the infinity, return it
return _ArgA;
}
// away from the infinity
return -_ArgA;
case 0b101:
// _ArgA is finite and _ArgB is an infinity
if (_ArgT == 0) {
return _ArgA;
}
if (_ArgT > 0) {
// toward the infinity
return _ArgB;
}
return -_ArgB;
case 0b111: // impossible; handled in fast path
default:
_CSTD abort();
}
}
@Joshua-Ashton
Copy link
Author

The MSVC STL devs are very smart

I beg to differ.

@angelfor3v3r
Copy link

angelfor3v3r commented Apr 4, 2020

Way to take the rest of my post out of context. Fix the code if you find it so bad.

Your other gists showing the raw x86-64 asm is also wrong because both of the asm do the same thing except the one you claim is “good” is exactly the same as this snippet minus the sanity checks... which is a bad thing.

@marzer
Copy link

marzer commented Apr 4, 2020

The MSVC STL devs are very smart

I beg to differ.

Arrogant. They're objectively intelligent. And also human beings. Microsoft's STL is open source; if you think they've missed an implementation opportunity, submit a bug report or pull request, instead of just sniping at them without giving them a right-of-reply.

Speaking of right-of-reply: @StephanTLavavej @CaseyCarter @BillyONeal

@ondrej008
Copy link

What's wrong with this?
The only improvements you could make to this is assuming the parameters are finite and stuff like that, but then this wouldn't be a generic/standard lerp.

@Joshua-Ashton
Copy link
Author

The only improvements you could make to this is assuming the parameters are finite and stuff like that, but then this wouldn't be a generic/standard lerp.

The C++ std::lerp is defined in a way such that it can be implemented without finite checks like this if you order things right.

@Joshua-Ashton
Copy link
Author

Honestly, may as well just toss this entire gist up as a failed trolling attempt. Good try though.

Please view the asm this generates compared to a simpler lerp (or even the one from libc++) It's pretty insane given how common lerp can be [and how common in fast paths] in some codebases.

@angelfor3v3r
Copy link

The C++ std::lerp is defined in a way such that it can be implemented without finite checks like this if you order things right.

You're incorrect.

Read "26.8.4 [c.math.lerp]":
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/n4849.pdf

Constexpr is obviously still missing, but that's why its a work in progress.

Please view the asm this generates compared to a simpler lerp (or even the one from libc++) It's pretty insane given how common lerp can be [and how common in fast paths] in some codebases.

What does this even mean? The first few branches to std::isfinite is a sanity check which the standard mandates and the first if statement produces identical asm across compilers.

The other cases (like the switch cases) are for handling edge-cases and are also required according to the latest standard.

Once again, unless you have some valid issues in the snippit of code you provided then don't bother replying. Even then, if you do have an issue then go create an issue on the MSVC STL github repo -> https://github.com/microsoft/STL

Or fix it yourself.

@ocornut
Copy link

ocornut commented Apr 4, 2020

The problem, as pointed out by the title of this gist, is that the C++ standard tends to push implementers into creating those monstrosities. Thus it’s probably not even fully fixable, the only fix is not avoid using those functions (or fork C++ back into sanity).

@angelfor3v3r
Copy link

Complaining about speed and safety? Seems odd to me. I can't defend your point or the creator of the gists. Their implementation is early but is objectively fast and safe.

@dmitsuki
Copy link

dmitsuki commented Apr 4, 2020

The point is it's only "speed and safety" given the context of the C++ standards committee's "standards".

@Joshua-Ashton
Copy link
Author

What does this even mean? The first few branches to std::isfinite is a sanity check which the standard mandates

image

The C++ standard dictates what the return values should be for given inputs and types of inputs, not the implementation.

Implementing this genuinely with isnan and isfinite is incredibly naïve, as I assure you it can be done by just understanding the quirks of NaN/INF and how comparisons with these types work. (NaN != NaN, comparison ordering, etc)

This also brings up the point that we still don't have a singular standardized STL implementation like other languages std libs which leads this to happen.

@Joshua-Ashton
Copy link
Author

You're making yourself look dumber with each post. "NaN != NaN" is compiler-specific and setting-specific (fast floating point math). Your point means nothing.

This is the MSVC implementation -- therefore it can adhere to the floating point standard it implements, which is IEEE 754 (not like anyone implements anything else anyway.)

As for /fp:fast, nobody using that will give a shit about these checks if they're already saying they don't give a shit about them by using that flag.
Alternatively though, they could ifdef this for that case, but that would suck and go against what /fp:fast is for.

@StephanTLavavej
Copy link

I would very much like to have a simpler implementation that handles all of the corner cases described in microsoft/STL#65 .

Should the Standard provide a separate function for linear interpolation that explicitly doesn't attempt to handle corner cases and can be directly implemented with the obvious expression? Maybe, but that's a question for the Library Evolution Working Group - our job as STL implementers is to Do What The Standard Says even when the Standard says (indirectly) to do something very complicated.

@dmitsuki
Copy link

dmitsuki commented Apr 5, 2020

@StephanTLavavej
Yep, so it's like I thought. It's written about as good as you are going to get given the confines of the standards, so the problem is not with the implementation, but the standard.

@BillyONeal
Copy link

problem is not with the implementation, but the standard

I don't even see this as a problem with the standard. Yes, this implementation is long. But most of the time you fall into the 'inputs are finite' case.

NaN != NaN

Unfortunately, not in some modes for some of the compilers we support.

As for /fp:fast, nobody using that will give a shit about these checks if they're already saying they don't give a shit about them by using that flag.

I disagree. There is value in allowing the compiler to make inferences like x==x is true without removing explicit isnan tests. /fp:fast means 'I don't care about every last ULP' which allows floating point loops to be vectorized or operations to be fused; not making NaNs infectious is a substantially bigger change than that.

I assure you it can be done by just understanding the quirks of NaN/INF and how comparisons with these types work

I don't believe passing our current battery of tests is possible without the explicit tests, though mostly due to infinites rather than NaNs, because almost everything turns into infinity*0 => NaN. However, since we shipped that implementation the authors of P0811 indicated that they desire different behavior for infinites which may allow removal of the explicit tests (see issue linked below).

If anyone is interested in actual product change I recommend continuing over in microsoft/STL#65. If you do, please be mindful of our Code of Conduct and refrain from personal attacks. Saying 'XYZ is broken' is not helpful. Saying 'XYZ should work like this' is.

@MightyPrinny
Copy link

The only reasonable change is to remove it, it shouldn't be in stl, 90% of people don't need such a complex lerp function and if they do they should implement it themselves.

@angelfor3v3r
Copy link

The only reasonable change is to remove it, it shouldn't be in stl, 90% of people don't need such a complex lerp function and if they do they should implement it themselves.

It’s not complex and does it’s job. You fall into the 99% case usually and it’s just a basic lerp function. If you think it’s complicated then this function is not for you.

You should go complain to the people who write the standard, not the people who implement it.

@Nielsbishere
Copy link

It is complex for a lerp function; one that's literally just a * perc + b * (1 - perc); which is literally 2 instructions for AVX2 (and that's literally 8 floats instead of just 1). This shouldn't be in stl. The only real reason I could think of is if your input isn't defined, but in 99% of the cases, they are; and that 1% shouldn't be catered to. And then you'll just have unnecessary branches and checks in a very simple function. Putting this into the standard makes it so everyone will use it (since they will assume that std is fast); lerp is commonly used and this WILL be a bottleneck if called very often.
https://gist.github.com/Joshua-Ashton/1f25704b6ccb989c03c1dd60757a6044 It shouldn't generate this, it's literally 2 asm instructions, that's it.

@angelfor3v3r
Copy link

It is complex for a lerp function; one that's literally just a * perc + b * (1 - perc); which is literally 2 instructions for AVX2 (and that's literally 8 floats instead of just 1). This shouldn't be in stl. The only real reason I could think of is if your input isn't defined, but in 99% of the cases, they are; and that 1% shouldn't be catered to. And then you'll just have unnecessary branches and checks in a very simple function. Putting this into the standard makes it so everyone will use it (since they will assume that std is fast); lerp is commonly used and this WILL be a bottleneck if called very often.
https://gist.github.com/Joshua-Ashton/1f25704b6ccb989c03c1dd60757a6044 It shouldn't generate this, it's literally 2 asm instructions, that's it.

First off, It depends on the compiler. You keep saying it's 2 instructions for AVX2 when that only depends on the compiler and target CPU.

MSVC by default compiles to SSE2 for x86-64 and usually won't use special instructions unless you specify AVX/AVX2. Basically; Compatibility over speed. There's nothing wrong in the snippit of asm you posted, go look at the label that has the "99% case" in it, it's literally ~5 SSE/2 instructions for a simple linear interpolation.

Also, go take your ideas to the people who write the STL, not the people who implement it.

And as a final note, it's not slow. A branch isn't slow on modern CPUs (since you argue the func should be using AVX instructions) since we have good branch predictors these days.

The function is going for safety over speed. Want a "faster" func? Write it yourself.

@Nielsbishere
Copy link

Even without AVX2, SSE2 is still able to do 4 floats within a few instructions. I don't see the "it's just a few instructions" as clearly it calls a few other functions (though with inlining this might be more preventable).
I do agree tho that msvc can't do anything about it since the standard defines this, so it's not up to the implementation.
Branches are still slower than not requiring branches for a few SSE or AVX instructions. Yes we do have good branch prediction but with spectre mitigations, they are slower.
My point is that 'safety over speed' is only applicable for a small percentage of people and people that see that std has lerp, will use that since they assume it is fast.

@angelfor3v3r
Copy link

angelfor3v3r commented Apr 8, 2020

My main point was bringing up the "99% case". It's done in a few instructions, the branches before that are negligible for performance.

The compiler will generally know what it's doing, more so than handwritten asm. Looks fine to me.

Also, It should be "fast enough". But do your own benchmarks if you think its slow. If someone truly cared about performance they wouldn't be using the STL all over the place. But in a modern application, it's still fast enough.

@justinmeiners
Copy link

Compare with original STL which does exactly what you would think:

https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h

@ondrej008
Copy link

Compare with original STL which does exactly what you would think:

https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h

What does that do? I don't even see a lerp function in there? Compare this gist to what, exactly?

@justinmeiners
Copy link

Compare with original STL which does exactly what you would think:
https://github.com/justinmeiners/sgi-stl/blob/master/stl_numeric.h

What does that do? I don't even see a lerp function in there? Compare this gist to what, exactly?

You're right. Lerp is new in the latest C++ revision, and so is not included here.
However, if you look at the functions that you would expect to be simple (inner_product, etc) they are implemented
in a straightforward manner without a fixation on generality at all costs, in every strange case.
The particular file is just one example, other parts of the repo are worth looking at.

@BillyONeal
Copy link

functions that you would expect to be simple (inner_product, etc)

inner_product is not a function where the "simple" implementation produces incorrect answers.

@justinmeiners
Copy link

justinmeiners commented Aug 15, 2020

functions that you would expect to be simple (inner_product, etc)

inner_product is not a function where the "simple" implementation produces incorrect answers.

You don't think we could find inputs that would be incorrect? We would have to seek them out, but that's what has been done for lerp.
The end result of anticipating every possible misuse is it just won't be useful for anyone. Any implementation is a compromise compared to its theoretical design. As other posters have mentioned, who is this lerp for?

Let's assume you're right though. inner_product was just one example. I shared the link to show that the STL used to be extremely simple. It's doubtful ANY of the functions or data structures you find in the original STL are even close to as long or complex as the current standard library has made them. Can you find any of them which has become more simple in any major implementation?

@BillyONeal
Copy link

The priorities are (1) correctness, (2) performance, (3) simplicity. The result of that is that simplicity goes down over time in favor of correctness or perf.

lerp got standardized precisely because a correct implementation is not simple. All the normal floating point rules already do the correct thing for inner_product (e.g. always take the leftmost NaN, raise the right floating point exceptions, don't accumulate rounding errors within the algorithm, etc.).

If you want the 'simple' one then go for it. The library function doesn't exist for you.

@justinmeiners
Copy link

justinmeiners commented Sep 6, 2020

Correctness is relative to a domain. You can often get all three (correctness, performance, and simplicity) by constraining your domain to more reasonable inputs to avoid pathological results. For example, some may say code which is not threadsafe is not correct. You can resolve this by throwing locks on everything, or you can just declare that accessing it from multiple threads is incorrect. Could we accomplish the same with Lerp by specifying a reasonable domain instead?

@yamirui
Copy link

yamirui commented Aug 5, 2022

If someone truly cared about performance they wouldn't be using the STL all over the place

Ah yes, great, if I cared about performance, I'd instead be reinventing my own wheels instead of standing on shoulders of giants, of course I would do that, such an amazing idea, surely one that works out just great for many people...

@angelfor3v3r
Copy link

angelfor3v3r commented Aug 5, 2022

If someone truly cared about performance they wouldn't be using the STL all over the place

Ah yes, great, if I cared about performance, I'd instead be reinventing my own wheels instead of standing on shoulders of giants, of course I would do that, such an amazing idea, surely one that works out just great for many people...

This is more common than you think in high performance applications in the professional workspace. Look at Rad Game Tools for example.

But, I digress… I too enjoy using the STL as much as I can.

You can take your trolling elsewhere though, with all the sarcasm and such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment