Skip to content

Instantly share code, notes, and snippets.

@lionello
Last active Apr 7, 2022
Embed
What would you like to do?
__sync_val_compare_and_swap for __uint128_t GCC built-in
/*
__sync_val_compare_and_swap that works with __uint128_t, by Lionello Lunesu.
Placed in the public domain
*/
#undef NDEBUG
#include <assert.h>
inline __uint128_t InterlockedCompareExchange128( volatile __uint128_t * src, __uint128_t cmp, __uint128_t with )
{
__asm__ __volatile__
(
"lock cmpxchg16b %1"
: "+A" ( cmp )
, "+m" ( *src )
: "b" ( (long long)with )
, "c" ( (long long)(with>>64) )
: "cc"
);
return cmp;
}
int main(int argc, char* argv[])
{
__uint128_t a=0, b=0, c=0x0123456789ABCDEFULL;
c <<= 64;
c |= 0xFEDCBA9876543210ULL;
assert(b == InterlockedCompareExchange128(&a, b, c));
assert(a == c);
assert(c == InterlockedCompareExchange128(&a, b, b));
assert(a == c);
assert(c == InterlockedCompareExchange128(&a, c, b));
assert(a == b);
assert(b == InterlockedCompareExchange128(&a, c, c));
assert(a == b);
return 0;
}
@jkriegshauser
Copy link

jkriegshauser commented Jan 12, 2020

InterlockedCompareExchange128 returns a boolean based on success/failure of the CAS operation, and takes the Exchange value as two separate quad-words. The following gnu asm should replicate the Windows function.

bool InterlockedCompareExchange128(long long volatile* Destination, long long ExchangeHigh, long long ExchangeLow, long long *ComparandResult)
{
    bool success;
    __asm__ __volatile__
    (
        "lock cmpxchg16b %3"
        : "=@ccz" (success)
        , "=a" (ComparandResult[0])
        , "=d" (ComparandResult[1])
        , "+m" (*Destination)
        : "b" (ExchangeLow)
        , "c" (ExchangeHigh)
        , "a" (ComparandResult[0])
        , "d" (ComparandResult[1])
        , "m" (*Destination)
        : "cc"
    );
    return success;
}

Here's the godbolt link

@lionello
Copy link
Author

lionello commented Jan 12, 2020

@jkriegshauser Thanks!

@moon-chilled
Copy link

moon-chilled commented Apr 6, 2022

I think you don't need to specify cc as clobber, since ccz is already an output; and similarly you can specify ComparandResult[0,1] as input/output results (+ instead of =) rather than specifying them separately as input and clobber.

So it simplifies to just

 __asm__("lock cmpxchg %3" : "=@cce" (success),
                             "+a" (ComparandResult[0]),
                             "+d" (ComparandResult[1]),
                             "+m" (*Destination),
                           : "b" (ExchangeLow),
                             "c" (ExchangeHigh));

Also, don't need to mark it as 'volatile'. With memory clobber of destination, it won't be elided because it's aliased. This gives acqrel semantics; you can set blanket memory clobber if you want seqcst semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment