Skip to content

Instantly share code, notes, and snippets.

@cocowalla
Last active April 23, 2023 15:07
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cocowalla/6070a53445e872f2bb24304712a3e1d2 to your computer and use it in GitHub Desktop.
Save cocowalla/6070a53445e872f2bb24304712a3e1d2 to your computer and use it in GitHub Desktop.
64-bit integer multiplication
// Non-intrinsics path is based on https://stackoverflow.com/a/51587262/25758, which is faster than any other portable 64-bit multiplication function I've found
public static class Maths
{
/// <summary>
/// Multiplies 2 unsigned 64-bit integers, returning the result in 2 ulongs representing the hi and lo bits
/// of the resulting 128-bit integer
/// </summary>
/// <remarks>
/// <seealso cref="System.Numerics.BigInteger"/> can perform multiplication on large integers, but it's
/// comparatively slow, and an equivalent method allocates around 360B/call
/// </remarks>
/// <param name="x">First 64-bit integer</param>
/// <param name="y">Second 64-bit integer</param>
/// <returns>Product of <paramref name="x"/> and <paramref name="y"/></returns>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static (ulong Hi, ulong Lo) Multiply64(ulong x, ulong y)
{
ulong hi;
ulong lo;
// Use BMI2 intrinsics where available
#if NETCOREAPP3_0
if (System.Runtime.Intrinsics.X86.Bmi2.X64.IsSupported)
{
hi = System.Runtime.Intrinsics.X86.Bmi2.X64.MultiplyNoFlags(x, y, &lo);
return (hi, lo);
}
#endif
lo = x * y;
ulong x0 = (uint)x;
ulong x1 = x >> 32;
ulong y0 = (uint)y;
ulong y1 = y >> 32;
ulong p11 = x1 * y1;
ulong p01 = x0 * y1;
ulong p10 = x1 * y0;
ulong p00 = x0 * y0;
// 64-bit product + two 32-bit values
ulong middle = p10 + (p00 >> 32) + (uint)p01;
// 64-bit product + two 32-bit values
hi = p11 + (middle >> 32) + (p01 >> 32);
return (hi, lo);
}
}
@kaalus
Copy link

kaalus commented Apr 3, 2020

Both x64 and ARM8 architectures contain native instructions that multiply two 64 bits numbers and give 128 bit result. In fact, these instructions are emitted by the jitter when you perform long * long multiplication in C# on a 64 bit platform. It's just that the jitter discards the highest 64 bits of the result. So, the above solution, while much faster than BigInteger, is still very unsatisfying and inefficient compared to what we could have if the language allowed us to access the highest 64 bits of the result instead of discarding them.

@cocowalla
Copy link
Author

@kaalus yes, I agree that it shouldn't have to be so difficult on .NET Core 😅

Note that my original solution here does already make use of the built in ulong * ulong to get only the low bits.

I've just updated it so it can use BMI2 intrinsics if they are available! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment