// float32 | |
// Martin Kallman | |
// | |
// Fast half-precision to single-precision floating point conversion | |
// - Supports signed zero and denormals-as-zero (DAZ) | |
// - Does not support infinities or NaN | |
// - Few, partially pipelinable, non-branching instructions, | |
// - Core opreations ~6 clock cycles on modern x86-64 | |
void float32(float* __restrict out, const uint16_t in) { | |
uint32_t t1; | |
uint32_t t2; | |
uint32_t t3; | |
t1 = in & 0x7fff; // Non-sign bits | |
t2 = in & 0x8000; // Sign bit | |
t3 = in & 0x7c00; // Exponent | |
t1 <<= 13; // Align mantissa on MSB | |
t2 <<= 16; // Shift sign bit into position | |
t1 += 0x38000000; // Adjust bias | |
t1 = (t3 == 0 ? 0 : t1); // Denormals-as-zero | |
t1 |= t2; // Re-insert sign bit | |
*((uint32_t*)out) = t1; | |
}; | |
// float16 | |
// Martin Kallman | |
// | |
// Fast single-precision to half-precision floating point conversion | |
// - Supports signed zero, denormals-as-zero (DAZ), flush-to-zero (FTZ), | |
// clamp-to-max | |
// - Does not support infinities or NaN | |
// - Few, partially pipelinable, non-branching instructions, | |
// - Core opreations ~10 clock cycles on modern x86-64 | |
void float16(uint16_t* __restrict out, const float in) { | |
uint32_t inu = *((uint32_t*)&in); | |
uint32_t t1; | |
uint32_t t2; | |
uint32_t t3; | |
t1 = inu & 0x7fffffff; // Non-sign bits | |
t2 = inu & 0x80000000; // Sign bit | |
t3 = inu & 0x7f800000; // Exponent | |
t1 >>= 13; // Align mantissa on MSB | |
t2 >>= 16; // Shift sign bit into position | |
t1 -= 0x1c000; // Adjust bias | |
t1 = (t3 > 0x38800000) ? 0 : t1; // Flush-to-zero | |
t1 = (t3 < 0x8e000000) ? 0x7bff : t1; // Clamp-to-max | |
t1 = (t3 == 0 ? 0 : t1); // Denormals-as-zero | |
t1 |= t2; // Re-insert sign bit | |
*((uint16_t*)out) = t1; | |
}; |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
stingoh
commented
Jul 14, 2014
I saw this answer on stackoverflow but do not have enough (any!) rep to comment. On line 40 you are doing type punning (from float* to int_). When compiling this with strict aliasing (which gcc and clang allow you to set and I believe on gcc it defaults to true at -O2), you will run into trouble and more likely so if a call to float16() gets inlined. Under strict aliasing rules, pointers of different types are assumed to not alias. Therefore, reads are writes to the same address, if done via pointers of different types (here float_ and int*) are considered independent, and thus can be re-ordered by the compiler. So with float16() getting inlined, 'inu' could be read before the calling code performs the write to that address. The proper way to do this would be via a union. Visual C++ stopped exposing strict/non-strict aliasing settings a long time ago so it wouldn't actually give you issues, but other compilers yes. Cheers. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
fjansson
Jul 22, 2015
I also found this on stackoverflow (http://stackoverflow.com/questions/1659440/32-bit-to-16-bit-floating-point-conversion), will comment here too.
In float16, the Clamp-to-max test is clearly wrong, it is always triggered. The flush-to-zero test has the comparison sign the wrong way. I think the two tests should be:
t1 = (t3 < 0x38800000) ? 0 : t1;
t1 = (t3 > 0x47000000) ? 0x7bff : t1;
fjansson
commented
Jul 22, 2015
I also found this on stackoverflow (http://stackoverflow.com/questions/1659440/32-bit-to-16-bit-floating-point-conversion), will comment here too. In float16, the Clamp-to-max test is clearly wrong, it is always triggered. The flush-to-zero test has the comparison sign the wrong way. I think the two tests should be:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
vmarkovtsev
Aug 17, 2015
The code which converts float16 to float32 does not deal with ±∞ and NaN. There is a reference implementation from e.g. Numpy: https://github.com/numpy/numpy/blob/master/numpy/core/src/npymath/halffloat.c#L466
vmarkovtsev
commented
Aug 17, 2015
The code which converts float16 to float32 does not deal with ±∞ and NaN. There is a reference implementation from e.g. Numpy: https://github.com/numpy/numpy/blob/master/numpy/core/src/npymath/halffloat.c#L466 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
anouarIT
commented
Feb 10, 2017
Hi, do you have an idea how i do the some thing with JavaScript please ? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
TaihuLight
Mar 15, 2017
Cloud you give some demo for test the code? and I do not know uint32_t and uint16_t where declared ?
TaihuLight
commented
Mar 15, 2017
Cloud you give some demo for test the code? and I do not know uint32_t and uint16_t where declared ? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment Hide comment
Dmitro25
Mar 28, 2018
Agree to fjansson. The code should be corrected to his variant.
E.g. test case for float32(float16(1.0)) gives wrong result for martinkallman code.
Dmitro25
commented
Mar 28, 2018
Agree to fjansson. The code should be corrected to his variant. |
I saw this answer on stackoverflow but do not have enough (any!) rep to comment. On line 40 you are doing type punning (from float* to int_). When compiling this with strict aliasing (which gcc and clang allow you to set and I believe on gcc it defaults to true at -O2), you will run into trouble and more likely so if a call to float16() gets inlined. Under strict aliasing rules, pointers of different types are assumed to not alias. Therefore, reads are writes to the same address, if done via pointers of different types (here float_ and int*) are considered independent, and thus can be re-ordered by the compiler. So with float16() getting inlined, 'inu' could be read before the calling code performs the write to that address.
The proper way to do this would be via a union.
Visual C++ stopped exposing strict/non-strict aliasing settings a long time ago so it wouldn't actually give you issues, but other compilers yes.
Cheers.