Skip to content

Instantly share code, notes, and snippets.

@asb
Last active May 20, 2017 10:31
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save asb/a3a54c57281447fc7eac1eec3a0763fa to your computer and use it in GitHub Desktop.
Save asb/a3a54c57281447fc7eac1eec3a0763fa to your computer and use it in GitHub Desktop.

[RFC] Addressing the problems with opaque floating point encodings in F/D/Q

Alex Bradbury, lowRISC CIC

Brief problem summary

The current version of the RISC-V ISA specification explicitly leaves the encoding of a single precision value undefined when it is either converted to a wider integer register or written to a wider memory location (e.g. a float written with fsd or fmv.x.d. The motivation for this is to allow a low overhead internal recoding. Unfortunately, the freedom to keep this encoding opaque is illusory, as the chosen encoding is architecturally visible in a way that can cause real compatibility issues unless it is standardised.

See issue #30 on riscv-isa-manual for further discussion. Thank you to all contributors to the discussion there: Krste Asanovic, David Horner, Andrew Waterman and especially Stefan O'Rear for helping me to understand the scope of the problem.

A version of this documented was submitted for discussion on the isa-dev mailing list and triggered substantial discussion. You may want to skip ahead to the resulting proposal from Krste, and his follow-on proposal in a new thread.

Background and more detailed problem description

The RISC-V ISA specification describes floating point support in the 'F' (single precision), 'D' (double precision), and 'Q' (quad precision) extensions. The proposed vector extension additionally introduces scalar instructions for half precision floats. These extensions each build upon each other, and introduce a 32 floating point registers with length ('flen') dependent on the maximum extension implemented. e.g. RV32IF has 32x32-bit FPRs, while RV32IFD has 32x64-bit FPRs. This document will focus on floats and doubles, although analogous issues exist for quad and half precision values.
This document does not (yet) consider potential interaction with the future 'L' decimal floating point extension, which has yet to be written.

A single precision value stored to memory using fsw will be encoded according to the IEEE-754-2008 standard. Similarly for a double stored using fsd. However, the encoding for a single precision value written using fsd is currently implementation defined. This can occur when a callee-saved register is spilled (saved on the stack), or when register state is saved while context switching. At the point the point the state is stored, it won't be known what type of value resides in the FPR. The problem is most obviously apparent if you consider a process running on a core that uses one encoding, which is then migrated to a core core that uses a different one - for example switching cores in a heterogeneous cluster.

To clarify the discussion, I should also point out where this is not a concern. It will not pose a problem for shared data structures containing doubles, as for correctness the compiler must insert a cast for cases like below (i.e. an fcvt.d.s instruction):

double d;

void callee(float f) {
  d = f;
}

Impact

Fundamentally, this issue affects any case where registers are spilled using one encoding, and potentially read back and interpreted by an FPU that expects a different one:

  • Task migration across heterogeneous cores within the same SoC.
  • Task migration across different devices, either using virtual machine migration or something like CRIU in Linux
  • Debug and validation. e.g. it's useful to verify one implementation against another by comparing the same instruction produces the same results. (See here).
  • Performing design space exploration, performance analysis, or testing in a way that might involve switching between different models (e.g. a "fast" and a "slow" model).

Being able to safely transfer state between cores that claim to implement the same ISA string (within reason) is a useful property, and this kind of ability is something that is actively used with other architectures. For instance, provided I ensure a minimal cpuid is exposed I am free to transfer virtual machines between Intel and AMD implementations of x86, without worrying that spilled floating point state will be corrupted (interpreted differently).

Solutions

The first possibility is we leave things as they are, and hope this isn't an issue in practice. This would harm the ability to mix cores from different vendors (or even cores by different design teams in the same company, unless they properly synchronise on this issue). I think the only reasonable path is to decide upon a standardised encoding/serialisation. In fact, to avoid issues such as this in the future we should consider any state directly accessible to user-level code to be architecturally visible and subject to a standard encoding.

First of all, a refresher on IEEE FP:

  • Single precision (32 bits): 1 bit sign, 8 bit exponent, 23 bit significand
  • Double precision (64 bits): 1 bit sign, 11 bit exponent, 52 bit significand

There are a number of choices for how you may arrange to store and encode single and double precision values in your register file:

  • Have a single floating point register file of fixed width (e.g. 32 32-bit registers), and use register pairs to support double precision floating point.

    • MIPS, among others, use this approach
    • Gives the advantage of allowing ABI compatibility when adding support for higher precision. As it stands the 'Q' extension introduces a completely new ABI requiring a complete rebuild (RV32E, RV32I, RV32IF, RV64IF, RV64IFD, RV64IFDQ are all different ABIs).
    • The RISC-V spec doesn't allow this approach
    • Downside: using a register pair means two single precision registers are now unavailable for use
  • Pack floats tightly inside wider double registers. Logically 'f2' refers to a different register depending on if it's used in an F or a D instruction (the lower 32 bits of the 2nd 64-bit register in an F context, and the whole of the 3rd 64-bit register in a D context).

    • ARMv7 (and below?) used this approach
    • Downside: writing to any of the first 16 64-bit registers using a double operation clobbers two registers for single precision use
    • The RISC-V spec doesn't allow this approach
  • Store a single precision float in the lower half of a 64-bit register. This is perhaps the most obvious solution.

    • Allowed by the current RISC-V spec
    • Simple and easy to understand. AArch64 and presumably a number of other architectures use this approach.
  • Standardise on the UCB recoded format used in Rocket's current FPU

    • Is this documented anywhere?
  • When performing an flw on a system with 64-bit FPRs, unpack the exponent and significant into the appropriate locations in the 64-bit register. Perform appropriate masking/rounding when executing single-precision operations

    • Andrew indicates this approach was used by POWER6 and Alpha, but may hurt single precision latency
  • Come up with a new NaN-based encoding (as first suggested by Krste)

    • A double-precision NaN is represented as a value where the exponent is all 1s and the 52-bit significand is non-zero. This is a huge encoding space, and a standard encoding could easily be chosen
    • There is an advantage in that debug tools could determine with a high degree of certainty whether the dumped state from a floating point register is holding a value that is meant to be interpreted as a single-precision float
    • Similar encodings could be used to represent a half precision value in a float register, or a double in a quad register
    • There's perhaps more flexibility for eagerly recoding what seems to be a single-precision float to a different internal representation upon an fld (rather than on-demand when a single precision operation is performed). However, for IEEE compliance any such value would still need to act as a NaN when used in a double precision operation.

I argue that what matters above all else, is that one of these options is chosen and used consistently. It's worth nothing that an implementation is still free to use a different internal recoding, it would just need to support serialising and deserialising to the standard encoding that is chosen.

Backwards compatibility impact

I believe this change can be made in a backwards compatible way (i.e. all standards-compliant RISC-V software would continue to work on a newer revision). It also seems likely there is still time to specify this change and have it adopted before any RISC-V FPU implementations are available in shipping systems.

Other related issues

  • An RV64IFD system hoping to support Q would, as it stands, have to break ABI compatibility when Q is in use, unless all Q state is viewed as caller-saved. Is the cost of adding yet another ABI worth it, or can it be avoided?

Conclusion/summary

  • Leaving the encoding of lower precision values in higher precision fp registers undefined appears to give more microarchitectural freedom, but in reality this is an architecturally visible property that 'leaks' and causes potential issues in use cases that RISC-V community should care about (e.g. migration on a heterogeneous cluster)
  • The RISC-V community would benefit from standardising on a single externally visible encoding ('serialisation'), and doing so quickly
@sorear
Copy link

sorear commented Mar 23, 2017

A nit, but you can use Q code with the non-Q ABI (since function calls only save and restore the "low 64 bits", you need to spill long doubles around function calls).

User spec v2.1 §7.3 first paragraph specifies signalling behavior (MSB of the significand, 1=qNaN, 0=sNaN)

@asb
Copy link
Author

asb commented Mar 23, 2017

Hi Stefan, I've tried to clarify the relevance of Q ABI concerns to choices about floating point encoding/register aliasing here https://groups.google.com/a/groups.riscv.org/d/msg/isa-dev/_r7hBlzsEd8/IUIXWEZ4AwAJ. [Edit: I suppose in addition any software that tried to do user-level preemptive threading (e.g. using SIGALRM) in a process using Q operations would also need to be changed, but this is a non-standard thing to do anyway, and would likely have issues with other extensions as well].
[Edit2: To really nit pick - whether an fsd of a register holding a quad value stores the low 64-bits or not isn't defined, is it? Either way, you want to spill all your quads around function calls if you want to maintain the RV64IFD ABI can't can't rely on other knowledge]

Thanks, I'd been searching for 'signaling' and missed that reference. I agree MSB=1 => qNaN is most logical but it could perhaps be slightly more explicit given different systems have used the opposite convention in the past. This is a minor nit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment