asb/riscv-opaque-fp-rfc.mkd

## riscv-opaque-fp-rfc.mkd

      
    Raw
  

              riscv-opaque-fp-rfc.mkd
            
          
    [RFC] Addressing the problems with opaque floating point encodings in F/D/Q

Alex Bradbury, lowRISC CIC
Brief problem summary

The current version of the RISC-V ISA specification explicitly leaves the
encoding of a single precision value undefined when it is either converted to
a wider integer register or written to a wider memory location (e.g. a float
written with fsd or fmv.x.d. The motivation for this is to allow a low
overhead internal recoding. Unfortunately, the freedom to keep this
encoding opaque is illusory, as the chosen encoding is architecturally visible in
a way that can cause real compatibility issues unless it is standardised.
See issue #30 on riscv-isa-manual
for further discussion. Thank you to all contributors to the discussion there:
Krste Asanovic, David Horner, Andrew Waterman and especially Stefan O'Rear for helping
me to understand the scope of the problem.
A version of this documented was submitted for discussion on the isa-dev mailing list and triggered substantial discussion. You may want to skip ahead to the resulting proposal from Krste, and his follow-on proposal in a new thread.
Background and more detailed problem description

The RISC-V ISA specification describes floating point support in the 'F'
(single precision), 'D' (double precision), and 'Q' (quad precision)
extensions. The proposed vector extension additionally introduces scalar
instructions for half precision floats. These extensions each build upon each
other, and introduce a 32 floating point registers with length ('flen')
dependent on the maximum extension implemented. e.g. RV32IF has 32x32-bit
FPRs, while RV32IFD has 32x64-bit FPRs. This document will focus on floats and
doubles, although analogous issues exist for quad and half precision values.

This document does not (yet) consider potential interaction with the future 'L'
decimal floating point extension, which has yet to be written.
A single precision value stored to memory using fsw will be encoded according
to the IEEE-754-2008 standard. Similarly for a double stored using fsd.
However, the encoding for a single precision value written using fsd is
currently implementation defined. This can occur when a callee-saved register
is spilled (saved on the stack), or when register state is saved while context
switching. At the point the point the state is stored, it won't be known what
type of value resides in the FPR. The problem is most obviously apparent if
you consider a process running on a core that uses one encoding, which is then
migrated to a core core that uses a different one - for example switching cores in a
heterogeneous cluster.
To clarify the discussion, I should also point out where this is not a
concern. It will not pose a problem for shared data structures containing
doubles, as for correctness the compiler must insert a cast for cases like
below (i.e. an fcvt.d.s instruction):
double d;

void callee(float f) {
  d = f;
}

Impact

Fundamentally, this issue affects any case where registers are spilled using
one encoding, and potentially read back and interpreted by an FPU that expects
a different one:

Task migration across heterogeneous cores within the same SoC.
Task migration across different devices, either using virtual machine
migration or something like CRIU in Linux
Debug and validation. e.g. it's useful to verify one implementation against
another by comparing the same instruction produces the same results. (See
here).
Performing design space exploration, performance analysis, or testing in a
way that might involve switching between different models (e.g. a "fast" and a
"slow" model).

Being able to safely transfer state between cores that claim to implement the
same ISA string (within reason) is a useful property, and this kind of ability
is something that is actively used with other architectures. For instance,
provided I ensure a minimal cpuid is exposed I am free to transfer virtual
machines between Intel and AMD implementations of x86, without worrying that
spilled floating point state will be corrupted (interpreted differently).
Solutions

The first possibility is we leave things as they are, and hope this isn't an
issue in practice. This would harm the ability to mix cores from different
vendors (or even cores by different design teams in the same company, unless
they properly synchronise on this issue). I think the only reasonable path is
to decide upon a standardised encoding/serialisation. In fact, to avoid issues
such as this in the future we should consider any state directly accessible to
user-level code to be architecturally visible and subject to a standard encoding.
First of all, a refresher on IEEE FP:

Single precision (32 bits): 1 bit sign, 8 bit exponent, 23 bit significand
Double precision (64 bits): 1 bit sign, 11 bit exponent, 52 bit significand

There are a number of choices for how you may arrange to store and encode single and
double precision values in your register file:


Have a single floating point register file of fixed width (e.g. 32 32-bit
registers), and use register pairs to support double precision floating point.

MIPS, among others, use this approach
Gives the advantage of allowing ABI compatibility when adding support for higher precision. As it stands the 'Q'
extension introduces a completely new ABI requiring a complete rebuild (RV32E, RV32I, RV32IF, RV64IF, RV64IFD, RV64IFDQ are all different ABIs).
The RISC-V spec doesn't allow this approach
Downside: using a register pair means two single precision registers are
now unavailable for use


Pack floats tightly inside wider double registers. Logically 'f2' refers to
a different register depending on if it's used in an F or a D instruction (the
lower 32 bits of the 2nd 64-bit register in an F context, and the whole of the
3rd 64-bit register in a D context).

ARMv7 (and below?) used this approach
Downside: writing to any of the first 16 64-bit registers using a double
operation clobbers two registers for single precision use
The RISC-V spec doesn't allow this approach


Store a single precision float in the lower half of a 64-bit register. This
is perhaps the most obvious solution.

Allowed by the current RISC-V spec
Simple and easy to understand. AArch64 and presumably a number of other
architectures use this approach.


Standardise on the UCB recoded format used in Rocket's current FPU

Is this documented anywhere?


When performing an flw on a system with 64-bit FPRs, unpack the exponent and
significant into the appropriate locations in the 64-bit register. Perform
appropriate masking/rounding when executing single-precision operations

Andrew
indicates
this approach was used by POWER6 and Alpha, but may hurt single precision
latency


Come up with a new NaN-based encoding (as first suggested by Krste)

A double-precision NaN is represented as a value where the exponent is all
1s and the 52-bit significand is non-zero. This is a huge encoding space,
and a standard encoding could easily be chosen
There is an advantage in that debug tools could determine with a high
degree of certainty whether the dumped state from a floating point register
is holding a value that is meant to be interpreted as a single-precision
float
Similar encodings could be used to represent a half precision value in a
float register, or a double in a quad register
There's perhaps more flexibility for eagerly recoding what seems to be a
single-precision float to a different internal representation upon an fld
(rather than on-demand when a single precision operation is performed).
However, for IEEE compliance any such value would still need to act as a NaN
when used in a double precision operation.


I argue that what matters above all else, is that one of these options is
chosen and used consistently. It's worth nothing that an implementation is
still free to use a different internal recoding, it would just need to support
serialising and deserialising to the standard encoding that is chosen.
Backwards compatibility impact

I believe this change can be made in a backwards compatible way (i.e. all
standards-compliant RISC-V software would continue to work on a newer
revision). It also seems likely there is still time to specify this change
and have it adopted before any RISC-V FPU implementations are available in
shipping systems.
Other related issues


An RV64IFD system hoping to support Q would, as it stands, have to break ABI
compatibility when Q is in use, unless all Q state is viewed as caller-saved. Is the cost of adding yet another ABI worth
it, or can it be avoided?

Conclusion/summary


Leaving the encoding of lower precision values in higher precision fp
registers undefined appears to give more microarchitectural freedom, but in reality this
is an architecturally visible property that 'leaks' and causes potential
issues in use cases that RISC-V community should care about (e.g. migration on a heterogeneous cluster)
The RISC-V community would benefit from standardising on a single externally
visible encoding ('serialisation'), and doing so quickly