nlitsme/r32.md

## r32.md

      
    Raw
  

              r32.md
            
          
    RV32I Base Integer Instruction Set, Version 2.1

This chapter describes the RV32I base integer instruction set.

RV32I was designed to be sufficient to form a compiler target and to
support modern operating system environments. The ISA was also designed
to reduce the hardware required in a minimal implementation. RV32I
contains 40 unique instructions, though a simple implementation might
cover the ECALL/EBREAK instructions with a single SYSTEM hardware
instruction that always traps and might be able to implement the FENCE
instruction as a NOP, reducing base instruction count to 38 total. RV32I
can emulate almost any other ISA extension (except the A extension,
which requires additional hardware support for atomicity).
In practice, a hardware implementation including the machine-mode
privileged architecture will also require the 6 CSR instructions.
Subsets of the base integer ISA might be useful for pedagogical
purposes, but the base has been defined such that there should be little
incentive to subset a real hardware implementation beyond omitting
support for misaligned memory accesses and treating all SYSTEM
instructions as a single trap.


The standard RISC-V assembly language syntax is documented in the
Assembly Programmer’s Manual .


Most of the commentary for RV32I also applies to the RV64I base.

Programmers’ Model for Base Integer ISA

Figure [gprs] shows the unprivileged state for the
base integer ISA. For RV32I, the 32 x registers are each 32 bits wide,
i.e., XLEN=32. Register x0 is hardwired with all bits equal to 0.
General purpose registers x1–x31 hold values that various
instructions interpret as a collection of Boolean values, or as two’s
complement signed binary integers or unsigned binary integers.
There is one additional unprivileged register: the program counter pc
holds the address of the current instruction.


XLEN


XLEN


There is no dedicated stack pointer or subroutine return address link
register in the Base Integer ISA; the instruction encoding allows any
x register to be used for these purposes. However, the standard
software calling convention uses register x1 to hold the return
address for a call, with register x5 available as an alternate link
register. The standard calling convention uses register x2 as the
stack pointer.
Hardware might choose to accelerate function calls and returns that use
x1 or x5. See the descriptions of the JAL and JALR instructions.
The optional compressed 16-bit instruction format is designed around the
assumption that x1 is the return address register and  x2 is the
stack pointer. Software using other conventions will operate correctly
but may have greater code size.


The number of available architectural registers can have large impacts
on code size, performance, and energy consumption. Although 16 registers
would arguably be sufficient for an integer ISA running compiled code,
it is impossible to encode a complete ISA with 16 registers in 16-bit
instructions using a 3-address format. Although a 2-address format would
be possible, it would increase instruction count and lower efficiency.
We wanted to avoid intermediate instruction sizes (such as Xtensa’s
24-bit instructions) to simplify base hardware implementations, and once
a 32-bit instruction size was adopted, it was straightforward to support
32 integer registers. A larger number of integer registers also helps
performance on high-performance code, where there can be extensive use
of loop unrolling, software pipelining, and cache tiling.
For these reasons, we chose a conventional size of 32 integer registers
for RV32I. Dynamic register usage tends to be dominated by a few
frequently accessed registers, and regfile implementations can be
optimized to reduce access energy for the frequently accessed
registers . The optional compressed 16-bit instruction format mostly
only accesses 8 registers and hence can provide a dense instruction
encoding, while additional instruction-set extensions could support a
much larger register space (either flat or hierarchical) if desired.
For resource-constrained embedded applications, we have defined the
RV32E subset, which only has 16 registers
(Chapter [rv32e]).

Base Instruction Formats

In the base RV32I ISA, there are four core instruction formats
(R/I/S/U), as shown in
Figure [fig:baseinstformats]. All are
a fixed 32 bits in length. The base ISA has IALIGN=32, meaning that
instructions must be aligned on a four-byte boundary in memory. An
instruction-address-misaligned exception is generated on a taken branch
or unconditional jump if the target address is not IALIGN-bit aligned.
This exception is reported on the branch or jump instruction, not on the
target instruction. No instruction-address-misaligned exception is
generated for a conditional branch that is not taken.

The alignment constraint for base ISA instructions is relaxed to a
two-byte boundary when instruction extensions with 16-bit lengths or
other odd multiples of 16-bit lengths are added (i.e., IALIGN=16).
Instruction-address-misaligned exceptions are reported on the branch or
jump that would cause instruction misalignment to help debugging, and to
simplify hardware design for systems with IALIGN=32, where these are the
only places where misalignment can occur.

The behavior upon decoding a reserved instruction is .

Some platforms may require that opcodes reserved for standard use raise
an illegal-instruction exception. Other platforms may permit reserved
opcode space be used for non-conforming extensions.


funct7
rs2
rs1
funct3
rd
opcode
R-type


imm[11:0]

rs1
funct3
rd
opcode
I-type


imm[11:5]
rs2
rs1
funct3
imm[4:0]
opcode
S-type


imm[31:12]


rd
opcode
U-type


The RISC-V ISA keeps the source (rs1 and rs2) and destination (rd)
registers at the same position in all formats to simplify decoding.
Except for the 5-bit immediates used in CSR instructions
(Chapter [csrinsts]), immediates are always
sign-extended, and are generally packed towards the leftmost available
bits in the instruction and have been allocated to reduce hardware
complexity. In particular, the sign bit for all immediates is always in
bit 31 of the instruction to speed sign-extension circuitry.

Decoding register specifiers is usually on the critical paths in
implementations, and so the instruction format was chosen to keep all
register specifiers at the same position in all formats at the expense
of having to move immediate bits across formats (a property shared with
RISC-IV aka. SPUR ).
In practice, most immediates are either small or require all XLEN bits.
We chose an asymmetric immediate split (12 bits in regular instructions
plus a special load-upper-immediate instruction with 20 bits) to
increase the opcode space available for regular instructions.
Immediates are sign-extended because we did not observe a benefit to
using zero-extension for some immediates as in the MIPS ISA and wanted
to keep the ISA as simple as possible.

Immediate Encoding Variants

There are a further two variants of the instruction formats (B/J) based
on the handling of immediates, as shown in
Figure [fig:baseinstformatsimm].


funct7

rs2

rs1
funct3
rd

opcode
R-type


imm[11:0]


rs1
funct3
rd

opcode
I-type


imm[11:5]

rs2

rs1
funct3
imm[4:0]

opcode
S-type


imm[12]
imm[10:5]
rs2

rs1
funct3
imm[4:1]
imm[11]
opcode
B-type


imm[31:12]


rd

opcode
U-type


imm[20]
imm[10:1]

imm[11]
imm[19:12]

rd

opcode
J-type


The only difference between the S and B formats is that the 12-bit
immediate field is used to encode branch offsets in multiples of 2 in
the B format. Instead of shifting all bits in the instruction-encoded
immediate left by one in hardware as is conventionally done, the middle
bits (imm[10:1]) and sign bit stay in fixed positions, while the
lowest bit in S format (inst[7]) encodes a high-order bit in B format.
Similarly, the only difference between the U and J formats is that the
20-bit immediate is shifted left by 12 bits to form U immediates and by
1 bit to form J immediates. The location of instruction bits in the U
and J format immediates is chosen to maximize overlap with the other
formats and with each other.
Figure [fig:immtypes] shows the immediates
produced by each of the base instruction formats, and is labeled to show
which instruction bit (inst[y ]) produces each bit of the immediate
value.


— inst[31] —


inst[30:25]
inst[24:21]
inst[20]
I-immediate


— inst[31] —


inst[30:25]
inst[11:8]
inst[7]
S-immediate


— inst[31] —


inst[7]
inst[30:25]
inst[11:8]
0
B-immediate


inst[31]
inst[30:20]
inst[19:12]
— 0 —


U-immediate


— inst[31] —

inst[19:12]
inst[20]
inst[30:25]
inst[24:21]
0
J-immediate


Sign-extension is one of the most critical operations on immediates
(particularly for XLEN>32), and in RISC-V the sign bit for all
immediates is always held in bit 31 of the instruction to allow
sign-extension to proceed in parallel with instruction decoding.
Although more complex implementations might have separate adders for
branch and jump calculations and so would not benefit from keeping the
location of immediate bits constant across types of instruction, we
wanted to reduce the hardware cost of the simplest implementations. By
rotating bits in the instruction encoding of B and J immediates instead
of using dynamic hardware muxes to multiply the immediate by 2, we
reduce instruction signal fanout and immediate mux costs by around a
factor of 2. The scrambled immediate encoding will add negligible time
to static or ahead-of-time compilation. For dynamic generation of
instructions, there is some small additional overhead, but the most
common short forward branches have straightforward immediate encodings.

Integer Computational Instructions

Most integer computational instructions operate on XLEN bits of values
held in the integer register file. Integer computational instructions
are either encoded as register-immediate operations using the I-type
format or as register-register operations using the R-type format. The
destination is register rd for both register-immediate and
register-register instructions. No integer computational instructions
cause arithmetic exceptions.

We did not include special instruction-set support for overflow checks
on integer arithmetic operations in the base instruction set, as many
overflow checks can be cheaply implemented using RISC-V branches.
Overflow checking for unsigned addition requires only a single
additional branch instruction after the addition:
 add t0, t1, t2; bltu t0, t1, overflow.
For signed addition, if one operand’s sign is known, overflow checking
requires only a single branch after the addition:
 addi t0, t1, +imm; blt t0, t1, overflow. This covers the common case
of addition with an immediate operand.
For general signed addition, three additional instructions after the
addition are required, leveraging the observation that the sum should be
less than one of the operands if and only if the other operand is
negative.
         add t0, t1, t2
         slti t3, t2, 0
         slt t4, t0, t1
         bne t3, t4, overflow

In RV64I, checks of 32-bit signed additions can be optimized further by
comparing the results of ADD and ADDW on the operands.

Integer Register-Immediate Instructions


M
R
S
R
O


5
3
5
7


I-immediate[11:0]
src
ADDI/SLTI[U]
dest
OP-IMM


I-immediate[11:0]
src
ANDI/ORI/XORI
dest
OP-IMM


ADDI adds the sign-extended 12-bit immediate to register rs1.
Arithmetic overflow is ignored and the result is simply the low XLEN
bits of the result. ADDI rd, rs1, 0 is used to implement the MV rd,
rs1 assembler pseudoinstruction.
SLTI (set less than immediate) places the value 1 in register rd if
register rs1 is less than the sign-extended immediate when both are
treated as signed numbers, else 0 is written to rd. SLTIU is similar
but compares the values as unsigned numbers (i.e., the immediate is
first sign-extended to XLEN bits then treated as an unsigned number).
Note, SLTIU rd, rs1, 1 sets rd to 1 if rs1 equals zero, otherwise
sets rd to 0 (assembler pseudoinstruction SEQZ rd, rs).
ANDI, ORI, XORI are logical operations that perform bitwise AND, OR, and
XOR on register rs1 and the sign-extended 12-bit immediate and place
the result in rd. Note, XORI rd, rs1, -1 performs a bitwise logical
inversion of register rs1 (assembler pseudoinstruction NOT rd, rs).


S
R
R
S
R
O


5
5
3
5
7


0000000
shamt[4:0]
src
SLLI
dest
OP-IMM


0000000
shamt[4:0]
src
SRLI
dest
OP-IMM


0100000
shamt[4:0]
src
SRAI
dest
OP-IMM


Shifts by a constant are encoded as a specialization of the I-type
format. The operand to be shifted is in rs1, and the shift amount is
encoded in the lower 5 bits of the I-immediate field. The right shift
type is encoded in bit 30. SLLI is a logical left shift (zeros are
shifted into the lower bits); SRLI is a logical right shift (zeros are
shifted into the upper bits); and SRAI is an arithmetic right shift (the
original sign bit is copied into the vacated upper bits).


U
R
O


5
7


U-immediate[31:12]
dest
LUI


U-immediate[31:12]
dest
AUIPC


LUI (load upper immediate) is used to build 32-bit constants and uses
the U-type format. LUI places the 32-bit U-immediate value into the
destination register rd, filling in the lowest 12 bits with zeros.
AUIPC (add upper immediate to pc) is used to build pc-relative
addresses and uses the U-type format. AUIPC forms a 32-bit offset from
the U-immediate, filling in the lowest 12 bits with zeros, adds this
offset to the address of the AUIPC instruction, then places the result
in register rd.

The assembly syntax for lui and auipc does not represent the lower
12 bits of the U-immediate, which are always zero.
The AUIPC instruction supports two-instruction sequences to access
arbitrary offsets from the pc for both control-flow transfers and data
accesses. The combination of an AUIPC and the 12-bit immediate in a JALR
can transfer control to any 32-bit pc-relative address, while an AUIPC
plus the 12-bit immediate offset in regular load or store instructions
can access any 32-bit pc-relative data address.
The current pc can be obtained by setting the U-immediate to 0.
Although a JAL +4 instruction could also be used to obtain the local
pc (of the instruction following the JAL), it might cause pipeline
breaks in simpler microarchitectures or pollute branch-target buffer
structures in more complex microarchitectures.

Integer Register-Register Operations

RV32I defines several arithmetic R-type operations. All operations read
the rs1 and rs2 registers as source operands and write the result
into register rd. The funct7 and funct3 fields select the type of
operation.


S
R
R
S
R
O


5
5
3
5
7


0000000
src2
src1
ADD/SLT[U]
dest
OP


0000000
src2
src1
AND/OR/XOR
dest
OP


0000000
src2
src1
SLL/SRL
dest
OP


0100000
src2
src1
SUB/SRA
dest
OP


ADD performs the addition of rs1 and rs2. SUB performs the
subtraction of rs2 from rs1. Overflows are ignored and the low XLEN
bits of results are written to the destination rd. SLT and SLTU
perform signed and unsigned compares respectively, writing 1 to rd if
$\mbox{\em rs1} &lt; \mbox{\em
rs2}$, 0 otherwise. Note, SLTU rd, x0, rs2 sets rd to 1 if
rs2 is not equal to zero, otherwise sets rd to zero (assembler
pseudoinstruction SNEZ rd, rs). AND, OR, and XOR perform bitwise
logical operations.
SLL, SRL, and SRA perform logical left, logical right, and arithmetic
right shifts on the value in register rs1 by the shift amount held in
the lower 5 bits of register rs2.
NOP Instruction


M
R
S
R
O


5
3
5
7


0
0
ADDI
0
OP-IMM


The NOP instruction does not change any architecturally visible state,
except for advancing the pc and incrementing any applicable
performance counters. NOP is encoded as ADDI x0, x0, 0.

NOPs can be used to align code segments to microarchitecturally
significant address boundaries, or to leave space for inline code
modifications. Although there are many possible ways to encode a NOP, we
define a canonical NOP encoding to allow microarchitectural
optimizations as well as for more readable disassembly output. The other
NOP encodings are made available for HINT instructions
(Section 1.9).
ADDI was chosen for the NOP encoding as this is most likely to take
fewest resources to execute across a range of systems (if not optimized
away in decode). In particular, the instruction only reads one register.
Also, an ADDI functional unit is more likely to be available in a
superscalar design as adds are the most common operation. In particular,
address-generation functional units can execute ADDI using the same
hardware needed for base+offset address calculations, while
register-register ADD or logical/shift operations require additional
hardware.

Control Transfer Instructions

RV32I provides two types of control transfer instructions: unconditional
jumps and conditional branches. Control transfer instructions in RV32I
do not have architecturally visible delay slots.
If an instruction access-fault or instruction page-fault exception
occurs on the target of a jump or taken branch, the exception is
reported on the target instruction, not on the jump or branch
instruction.
Unconditional Jumps

The jump and link (JAL) instruction uses the J-type format, where the
J-immediate encodes a signed offset in multiples of 2 bytes. The offset
is sign-extended and added to the address of the jump instruction to
form the jump target address. Jumps can therefore target a ± range. JAL
stores the address of the instruction that follows the JAL (pc+4) into
register rd. The standard software calling convention uses x1 as the
return address register and x5 as an alternate link register.

The alternate link register supports calling millicode routines (e.g.,
those to save and restore registers in compressed code) while preserving
the regular return address register. The register x5 was chosen as the
alternate link register as it maps to a temporary in the standard
calling convention, and has an encoding that is only one bit different
than the regular link register.

Plain unconditional jumps (assembler pseudoinstruction J) are encoded as
a JAL with rd=x0.


W
E
W
R
R
O


10

8
5
7


dest
JAL


The indirect jump instruction JALR (jump and link register) uses the
I-type encoding. The target address is obtained by adding the
sign-extended 12-bit I-immediate to the register rs1, then setting the
least-significant bit of the result to zero. The address of the
instruction following the jump (pc+4) is written to register rd.
Register x0 can be used as the destination if the result is not
required.


M
R
F
R
O


5
3
5
7


offset[11:0]
base
0
dest
JALR


The unconditional jump instructions all use pc-relative addressing to
help support position-independent code. The JALR instruction was defined
to enable a two-instruction sequence to jump anywhere in a 32-bit
absolute address range. A LUI instruction can first load rs1 with the
upper 20 bits of a target address, then JALR can add in the lower bits.
Similarly, AUIPC then JALR can jump anywhere in a 32-bit pc-relative
address range.
Note that the JALR instruction does not treat the 12-bit immediate as
multiples of 2 bytes, unlike the conditional branch instructions. This
avoids one more immediate format in hardware. In practice, most uses of
JALR will have either a zero immediate or be paired with a LUI or AUIPC,
so the slight reduction in range is not significant.
Clearing the least-significant bit when calculating the JALR target
address both simplifies the hardware slightly and allows the low bit of
function pointers to be used to store auxiliary information. Although
there is potentially a slight loss of error checking in this case, in
practice jumps to an incorrect instruction address will usually quickly
raise an exception.
When used with a base rs1=x0, JALR can be used to implement a single
instruction subroutine call to the lowest or highest address region from
anywhere in the address space, which could be used to implement fast
calls to a small runtime library. Alternatively, an ABI could dedicate a
general-purpose register to point to a library elsewhere in the address
space.

The JAL and JALR instructions will generate an
instruction-address-misaligned exception if the target address is not
aligned to an IALIGN-bit boundary.

Instruction-address-misaligned exceptions are not possible on machines
with IALIGN=16, such as those that support the compressed
instruction-set extension, C.

Return-address prediction stacks are a common feature of
high-performance instruction-fetch units, but require accurate detection
of instructions used for procedure calls and returns to be effective.
For RISC-V, hints as to the instructions’ usage are encoded implicitly
via the register numbers used. A JAL instruction should push the return
address onto a return-address stack (RAS) only when rd is x1 or
x5. JALR instructions should push/pop a RAS as shown in the
Table 1.1.


rd is x1/x5
rs1 is x1/x5
rd=rs1
RAS action


No
No
–
None


No
Yes
–
Pop


Yes
No
–
Push


Yes
Yes
No
Pop, then push


Yes
Yes
Yes
Push


Return-address stack prediction hints encoded in the register operands
of a JALR instruction.


Some other ISAs added explicit hint bits to their indirect-jump
instructions to guide return-address stack manipulation. We use implicit
hinting tied to register numbers and the calling convention to reduce
the encoding space used for these hints.
When two different link registers (x1 and x5) are given as rs1 and
rd, then the RAS is both popped and pushed to support coroutines. If
rs1 and rd are the same link register (either x1 or x5), the RAS
is only pushed to enable macro-op fusion of the sequences:
lui ra, imm20; jalr ra, imm12(ra)  and
 auipc ra, imm20; jalr ra, imm12(ra)

Conditional Branches

All branch instructions use the B-type instruction format. The 12-bit
B-immediate encodes signed offsets in multiples of 2 bytes. The offset
is sign-extended and added to the address of the branch instruction to
give the target address. The conditional branch range is ±.


W
R
F
F
R
R
F
S


6
5
5
3
4
1
7


src2
src1
BEQ/BNE

BRANCH


src2
src1
BLT[U]

BRANCH


src2
src1
BGE[U]

BRANCH


Branch instructions compare two registers. BEQ and BNE take the branch
if registers rs1 and rs2 are equal or unequal respectively. BLT and
BLTU take the branch if rs1 is less than rs2, using signed and
unsigned comparison respectively. BGE and BGEU take the branch if rs1
is greater than or equal to rs2, using signed and unsigned comparison
respectively. Note, BGT, BGTU, BLE, and BLEU can be synthesized by
reversing the operands to BLT, BLTU, BGE, and BGEU, respectively.

Signed array bounds may be checked with a single BLTU instruction, since
any negative index will compare greater than any nonnegative bound.

Software should be optimized such that the sequential code path is the
most common path, with less-frequently taken code paths placed out of
line. Software should also assume that backward branches will be
predicted taken and forward branches as not taken, at least the first
time they are encountered. Dynamic predictors should quickly learn any
predictable branch behavior.
Unlike some other architectures, the RISC-V jump (JAL with rd=x0)
instruction should always be used for unconditional branches instead of
a conditional branch instruction with an always-true condition. RISC-V
jumps are also pc-relative and support a much wider offset range than
branches, and will not pollute conditional-branch prediction tables.

The conditional branches were designed to include arithmetic comparison
operations between two registers (as also done in PA-RISC, Xtensa, and
MIPS R6), rather than use condition codes (x86, ARM, SPARC, PowerPC), or
to only compare one register against zero (Alpha, MIPS), or two
registers only for equality (MIPS). This design was motivated by the
observation that a combined compare-and-branch instruction fits into a
regular pipeline, avoids additional condition code state or use of a
temporary register, and reduces static code size and dynamic instruction
fetch traffic. Another point is that comparisons against zero require
non-trivial circuit delay (especially after the move to static logic in
advanced processes) and so are almost as expensive as arithmetic
magnitude compares. Another advantage of a fused compare-and-branch
instruction is that branches are observed earlier in the front-end
instruction stream, and so can be predicted earlier. There is perhaps an
advantage to a design with condition codes in the case where multiple
branches can be taken based on the same condition codes, but we believe
this case to be relatively rare.
We considered but did not include static branch hints in the instruction
encoding. These can reduce the pressure on dynamic predictors, but
require more instruction encoding space and software profiling for best
results, and can result in poor performance if production runs do not
match profiling runs.
We considered but did not include conditional moves or predicated
instructions, which can effectively replace unpredictable short forward
branches. Conditional moves are the simpler of the two, but are
difficult to use with conditional code that might cause exceptions
(memory accesses and floating-point operations). Predication adds
additional flag state to a system, additional instructions to set and
clear flags, and additional encoding overhead on every instruction. Both
conditional move and predicated instructions add complexity to
out-of-order microarchitectures, adding an implicit third source operand
due to the need to copy the original value of the destination
architectural register into the renamed destination physical register if
the predicate is false. Also, static compile-time decisions to use
predication instead of branches can result in lower performance on
inputs not included in the compiler training set, especially given that
unpredictable branches are rare, and becoming rarer as branch prediction
techniques improve.
We note that various microarchitectural techniques exist to dynamically
convert unpredictable short forward branches into internally predicated
code to avoid the cost of flushing pipelines on a branch mispredict  and
have been implemented in commercial processors . The simplest techniques
just reduce the penalty of recovering from a mispredicted short forward
branch by only flushing instructions in the branch shadow instead of the
entire fetch pipeline, or by fetching instructions from both sides using
wide instruction fetch or idle instruction fetch slots. More complex
techniques for out-of-order cores add internal predicates on
instructions in the branch shadow, with the internal predicate value
written by the branch instruction, allowing the branch and following
instructions to be executed speculatively and out-of-order with respect
to other code .

The conditional branch instructions will generate an
instruction-address-misaligned exception if the target address is not
aligned to an IALIGN-bit boundary and the branch condition evaluates to
true. If the branch condition evaluates to false, the
instruction-address-misaligned exception will not be raised.

Instruction-address-misaligned exceptions are not possible on machines
with IALIGN=16, such as those that support the compressed
instruction-set extension, C.

Load and Store Instructions

RV32I is a load-store architecture, where only load and store
instructions access memory and arithmetic instructions only operate on
CPU registers. RV32I provides a 32-bit address space that is
byte-addressed. The EEI will define what portions of the address space
are legal to access with which instructions (e.g., some addresses might
be read only, or support word access only). Loads with a destination of
x0 must still raise any exceptions and cause any other side effects
even though the load value is discarded.
The EEI will define whether the memory system is little-endian or
big-endian. In RISC-V, endianness is byte-address invariant.

In a system for which endianness is byte-address invariant, the
following property holds: if a byte is stored to memory at some address
in some endianness, then a byte-sized load from that address in any
endianness returns the stored value.
In a little-endian configuration, multibyte stores write the
least-significant register byte at the lowest memory byte address,
followed by the other register bytes in ascending order of their
significance. Loads similarly transfer the contents of the lesser memory
byte addresses to the less-significant register bytes.
In a big-endian configuration, multibyte stores write the
most-significant register byte at the lowest memory byte address,
followed by the other register bytes in descending order of their
significance. Loads similarly transfer the contents of the greater
memory byte addresses to the less-significant register bytes.


M
R
F
R
O


5
3
5
7


offset[11:0]
base
width
dest
LOAD


O
R
R
F
R
O


5
5
3
5
7


offset[11:5]
src
base
width
offset[4:0]
STORE


Load and store instructions transfer a value between the registers and
memory. Loads are encoded in the I-type format and stores are S-type.
The effective address is obtained by adding register rs1 to the
sign-extended 12-bit offset. Loads copy a value from memory to register
rd. Stores copy the value in register rs2 to memory.
The LW instruction loads a 32-bit value from memory into rd. LH loads
a 16-bit value from memory, then sign-extends to 32-bits before storing
in rd. LHU loads a 16-bit value from memory but then zero extends to
32-bits before storing in rd. LB and LBU are defined analogously for
8-bit values. The SW, SH, and SB instructions store 32-bit, 16-bit, and
8-bit values from the low bits of register rs2 to memory.
Regardless of EEI, loads and stores whose effective addresses are
naturally aligned shall not raise an address-misaligned exception. Loads
and stores whose effective address is not naturally aligned to the
referenced datatype (i.e., the effective address is not divisible by the
size of the access in bytes) have behavior dependent on the EEI.
An EEI may guarantee that misaligned loads and stores are fully
supported, and so the software running inside the execution environment
will never experience a contained or fatal address-misaligned trap. In
this case, the misaligned loads and stores can be handled in hardware,
or via an invisible trap into the execution environment implementation,
or possibly a combination of hardware and invisible trap depending on
address.
An EEI may not guarantee misaligned loads and stores are handled
invisibly. In this case, loads and stores that are not naturally aligned
may either complete execution successfully or raise an exception. The
exception raised can be either an address-misaligned exception or an
access-fault exception. For a memory access that would otherwise be able
to complete except for the misalignment, an access-fault exception can
be raised instead of an address-misaligned exception if the misaligned
access should not be emulated, e.g., if accesses to the memory region
have side effects. When an EEI does not guarantee misaligned loads and
stores are handled invisibly, the EEI must define if exceptions caused
by address misalignment result in a contained trap (allowing software
running inside the execution environment to handle the trap) or a fatal
trap (terminating execution).

Misaligned accesses are occasionally required when porting legacy code,
and help performance on applications when using any form of packed-SIMD
extension or handling externally packed data structures. Our rationale
for allowing EEIs to choose to support misaligned accesses via the
regular load and store instructions is to simplify the addition of
misaligned hardware support. One option would have been to disallow
misaligned accesses in the base ISAs and then provide some separate ISA
support for misaligned accesses, either special instructions to help
software handle misaligned accesses or a new hardware addressing mode
for misaligned accesses. Special instructions are difficult to use,
complicate the ISA, and often add new processor state (e.g., SPARC VIS
align address offset register) or complicate access to existing
processor state (e.g., MIPS LWL/LWR partial register writes). In
addition, for loop-oriented packed-SIMD code, the extra overhead when
operands are misaligned motivates software to provide multiple forms of
loop depending on operand alignment, which complicates code generation
and adds to loop startup overhead. New misaligned hardware addressing
modes take considerable space in the instruction encoding or require
very simplified addressing modes (e.g., register indirect only).

Even when misaligned loads and stores complete successfully, these
accesses might run extremely slowly depending on the implementation
(e.g., when implemented via an invisible trap). Furthermore, whereas
naturally aligned loads and stores are guaranteed to execute atomically,
misaligned loads and stores might not, and hence require additional
synchronization to ensure atomicity.

We do not mandate atomicity for misaligned accesses so execution
environment implementations can use an invisible machine trap and a
software handler to handle some or all misaligned accesses. If hardware
misaligned support is provided, software can exploit this by simply
using regular load and store instructions. Hardware can then
automatically optimize accesses depending on whether runtime addresses
are aligned.

Memory Ordering Instructions


| F | IIIIIIIIF | F | F | S

|:- |:- |:- |:- |:- |:- |:- |:- |:- |:- |:- |:- |:-

| | | | | | | | | | | | |

| | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5 | 3 | 5 | 7

| FM | | | 0 | FENCE | 0 | MISC-MEM


The FENCE instruction is used to order device I/O and memory accesses as
viewed by other RISC-V harts and external devices or coprocessors. Any
combination of device input (I), device output (O), memory reads (R),
and memory writes (W) may be ordered with respect to any combination of
the same. Informally, no other RISC-V hart or external device can
observe any operation in the successor set following a FENCE before
any operation in the predecessor set preceding the FENCE.
Chapter [ch:memorymodel] provides a precise
description of the RISC-V memory consistency model.
The FENCE instruction also orders memory reads and writes made by the
hart as observed by memory reads and writes made by an external device.
However, FENCE does not order observations of events made by an external
device using any other signaling mechanism.

A device might observe an access to a memory location via some external
communication mechanism, e.g., a memory-mapped control register that
drives an interrupt signal to an interrupt controller. This
communication is outside the scope of the FENCE ordering mechanism and
hence the FENCE instruction can provide no guarantee on when a change in
the interrupt signal is visible to the interrupt controller. Specific
devices might provide additional ordering guarantees to reduce software
overhead but those are outside the scope of the RISC-V memory model.

The EEI will define what I/O operations are possible, and in particular,
which memory addresses when accessed by load and store instructions will
be treated and ordered as device input and device output operations
respectively rather than memory reads and writes. For example,
memory-mapped I/O devices will typically be accessed with uncached loads
and stores that are ordered using the I and O bits rather than the R and
W bits. Instruction-set extensions might also describe new I/O
instructions that will also be ordered using the I and O bits in a
FENCE.


fm field
Mnemonic
Meaning


0000
none
Normal Fence


1000
TSO
With FENCE RW,RW: exclude write-to-read ordering


Otherwise: Reserved for future use.


other

Reserved for future use.


Fence mode encoding.


The fence mode field fm defines the semantics of the FENCE. A FENCE
with fm=0000 orders all memory operations in its predecessor set
before all memory operations in its successor set.
The FENCE.TSO instruction is encoded as a FENCE instruction with
fm=1000, predecessor=RW, and successor=RW. FENCE.TSO orders all
load operations in its predecessor set before all memory operations in
its successor set, and all store operations in its predecessor set
before all store operations in its successor set. This leaves non-AMO
store operations in the FENCE.TSO’s predecessor set unordered with
non-AMO loads in its successor set.

Because FENCE RW,RW imposes a superset of the orderings that FENCE.TSO
imposes, it is correct to ignore the fm field and implement FENCE.TSO
as FENCE RW,RW.

The unused fields in the FENCE instructions—rs1 and rd—are reserved
for finer-grain fences in future extensions. For forward compatibility,
base implementations shall ignore these fields, and standard software
shall zero these fields. Likewise, many fm and predecessor/successor
set settings in
Table 1.2
are also reserved for future use. Base implementations shall treat all
such reserved configurations as normal fences with fm=0000, and
standard software shall use only non-reserved configurations.

We chose a relaxed memory model to allow high performance from simple
machine implementations and from likely future coprocessor or
accelerator extensions. We separate out I/O ordering from memory R/W
ordering to avoid unnecessary serialization within a device-driver hart
and also to support alternative non-memory paths to control added
coprocessors or I/O devices. Simple implementations may additionally
ignore the predecessor and successor fields and always execute a
conservative fence on all operations.

Environment Call and Breakpoints

SYSTEM instructions are used to access system functionality that might
require privileged access and are encoded using the I-type instruction
format. These can be divided into two main classes: those that
atomically read-modify-write control and status registers (CSRs), and
all other potentially privileged instructions. CSR instructions are
described in Chapter [csrinsts], and the base unprivileged
instructions are described in the following section.

The SYSTEM instructions are defined to allow simpler implementations to
always trap to a single software trap handler. More sophisticated
implementations might execute more of each system instruction in
hardware.


| M | R | F | R | S

|:- |:- |:- |:- |:- :-
| | | | |

| | 5 | 3 | 5 | 7

| ECALL | 0 | PRIV | 0 | SYSTEM

| EBREAK | 0 | PRIV | 0 | SYSTEM


These two instructions cause a precise requested trap to the supporting
execution environment.
The ECALL instruction is used to make a service request to the execution
environment. The EEI will define how parameters for the service request
are passed, but usually these will be in defined locations in the
integer register file.
The EBREAK instruction is used to return control to a debugging
environment.

ECALL and EBREAK were previously named SCALL and SBREAK. The
instructions have the same functionality and encoding, but were renamed
to reflect that they can be used more generally than to call a
supervisor-level operating system or debugger.


EBREAK was primarily designed to be used by a debugger to cause
execution to stop and fall back into the debugger. EBREAK is also used
by the standard gcc compiler to mark code paths that should not be
executed.
Another use of EBREAK is to support “semihosting”, where the execution
environment includes a debugger that can provide services over an
alternate system call interface built around the EBREAK instruction.
Because the RISC-V base ISAs do not provide more than one EBREAK
instruction, RISC-V semihosting uses a special sequence of instructions
to distinguish a semihosting EBREAK from a debugger inserted EBREAK.
    slli x0, x0, 0x1f   # Entry NOP
    ebreak              # Break to debugger
    srai x0, x0, 7      # NOP encoding the semihosting call number 7

Note that these three instructions must be 32-bit-wide instructions,
i.e., they mustn’t be among the compressed 16-bit instructions described
in Chapter [compressed].
The shift NOP instructions are still considered available for use as
HINTs.
Semihosting is a form of service call and would be more naturally
encoded as an ECALL using an existing ABI, but this would require the
debugger to be able to intercept ECALLs, which is a newer addition to
the debug standard. We intend to move over to using ECALLs with a
standard ABI, in which case, semihosting can share a service ABI with an
existing standard.
We note that ARM processors have also moved to using SVC instead of BKPT
for semihosting calls in newer designs.

HINT Instructions

RV32I reserves a large encoding space for HINT instructions, which are
usually used to communicate performance hints to the microarchitecture.
Like the NOP instruction, HINTs do not change any architecturally
visible state, except for advancing the pc and any applicable
performance counters. Implementations are always allowed to ignore the
encoded hints.
Most RV32I HINTs are encoded as integer computational instructions with
rd=x0. The other RV32I HINTs are encoded as FENCE instructions with
a null predecessor or successor set and with fm=0.

These HINT encodings have been chosen so that simple implementations can
ignore HINTs altogether, and instead execute a HINT as a regular
instruction that happens not to mutate the architectural state. For
example, ADD is a HINT if the destination register is x0; the five-bit
rs1 and rs2 fields encode arguments to the HINT. However, a simple
implementation can simply execute the HINT as an ADD of rs1 and rs2
that writes  x0, which has no architecturally visible effect.
As another example, a FENCE instruction with a zero pred field and a
zero fm field is a HINT; the succ, rs1, and rd fields encode the
arguments to the HINT. A simple implementation can simply execute the
HINT as a FENCE that orders the null set of prior memory accesses before
whichever subsequent memory accesses are encoded in the succ field.
Since the intersection of the predecessor and successor sets is null,
the instruction imposes no memory orderings, and so it has no
architecturally visible effect.

Table [tab:rv32i-hints] lists all RV32I
HINT code points. 91% of the HINT space is reserved for standard HINTs.
The remainder of the HINT space is designated for custom HINTs: no
standard HINTs will ever be defined in this subspace.

We anticipate standard hints to eventually include memory-system spatial
and temporal locality hints, branch prediction hints, thread-scheduling
hints, security tags, and instrumentation flags for
simulation/emulation.


|l|l|c|l| Instruction
Constraints
Code Points
Purpose


LUI
rd=x0
2²⁰


AUIPC
rd=x0
2²⁰


rd=x0, and either


rs1≠x0 or imm≠0


ANDI
rd=x0
2¹⁷


ORI
rd=x0
2¹⁷


XORI
rd=x0
2¹⁷


ADD
rd=x0, rs1≠x0
2¹⁰ − 32


rd=x0, rs1=x0,


rs2≠x2–x5


(rs2=x2) NTL.P1


(rs2=x3) NTL.PALL


(rs2=x4) NTL.S1


(rs2=x5) NTL.ALL


SUB
rd=x0
2¹⁰


AND
rd=x0
2¹⁰


OR
rd=x0
2¹⁰


XOR
rd=x0
2¹⁰


SLL
rd=x0
2¹⁰


SRL
rd=x0
2¹⁰


SRA
rd=x0
2¹⁰


rd=x0, rs1≠x0,


fm=0, and either


pred=0 or succ=0


rd≠x0, rs1=x0,


fm=0, and either


pred=0 or succ=0


rd=rs1=x0, fm=0,


pred=0, succ≠0


rd=rs1=x0, fm=0,


pred≠W, succ=0


rd=rs1=x0, fm=0,


pred=W, succ=0


SLTI
rd=x0
2¹⁷


SLTIU
rd=x0
2¹⁷


SLLI
rd=x0
2¹⁰


SRLI
rd=x0
2¹⁰


SRAI
rd=x0
2¹⁰


SLT
rd=x0
2¹⁰


SLTU
rd=x0
2¹⁰


“M” Standard Extension for Integer Multiplication and Division, Version 2.0

This chapter describes the standard integer multiplication and division
instruction extension, which is named “M” and contains instructions that
multiply or divide values held in two integer registers.

We separate integer multiply and divide out from the base to simplify
low-end implementations, or for applications where integer multiply and
divide operations are either infrequent or better handled in attached
accelerators.

Multiplication Operations


S
R
R
S
R
O


5
5
3
5
7


MULDIV
multiplier
multiplicand
MUL/MULH[[S]U]
dest
OP


MULDIV
multiplier
multiplicand
MULW
dest
OP-32


MUL performs an XLEN-bit×XLEN-bit multiplication of rs1 by rs2 and
places the lower XLEN bits in the destination register. MULH, MULHU, and
MULHSU perform the same multiplication but return the upper XLEN bits of
the full 2×XLEN-bit product, for signed×signed, unsigned×unsigned, and ×
multiplication, respectively. If both the high and low bits of the same
product are required, then the recommended code sequence is:
MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register
specifiers must be in same order and rdh cannot be the same as rs1
or rs2). Microarchitectures can then fuse these into a single multiply
operation instead of performing two separate multiplies.

MULHSU is used in multi-word signed multiplication to multiply the
most-significant word of the multiplicand (which contains the sign bit)
with the less-significant words of the multiplier (which are unsigned).

MULW is an RV64 instruction that multiplies the lower 32 bits of the
source registers, placing the sign-extension of the lower 32 bits of the
result into the destination register.

In RV64, MUL can be used to obtain the upper 32 bits of the 64-bit
product, but signed arguments must be proper 32-bit signed values,
whereas unsigned arguments must have their upper 32 bits clear. If the
arguments are not known to be sign- or zero-extended, an alternative is
to shift both arguments left by 32 bits, then use MULH[[S]U].

Division Operations


S
R
R
O
R
O


5
5
3
5
7


MULDIV
divisor
dividend
DIV[U]/REM[U]
dest
OP


MULDIV
divisor
dividend
DIV[U]W/REM[U]W
dest
OP-32


DIV and DIVU perform an XLEN bits by XLEN bits signed and unsigned
integer division of rs1 by rs2, rounding towards zero. REM and REMU
provide the remainder of the corresponding division operation. For REM,
the sign of the result equals the sign of the dividend.

For both signed and unsigned division, it holds that
dividend = divisor × quotient + remainder.

If both the quotient and remainder are required from the same division,
the recommended code sequence is: DIV[U] rdq, rs1, rs2; REM[U]
rdr, rs1, rs2 (rdq cannot be the same as rs1 or rs2).
Microarchitectures can then fuse these into a single divide operation
instead of performing two separate divides.
DIVW and DIVUW are RV64 instructions that divide the lower 32 bits of
rs1 by the lower 32 bits of rs2, treating them as signed and
unsigned integers respectively, placing the 32-bit quotient in rd,
sign-extended to 64 bits. REMW and REMUW are RV64 instructions that
provide the corresponding signed and unsigned remainder operations
respectively. Both REMW and REMUW always sign-extend the 32-bit result
to 64 bits, including on a divide by zero.
The semantics for division by zero and division overflow are summarized
in Table 1.1. The quotient of division by zero
has all bits set, and the remainder of division by zero equals the
dividend. Signed division overflow occurs only when the most-negative
integer is divided by  − 1. The quotient of a signed division with
overflow is equal to the dividend, and the remainder is zero. Unsigned
division overflow cannot occur.


Condition
Dividend
Divisor
DIVU[W]
REMU[W]
DIV[W]
REM[W]


Division by zero
x
0
2^L − 1
x
 − 1
x


Overflow (signed only)
 − 2^L − 1
 − 1
–
–
 − 2^L − 1
0


Semantics for division by zero and division overflow. L is the width of
the operation in bits: XLEN for DIV[U] and REM[U], or 32 for
DIV[U]W and REM[U]W.


We considered raising exceptions on integer divide by zero, with these
exceptions causing a trap in most execution environments. However, this
would be the only arithmetic trap in the standard ISA (floating-point
exceptions set flags and write default values, but do not cause traps)
and would require language implementors to interact with the execution
environment’s trap handlers for this case. Further, where language
standards mandate that a divide-by-zero exception must cause an
immediate control flow change, only a single branch instruction needs to
be added to each divide operation, and this branch instruction can be
inserted after the divide and should normally be very predictably not
taken, adding little runtime overhead.
The value of all bits set is returned for both unsigned and signed
divide by zero to simplify the divider circuitry. The value of all 1s is
both the natural value to return for unsigned divide, representing the
largest unsigned number, and also the natural result for simple unsigned
divider implementations. Signed division is often implemented using an
unsigned division circuit and specifying the same overflow result
simplifies the hardware.

Zmmul Extension, Version 1.0

The Zmmul extension implements the multiplication subset of the M
extension. It adds all of the instructions defined in
Section 1.1, namely: MUL, MULH,
MULHU, MULHSU, and (for RV64 only) MULW. The encodings are identical to
those of the corresponding M-extension instructions.

The Zmmul extension enables low-cost implementations that require
multiplication operations but not division. For many microcontroller
applications, division operations are too infrequent to justify the cost
of divider hardware. By contrast, multiplication operations are more
frequent, making the cost of multiplier hardware more justifiable.
Simple FPGA soft cores particularly benefit from eliminating division
but retaining multiplication, since many FPGAs provide hardwired
multipliers but require dividers be implemented in soft logic.

“Zicsr”, Control and Status Register (CSR) Instructions, Version 2.0

RISC-V defines a separate address space of 4096 Control and Status
registers associated with each hart. This chapter defines the full set
of CSR instructions that operate on these CSRs.

While CSRs are primarily used by the privileged architecture, there are
several uses in unprivileged code including for counters and timers, and
for floating-point status.
The counters and timers are no longer considered mandatory parts of the
standard base ISAs, and so the CSR instructions required to access them
have been moved out of Chapter [rv32] into this separate chapter.

CSR Instructions

All CSR instructions atomically read-modify-write a single CSR, whose
CSR specifier is encoded in the 12-bit csr field of the instruction
held in bits 31–20. The immediate forms use a 5-bit zero-extended
immediate encoded in the rs1 field.


M
R
F
R
S


5
3
5
7


source/dest
source
CSRRW
dest
SYSTEM


source/dest
source
CSRRS
dest
SYSTEM


source/dest
source
CSRRC
dest
SYSTEM


source/dest
uimm[4:0]
CSRRWI
dest
SYSTEM


source/dest
uimm[4:0]
CSRRSI
dest
SYSTEM


source/dest
uimm[4:0]
CSRRCI
dest
SYSTEM


The CSRRW (Atomic Read/Write CSR) instruction atomically swaps values in
the CSRs and integer registers. CSRRW reads the old value of the CSR,
zero-extends the value to XLEN bits, then writes it to integer register
rd. The initial value in rs1 is written to the CSR. If rd=x0,
then the instruction shall not read the CSR and shall not cause any of
the side effects that might occur on a CSR read.
The CSRRS (Atomic Read and Set Bits in CSR) instruction reads the value
of the CSR, zero-extends the value to XLEN bits, and writes it to
integer register rd. The initial value in integer register rs1 is
treated as a bit mask that specifies bit positions to be set in the CSR.
Any bit that is high in rs1 will cause the corresponding bit to be set
in the CSR, if that CSR bit is writable. Other bits in the CSR are not
explicitly written.
The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the
value of the CSR, zero-extends the value to XLEN bits, and writes it to
integer register rd. The initial value in integer register rs1 is
treated as a bit mask that specifies bit positions to be cleared in the
CSR. Any bit that is high in rs1 will cause the corresponding bit to
be cleared in the CSR, if that CSR bit is writable. Other bits in the
CSR are not explicitly written.
For both CSRRS and CSRRC, if rs1=x0, then the instruction will not
write to the CSR at all, and so shall not cause any of the side effects
that might otherwise occur on a CSR write, nor raise illegal instruction
exceptions on accesses to read-only CSRs. Both CSRRS and CSRRC always
read the addressed CSR and cause any read side effects regardless of
rs1 and rd fields. Note that if rs1 specifies a register holding a
zero value other than  x0, the instruction will still attempt to write
the unmodified value back to the CSR and will cause any attendant side
effects. A CSRRW with rs1=x0 will attempt to write zero to the
destination CSR.
The CSRRWI, CSRRSI, and CSRRCI variants are similar to CSRRW, CSRRS, and
CSRRC respectively, except they update the CSR using an XLEN-bit value
obtained by zero-extending a 5-bit unsigned immediate (uimm[4:0])
field encoded in the rs1 field instead of a value from an integer
register. For CSRRSI and CSRRCI, if the uimm[4:0] field is zero, then
these instructions will not write to the CSR, and shall not cause any of
the side effects that might otherwise occur on a CSR write, nor raise
illegal instruction exceptions on accesses to read-only CSRs. For
CSRRWI, if rd=x0, then the instruction shall not read the CSR and
shall not cause any of the side effects that might occur on a CSR read.
Both CSRRSI and CSRRCI will always read the CSR and cause any read side
effects regardless of rd and rs1 fields.


Register operand


Instruction
rd is x0
rs1 is x0
Reads CSR
Writes CSR


CSRRW
Yes
–
No
Yes


CSRRW
No
–
Yes
Yes


CSRRS/CSRRC
–
Yes
Yes
No


CSRRS/CSRRC
–
No
Yes
Yes


Immediate operand


Instruction
rd is x0
uimm=0
Reads CSR
Writes CSR


CSRRWI
Yes
–
No
Yes


CSRRWI
No
–
Yes
Yes


CSRRSI/CSRRCI
–
Yes
Yes
No


CSRRSI/CSRRCI
–
No
Yes
Yes


Conditions determining whether a CSR instruction reads or writes the
specified CSR.

Table 1.1 summarizes the behavior of
the CSR instructions with respect to whether they read and/or write the
CSR.
For any event or consequence that occurs due to a CSR having a
particular value, if a write to the CSR gives it that value, the
resulting event or consequence is said to be an indirect effect of the
write. Indirect effects of a CSR write are not considered by the RISC-V
ISA to be side effects of that write.

An example of side effects for CSR accesses would be if reading from a
specific CSR causes a light bulb to turn on, while writing an odd value
to the same CSR causes the light to turn off. Assume writing an even
value has no effect. In this case, both the read and write have side
effects controlling whether the bulb is lit, as this condition is not
determined solely from the CSR value. (Note that after writing an odd
value to the CSR to turn off the light, then reading to turn the light
on, writing again the same odd value causes the light to turn off again.
Hence, on the last write, it is not a change in the CSR value that turns
off the light.)
On the other hand, if a bulb is rigged to light whenever the value of a
particular CSR is odd, then turning the light on and off is not
considered a side effect of writing to the CSR but merely an indirect
effect of such writes.
More concretely, the RISC-V privileged architecture defined in Volume II
specifies that certain combinations of CSR values cause a trap to occur.
When an explicit write to a CSR creates the conditions that trigger the
trap, the trap is not considered a side effect of the write but merely
an indirect effect.
Standard CSRs do not have any side effects on reads. Standard CSRs may
have side effects on writes. Custom extensions might add CSRs for which
accesses have side effects on either reads or writes.

Some CSRs, such as the instructions-retired counter, instret, may be
modified as side effects of instruction execution. In these cases, if a
CSR access instruction reads a CSR, it reads the value prior to the
execution of the instruction. If a CSR access instruction writes such a
CSR, the write is done instead of the increment. In particular, a value
written to instret by one instruction will be the value read by the
following instruction.
The assembler pseudoinstruction to read a CSR, CSRR rd, csr, is
encoded as CSRRS rd, csr, x0. The assembler pseudoinstruction to write
a CSR, CSRW csr, rs1, is encoded as CSRRW x0, csr, rs1, while CSRWI
csr, uimm, is encoded as CSRRWI x0, csr, uimm.
Further assembler pseudoinstructions are defined to set and clear bits
in the CSR when the old value is not required: CSRS/CSRC csr, rs1;
CSRSI/CSRCI csr, uimm.
CSR Access Ordering

Each RISC-V hart normally observes its own CSR accesses, including its
implicit CSR accesses, as performed in program order. In particular,
unless specified otherwise, a CSR access is performed after the
execution of any prior instructions in program order whose behavior
modifies or is modified by the CSR state and before the execution of any
subsequent instructions in program order whose behavior modifies or is
modified by the CSR state. Furthermore, an explicit CSR read returns the
CSR state before the execution of the instruction, while an explicit CSR
write suppresses and overrides any implicit writes or modifications to
the same CSR by the same instruction.
Likewise, any side effects from an explicit CSR access are normally
observed to occur synchronously in program order. Unless specified
otherwise, the full consequences of any such side effects are observable
by the very next instruction, and no consequences may be observed
out-of-order by preceding instructions. (Note the distinction made
earlier between side effects and indirect effects of CSR writes.)
For the RVWMO memory consistency model
(Chapter [ch:memorymodel]), CSR accesses are
weakly ordered by default, so other harts or devices may observe CSR
accesses in an order different from program order. In addition, CSR
accesses are not ordered with respect to explicit memory accesses,
unless a CSR access modifies the execution behavior of the instruction
that performs the explicit memory access or unless a CSR access and an
explicit memory access are ordered by either the syntactic dependencies
defined by the memory model or the ordering requirements defined by the
Memory-Ordering PMAs section in Volume II of this manual. To enforce
ordering in all other cases, software should execute a FENCE instruction
between the relevant accesses. For the purposes of the FENCE
instruction, CSR read accesses are classified as device input (I), and
CSR write accesses are classified as device output (O).

Informally, the CSR space acts as a weakly ordered memory-mapped I/O
region, as defined by the Memory-Ordering PMAs section in Volume II of
this manual. As a result, the order of CSR accesses with respect to all
other accesses is constrained by the same mechanisms that constrain the
order of memory-mapped I/O accesses to such a region.
These CSR-ordering constraints are imposed to support ordering main
memory and memory-mapped I/O accesses with respect to CSR accesses that
are visible to, or affected by, devices or other harts. Examples include
the time, cycle, and mcycle CSRs, in addition to CSRs that reflect
pending interrupts, like mip and sip. Note that implicit reads of
such CSRs (e.g., taking an interrupt because of a change in mip) are
also ordered as device input.
Most CSRs (including, e.g., the fcsr) are not visible to other harts;
their accesses can be freely reordered in the global memory order with
respect to FENCE instructions without violating this specification.

The hardware platform may define that accesses to certain CSRs are
strongly ordered, as defined by the Memory-Ordering PMAs section in
Volume II of this manual. Accesses to strongly ordered CSRs have
stronger ordering constraints with respect to accesses to both weakly
ordered CSRs and accesses to memory-mapped I/O regions.

The rules for the reordering of CSR accesses in the global memory order
should probably be moved to
Chapter [ch:memorymodel] concerning the
RVWMO memory consistency model.


## r32c.md

      
    Raw
  

              r32c.md
            
          
    “C” Standard Extension for Compressed Instructions, Version 2.0

This chapter describes the RISC-V standard compressed instruction-set
extension, named “C”, which reduces static and dynamic code size by
adding short 16-bit instruction encodings for common operations. The C
extension can be added to any of the base ISAs (RV32, RV64, RV128), and
we use the generic term “RVC” to cover any of these. Typically, 50%–60%
of the RISC-V instructions in a program can be replaced with RVC
instructions, resulting in a 25%–30% code-size reduction.
Overview

RVC uses a simple compression scheme that offers shorter 16-bit versions
of common 32-bit RISC-V instructions when:

the immediate or address offset is small, or
one of the registers is the zero register (x0), the ABI link register
(x1), or the ABI stack pointer ( x2), or
the destination register and the first source register are identical, or
the registers used are the 8 most popular ones.

The C extension is compatible with all other standard instruction
extensions. The C extension allows 16-bit instructions to be freely
intermixed with 32-bit instructions, with the latter now able to start
on any 16-bit boundary, i.e., IALIGN=16. With the addition of the C
extension, no instructions can raise instruction-address-misaligned
exceptions.

Removing the 32-bit alignment constraint on the original 32-bit
instructions allows significantly greater code density.

The compressed instruction encodings are mostly common across RV32C,
RV64C, and RV128C, but as shown in
Table [rvcopcodemap], a few opcodes are used
for different purposes depending on base ISA. For example, the wider
address-space RV64C and RV128C variants require additional opcodes to
compress loads and stores of 64-bit integer values, while RV32C uses the
same opcodes to compress loads and stores of single-precision
floating-point values. Similarly, RV128C requires additional opcodes to
capture loads and stores of 128-bit integer values, while these same
opcodes are used for loads and stores of double-precision floating-point
values in RV32C and RV64C. If the C extension is implemented, the
appropriate compressed floating-point load and store instructions must
be provided whenever the relevant standard floating-point extension (F
and/or D) is also implemented. In addition, RV32C includes a compressed
jump and link instruction to compress short-range subroutine calls,
where the same opcode is used to compress ADDIW for RV64C and RV128C.

Double-precision loads and stores are a significant fraction of static
and dynamic instructions, hence the motivation to include them in the
RV32C and RV64C encoding.
Although single-precision loads and stores are not a significant source
of static or dynamic compression for benchmarks compiled for the
currently supported ABIs, for microcontrollers that only provide
hardware single-precision floating-point units and have an ABI that only
supports single-precision floating-point numbers, the single-precision
loads and stores will be used at least as frequently as double-precision
loads and stores in the measured benchmarks. Hence, the motivation to
provide compressed support for these in RV32C.
Short-range subroutine calls are more likely in small binaries for
microcontrollers, hence the motivation to include these in RV32C.
Although reusing opcodes for different purposes for different base ISAs
adds some complexity to documentation, the impact on implementation
complexity is small even for designs that support multiple base ISAs.
The compressed floating-point load and store variants use the same
instruction format with the same register specifiers as the wider
integer loads and stores.

RVC was designed under the constraint that each RVC instruction expands
into a single 32-bit instruction in either the base ISA (RV32I/E, RV64I,
or RV128I) or the F and D standard extensions where present. Adopting
this constraint has two main benefits:

Hardware designs can simply expand RVC instructions during decode,
simplifying verification and minimizing modifications to existing
microarchitectures.
Compilers can be unaware of the RVC extension and leave code compression
to the assembler and linker, although a compression-aware compiler will
generally be able to produce better results.


We felt the multiple complexity reductions of a simple one-one mapping
between C and base IFD instructions far outweighed the potential gains
of a slightly denser encoding that added additional instructions only
supported in the C extension, or that allowed encoding of multiple IFD
instructions in one C instruction.

It is important to note that the C extension is not designed to be a
stand-alone ISA, and is meant to be used alongside a base ISA.

Variable-length instruction sets have long been used to improve code
density. For example, the IBM Stretch , developed in the late 1950s, had
an ISA with 32-bit and 64-bit instructions, where some of the 32-bit
instructions were compressed versions of the full 64-bit instructions.
Stretch also employed the concept of limiting the set of registers that
were addressable in some of the shorter instruction formats, with short
branch instructions that could only refer to one of the index registers.
The later IBM 360 architecture  supported a simple variable-length
instruction encoding with 16-bit, 32-bit, or 48-bit instruction formats.
In 1963, CDC introduced the Cray-designed CDC 6600 , a precursor to RISC
architectures, that introduced a register-rich load-store architecture
with instructions of two lengths, 15-bits and 30-bits. The later Cray-1
design used a very similar instruction format, with 16-bit and 32-bit
instruction lengths.
The initial RISC ISAs from the 1980s all picked performance over code
size, which was reasonable for a workstation environment, but not for
embedded systems. Hence, both ARM and MIPS subsequently made versions of
the ISAs that offered smaller code size by offering an alternative
16-bit wide instruction set instead of the standard 32-bit wide
instructions. The compressed RISC ISAs reduced code size relative to
their starting points by about 25–30%, yielding code that was
significantly smaller than 80x86. This result surprised some, as their
intuition was that the variable-length CISC ISA should be smaller than
RISC ISAs that offered only 16-bit and 32-bit formats.
Since the original RISC ISAs did not leave sufficient opcode space free
to include these unplanned compressed instructions, they were instead
developed as complete new ISAs. This meant compilers needed different
code generators for the separate compressed ISAs. The first compressed
RISC ISA extensions (e.g., ARM Thumb and MIPS16) used only a fixed
16-bit instruction size, which gave good reductions in static code size
but caused an increase in dynamic instruction count, which led to lower
performance compared to the original fixed-width 32-bit instruction
size. This led to the development of a second generation of compressed
RISC ISA designs with mixed 16-bit and 32-bit instruction lengths (e.g.,
ARM Thumb2, microMIPS, PowerPC VLE), so that performance was similar to
pure 32-bit instructions but with significant code size savings.
Unfortunately, these different generations of compressed ISAs are
incompatible with each other and with the original uncompressed ISA,
leading to significant complexity in documentation, implementations, and
software tools support.
Of the commonly used 64-bit ISAs, only PowerPC and microMIPS currently
supports a compressed instruction format. It is surprising that the most
popular 64-bit ISA for mobile platforms (ARM v8) does not include a
compressed instruction format given that static code size and dynamic
instruction fetch bandwidth are important metrics. Although static code
size is not a major concern in larger systems, instruction fetch
bandwidth can be a major bottleneck in servers running commercial
workloads, which often have a large instruction working set.
Benefiting from 25 years of hindsight, RISC-V was designed to support
compressed instructions from the outset, leaving enough opcode space for
RVC to be added as a simple extension on top of the base ISA (along with
many other extensions). The philosophy of RVC is to reduce code size for
embedded applications and to improve performance and energy-efficiency
for all applications due to fewer misses in the instruction cache.
Waterman shows that RVC fetches 25%-30% fewer instruction bits, which
reduces instruction cache misses by 20%-25%, or roughly the same
performance impact as doubling the instruction cache size .

Compressed Instruction Formats

Table 1.1 shows the nine compressed
instruction formats. CR, CI, and CSS can use any of the 32 RVI
registers, but CIW, CL, CS, CA, and CB are limited to just 8 of them.
Table 1.2 lists these popular registers, which
correspond to registers x8 to x15. Note that there is a separate
version of load and store instructions that use the stack pointer as the
base address register, since saving to and restoring from the stack are
so prevalent, and that they use the CI and CSS formats to allow access
to all 32 data registers. CIW supplies an 8-bit immediate for the
ADDI4SPN instruction.

The RISC-V ABI was changed to make the frequently used registers map to
registers x8–x15. This simplifies the decompression decoder by
having a contiguous naturally aligned set of register numbers, and is
also compatible with the RV32E base ISA, which only has 16 integer
registers.

Compressed register-based floating-point loads and stores also use the
CL and CS formats respectively, with the eight registers mapping to f8
to f15.

The standard RISC-V calling convention maps the most frequently used
floating-point registers to registers f8 to f15, which allows the
same register decompression decoding as for integer register numbers.

The formats were designed to keep bits for the two register source
specifiers in the same place in all instructions, while the destination
register field can move. When the full 5-bit destination register
specifier is present, it is in the same place as in the 32-bit RISC-V
encoding. Where immediates are sign-extended, the sign-extension is
always from bit 12. Immediate fields have been scrambled, as in the base
specification, to reduce the number of immediate muxes required.

The immediate fields are scrambled in the instruction formats instead of
in sequential order so that as many bits as possible are in the same
position in every instruction, thereby simplifying implementations.

For many RVC instructions, zero-valued immediates are disallowed and
x0 is not a valid 5-bit register specifier. These restrictions free up
encoding space for other instructions requiring fewer operand bits.


Format
Meaning


CR
Register
funct4


rd/rs1


rs2


op


CI
Immediate
funct3


imm
rd/rs1


imm


op


CSS
Stack-relative Store
funct3


imm


rs2


op


CIW
Wide Immediate
funct3


imm


rd ′


op


CL
Load
funct3


imm


rs1 ′


imm

rd ′


op


CS
Store
funct3


imm


rs1 ′


imm

rs2 ′


op


CA
Arithmetic
funct6


rd ′/rs1 ′


funct2

rs2 ′


op


CB
Branch/Arithmetic
funct3


offset


rd ′/rs1 ′


offset


op


CJ
Jump
funct3


jump target


op


Compressed 16-bit RVC instruction formats.


RVC Register Number
000
001
010
011
100
101
110
111


Integer Register Number
x8
x9
x10
x11
x12
x13
x14
x15


Integer Register ABI Name
s0
s1
a0
a1
a2
a3
a4
a5


Floating-Point Register Number
f8
f9
f10
f11
f12
f13
f14
f15


Floating-Point Register ABI Name
fs0
fs1
fa0
fa1
fa2
fa3
fa4
fa5


Registers specified by the three-bit rs1 ′, rs2 ′, and rd ′ fields
of the CIW, CL, CS, CA, and CB formats.


Load and Store Instructions

To increase the reach of 16-bit instructions, data-transfer instructions
use zero-extended immediates that are scaled by the size of the data in
bytes: ×4 for words, ×8 for double words, and
×16 for quad words.
RVC provides two variants of loads and stores. One uses the ABI stack
pointer, x2, as the base address and can target any data register. The
other can reference one of 8 base address registers and one of 8 data
registers.
Stack-Pointer-Based Loads and Stores


S
W
T
T
Y


1
5
5
2


C.LWSP
offset[5]
dest≠0
offset[4:2|7:6]
C2


C.LDSP
offset[5]
dest≠0
offset[4:3|8:6]
C2


C.LQSP
offset[5]
dest≠0
offset[4|9:6]
C2


C.FLWSP
offset[5]
dest
offset[4:2|7:6]
C2


C.FLDSP
offset[5]
dest
offset[4:3|8:6]
C2


These instructions use the CI format.
C.LWSP loads a 32-bit value from memory into register rd. It computes
an effective address by adding the zero-extended offset, scaled by 4,
to the stack pointer, x2. It expands to lw rd, offset(x2). C.LWSP is
only valid when rd ≠ x0; the code points with rd = x0 are
reserved.
C.LDSP is an RV64C/RV128C-only instruction that loads a 64-bit value
from memory into register rd. It computes its effective address by
adding the zero-extended offset, scaled by 8, to the stack pointer,
x2. It expands to ld rd, offset(x2). C.LDSP is only valid when
rd ≠ x0; the code points with rd = x0 are reserved.
C.LQSP is an RV128C-only instruction that loads a 128-bit value from
memory into register rd. It computes its effective address by adding
the zero-extended offset, scaled by 16, to the stack pointer, x2. It
expands to lq rd, offset(x2). C.LQSP is only valid when rd ≠ x0;
the code points with rd = x0 are reserved.
C.FLWSP is an RV32FC-only instruction that loads a single-precision
floating-point value from memory into floating-point register rd. It
computes its effective address by adding the zero-extended offset,
scaled by 4, to the stack pointer, x2. It expands to
flw rd, offset(x2).
C.FLDSP is an RV32DC/RV64DC-only instruction that loads a
double-precision floating-point value from memory into floating-point
register rd. It computes its effective address by adding the
zero-extended offset, scaled by 8, to the stack pointer, x2. It
expands to fld rd, offset(x2).


S
M
T
Y


6
5
2


C.SWSP
offset[5:2|7:6]
src
C2


C.SDSP
offset[5:3|8:6]
src
C2


C.SQSP
offset[5:4|9:6]
src
C2


C.FSWSP
offset[5:2|7:6]
src
C2


C.FSDSP
offset[5:3|8:6]
src
C2


These instructions use the CSS format.
C.SWSP stores a 32-bit value in register rs2 to memory. It computes an
effective address by adding the zero-extended offset, scaled by 4, to
the stack pointer, x2. It expands to sw rs2, offset(x2).
C.SDSP is an RV64C/RV128C-only instruction that stores a 64-bit value in
register rs2 to memory. It computes an effective address by adding the
zero-extended offset, scaled by 8, to the stack pointer, x2. It
expands to sd rs2, offset(x2).
C.SQSP is an RV128C-only instruction that stores a 128-bit value in
register rs2 to memory. It computes an effective address by adding the
zero-extended offset, scaled by 16, to the stack pointer, x2. It
expands to sq rs2, offset(x2).
C.FSWSP is an RV32FC-only instruction that stores a single-precision
floating-point value in floating-point register rs2 to memory. It
computes an effective address by adding the zero-extended offset,
scaled by 4, to the stack pointer, x2. It expands to
fsw rs2, offset(x2).
C.FSDSP is an RV32DC/RV64DC-only instruction that stores a
double-precision floating-point value in floating-point register rs2
to memory. It computes an effective address by adding the
zero-extended offset, scaled by 8, to the stack pointer, x2. It
expands to fsd rs2, offset(x2).

Register save/restore code at function entry/exit represents a
significant portion of static code size. The stack-pointer-based
compressed loads and stores in RVC are effective at reducing the
save/restore static code size by a factor of 2 while improving
performance by reducing dynamic instruction bandwidth.
A common mechanism used in other ISAs to further reduce save/restore
code size is load-multiple and store-multiple instructions. We
considered adopting these for RISC-V but noted the following drawbacks
to these instructions:


These instructions complicate processor implementations.


For virtual memory systems, some data accesses could be resident in
physical memory and some could not, which requires a new restart
mechanism for partially executed instructions.


Unlike the rest of the RVC instructions, there is no IFD equivalent
to Load Multiple and Store Multiple.


Unlike the rest of the RVC instructions, the compiler would have to
be aware of these instructions to both generate the instructions and
to allocate registers in an order to maximize the chances of the
them being saved and stored, since they would be saved and restored
in sequential order.


Simple microarchitectural implementations will constrain how other
instructions can be scheduled around the load and store multiple
instructions, leading to a potential performance loss.


The desire for sequential register allocation might conflict with
the featured registers selected for the CIW, CL, CS, CA, and CB
formats.


Furthermore, much of the gains can be realized in software by replacing
prologue and epilogue code with subroutine calls to common prologue and
epilogue code, a technique described in Section 5.6 of .
While reasonable architects might come to different conclusions, we
decided to omit load and store multiple and instead use the
software-only approach of calling save/restore millicode routines to
attain the greatest code size reduction.

Register-Based Loads and Stores


S
S
S
Y
S
Y


3
3
2
3
2


C.LW
offset[5:3]
base
offset[2|6]
dest
C0


C.LD
offset[5:3]
base
offset[7:6]
dest
C0


C.LQ
offset[5|4|8]
base
offset[7:6]
dest
C0


C.FLW
offset[5:3]
base
offset[2|6]
dest
C0


C.FLD
offset[5:3]
base
offset[7:6]
dest
C0


These instructions use the CL format.
C.LW loads a 32-bit value from memory into register rd ′. It computes
an effective address by adding the zero-extended offset, scaled by 4,
to the base address in register rs1 ′. It expands to
lw rd ', offset(rs1 ').
C.LD is an RV64C/RV128C-only instruction that loads a 64-bit value from
memory into register rd ′. It computes an effective address by adding
the zero-extended offset, scaled by 8, to the base address in register
rs1 ′. It expands to ld rd ', offset(rs1 ').
C.LQ is an RV128C-only instruction that loads a 128-bit value from
memory into register rd ′. It computes an effective address by adding
the zero-extended offset, scaled by 16, to the base address in
register rs1 ′. It expands to lq rd ', offset(rs1 ').
C.FLW is an RV32FC-only instruction that loads a single-precision
floating-point value from memory into floating-point register rd ′. It
computes an effective address by adding the zero-extended offset,
scaled by 4, to the base address in register rs1 ′. It expands to
flw rd ', offset(rs1 ').
C.FLD is an RV32DC/RV64DC-only instruction that loads a double-precision
floating-point value from memory into floating-point register rd ′. It
computes an effective address by adding the zero-extended offset,
scaled by 8, to the base address in register rs1 ′. It expands to
fld rd ', offset(rs1 ').


S
S
S
Y
S
Y


3
3
2
3
2


C.SW
offset[5:3]
base
offset[2|6]
src
C0


C.SD
offset[5:3]
base
offset[7:6]
src
C0


C.SQ
offset[5|4|8]
base
offset[7:6]
src
C0


C.FSW
offset[5:3]
base
offset[2|6]
src
C0


C.FSD
offset[5:3]
base
offset[7:6]
src
C0


These instructions use the CS format.
C.SW stores a 32-bit value in register rs2 ′ to memory. It computes an
effective address by adding the zero-extended offset, scaled by 4, to
the base address in register rs1 ′. It expands to
sw rs2 ', offset(rs1 ').
C.SD is an RV64C/RV128C-only instruction that stores a 64-bit value in
register rs2 ′ to memory. It computes an effective address by adding
the zero-extended offset, scaled by 8, to the base address in register
rs1 ′. It expands to sd rs2 ', offset(rs1 ').
C.SQ is an RV128C-only instruction that stores a 128-bit value in
register rs2 ′ to memory. It computes an effective address by adding
the zero-extended offset, scaled by 16, to the base address in
register rs1 ′. It expands to sq rs2 ', offset(rs1 ').
C.FSW is an RV32FC-only instruction that stores a single-precision
floating-point value in floating-point register rs2 ′ to memory. It
computes an effective address by adding the zero-extended offset,
scaled by 4, to the base address in register rs1 ′. It expands to
fsw rs2 ', offset(rs1 ').
C.FSD is an RV32DC/RV64DC-only instruction that stores a
double-precision floating-point value in floating-point register rs2 ′
to memory. It computes an effective address by adding the
zero-extended offset, scaled by 8, to the base address in register
rs1 ′. It expands to fsd rs2 ', offset(rs1 ').
Control Transfer Instructions

RVC provides unconditional jump instructions and conditional branch
instructions. As with base RVI instructions, the offsets of all RVC
control transfer instructions are in multiples of 2 bytes.


S
L
Y


11
2


C.J
offset[11|4|9:8|10|6|7|3:1|5]


C1


C.JAL
offset[11|4|9:8|10|6|7|3:1|5]


C1


These instructions use the CJ format.
C.J performs an unconditional control transfer. The offset is
sign-extended and added to the pc to form the jump target address. C.J
can therefore target a ± range. C.J expands to jal x0, offset.
C.JAL is an RV32C-only instruction that performs the same operation as
C.J, but additionally writes the address of the instruction following
the jump (pc+2) to the link register, x1. C.JAL expands to
jal x1, offset.


E
T
T
Y


5
5
2


C.JR
src≠0
0
C2


C.JALR
src≠0
0
C2


These instructions use the CR format.
C.JR (jump register) performs an unconditional control transfer to the
address in register rs1. C.JR expands to jalr x0, 0(rs1). C.JR is
only valid when rs1 ≠ x0; the code point with rs1 = x0 is
reserved.
C.JALR (jump and link register) performs the same operation as C.JR, but
additionally writes the address of the instruction following the jump
(pc+2) to the link register, x1. C.JALR expands to
jalr x1, 0(rs1). C.JALR is only valid when rs1 ≠ x0; the code
point with rs1 = x0 corresponds to the C.EBREAK instruction.

Strictly speaking, C.JALR does not expand exactly to a base RVI
instruction as the value added to the pc to form the link address is 2
rather than 4 as in the base ISA, but supporting both offsets of 2 and 4
bytes is only a very minor change to the base microarchitecture.


S
S
S
T
Y


3
3
5
2


C.BEQZ
offset[8|4:3]
src


offset[7:6|2:1|5]
C1


C.BNEZ
offset[8|4:3]
src


offset[7:6|2:1|5]
C1


These instructions use the CB format.
C.BEQZ performs conditional control transfers. The offset is
sign-extended and added to the pc to form the branch target address.
It can therefore target a ± range. C.BEQZ takes the branch if the value
in register rs1 ′ is zero. It expands to beq rs1 ', x0, offset.
C.BNEZ is defined analogously, but it takes the branch if rs1 ′
contains a nonzero value. It expands to bne rs1 ', x0, offset.
Integer Computational Instructions

RVC provides several instructions for integer arithmetic and constant
generation.
Integer Constant-Generation Instructions

The two constant-generation instructions both use the CI instruction
format and can target any integer register.


S
W
T
T
Y


1
5
5
2


C.LI
imm[5]
dest≠0
imm[4:0]
C1


C.LUI
nzimm[17]
dest ≠ {0,2}
nzimm[16:12]
C1


C.LI loads the sign-extended 6-bit immediate, imm, into register rd.
C.LI expands into addi rd, x0, imm. C.LI is only valid when rd≠x0;
the code points with rd=x0 encode HINTs.
C.LUI loads the non-zero 6-bit immediate field into bits 17–12 of the
destination register, clears the bottom 12 bits, and sign-extends bit 17
into all higher bits of the destination. C.LUI expands into
lui rd, nzimm. C.LUI is only valid when rd ≠ {x0,x2}, and when
the immediate is not equal to zero. The code points with nzimm=0 are
reserved; the remaining code points with rd=x0 are HINTs; and the
remaining code points with rd=x2 correspond to the C.ADDI16SP
instruction.
Integer Register-Immediate Operations

These integer register-immediate operations are encoded in the CI format
and perform operations on an integer register and a 6-bit immediate.


S
W
T
T
Y


1
5
5
2


C.ADDI
nzimm[5]
dest≠0
nzimm[4:0]
C1


C.ADDIW
imm[5]
dest≠0
imm[4:0]
C1


C.ADDI16SP
nzimm[9]
2
nzimm[4|6|8:7|5]
C1


C.ADDI adds the non-zero sign-extended 6-bit immediate to the value in
register rd then writes the result to rd. C.ADDI expands into
addi rd, rd, nzimm. C.ADDI is only valid when rd≠x0 and
nzimm≠0. The code points with rd=x0 encode the C.NOP
instruction; the remaining code points with nzimm=0 encode HINTs.
C.ADDIW is an RV64C/RV128C-only instruction that performs the same
computation but produces a 32-bit result, then sign-extends result to 64
bits. C.ADDIW expands into addiw rd, rd, imm. The immediate can be
zero for C.ADDIW, where this corresponds to  sext.w rd. C.ADDIW is
only valid when rd≠x0; the code points with rd=x0 are reserved.
C.ADDI16SP shares the opcode with C.LUI, but has a destination field of
x2. C.ADDI16SP adds the non-zero sign-extended 6-bit immediate to the
value in the stack pointer (sp=x2), where the immediate is scaled to
represent multiples of 16 in the range (-512,496). C.ADDI16SP is used to
adjust the stack pointer in procedure prologues and epilogues. It
expands into addi x2, x2, nzimm. C.ADDI16SP is only valid when
nzimm≠0; the code point with nzimm=0 is reserved.

In the standard RISC-V calling convention, the stack pointer sp is
always 16-byte aligned.


S
K
S
Y


8
3
2


C.ADDI4SPN
nzuimm[5:4|9:6|2|3]
dest
C0


C.ADDI4SPN is a CIW-format instruction that adds a zero-extended
non-zero immediate, scaled by 4, to the stack pointer, x2, and writes
the result to rd '. This instruction is used to generate pointers to
stack-allocated variables, and expands to addi rd ', x2, nzuimm.
C.ADDI4SPN is only valid when nzuimm≠0; the code points with
nzuimm=0 are reserved.


S
W
T
T
Y


1
5
5
2


C.SLLI
shamt[5]
dest≠0
shamt[4:0]
C2


C.SLLI is a CI-format instruction that performs a logical left shift of
the value in register rd then writes the result to rd. The shift
amount is encoded in the shamt field. For RV128C, a shift amount of
zero is used to encode a shift of 64. C.SLLI expands into
slli rd, rd, shamt, except for RV128C with shamt=0, which expands to
slli rd, rd, 64.
For RV32C, shamt[5] must be zero; the code points with
shamt[5]=1 are designated for custom extensions. For RV32C and
RV64C, the shift amount must be non-zero; the code points with shamt=0
are HINTs. For all base ISAs, the code points with rd=x0 are HINTs,
except those with shamt[5]=1 in RV32C.


S
W
Y
S
T
Y


1
2
3
5
2


C.SRLI
shamt[5]
C.SRLI
dest
shamt[4:0]
C1


C.SRAI
shamt[5]
C.SRAI
dest
shamt[4:0]
C1


C.SRLI is a CB-format instruction that performs a logical right shift of
the value in register rd ′ then writes the result to rd ′. The shift
amount is encoded in the shamt field. For RV128C, a shift amount of
zero is used to encode a shift of 64. Furthermore, the shift amount is
sign-extended for RV128C, and so the legal shift amounts are 1–31, 64,
and 96–127. C.SRLI expands into srli rd ', rd ', shamt, except for
RV128C with shamt=0, which expands to srli rd ', rd ', 64.
For RV32C, shamt[5] must be zero; the code points with
shamt[5]=1 are designated for custom extensions. For RV32C and
RV64C, the shift amount must be non-zero; the code points with shamt=0
are HINTs.
C.SRAI is defined analogously to C.SRLI, but instead performs an
arithmetic right shift. C.SRAI expands to srai rd ', rd ', shamt.

Left shifts are usually more frequent than right shifts, as left shifts
are frequently used to scale address values. Right shifts have therefore
been granted less encoding space and are placed in an encoding quadrant
where all other immediates are sign-extended. For RV128, the decision
was made to have the 6-bit shift-amount immediate also be sign-extended.
Apart from reducing the decode complexity, we believe right-shift
amounts of 96–127 will be more useful than 64–95, to allow extraction of
tags located in the high portions of 128-bit address pointers. We note
that RV128C will not be frozen at the same point as RV32C and RV64C, to
allow evaluation of typical usage of 128-bit address-space codes.


S
W
Y
S
T
Y


1
2
3
5
2


C.ANDI
imm[5]
C.ANDI
dest
imm[4:0]
C1


C.ANDI is a CB-format instruction that computes the bitwise AND of the
value in register rd ′ and the sign-extended 6-bit immediate, then
writes the result to rd ′. C.ANDI expands to andi rd ', rd ', imm.
Integer Register-Register Operations


E
T
T
Y


5
5
2


C.MV
dest≠0
src≠0
C2


C.ADD
dest≠0
src≠0
C2


These instructions use the CR format.
C.MV copies the value in register rs2 into register rd. C.MV expands
into add rd, x0, rs2. C.MV is only valid when rs2 ≠ x0; the code
points with rs2 = x0 correspond to the C.JR instruction. The code
points with rs2 ≠ x0 and rd = x0 are HINTs.

C.MV expands to a different instruction than the canonical MV
pseudoinstruction, which instead uses ADDI. Implementations that handle
MV specially, e.g. using register-renaming hardware, may find it more
convenient to expand C.MV to MV instead of ADD, at slight additional
hardware cost.

C.ADD adds the values in registers rd and rs2 and writes the result
to register rd. C.ADD expands into add rd, rd, rs2. C.ADD is only
valid when rs2 ≠ x0; the code points with rs2 = x0 correspond to
the C.JALR and C.EBREAK instructions. The code points with rs2 ≠ x0
and rd = x0 are HINTs.


M
S
Y
S
Y


3
2
3
2


C.AND
dest
C.AND
src
C1


C.OR
dest
C.OR
src
C1


C.XOR
dest
C.XOR
src
C1


C.SUB
dest
C.SUB
src
C1


C.ADDW
dest
C.ADDW
src
C1


C.SUBW
dest
C.SUBW
src
C1


These instructions use the CA format.
C.AND computes the bitwise AND of the values in registers rd ′ and
rs2 ′, then writes the result to register rd ′. C.AND expands into
and rd ', rd ', rs2 '.
C.OR computes the bitwise OR of the values in registers rd ′ and
rs2 ′, then writes the result to register rd ′. C.OR expands into
or rd ', rd ', rs2 '.
C.XOR computes the bitwise XOR of the values in registers rd ′ and
rs2 ′, then writes the result to register rd ′. C.XOR expands into
xor rd ', rd ', rs2 '.
C.SUB subtracts the value in register rs2 ′ from the value in register
rd ′, then writes the result to register rd ′. C.SUB expands into
sub rd ', rd ', rs2 '.
C.ADDW is an RV64C/RV128C-only instruction that adds the values in
registers rd ′ and rs2 ′, then sign-extends the lower 32 bits of the
sum before writing the result to register rd ′. C.ADDW expands into
addw rd ', rd ', rs2 '.
C.SUBW is an RV64C/RV128C-only instruction that subtracts the value in
register rs2 ′ from the value in register rd ′, then sign-extends
the lower 32 bits of the difference before writing the result to
register rd ′. C.SUBW expands into subw rd ', rd ', rs2 '.

This group of six instructions do not provide large savings
individually, but do not occupy much encoding space and are
straightforward to implement, and as a group provide a worthwhile
improvement in static and dynamic compression.

Defined Illegal Instruction


SW
T
T
Y


1
5
5


0
0
0
0


A 16-bit instruction with all bits zero is permanently reserved as an
illegal instruction.

We reserve all-zero instructions to be illegal instructions to help trap
attempts to execute zero-ed or non-existent portions of the memory
space. The all-zero value should not be redefined in any non-standard
extension. Similarly, we reserve instructions with all bits set to 1
(corresponding to very long instructions in the RISC-V variable-length
encoding scheme) as illegal to capture another common value seen in
non-existent memory regions.

NOP Instruction


SW
T
T
Y


1
5
5


C.NOP
0
0
0


C.NOP is a CI-format instruction that does not change any user-visible
state, except for advancing the pc and incrementing any applicable
performance counters. C.NOP expands to nop. C.NOP is only valid when
imm=0; the code points with imm≠0 encode HINTs.
Breakpoint Instruction


E
U
Y


10
2


C.EBREAK
0
C2


Debuggers can use the C.EBREAK instruction, which expands to ebreak,
to cause control to be transferred back to the debugging environment.
C.EBREAK shares the opcode with the C.ADD instruction, but with rd and
rs2 both zero, thus can also use the CR format.
Usage of C Instructions in LR/SC Sequences

On implementations that support the C extension, compressed forms of the
I instructions permitted inside constrained LR/SC sequences, as
described in Section [sec:lrscseq], are also permitted
inside constrained LR/SC sequences.

The implication is that any implementation that claims to support both
the A and C extensions must ensure that LR/SC sequences containing valid
C instructions will eventually complete.

HINT Instructions

A portion of the RVC encoding space is reserved for microarchitectural
HINTs. Like the HINTs in the RV32I base ISA (see
Section [sec:rv32i-hints]), these
instructions do not modify any architectural state, except for advancing
the pc and any applicable performance counters. HINTs are executed as
no-ops on implementations that ignore them.
RVC HINTs are encoded as computational instructions that do not modify
the architectural state, either because rd=x0 (e.g.
C.ADD x0, t0), or because rd is overwritten with a copy of itself
(e.g. C.ADDI t0, 0).

This HINT encoding has been chosen so that simple implementations can
ignore HINTs altogether, and instead execute a HINT as a regular
computational instruction that happens not to mutate the architectural
state.

RVC HINTs do not necessarily expand to their RVI HINT counterparts. For
example, C.ADD x0, a0 might not encode the same HINT as
ADD x0, x0, a0.

The primary reason to not require an RVC HINT to expand to an RVI HINT
is that HINTs are unlikely to be compressible in the same manner as the
underlying computational instruction. Also, decoupling the RVC and RVI
HINT mappings allows the scarce RVC HINT space to be allocated to the
most popular HINTs, and in particular, to HINTs that are amenable to
macro-op fusion.

Table 1.3 lists all RVC HINT code points.
For RV32C, 78% of the HINT space is reserved for standard HINTs. The
remainder of the HINT space is designated for custom HINTs: no standard
HINTs will ever be defined in this subspace.


Instruction
Constraints
Code Points
Purpose


C.NOP
nzimm≠0
63
Reserved for future standard use


C.ADDI
rd≠x0, nzimm=0
31


C.LI
rd=x0
64


C.LUI
rd=x0, nzimm≠0
63


C.MV
rd=x0, rs2≠x0
31


C.ADD
rd=x0, rs2≠x0, rs2≠x2–x5
27


C.ADD
rd=x0, rs2=x2–x5
4
(rs2=x2) C.NTL.P1


(rs2=x3) C.NTL.PALL


(rs2=x4) C.NTL.S1


(rs2=x5) C.NTL.ALL


C.SLLI
rd=x0, nzimm≠0
31 (RV32)
Designated for custom use


63 (RV64/128)


C.SLLI64
rd=x0
1


C.SLLI64
rd≠x0, RV32 and RV64 only
31


C.SRLI64
RV32 and RV64 only
8


C.SRAI64
RV32 and RV64 only
8


RVC HINT instructions.

RVC Instruction Set Listings

Table [rvcopcodemap] shows a map of the
major opcodes for RVC. Each row of the table corresponds to one quadrant
of the encoding space. The last quadrant, which has the two
least-significant bits set, corresponds to instructions wider than 16
bits, including those in the base ISAs. Several instructions are only
valid for certain operands; when invalid, they are marked either RES
to indicate that the opcode is reserved for future standard extensions;
Custom to indicate that the opcode is designated for custom
extensions; or HINT to indicate that the opcode is reserved for
microarchitectural hints (see
Section 1.7).
Tables [rvc-instr-table0]–[rvc-instr-table2] list the RVC
instructions.

  
## r32cinsn.md

      
    Raw
  

              r32cinsn.md
            
          
inst[15:13]
000
001
010
011
100
101
110
111


inst[1:0]


00
ADDI4SPN
FLD
LW
FLW
Reserved
FSD
SW
FSW
RV32


FLD

LD

FSD

SD
RV64


LQ

LD

SQ

SD
RV128


01
ADDI
JAL
LI
LUI/ADDI16SP
MISC-ALU
J
BEQZ
BNEZ
RV32


ADDIW


RV64


ADDIW


RV128


10
SLLI
FLDSP
LWSP
FLWSP
J[AL]R/MV/ADD
FSDSP
SWSP
FSWSP
RV32


FLDSP

LDSP

FSDSP

SDSP
RV64


LQSP

LDSP

SQSP

SDSP
RV128


11
>16b


000


0


0


00

Illegal instruction


000


nzuimm[5:4|9:6|2|3]


00

C.ADDI4SPN (RES, nzuimm=0)


001


uimm[5:3]


uimm[7:6]


00

C.FLD (RV32/64)


001


uimm[5:4|8]


uimm[7:6]


00

C.LQ (RV128)


010


uimm[5:3]


uimm[2|6]


00

C.LW


011


uimm[5:3]


uimm[2|6]


00

C.FLW (RV32)


011


uimm[5:3]


uimm[7:6]


00

C.LD (RV64/128)


100


—


00

Reserved


101


uimm[5:3]


uimm[7:6]


00

C.FSD (RV32/64)


101


uimm[5:4|8]


uimm[7:6]


00

C.SQ (RV128)


110


uimm[5:3]


uimm[2|6]


00

C.SW


111


uimm[5:3]


uimm[2|6]


00

C.FSW (RV32)


111


uimm[5:3]


uimm[7:6]


00

C.SD (RV64/128)


Instruction listing for RVC, Quadrant 0.


000


nzimm[5]
0


nzimm[4:0]


01

C.NOP (HINT, nzimm≠0)


000


nzimm[5]
rs1/rd≠0


nzimm[4:0]


01

C.ADDI (HINT, nzimm=0)


001


imm[11|4|9:8|10|6|7|3:1|5]


01

C.JAL (RV32)


001


imm[5]
rs1/rd≠0


imm[4:0]


01

C.ADDIW (RV64/128; RES, rd=0)


010


imm[5]
rd≠0


imm[4:0]


01

C.LI (HINT, rd=0)


011


nzimm[9]
2


nzimm[4|6|8:7|5]


01

C.ADDI16SP (RES, nzimm=0)


011


nzimm[17]
rd≠{0, 2}


nzimm[16:12]


01

C.LUI (RES, nzimm=0; HINT, rd=0)


100


nzuimm[5]
00

/


nzuimm[4:0]


01

C.SRLI (RV32 Custom, nzuimm[5]=1)


100


0
00

/


0


01

C.SRLI64 (RV128; RV32/64 HINT)


100


nzuimm[5]
01

/


nzuimm[4:0]


01

C.SRAI (RV32 Custom, nzuimm[5]=1)


100


0
01

/


0


01

C.SRAI64 (RV128; RV32/64 HINT)


100


imm[5]
10

/


imm[4:0]


01

C.ANDI


100


0
11

/


00


01

C.SUB


100


0
11

/


01


01

C.XOR


100


0
11

/


10


01

C.OR


100


0
11

/


11


01

C.AND


100


1
11

/


00


01

C.SUBW (RV64/128; RV32 RES)


100


1
11

/


01


01

C.ADDW (RV64/128; RV32 RES)


100


1
11

—


10

—


01

Reserved


100


1
11

—


11

—


01

Reserved


101


imm[11|4|9:8|10|6|7|3:1|5]


01

C.J


110


imm[8|4:3]


imm[7:6|2:1|5]


01

C.BEQZ


111


imm[8|4:3]


imm[7:6|2:1|5]


01

C.BNEZ


Instruction listing for RVC, Quadrant 1.


000


nzuimm[5]
rs1/rd≠0


nzuimm[4:0]


10

C.SLLI (HINT, rd=0; RV32 Custom, nzuimm[5]=1)


000


0
rs1/rd≠0


0


10

C.SLLI64 (RV128; RV32/64 HINT; HINT, rd=0)


001


uimm[5]
rd


uimm[4:3|8:6]


10

C.FLDSP (RV32/64)


001


uimm[5]
rd≠0


uimm[4|9:6]


10

C.LQSP (RV128; RES, rd=0)


010


uimm[5]
rd≠0


uimm[4:2|7:6]


10

C.LWSP (RES, rd=0)


011


uimm[5]
rd


uimm[4:2|7:6]


10

C.FLWSP (RV32)


011


uimm[5]
rd≠0


uimm[4:3|8:6]


10

C.LDSP (RV64/128; RES, rd=0)


100


0
rs1≠0


0


10

C.JR (RES, rs1=0)


100


0
rd≠0


rs2≠0


10

C.MV (HINT, rd=0)


100


1
0


0


10

C.EBREAK


100


1
rs1≠0


0


10

C.JALR


100


1
rs1/rd≠0


rs2≠0


10

C.ADD (HINT, rd=0)


101


uimm[5:3|8:6]


rs2


10

C.FSDSP (RV32/64)


101


uimm[5:4|9:6]


rs2


10

C.SQSP (RV128)


110


uimm[5:2|7:6]


rs2


10

C.SWSP


111


uimm[5:2|7:6]


rs2


10

C.FSWSP (RV32)


111


uimm[5:3|8:6]


rs2


10

C.SDSP (RV64/128)


Instruction listing for RVC, Quadrant 2.


## r32insn.md

      
    Raw
  

              r32insn.md
            
          
inst[4:2]
000
001
010
011
100
101
110
111


inst[6:5]


( > 32b)


00
LOAD
LOAD-FP
custom-0
MISC-MEM
OP-IMM
AUIPC
OP-IMM-32
48b


01
STORE
STORE-FP
custom-1
AMO
OP
LUI
OP-32
64b


10
MADD
MSUB
NMSUB
NMADD
OP-FP
OP-V
custom-2/rv128
48b


11
BRANCH
JALR
reserved
JAL
SYSTEM
reserved
custom-3/rv128
 ≥ 80b


funct7


rs2

rs1
funct3
rd
opcode
R-type


imm[11:0]


rs1
funct3
rd
opcode
I-type


imm[11:5]


rs2

rs1
funct3
imm[4:0]
opcode
S-type


imm[12|10:5]


rs2

rs1
funct3
imm[4:1|11]
opcode
B-type


imm[31:12]


rd
opcode
U-type


imm[20|10:1|11|19:12]


rd
opcode
J-type


RV32I Base Instruction Set


imm[31:12]


rd
0110111
LUI


imm[31:12]


rd
0010111
AUIPC


imm[20|10:1|11|19:12]


rd
1101111
JAL


imm[11:0]


rs1
000
rd
1100111
JALR


imm[12|10:5]


rs2

rs1
000
imm[4:1|11]
1100011
BEQ


imm[12|10:5]


rs2

rs1
001
imm[4:1|11]
1100011
BNE


imm[12|10:5]


rs2

rs1
100
imm[4:1|11]
1100011
BLT


imm[12|10:5]


rs2

rs1
101
imm[4:1|11]
1100011
BGE


imm[12|10:5]


rs2

rs1
110
imm[4:1|11]
1100011
BLTU


imm[12|10:5]


rs2

rs1
111
imm[4:1|11]
1100011
BGEU


imm[11:0]


rs1
000
rd
0000011
LB


imm[11:0]


rs1
001
rd
0000011
LH


imm[11:0]


rs1
010
rd
0000011
LW


imm[11:0]


rs1
100
rd
0000011
LBU


imm[11:0]


rs1
101
rd
0000011
LHU


imm[11:5]


rs2

rs1
000
imm[4:0]
0100011
SB


imm[11:5]


rs2

rs1
001
imm[4:0]
0100011
SH


imm[11:5]


rs2

rs1
010
imm[4:0]
0100011
SW


imm[11:0]


rs1
000
rd
0010011
ADDI


imm[11:0]


rs1
010
rd
0010011
SLTI


imm[11:0]


rs1
011
rd
0010011
SLTIU


imm[11:0]


rs1
100
rd
0010011
XORI


imm[11:0]


rs1
110
rd
0010011
ORI


imm[11:0]


rs1
111
rd
0010011
ANDI


0000000


shamt

rs1
001
rd
0010011
SLLI


0000000


shamt

rs1
101
rd
0010011
SRLI


0100000


shamt

rs1
101
rd
0010011
SRAI


0000000


rs2

rs1
000
rd
0110011
ADD


0100000


rs2

rs1
000
rd
0110011
SUB


0000000


rs2

rs1
001
rd
0110011
SLL


0000000


rs2

rs1
010
rd
0110011
SLT


0000000


rs2

rs1
011
rd
0110011
SLTU


0000000


rs2

rs1
100
rd
0110011
XOR


0000000


rs2

rs1
101
rd
0110011
SRL


0100000


rs2

rs1
101
rd
0110011
SRA


0000000


rs2

rs1
110
rd
0110011
OR


0000000


rs2

rs1
111
rd
0110011
AND


fm

pred


succ
rs1
000
rd
0001111
FENCE


1000

0011


0011
00000
000
00000
0001111
FENCE.TSO


0000

0001


0000
00000
000
00000
0001111
PAUSE


000000000000


00000
000
00000
1110011
ECALL


000000000001


00000
000
00000
1110011
EBREAK


funct7


rs2

rs1
funct3
rd
opcode
R-type


imm[11:0]


rs1
funct3
rd
opcode
I-type


imm[11:5]


rs2

rs1
funct3
imm[4:0]
opcode
S-type


RV64I Base Instruction Set (in addition to RV32I)


imm[11:0]


rs1
110
rd
0000011
LWU


imm[11:0]


rs1
011
rd
0000011
LD


imm[11:5]


rs2

rs1
011
imm[4:0]
0100011
SD


000000


shamt


rs1
001
rd
0010011
SLLI


000000


shamt


rs1
101
rd
0010011
SRLI


010000


shamt


rs1
101
rd
0010011
SRAI


imm[11:0]


rs1
000
rd
0011011
ADDIW


0000000


shamt

rs1
001
rd
0011011
SLLIW


0000000


shamt

rs1
101
rd
0011011
SRLIW


0100000


shamt

rs1
101
rd
0011011
SRAIW


0000000


rs2

rs1
000
rd
0111011
ADDW


0100000


rs2

rs1
000
rd
0111011
SUBW


0000000


rs2

rs1
001
rd
0111011
SLLW


0000000


rs2

rs1
101
rd
0111011
SRLW


0100000


rs2

rs1
101
rd
0111011
SRAW


RV32/RV64 Zifencei Standard Extension


imm[11:0]


rs1
001
rd
0001111
FENCE.I


RV32/RV64 Zicsr Standard Extension


csr


rs1
001
rd
1110011
CSRRW


csr


rs1
010
rd
1110011
CSRRS


csr


rs1
011
rd
1110011
CSRRC


csr


uimm
101
rd
1110011
CSRRWI


csr


uimm
110
rd
1110011
CSRRSI


csr


uimm
111
rd
1110011
CSRRCI


RV32M Standard Extension


0000001


rs2

rs1
000
rd
0110011
MUL


0000001


rs2

rs1
001
rd
0110011
MULH


0000001


rs2

rs1
010
rd
0110011
MULHSU


0000001


rs2

rs1
011
rd
0110011
MULHU


0000001


rs2

rs1
100
rd
0110011
DIV


0000001


rs2

rs1
101
rd
0110011
DIVU


0000001


rs2

rs1
110
rd
0110011
REM


0000001


rs2

rs1
111
rd
0110011
REMU


RV64M Standard Extension (in addition to RV32M)


0000001


rs2

rs1
000
rd
0111011
MULW


0000001


rs2

rs1
100
rd
0111011
DIVW


0000001


rs2

rs1
101
rd
0111011
DIVUW


0000001


rs2

rs1
110
rd
0111011
REMW


0000001


rs2

rs1
111
rd
0111011
REMUW


funct7


rs2

rs1
funct3
rd
opcode
R-type


RV32A Standard Extension


00010

aq
rl
00000

rs1
010
rd
0101111
LR.W


00011

aq
rl
rs2

rs1
010
rd
0101111
SC.W


00001

aq
rl
rs2

rs1
010
rd
0101111
AMOSWAP.W


00000

aq
rl
rs2

rs1
010
rd
0101111
AMOADD.W


00100

aq
rl
rs2

rs1
010
rd
0101111
AMOXOR.W


01100

aq
rl
rs2

rs1
010
rd
0101111
AMOAND.W


01000

aq
rl
rs2

rs1
010
rd
0101111
AMOOR.W


10000

aq
rl
rs2

rs1
010
rd
0101111
AMOMIN.W


10100

aq
rl
rs2

rs1
010
rd
0101111
AMOMAX.W


11000

aq
rl
rs2

rs1
010
rd
0101111
AMOMINU.W


11100

aq
rl
rs2

rs1
010
rd
0101111
AMOMAXU.W


RV64A Standard Extension (in addition to RV32A)


00010

aq
rl
00000

rs1
011
rd
0101111
LR.D


00011

aq
rl
rs2

rs1
011
rd
0101111
SC.D


00001

aq
rl
rs2

rs1
011
rd
0101111
AMOSWAP.D


00000

aq
rl
rs2

rs1
011
rd
0101111
AMOADD.D


00100

aq
rl
rs2

rs1
011
rd
0101111
AMOXOR.D


01100

aq
rl
rs2

rs1
011
rd
0101111
AMOAND.D


01000

aq
rl
rs2

rs1
011
rd
0101111
AMOOR.D


10000

aq
rl
rs2

rs1
011
rd
0101111
AMOMIN.D


10100

aq
rl
rs2

rs1
011
rd
0101111
AMOMAX.D


11000

aq
rl
rs2

rs1
011
rd
0101111
AMOMINU.D


11100

aq
rl
rs2

rs1
011
rd
0101111
AMOMAXU.D


funct7


rs2

rs1
funct3
rd
opcode
R-type


rs3

funct2

rs2

rs1
funct3
rd
opcode
R4-type


imm[11:0]


rs1
funct3
rd
opcode
I-type


imm[11:5]


rs2

rs1
funct3
imm[4:0]
opcode
S-type


RV32F Standard Extension


imm[11:0]


rs1
010
rd
0000111
FLW


imm[11:5]


rs2

rs1
010
imm[4:0]
0100111
FSW


rs3

00

rs2

rs1
rm
rd
1000011
FMADD.S


rs3

00

rs2

rs1
rm
rd
1000111
FMSUB.S


rs3

00

rs2

rs1
rm
rd
1001011
FNMSUB.S


rs3

00

rs2

rs1
rm
rd
1001111
FNMADD.S


0000000


rs2

rs1
rm
rd
1010011
FADD.S


0000100


rs2

rs1
rm
rd
1010011
FSUB.S


0001000


rs2

rs1
rm
rd
1010011
FMUL.S


0001100


rs2

rs1
rm
rd
1010011
FDIV.S


0101100


00000

rs1
rm
rd
1010011
FSQRT.S


0010000


rs2

rs1
000
rd
1010011
FSGNJ.S


0010000


rs2

rs1
001
rd
1010011
FSGNJN.S


0010000


rs2

rs1
010
rd
1010011
FSGNJX.S


0010100


rs2

rs1
000
rd
1010011
FMIN.S


0010100


rs2

rs1
001
rd
1010011
FMAX.S


1100000


00000

rs1
rm
rd
1010011
FCVT.W.S


1100000


00001

rs1
rm
rd
1010011
FCVT.WU.S


1110000


00000

rs1
000
rd
1010011
FMV.X.W


1010000


rs2

rs1
010
rd
1010011
FEQ.S


1010000


rs2

rs1
001
rd
1010011
FLT.S


1010000


rs2

rs1
000
rd
1010011
FLE.S


1110000


00000

rs1
001
rd
1010011
FCLASS.S


1101000


00000

rs1
rm
rd
1010011
FCVT.S.W


1101000


00001

rs1
rm
rd
1010011
FCVT.S.WU


1111000


00000

rs1
000
rd
1010011
FMV.W.X


RV64F Standard Extension (in addition to RV32F)


1100000


00010

rs1
rm
rd
1010011
FCVT.L.S


1100000


00011

rs1
rm
rd
1010011
FCVT.LU.S


1101000


00010

rs1
rm
rd
1010011
FCVT.S.L


1101000


00011

rs1
rm
rd
1010011
FCVT.S.LU


funct7


rs2

rs1
funct3
rd
opcode
R-type


rs3

funct2

rs2

rs1
funct3
rd
opcode
R4-type


imm[11:0]


rs1
funct3
rd
opcode
I-type


imm[11:5]


rs2

rs1
funct3
imm[4:0]
opcode
S-type


RV32D Standard Extension


imm[11:0]


rs1
011
rd
0000111
FLD


imm[11:5]


rs2

rs1
011
imm[4:0]
0100111
FSD


rs3

01

rs2

rs1
rm
rd
1000011
FMADD.D


rs3

01

rs2

rs1
rm
rd
1000111
FMSUB.D


rs3

01

rs2

rs1
rm
rd
1001011
FNMSUB.D


rs3

01

rs2

rs1
rm
rd
1001111
FNMADD.D


0000001


rs2

rs1
rm
rd
1010011
FADD.D


0000101


rs2

rs1
rm
rd
1010011
FSUB.D


0001001


rs2

rs1
rm
rd
1010011
FMUL.D


0001101


rs2

rs1
rm
rd
1010011
FDIV.D


0101101


00000

rs1
rm
rd
1010011
FSQRT.D


0010001


rs2

rs1
000
rd
1010011
FSGNJ.D


0010001


rs2

rs1
001
rd
1010011
FSGNJN.D


0010001


rs2

rs1
010
rd
1010011
FSGNJX.D


0010101


rs2

rs1
000
rd
1010011
FMIN.D


0010101


rs2

rs1
001
rd
1010011
FMAX.D


0100000


00001

rs1
rm
rd
1010011
FCVT.S.D


0100001


00000

rs1
rm
rd
1010011
FCVT.D.S


1010001


rs2

rs1
010
rd
1010011
FEQ.D


1010001


rs2

rs1
001
rd
1010011
FLT.D


1010001


rs2

rs1
000
rd
1010011
FLE.D


1110001


00000

rs1
001
rd
1010011
FCLASS.D


1100001


00000

rs1
rm
rd
1010011
FCVT.W.D


1100001


00001

rs1
rm
rd
1010011
FCVT.WU.D


1101001


00000

rs1
rm
rd
1010011
FCVT.D.W


1101001


00001

rs1
rm
rd
1010011
FCVT.D.WU


RV64D Standard Extension (in addition to RV32D)


1100001


00010

rs1
rm
rd
1010011
FCVT.L.D


1100001


00011

rs1
rm
rd
1010011
FCVT.LU.D


1110001


00000

rs1
000
rd
1010011
FMV.X.D


1101001


00010

rs1
rm
rd
1010011
FCVT.D.L


1101001


00011

rs1
rm
rd
1010011
FCVT.D.LU


1111001


00000

rs1
000
rd
1010011
FMV.D.X


funct7


rs2

rs1
funct3
rd
opcode
R-type


rs3

funct2

rs2

rs1
funct3
rd
opcode
R4-type


imm[11:0]


rs1
funct3
rd
opcode
I-type


imm[11:5]


rs2

rs1
funct3
imm[4:0]
opcode
S-type


RV32Q Standard Extension


imm[11:0]


rs1
100
rd
0000111
FLQ


imm[11:5]


rs2

rs1
100
imm[4:0]
0100111
FSQ


rs3

11

rs2

rs1
rm
rd
1000011
FMADD.Q


rs3

11

rs2

rs1
rm
rd
1000111
FMSUB.Q


rs3

11

rs2

rs1
rm
rd
1001011
FNMSUB.Q


rs3

11

rs2

rs1
rm
rd
1001111
FNMADD.Q


0000011


rs2

rs1
rm
rd
1010011
FADD.Q


0000111


rs2

rs1
rm
rd
1010011
FSUB.Q


0001011


rs2

rs1
rm
rd
1010011
FMUL.Q


0001111


rs2

rs1
rm
rd
1010011
FDIV.Q


0101111


00000

rs1
rm
rd
1010011
FSQRT.Q


0010011


rs2

rs1
000
rd
1010011
FSGNJ.Q


0010011


rs2

rs1
001
rd
1010011
FSGNJN.Q


0010011


rs2

rs1
010
rd
1010011
FSGNJX.Q


0010111


rs2

rs1
000
rd
1010011
FMIN.Q


0010111


rs2

rs1
001
rd
1010011
FMAX.Q


0100000


00011

rs1
rm
rd
1010011
FCVT.S.Q


0100011


00000

rs1
rm
rd
1010011
FCVT.Q.S


0100001


00011

rs1
rm
rd
1010011
FCVT.D.Q


0100011


00001

rs1
rm
rd
1010011
FCVT.Q.D


1010011


rs2

rs1
010
rd
1010011
FEQ.Q


1010011


rs2

rs1
001
rd
1010011
FLT.Q


1010011


rs2

rs1
000
rd
1010011
FLE.Q


1110011


00000

rs1
001
rd
1010011
FCLASS.Q


1100011


00000

rs1
rm
rd
1010011
FCVT.W.Q


1100011


00001

rs1
rm
rd
1010011
FCVT.WU.Q


1101011


00000

rs1
rm
rd
1010011
FCVT.Q.W


1101011


00001

rs1
rm
rd
1010011
FCVT.Q.WU


RV64Q Standard Extension (in addition to RV32Q)


1100011


00010

rs1
rm
rd
1010011
FCVT.L.Q


1100011


00011

rs1
rm
rd
1010011
FCVT.LU.Q


1101011


00010

rs1
rm
rd
1010011
FCVT.Q.L


1101011


00011

rs1
rm
rd
1010011
FCVT.Q.LU


funct7


rs2

rs1
funct3
rd
opcode
R-type


rs3

funct2

rs2

rs1
funct3
rd
opcode
R4-type


imm[11:0]


rs1
funct3
rd
opcode
I-type


imm[11:5]


rs2

rs1
funct3
imm[4:0]
opcode
S-type


RV32Zfh Standard Extension


imm[11:0]


rs1
001
rd
0000111
FLH


imm[11:5]


rs2

rs1
001
imm[4:0]
0100111
FSH


rs3

10

rs2

rs1
rm
rd
1000011
FMADD.H


rs3

10

rs2

rs1
rm
rd
1000111
FMSUB.H


rs3

10

rs2

rs1
rm
rd
1001011
FNMSUB.H


rs3

10

rs2

rs1
rm
rd
1001111
FNMADD.H


0000010


rs2

rs1
rm
rd
1010011
FADD.H


0000110


rs2

rs1
rm
rd
1010011
FSUB.H


0001010


rs2

rs1
rm
rd
1010011
FMUL.H


0001110


rs2

rs1
rm
rd
1010011
FDIV.H


0101110


00000

rs1
rm
rd
1010011
FSQRT.H


0010010


rs2

rs1
000
rd
1010011
FSGNJ.H


0010010


rs2

rs1
001
rd
1010011
FSGNJN.H


0010010


rs2

rs1
010
rd
1010011
FSGNJX.H


0010110


rs2

rs1
000
rd
1010011
FMIN.H


0010110


rs2

rs1
001
rd
1010011
FMAX.H


0100000


00010

rs1
rm
rd
1010011
FCVT.S.H


0100010


00000

rs1
rm
rd
1010011
FCVT.H.S


0100001


00010

rs1
rm
rd
1010011
FCVT.D.H


0100010


00001

rs1
rm
rd
1010011
FCVT.H.D


0100011


00010

rs1
rm
rd
1010011
FCVT.Q.H


0100010


00011

rs1
rm
rd
1010011
FCVT.H.Q


1010010


rs2

rs1
010
rd
1010011
FEQ.H


1010010


rs2

rs1
001
rd
1010011
FLT.H


1010010


rs2

rs1
000
rd
1010011
FLE.H


1110010


00000

rs1
001
rd
1010011
FCLASS.H


1100010


00000

rs1
rm
rd
1010011
FCVT.W.H


1100010


00001

rs1
rm
rd
1010011
FCVT.WU.H


1110010


00000

rs1
000
rd
1010011
FMV.X.H


1101010


00000

rs1
rm
rd
1010011
FCVT.H.W


1101010


00001

rs1
rm
rd
1010011
FCVT.H.WU


1111010


00000

rs1
000
rd
1010011
FMV.H.X


RV64Zfh Standard Extension (in addition to RV32Zfh)


1100010


00010

rs1
rm
rd
1010011
FCVT.L.H


1100010


00011

rs1
rm
rd
1010011
FCVT.LU.H


1101010


00010

rs1
rm
rd
1010011
FCVT.H.L


1101010


00011

rs1
rm
rd
1010011
FCVT.H.LU


Instruction listing for RISC-V

funct7	rs2	rs1	funct3	rd	opcode	R-type

imm[11:0]		rs1	funct3	rd	opcode	I-type

imm[11:5]	rs2	rs1	funct3	imm[4:0]	opcode	S-type

imm[31:12]				rd	opcode	U-type

— inst[31] —				inst[30:25]	inst[24:21]	inst[20]	I-immediate

— inst[31] —				inst[30:25]	inst[11:8]	inst[7]	S-immediate

— inst[31] —			inst[7]	inst[30:25]	inst[11:8]	0	B-immediate

inst[31]	inst[30:20]	inst[19:12]	— 0 —				U-immediate

— inst[31] —		inst[19:12]	inst[20]	inst[30:25]	inst[24:21]	0	J-immediate
M	R	S	R	O

	5	3	5	7
I-immediate[11:0]	src	ADDI/SLTI[U]	dest	OP-IMM
I-immediate[11:0]	src	ANDI/ORI/XORI	dest	OP-IMM
S	R	R	S	R	O

	5	5	3	5	7
0000000	shamt[4:0]	src	SLLI	dest	OP-IMM
0000000	shamt[4:0]	src	SRLI	dest	OP-IMM
0100000	shamt[4:0]	src	SRAI	dest	OP-IMM
S	R	R	S	R	O

	5	5	3	5	7
0000000	src2	src1	ADD/SLT[U]	dest	OP
0000000	src2	src1	AND/OR/XOR	dest	OP
0000000	src2	src1	SLL/SRL	dest	OP
0100000	src2	src1	SUB/SRA	dest	OP
rd is `x1`/`x5`	rs1 is `x1`/`x5`	rd=rs1	RAS action
No	No	–	None
No	Yes	–	Pop
Yes	No	–	Push
Yes	Yes	No	Pop, then push
Yes	Yes	Yes	Push
R	F	F	R	R	F	S

6	5	5	3	4	1	7
src2	src1	BEQ/BNE		BRANCH
src2	src1	BLT[U]		BRANCH
src2	src1	BGE[U]		BRANCH
fm field	Mnemonic	Meaning
0000	none	Normal Fence
1000	TSO	With FENCE RW,RW: exclude write-to-read ordering
		Otherwise: Reserved for future use.
other		Reserved for future use.
\|l\|l\|c\|l\| Instruction	Constraints	Code Points	Purpose
LUI	rd=`x0`	2²⁰
AUIPC	rd=`x0`	2²⁰
	rd=`x0`, and either
	rs1≠`x0` or imm≠0
ANDI	rd=`x0`	2¹⁷
ORI	rd=`x0`	2¹⁷
XORI	rd=`x0`	2¹⁷
ADD	rd=`x0`, rs1≠`x0`	2¹⁰ − 32
	rd=`x0`, rs1=`x0`,
	rs2≠`x2`–`x5`
			(rs2=`x2`) NTL.P1
			(rs2=`x3`) NTL.PALL
			(rs2=`x4`) NTL.S1
			(rs2=`x5`) NTL.ALL
SUB	rd=`x0`	2¹⁰
AND	rd=`x0`	2¹⁰
OR	rd=`x0`	2¹⁰
XOR	rd=`x0`	2¹⁰
SLL	rd=`x0`	2¹⁰
SRL	rd=`x0`	2¹⁰
SRA	rd=`x0`	2¹⁰
	rd=`x0`, rs1≠`x0`,
	fm=0, and either
	pred=0 or succ=0
	rd≠`x0`, rs1=`x0`,
	fm=0, and either
	pred=0 or succ=0
	rd=rs1=`x0`, fm=0,
	pred=0, succ≠0
	rd=rs1=`x0`, fm=0,
	pred≠W, succ=0
	rd=rs1=`x0`, fm=0,
	pred=W, succ=0
SLTI	rd=`x0`	2¹⁷
SLTIU	rd=`x0`	2¹⁷
SLLI	rd=`x0`	2¹⁰
SRLI	rd=`x0`	2¹⁰
SRAI	rd=`x0`	2¹⁰
SLT	rd=`x0`	2¹⁰
SLTU	rd=`x0`	2¹⁰
S	R	R	S	R	O

	5	5	3	5	7
MULDIV	multiplier	multiplicand	MUL/MULH[[S]U]	dest	OP
MULDIV	multiplier	multiplicand	MULW	dest	OP-32