sorear/capnotes.txt Secret

## capnotes.txt
1.1. CHERI Concepts and Terminology

Paragraphs 2-4 discuss CHERI mechanisms in an abstract sense, but are not very
meaningful until capabilities as unforgeable tokens are introduced.

Introduction mentions "temporal memory protection" but the temporal mechanisms
from ISAv9 appear to be absent.

1.2. CHERI Extensions to RISC-V

Extension names are bogus (_ is not a legal character, Zc* should be a
compressed extension, Zcheri_pte should be a S* extension because it is only
visible in the privileged architecture).

1.3. Risks and Known Uncertainty

XLENMAX used before definition.

1.3.2. Incompatible Extensions

AFAICT, landing pads are completely orthogonal to CHERI and mitigate certain
type confusion vulnerabilities that CHERI does not.  I see no reason to forbid
their use.

Pointer masking and shadow stacks are used for spatial memory safety and as
such are redundant with CHERI purecap mode, however they are low (hardware)
cost mechanisms that are still useful for hybrid software.  Forbidding
simultaneous hardware support will make it more difficult for hardware vendors
to enable CHERI support in some cases.  Can we simply say that in capability
mode, xSSE is treated as 0 and pointer masking is disabled regardless of other
registers, and shadow stack instructions in legacy mode are authorized by ddc?

1.3.3. Suggested Mnemonic Renaming

P is used by the P extension, in particular PADD.[BH].

C suffix might work, but prefix C is fine.  SC is not ambiguous as long as it
lacks a dotted length.

Chapter 2. Anatomy of Capabilities in Zcheri_purecap

"XLENMAX" as defined is identical to "DXLEN" in the debug spec: DXLEN Debug
XLEN, which is the widest XLEN a hart supports, ignoring the current value of
MXL in misa.

2.1.1. Tag

Mention that CLEN-aligned blocks of vector data have tags if Zcheri_vectorcap.

Clarify that if the tag is cleared, a location contains CLEN bits of binary
data uninterpreted by hardware with no restrictions on "reserved" fields, i.e.
it is inappropriate to treat reserved bits in the register file as read-only
zero.

2.1.2. Architectural Permissions (AP)

Clarify that ASR applies while executing with a capability in pcc.

"Reserved encoding" fails to communicate who is responsible for what - what
should hardware do if a reserved encoding exists in memory or is passed to
CBUILDCAP?

2.1.5. Bounds

The spec is expected to serve as a programmer's manual, which means that there
needs to be a concise and comprehensible summary of what combinations of top,
base, and address are representable.  It took me almost an entire day to figure
out from the encoding pseudocode what the rules are, and there should be an
explanation of the data model separate from the encoding.

Clarify that the top can be 2^XLENMAX.

The following is my current best theory about the behavior.

    Every length value L between 0 and 2^XLENMAX (inclusive) determines a
    bounds alignment BA(L) and a representable region alignment RA(L).
    A capability with base B and length L is representable if B % BA = 0,
    L % BA = 0, and -1 <= A / RA - B / RA <= 6; if RA > 2^XLEN/8, the last
    check is skipped.  The CRAM instruction given L rounds L up to a
    representable value, then returns BA(L).

    EXPONENT    LEN[32]     BA[32]  RA[32]  LEN[64]     BA[64]  RA[64]
    embedded    [0,512)     1       128     [0,4K)      1       2K
    0                                       [4K,8K)     8       2K
    1           [512,1K)    8       256     [8K,16K)    16      4K
    2           [1K,2K)     16      512     [16K,32K)   32      8K
    ...
    24          [4G,8G)     64M     2G      [64G,128G)  128M    32G
    ...

2.1.7. Reserved Bits

Clarify expected behavior if one of these is set by the memory system, and that
they can be used by future specifications.  At least we don't have to worry
about CBUILDCAP setting them.

2.2. Capability Encoding

The graph is close enough to scale to be misleading (AP is not twice as wide as
EF, etc).

This entire section is extremely confusing.  Please pick _one_ of pseudo-code,
tables, and prose, and define all symbols locally (MW is missed).  Explain both
the forward and reverse processes, and do so in terms of the unique expression
of the length as a floating point value; writing the check entirely without
reference to the length is useful in hardware optimization but serves only to
obfuscate here.  Adopt the C convention that X < Y is 1 or 0 and then the
tables can be replaced with ct := (Tc < R) - (Ac < R); cb := (Bc < R) - (Ac <
R).

I think "arithmetic is signed" should be "arithmetic is unsigned".

If t and b are consistently treated as XLENMAX+2 bit values the overflow
pseudocode can be simplified.

2.4. Representable Limit Check

The encoding depends on the leading 1 of the length, not the exponent.

2.5. Malformed Capability Bounds

General question: are capabilities interpreted as unbounded intervals, or as
sets of modularly reduced addresses?  Should two capabilities be able to exist
which represent the same set of addresses?  Are subset checks, in CTESTSUBSET
and CBUILDCAP, performed using an subinterval check or a set-theotic subset
check?

A capability with B != T and E = CAP_MAX_E has a length greater than 2^XLENMAX
and contains some addresses more than once.  A capability with B = T != 0 and E
= CAP_MAX_E includes the entire address space as a single interval, and is
set-theoretically equivalent to Infinity.  Both cases cannot be constructed as
subintervals of Infinity, but are set-theoretic subsets of Infinity.  Since
they are redundant as sets and not meaningful outside a context where Infinity
is present, I propose to make both malformed.

A capability can exist with LEN < 2^XLEN, BASE + LEN >= 2^XLEN.  Such a
capability is capable of representing a nontrivial subset of the address space
and may be useful in some applications, however extremely few environments will
have memory regions spanning the address space wraparound and allowing
nontrivial capabilities to wrap around the address space substantially
complicates subset checks.  I propose to disallow them for simplicity.

Under RV32, a capability with EF = 0 and E = 0 (with length 256 .. 508 in steps
of 4) is redundant with a capability with EF = 1 and T8 = 1 (representing a
length of 256 .. 511 in steps of 1), both as a set and as an interval, and
there is no reason to use the former.  I propose to make it malformed.

Incorporating all of the above checks,

    malformedMSB = (E == CAP_MAX_E   && B[MW-1:MW-2] != 0)
                || (E == CAP_MAX_E-1 && B[MW-1] != 0)
                || (t[XLENMAX] != 0 && t[XLENMAX-1:0] != 0)
    malformedLSB = !EF && (XLEN == 32 ? E <= 0 : E < 0)
    malformed = malformedMSB || malformedLSB

Are zero-length bounds well-formed?  They are meaningful as intervals and
useful in applications in a wide variety of environments (as unforgeable
tokens, as an alternative to sealed capabilities) and emerge naturally from
CSETBOUNDS; I think they should remain valid despite representing the empty
set, but this should be called out explicitly.

We need a precise definition of the subset (subinterval?) check performed by
CTESTSUBSET and CBUILDCAP; it should be documented here since checks can be
moved between malformed-bounds and subset.

3. Integrating Zcheri_purecap with the RISC-V Base Integer Instruction Set

It is unclear if Zcheri_purecap can be considered an extension or if it is
better considered a new base ISA.  A privileged architecture has a great deal
of freedom to say what a memory instruction is allowed to do.

3.3.3. Capability Load and Store Instructions

The purpose of load/store address misaligned exceptions is to allow a M-mode
trap handler to perform a minimum of checks before emulating an operation which
would succeed except for alignment.  Since CLC/CSC to a misaligned address is a
fatal exception, it should be generated as an access fault (mcause 5 or 7).

3.3.4. Unconditional Integer Address Jumps

Clarify a specific purpose for this if it has one; it appears to be a relic of
the pre-modal design.  Calls between the M=0 and M=1 worlds can be done with
cjalr and cmodeswitch.

3.4.1. Integer Computational Instructions

auipcc instructions used for addressing generate a value +- 2 KiB of the target
symbol.  For RV32, for pcc lengths no less than 2KiB and less than 4KiB, the
representable region may not extend the full 2KiB before the base of the pcc,
and so the auipcc intermediate value may not be representable.  This is not an
issue for code segments of less than 2 KiB, as any internal reference will have
an auipc immediate of 0, which is always representable.  A 2 KiB code segment
is always followed by at least 2 KiB of representable addresses; the only
problematic references are those between -2KiB and -3KiB of pcc, which will be
rounded to -4KiB and exceed the representable region.

A linker can work around this by adding up to 1 KiB of padding to the beginning
of code segments between 2 KiB and 4 KiB, to a target size of 4 KiB, but this
may result in significant wasted space for some distributions of code segment
sizes.  Alternatively it can be addressed in the ISA by adding a "s2048pcc cd"
instruction which is the same size as an auipcc and can be used instead of
auipcc to handle PCREL_HI20 relocations -2 to -3 KiB before pcc.

3.4.2. Control Transfer Instructions

Add "the target capability is sealed and the immediate is not zero" to the list
of exception causes.

3.6. Control and Status Registers (CSRs)

Do we need to renumber all of these in purecap?  rv64 extends them while
keeping the number the same; the standard CSR space is finite and allocating 17
of them (inc. vstvecc vsscratchc vsepcc) seems a bit much.

Specify that writes to WARL capability registers (mtvecc, stvecc, mepcc, sepcc,
ddc, jvtc) may only be legalized to a capability value which is either validly
derived from the written capability by a CSETADDR-like operation, or to a
capability with a clear tag.  Propose that any implementation may, at its
discretion, clear the tag if the written value was not legal.

3.7.4. Machine Trap-Vector Base-Address Capability Registers (mtvecc)

Note that if an implementation has no interrupts higher than 31 (RV32) or 511
(RV64), and imposes an alignment of 2 + ceil(log(interrupts)) on vectored-mode
mtvecc, the required representability check is vacuous.

3.7.6. Machine Scratch Register Capability (mscratchc)

Do we need to specify a reset value?  We don't (can't) specify reset values for
DRAM; it's expected that software will initialize most state, and most CSRs in
the base ISA do not have specified reset values.

Similar comments apply to sscratchc, stvecc, sepcc, and mepcc, since they will
not be used until M-mode executes mret and it can ensure they are initialized
first.  mtvecc should either not be initialized (since M-mode will initialize
it before enabling interrupts or executing code that may take exceptions) or
initialize to Null to trap errors; the initial Infinity capability is provided
by pcc, which must have a reset value in any case.

3.7.8. Machine Exception Program Counter Capability (mepcc)

Am I correct to understand that no change to the bottom MW-3 bits of a
capability's address (7 on rv32, 11 on rv64) can affect representability?  So
clearing bit 1 if IALIGN=32 can never create an unrepresentable capability.

Clarify that xepcc will never be implicitly written with a sealed capability,
so trap handlers need not support receiving one.

3.7.11. Machine Trap Value Register (mtval)

Is the 16-bit structure a MIPSism?  The CHERI use of mtval is inconsistent with
the handling of all traps in the base ISA; the consistent approach would be to
decide how many causes to distinguish, and then provide that information - and
only that information - directly in mcause.  It is likely that some form of
"extended cause" will be added later for things like page faults, no
CHERI-specific extended cause should be added without consulting ARC.  Under
base ISA rules, mtval must be able to hold any address if it has any nonzero
values, so using mtval for this instead of a cause or extended cause increases
architectural state.

3.9. Unprivileged CSRs

"to accesss unprivileged CSRs"

3.9.1. Program Counter Capability (pcc)

Strange to put this in the CSR space when there is no unprivileged pc CSR, and
no precedent for CSRs that read pc or otherwise synchronize on read with the
instruction stream as a whole.  An encoding for auipcc 0?

3.10. CHERI Exception handling

"any byte of 16-bit instruction at target out of pcc bounds" -> "minimal length
instruction"

Per 8.2.5, CJALR unseals the target capability, but the table claims that a
sealed indirect jump target causes an exception.

CBO.ZERO should not require R-permission, since it is used to optimize pure
write sequences.

3.11. Physical Memory Attributes (PMA)

"When the hart attempts to store or load data with the tag set to memory
regions that are not taggable," If the memory is not taggable, it does not make
sense to "load data with the tag set". Is this intended to forbid the use of
the LC instruction on untaggable memory, or to say that untaggable memory
always reads tag bits as zero?

3.12.1. Invalid Address Handling

The mention of Sv* modes is a red herring; "virtual addresses" for the purposes
of mepc legal values include physical addresses, and valid physical addresses
are platform dependant.  It would also be possible to remove this entire
facility; the base ISA feature exists to reduce architectural state in the very
smallest implementations, like a rv32e with 16 bit physical (thus virtual)
addresses, but for the smallest CHERI implementation the proportional savings
is necessarily much smaller.

In any event the base ISA licenses implementations to lose all information when
storing an out of range value, and we should do the same thing - if A's address
is not valid and cannot be held or if A is invalid for other reasons, replace
the entire capability with one that can be held and has the tag clear, such as
Null.  Note that this authorizes implementations to not store reserved bits for
CSRs that hold a capability.

This also regularizes the handling of sealed capabilities in xepcc; if the CSR
value changes, it will be replaced by something with the tag clear and thus not
a sealing violation.

4.1. Debug Mode

(RISC-V, YEAR) is a very unhelpful way to format a reference.

4.2.2. Debug Program Counter Capability (dpcc)

pcc cannot be used as a source of Infinity because accessing the actual PC
during debug execution is not allowed.  This is necessary to abstract over
implementations where debugging feeds the pipeline directly and dpcc is pcc, or
implementations where the program buffer is visible in physical memory and
subject to normal instruction fetches, and dpcc is a separate register storing
pcc prior to debug entry.

5.1. Extending the Page Table Entry Format

The language around CD and memory ordering needs to be harmonized with Svadu.

Since type-agnostic data copy routines may be called implicitly by compilers
for non-volatile data, and such routines will use SC for all aligned regions
(adding CGETTAG and branches doubles the instruction count), on implementations
where CW=0 pages fault on all SC instructions CW=0 is inappropriate for
allowing applications access to non-tagged or address-space-shared memory.  Is
the intention to use CW=0 pages for revocation and C=0 capabilities for access
control?  If so this can be made explicit.

6. "Zcheri_legacy" Extension for CHERI Legacy Mode

This extension is trying to do two different things and succeeding at neither
of them.

You cannot use Zcheri_legacy to run a capability-naive software stack in the
widest supported XLEN, because even with CME=0 U-mode can write something
unexpected into ddc, which will then affect S-mode execution because stdc is
not automatically used.  Instant compromise.

You cannot use Zcheri_legacy to implement a sensible hybrid ABI because the
capability-base loads and stores were all removed; the only way to access
capability data in the hybrid ABI is to constantly change ddc.  The cost of
implementing Zcheri_mode is likely to be much lower than the hybrid-ABI
features of Zcheri_legacy (only requiring one new state bit, one instruction,
and a few single wires for plumbing, as opposed to three full width CSRs and
a new class of jumps).

I propose to commit to the former purpose exclusively.  At any privilege level
where Legacy mode is in effect, no capability state is explicitly accessible.
There is no need for CSR aliases since the current mode determines whether they
are accessed as capabilities or as addresses, no need for Legacy-specific trap
handling, and no need for the pcc CSR since in Purecap mode it can be accessed
using auipcc; this saves 0.6% of the standard CSR space, and allows running
capability-naive user and system software safely.  Zcheri_mode is required to
run a capability-aware system using a legacy ABI; the new Zcheri_mode +
Zcheri_legacy is likely to be cheaper than the current Zcheri_legacy.

It would be possible to salvage the current Zcheri_legacy design by adding a
full CHERI feature disable bit to xenvcfg; I do not consider this further due
to the superiority of Zcheri_mode.

6.1. CHERI Execution Mode

The current CHERI execution mode depends on XLEN in all privilege modes.

Execution of capability instructions in Legacy mode is reserved, and will
result in an illegal instruction exception if Ssstrict is implemented.
CMODESWITCH and C.CMODESWITCH are exceptions.

6.2.1. Capability Load and Store Instructions

Since capability state cannot be accessed, the new Zcheri_legacy does not
provide LC and SC in Legacy mode; the encoding is reserved.

6.2.2. Unconditional Capability Jumps

Since capability state cannot be accessed, the new Zcheri_legacy does not
provide JALR.CAP in Legacy mode; the encoding is reserved.

6.3.4. CSR Instructions

The new Zcheri_legacy does not introduce aliases for CSRs.  The write width for
CLEN-bit CSRs depends on on the execution mode.

XLEN-bit writes to a CLEN-bit CSR perform a CSETADDR-like operation followed by
WARL legalization, including invalid address conversion and tag clearing if
applicable.  The presence of a CSETADDR is regrettably necessary here, since a
capability-naive S-mode may perform writes to stvecc and sepcc, and modifying
traps and trap returns depending on the execution mode would introduce far too
much complexity.  We may be able to simplify this by allowing the CSETADDR-like
operation to always clear the tag unless the old capability had maximal bounds.

XLEN-bit results from CSR instructions are sign extended to XLENMAX, not zero
extended.  (This is an errata applicable to the full Zcheri_legacy.)

6.5. Debug Default Data Capability (dddc)

I do not believe this is needed.  Program buffer execution is always in
Capability mode, since DXLEN and XLENMAX are synonyms and debug mode is not
affected by menvcfg, and as such does not depend on ddc, so ddc can be
manipulated by the debugger as if it were any other CSR.

6.6. Disabling CHERI Features

This section is redundant with the new Zcheri_legacy since "CHERI features
disabled" and "Legacy mode" are collapsed into a single concept, and
XLEN < XLENMAX causes the use of Legacy mode automatically.

6.7. Added CLEN-wide CSRs

ddc exists but is only directly accessible in Capability mode.  dddc, stdc, and
mtdc are no longer needed.

6.7.1. Machine ISA Register (misa)

Changing MXLEN changes the execution mode.

6.7.2. Machine Status Registers (mstatus and mstatush)

The base ISA zeros bits above XLEN in memory addresses to present a full
XLEN-bit address space to low-width privilege modes, untroubled by the
existence of higher and wider modes.  Ironically, this is unnecessary in a
paged environment - nothing in Sv39 prohibits assigning 0-2GiB and -2-0GiB to
user processes - but CHERI cannot span the gap easily.

Since pcc must be representable when it is written into xepcc, pcc must be kept
in sign extended form during use, performing bounds checks under sign
extension, despite the fact that the bits above XLEN are masked off for memory
system access.

This is likely to be a major verification pitfall.  Full-width JAL and JALR can
use the CINCOFFSET pipeline to perform representability checks but there is no
CINCWOFFSET otherwise, and handling of the gap is likely to significantly
complicate bounds checks in the pc incrementer.

Software should not execute with XLEN < XLENMAX if pcc or ddc does not have
full bounds; the bounds are likely to be handled wrong anyway.  Formalizing
this in the architecture could have advantages.

6.7.3. Machine Trap Default Capability Register (mtdc)

Delete.

6.7.4. Machine Environment Configuration Register (menvcfg)

Software should set CME=0 if mstatus.SXL=0 for forward compatibility.

6.7.5. Supervisor Trap Default Capability Register (stdc)

Delete.

6.7.6. Supervisor Environment Configuration Register (senvcfg)

Software should set CME=0 if sstatus.UXL=0 for forward compatibility.

6.7.7. Default Data Capability (ddc)

There is no reason for this to hold non-addresses; make it WARL and subject to
invalid address conversion.  It does not need a reset value as discussed for
mscratchc.

7. "Zcheri_mode" Extension for CHERI Execution Mode

This is the mechanism used for the hybrid ABI.  It is much cheaper than
Zcheri_legacy and there is little reason to support one but not the other.

7.1. CHERI Execution Mode

XLEN < XLENMAX also forces Legacy mode.

Define Semi-legacy mode as the mode resulting from XLEN = XLENMAX, effective
CME = 1, pcc.M = 0.  Semi-legacy mode acts as legacy mode for the purposes of
base ISA instructions including CSR accesses, but allows new Zcheri_purecap and
Zcheri_mode instructions to be executed.  Semi-legacy mode can switch to (and
from) Capability mode using the CMODESWITCH instruction, and can use CLC/CSC to
access capabilities in memory authorized by ddc.

8.1. "Zcheri_purecap", "Zcheri_legacy" and "Zcheri_mode" Extensions for CHERI

Has anyone made an attempt to optimize the opcode assignments?

Please alphabetize this list.

8.1.2. JALR.CAP

I don't think either of these is needed.  JALR.CAP goes away with the
simplified Zcheri_legacy, JALR.PCC lacks a clear use case other than occupying
the same opcode.  Remove.

8.1.4. CMODESWITCH

This is likely to be a heavily used instruction in hybrid code, since all loads
and stores are modal a CMODESWITCH is needed between ddc-authorized accesses
and explicit capability-authorized access.  Implementations are advised to not
require a pipeline flush or other heaviweight operation.

8.1.8. CSETMODE

This instruction is likely not needed for Zcheri_mode.  Since function pointers
always carry their correct mode, the only time the mode must be set on a
capability is when the dynamic linker is initializing function pointers, which
can be done almost as easily with CBUILDCAP.  It could be useful for future
mode-like extensions.

[Filed #31]

8.1.13. CTESTSUBSET

The subset relationship on bounds needs a precise definition (is it a true
subset relationship or actually a subinterval); see 2.5.

8.1.16. CGETPERM

Use of bit position 16 may be a MIPSism.  Moving SDP below bit 12, a la FCLASS,
would permit more efficient code sequences using ANDI.  If such a change is
adopted, change CANDPERM to match.

8.1.19. CGETLEN

Clarify that extracting the length of Infinity or Null gives 0.

8.1.25. CLC

Reserved in Legacy mode; the LC operation only applies in Semi-legacy mode.

All base ISA loads support a destination of x0 (unproductively; this is not a
prefetch).  Do we have a reason for being inconsistent?

8.1.27. CSC

Reserved in Legacy mode; the LC operation only applies in Semi-legacy mode.

8.2.2. AUIPCC

Add a RV32-only instruction S2048PCC which uses a fixed -2048 instead of an
upper immediate if we want to avoid section padding in linkers.

Otherwise, document that ABI consideration.

8.7.12. PREFETCH.R.CAP

Should not generate bounds exceptions.

[already done #7]

8.8.6. SH3ADD

rs1 and rs2 are reversed relative to their definitions in the bitmanip spec;
likely unintentional breaking change.

[Filed #28]

8.8.12. SH3ADD.UW

rs1 and rs2 are reversed relative to their definitions in the bitmanip spec;
likely unintentional breaking change.

[Filed #28]

8.6. "C" Standard Extension for Compressed Instructions

Floating point loads and stores are incorrectly described as "C or Zca" instead
of "C or Zcf" or "C or Zcd".  Both are incompatible with rv64 Zcheri.

[Filed #29]

8.10. "Zcmp" Standard Extension For Code-Size Reduction

This uses the same opcode as C.SCSP and is not compatible with rv64 Zcheri.

[Filed #29]

8.11. "Zcmt" Standard Extension For Code-Size Reduction

This uses the same opcode as C.SCSP and is not compatible with rv64 Zcheri.

[Filed #29]