Skip to content

Instantly share code, notes, and snippets.

@laanwj
Last active February 4, 2021 09:37
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save laanwj/99f5f720acf5f13903515da11f6a7b34 to your computer and use it in GitHub Desktop.
Save laanwj/99f5f720acf5f13903515da11f6a7b34 to your computer and use it in GitHub Desktop.
Alphanumeric instructions on RISC-V

Alphanumeric shellcode on RISC-V

Although common on x86, it was initially believed that it was not possible to make alphanumeric shellcode for ARM. Later it turned out it was.

Similar to that, I wondered if it was possible to make alphanumeric shell-code for RISC-V.

(Basic shellcode in RISC-V Linux provides a good introduction to shellcode for RISC-V, including how to avoid NUL bytes.)

First, I enumerated all the possible instructions that could be formed from these characters with a little Rust program and generated some statistics.

Alphanumeric instructions

These are all the valid instructions in the RV32 ISA whose binary representation can be represented using alphanumeric characters ('a'..'z', 'A'..'Z', '0'..'9'), summarized.

16-bit instructions

These require the 'RV32C' extension.

addi       01xxxxxx0xxxxx01
fld        0011xxxx0xxxxxx0
flw        011xxxxx0xxxxxx0
jal        0011xxxx0xxxxx01
lui        011xxxxx0xxxxx01
lw         010xxxxx0xxxxxx0

32-bit instructions

Covers 'RV32G' (a shorthand for 'RV32IMAFD').

bgt        0xxxxxxx0xxxxxxx0100xxxx01100011
bgtu       0xxxxxxx0xxxxxxx0110xxxx01100011
bgtz       0xxxxxxx0xx100000100xxxx01100011
ble        0xxxxxxx0xxxxxxx0101xxxx01100011
bleu       0xxxxxxx0xxxxxxx0111xxxx01100011
blez       0xxxxxxx0xx100000101xxxx01100011
csrrc      0xxxxxxx0xxxxxxx0011xxxx01110011
csrrci     0xxxxxxx0xxxxxxx0111xxxx01110011
csrrsi     0xxxxxxx0xxxxxxx0110xxxx01110011
csrrwi     0xxxxxxx0xxxxxxx0101xxxx01110011
fcvt.d.q   010000100011xxxx0xxxxxxx01010011
fmadd.d    0xxxx01x0xxxxxxx0xxxxxxx01000011
fmadd.q    0xxxx11x0xxxxxxx0xxxxxxx01000011
fmadd.s    0xxxx00x0xxxxxxx0xxxxxxx01000011
fmsub.d    0xxxx01x0xxxxxxx0xxxxxxx01000111
fmsub.q    0xxxx11x0xxxxxxx0xxxxxxx01000111
fmsub.s    0xxxx00x0xxxxxxx0xxxxxxx01000111
fnmadd.d   0xxxx01x0xxxxxxx0xxxxxxx01001111
fnmadd.q   0xxxx11x0xxxxxxx0xxxxxxx01001111
fnmadd.s   0xxxx00x0xxxxxxx0xxxxxxx01001111
fnmsub.d   0xxxx01x0xxxxxxx0xxxxxxx01001011
fnmsub.q   0xxxx11x0xxxxxxx0xxxxxxx01001011
fnmsub.s   0xxxx00x0xxxxxxx0xxxxxxx01001011
j          0xxxxxxx0xxxxxxx0xx1000001101111
jal        0xxxxxxx0xxxxxxx0xxxxxxx01101111
lui        0xxxxxxx0xxxxxxx0xxxxxxx00110111
sra        010000010xxxxxxx0101xxxx00110011

RV64

The reachable instructions on RISC-V 64 bit are the same except for patterns in the compressed instructions space which have a different meaning:

 addi       01xxxxxx0xxxxx01
+addiw      0011xxxx0xxxxx01
 fld        0011xxxx0xxxxxx0
-flw        011xxxxx0xxxxxx0
-jal        0011xxxx0xxxxx01
+ld         011xxxxx0xxxxxx0
 lui        011xxxxx0xxxxx01
 lw         010xxxxx0xxxxxx0

Conclusions

My first response was "oh no! no store instructions", and although there are some instructions to get values into registers, and some to make jumps, that leaves no way

  • to do a system call directly—needs 0x73 and 0x00
  • to write a 'ecall' instruction on the stack to jump to

Approaches that might work, are:

  • if an existing offset of an ecall instruction is known it can jump there!
  • in a similar way, if it knows the offset of a memory write+ret, it could use that

This goes into Return Oriented Programming (ROP) territory, this means that the resulting shell code can never be self-contained but depends on the host application.

That's disappointing! Maybe someone has a better idea, some trick up their sleeve, but I don't see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment