Some of the biggest concerns I have about correctness:
-
errors in assembly code
These are unfortunately hard to find in an automated way.
-
boundary conditions
Especially in every field of every instruction, but also in memory accesses.
-
alignment
Alignment errors are extremely common, and I haven't convinced myself that CKB-VM can't be coerced into e.g. executing native assembly instructions on misaligned data.
-
out of bounds accesses
One of the worst.
Here are some concrete, actionable tasks, to validate CKB-VM, in a reasonable order of priority.
Both riscv-tests and riscv-compliance are on 2019 revisions.
Important tests have probably been added since.
I also ran into problems building the test suite that would have been solved on newer reversions.
miri is an obvious one, but with so much assembly it's not going to get great covage. Valgrind might turn up something interesting.
If miri can give useful coverage, it might be added to CI.
The in-tree fuzz project only has the simplest test case. More test cases would probably help the fuzzer dig deeper.
Fuzzers work best with a good starting corpus. Fuzzing under CI at least is not using any corpus, so it is probably not exercising the VM much.
The in-tree fuzz project only uses llvm's libfuzzer. AFL and honggfuzz-based fuzzers may exercise different code paths.
To find unknown bugs we can compare CKB-VM's behavor to other RISC-V VMs. Mahonson is exploring this in his rfuzzing project, but we can also add it to the in-tree fuzz project.
The simplest way to compare is just to check the final result: did the program succeed or fail; but there are other things we could imagine comparing:
- the complete memory space after execution
- register state after execution
- register state after every instruction step
Mops are an optimization and the VM needs to behave exactly the same with and without them.
Compare CKB-VM against spike on code pulled from crates.io or other sources.
I haven't thought too much about this,
but I notice that the assembly for each interpreter op
follows patterns,
and I have seen patches from mahanson
that fix errors in those patterns
(like missing .p2align
directives)
We could try to write a custom analysis
to enforce those patterns.
The VM uses a lot of low level bit manipulation. A lot of that is for performance or C/asm interop, but probably not all of it.
It might be worthwhile to look for places that could benefit from stronger typing.
Porting to other architectures can often reveal incorrect assumptions, sometimes with an impact on the correctness of the original architecture.
These nodes do expensive extra validation, in order to catch unknown bugs live.
ex:
- run both asm and rust interpreters, and verify they get the same result
- run both ckb-vm and a spike backend ...
- run ckb-vm itself inside a VM and look for suspicious memory accesses