Recently I have been working on making it possible to fuzz Soroban contracts. Having some progress and experience now, I am writing up what I have learned and prototyped in hopes of getting feedback.
The work is currently done in the arbitrary
branches of three forks:
The primary code for this work, and it's documentation,
is in the arbitrary
module of soroban-sdk
.
I am especially hoping for feedback on:
- the design of the two traits involved
- the ergonomics of writing fuzz tests for Soroban contracts
- future priorities
But am happy for any other feedback.
The problem I am initially focusing on is making it ergonomic for Soroban
contract authors to write fuzz tests using the cargo-fuzz
tool. Similar
types of testing tools, most notably quickcheck
and proptest
, may be
desirable to use with Soroban contracts. For the moment I am focused on
cargo-fuzz
, but this work may lead to compatibility with those other tools as
well.
These fuzz tests run in the host Soroban environment, like Soroban unit tests,
not the guest WASM environment. Fuzzing within WASM could be desirable but is a
distant goal. Fuzzing of other parts of Soroban, in particular soroban-sdk
and
the interface between host and guest, may be desirable, but is not something I
am focused on yet.
With cargo-fuzz
one writes standalone programs that run a single fuzz test.
These tests are instrumented by the cargo-fuzz
tool, and run repeatedly, fed
input bytes that progressively exercise more of the program as new branches are
discovered. The effectiveness of this process depends on how the tests are
crafted and how the input bytes are interpreted.
The cargo-fuzz
driver feeds bytes to the test program, and the test program
must usefully interpret these bytes. To help with this, cargo-fuzz
includes a trait Arbitrary
that accepts bytes and outputs Rust values
for any type that implements it. My work so far has focused on implementing
Arbitrary
for Soroban contract types.
A fuzz test that receives arbitrary types as input might look like
#![no_main]
use libfuzzer_sys::fuzz_target;
use soroban_hello_world_contract::*;
use soroban_sdk::{symbol, vec, Env, Symbol};
fuzz_target!(|to: Symbol| {
let env = Env::default();
let contract_id = env.register_contract(None, HelloContract);
let client = HelloContractClient::new(&env, &contract_id);
let words = client.hello(&to);
assert_eq!(words, vec![&env, symbol!("Hello"), to]);
});
If the fuzz test panics or otherwise crashes it is considered a failure.
Most of the types that Soroban contracts might accept as input do not implement
Arbitrary
, so the obvious first step in making Soroban contracts fuzzable is
to implement Arbitrary
for every type that might be used as input to a
contract.
A simple implementation of Arbitrary
looks like
impl<'a> Arbitrary<'a> for BitSet {
fn arbitrary(u: &mut Unstructured<'a>) -> arbitrary::Result<Self> {
let bits = u64::arbitrary(u)?;
let bits = bits & 0x0fff_ffff_ffff_ffff;
let bitset = BitSet::try_from_u64(bits).expect("BitSet");
Ok(bitset)
}
}
Though most types can simply #[derive(Arbitrary)]
.
There are two barriers to implementing Arbitrary
for Soroban contract types:
- Many Soroban types, including object types, cannot be constructed without
access to an
Env
environment; andArbitrary
constructors do not have access toEnv
. - Users can define their own types with
#[contracttype]
that can be serialized to storage or accepted as method arguments. These types should be able to implementArbitrary
.
I have prototyped a design that allows users to fuzz with arbitrary values of
nearly all types a contract might accept as input. It requires two new traits,
one to surmount each of the above barriers, defined in the arbitrary::api
module of soroban-sdk
.
The gist is that Soroban contract types all have a corresponding arbitrary
prototype, defined by the SorobanArbitraryPrototype
trait. This prototype
does not require an Env
, so can be generated from random bytes. A prototype
can be instantiated into an Env
with the existing IntoVal
trait.
The trait has a somewhat complex definition:
pub trait SorobanArbitraryPrototype: IntoVal<Env, Self::Into> {
type Into: IntoVal<Env, RawVal>
+ TryFromVal<Env, RawVal>;
}
The main thing to understand here is that prototypes have an associated Into
type that represents the final desired Soroban contract type, and that
SorobanArbitraryPrototype
implements IntoVal<Env, Self::Into>
so that it can
be converted to that type.
The IntoVal
and TryFromVal
bounds on SorobanArbitraryPrototype::Into
are
required because those bounds are also on the Vec
and Map
element types.
Some Soroban contract types that do not require Env
are their
own prototype, like Symbol
.
An easy example of how this trait is implemented is for Bytes
:
#[derive(Arbitrary, Debug)]
pub struct ArbitraryBytes {
vec: RustVec<u8>,
}
impl SorobanArbitraryPrototype for ArbitraryBytes {
type Into = Bytes;
}
impl IntoVal<Env, Bytes> for ArbitraryBytes {
fn into_val(self, env: &Env) -> Bytes {
self.vec.into_val(env)
}
}
With this definition one could write a fuzz test
that accepts ArbitraryBytes
and converts it into Bytes
:
#![no_main]
use libfuzzer_sys::fuzz_target;
use soroban_sdk::arbitrary::ArbitraryBytes;
use soroban_sdk::Bytes;
use soroban_sdk::{Env, IntoVal};
fuzz_target!(|input: ArbitraryBytes| {
let env = Env::default();
let input = input.into_val(&env);
// do something with `input`
});
But that's probably not quite how we would identify
the Bytes
prototype because of the other trait,
SorobanArbitrary
.
Where SorobanArbitraryPrototype
identifies a prototype that can be turned into
a Soroban contract type, the SorobanArbitrary
trait is a Soroban contract
type, and it's entire reason for existing is to provide
an associated type that names the prototype:
pub trait SorobanArbitrary {
type Arbitrary: SorobanArbitraryPrototype;
}
This solves two problems. The first is related to the previous example of naming
the Bytes
prototype: the programmer only needs to know the name of the
contract type they want to generate, and that it has an Arbitrary
associated
type; i.e. nobody has to know the name of every arbitrary prototype to fuzz
contracts, just that SorobanArbitrary::Arbitrary
exists and gives them the
name of every possible prototype.
So the previous Bytes
fuzzer could be written
fuzz_target!(|input: <Bytes as SorobanArbitrary>::Arbitrary| {
let env = Env::default();
let input = input.into_val(&env);
// do something with `input`
});
This is more verbose, but it is consistent for every possible type.
A similar consistency could be achieved through naming conventions,
but I think using the type system is probably more idiomatic.
For simple types like Symbol
the associated type doesn't need to be used:
fuzz_target!(|input: Symbol| {
// do something with `input`
});
The second problem this solves is related to user-defined types.
Contract authors can define their own storage types with the contracttype
macro:
#[contracttype]
pub struct State {
pub count: u32,
pub last_incr: u32,
}
The macro now automatically derives a corresponding prototype that looks roughly like:
#[derive(Arbitrary, Debug)]
pub struct ArbitraryState {
pub count: <u32 as SorabanArbitrary>::Arbitrary,
pub last_incr: <u32 as SorabanArbitrary>::Arbitrary,
}
The SorobanArbitrary::Arbitrary
associated type provides
a way to easily name the correct type for each field.
Since the contract types are just reflections of the XDR types, which also don't
require an Env
, it might seem obvious to generate those XDR types and convert
them to the contract types.
This would be reasonable, but I found a few reasons that disuaded me from that path.
First, it seems desirable for contract authors to write contracts in terms of the contract types, and not think about the serialized form of the type and why they should be converting between the two to write fuzz tests.
Second, there are some types for which the allowed values of the XDR types and
the allowed values of the contract types are not the same. An example is
Symbol
, which in the XDR definitions is a typedef of StringM<10>
, allowing
characters that are not allowed in the contract Symbol
type. Another is
Status
, which in XDR has a limited set of error codes allowed for each erorr
type variant, but in contract code allows all error codes to be paired ith all
error types. These could be considered bugs.
In some of the Arbitrary
implementations I have written, namely Static
and
Status
, I do first create the corresponding XDR type and convert it to the
contract type, but I don't expose that detail in the interface.
This is pretty long already. For additional, less-orginized, thoughts I've had on this subject, see this other gist.
Feedback that would be helpful:
- Concerns about this general approach?
- Should I reuse
IntoVal
this way? - Where should a practical example of Soroban contract fuzzing live?
- Where should a tutorial on Soroban contract fuzzing live?
Yes I think so.
IntoVal
andTryIntoVal
are intended for any conversion that needs to introduce anEnv
that isn't already available in the converting type.