brson/soroban-fuzzing-rfc.md

## soroban-fuzzing-rfc.md

      
    Raw
  

              soroban-fuzzing-rfc.md
            
          
    Fuzzing Soroban smart contracts RFC

Recently I have been working on making it possible to fuzz Soroban contracts.
Having some progress and experience now, I am writing up what I have learned and
prototyped in hopes of getting feedback.
The work is currently done in the arbitrary branches of three forks:

rs-soroban-sdk
rs-soroban-env
soroban-examples

The primary code for this work, and it's documentation,
is in the arbitrary module of soroban-sdk.
I am especially hoping for feedback on:

the design of the two traits involved
the ergonomics of writing fuzz tests for Soroban contracts
future priorities

But am happy for any other feedback.
The scope

The problem I am initially focusing on is making it ergonomic for Soroban
contract authors to write fuzz tests using the cargo-fuzz tool. Similar
types of testing tools, most notably quickcheck and proptest, may be
desirable to use with Soroban contracts. For the moment I am focused on
cargo-fuzz, but this work may lead to compatibility with those other tools as
well.
These fuzz tests run in the host Soroban environment, like Soroban unit tests,
not the guest WASM environment. Fuzzing within WASM could be desirable but is a
distant goal. Fuzzing of other parts of Soroban, in particular soroban-sdk and
the interface between host and guest, may be desirable, but is not something I
am focused on yet.
About cargo-fuzz

With cargo-fuzz one writes standalone programs that run a single fuzz test.
These tests are instrumented by the cargo-fuzz tool, and run repeatedly, fed
input bytes that progressively exercise more of the program as new branches are
discovered. The effectiveness of this process depends on how the tests are
crafted and how the input bytes are interpreted.
The cargo-fuzz driver feeds bytes to the test program, and the test program
must usefully interpret these bytes. To help with this, cargo-fuzz
includes a trait Arbitrary that accepts bytes and outputs Rust values
for any type that implements it. My work so far has focused on implementing
Arbitrary for Soroban contract types.
A fuzz test that receives arbitrary types as input might look like
#![no_main]

use libfuzzer_sys::fuzz_target;
use soroban_hello_world_contract::*;
use soroban_sdk::{symbol, vec, Env, Symbol};

fuzz_target!(|to: Symbol| {
    let env = Env::default();
    let contract_id = env.register_contract(None, HelloContract);
    let client = HelloContractClient::new(&env, &contract_id);

    let words = client.hello(&to);
    assert_eq!(words, vec![&env, symbol!("Hello"), to]);
});
If the fuzz test panics or otherwise crashes it is considered a failure.
The main problem

Most of the types that Soroban contracts might accept as input do not implement
Arbitrary, so the obvious first step in making Soroban contracts fuzzable is
to implement Arbitrary for every type that might be used as input to a
contract.
A simple implementation of Arbitrary looks like
impl<'a> Arbitrary<'a> for BitSet {
    fn arbitrary(u: &mut Unstructured<'a>) -> arbitrary::Result<Self> {
        let bits = u64::arbitrary(u)?;
        let bits = bits & 0x0fff_ffff_ffff_ffff;

        let bitset = BitSet::try_from_u64(bits).expect("BitSet");

        Ok(bitset)
    }
}
Though most types can simply #[derive(Arbitrary)].
There are two barriers to implementing Arbitrary for Soroban contract types:

Many Soroban types, including object types, cannot be constructed without
access to an Env environment; and Arbitrary constructors do not have
access to Env.
Users can define their own types with #[contracttype] that can be
serialized to storage or accepted as method arguments.
These types should be able to implement Arbitrary.

Design - the SorobanArbitraryPrototype trait

I have prototyped a design that allows users to fuzz with arbitrary values of
nearly all types a contract might accept as input. It requires two new traits,
one to surmount each of the above barriers, defined in the arbitrary::api
module of soroban-sdk.
The gist is that Soroban contract types all have a corresponding arbitrary
prototype, defined by the SorobanArbitraryPrototype trait. This prototype
does not require an Env, so can be generated from random bytes. A prototype
can be instantiated into an Env with the existing IntoVal trait.
The trait has a somewhat complex definition:
    pub trait SorobanArbitraryPrototype: IntoVal<Env, Self::Into> {
        type Into: IntoVal<Env, RawVal>
            + TryFromVal<Env, RawVal>;
    }
The main thing to understand here is that prototypes have an associated Into
type that represents the final desired Soroban contract type, and that
SorobanArbitraryPrototype implements IntoVal<Env, Self::Into> so that it can
be converted to that type.
The IntoVal and TryFromVal bounds on SorobanArbitraryPrototype::Into are
required because those bounds are also on the Vec and Map element types.
Some Soroban contract types that do not require Env are their
own prototype, like Symbol.
An easy example of how this trait is implemented is for Bytes:
    #[derive(Arbitrary, Debug)]
    pub struct ArbitraryBytes {
        vec: RustVec<u8>,
    }

    impl SorobanArbitraryPrototype for ArbitraryBytes {
        type Into = Bytes;
    }

    impl IntoVal<Env, Bytes> for ArbitraryBytes {
        fn into_val(self, env: &Env) -> Bytes {
            self.vec.into_val(env)
        }
    }
With this definition one could write a fuzz test
that accepts ArbitraryBytes and converts it into Bytes:
#![no_main]

use libfuzzer_sys::fuzz_target;
use soroban_sdk::arbitrary::ArbitraryBytes;
use soroban_sdk::Bytes;
use soroban_sdk::{Env, IntoVal};

fuzz_target!(|input: ArbitraryBytes| {
    let env = Env::default();
    let input = input.into_val(&env);
    // do something with `input`
});
But that's probably not quite how we would identify
the Bytes prototype because of the other trait,
SorobanArbitrary.
Design - the SorobanArbitrary trait

Where SorobanArbitraryPrototype identifies a prototype that can be turned into
a Soroban contract type, the SorobanArbitrary trait is a Soroban contract
type, and it's entire reason for existing is to provide
an associated type that names the prototype:
    pub trait SorobanArbitrary {
        type Arbitrary: SorobanArbitraryPrototype;
    }
This solves two problems. The first is related to the previous example of naming
the Bytes prototype: the programmer only needs to know the name of the
contract type they want to generate, and that it has an Arbitrary associated
type; i.e. nobody has to know the name of every arbitrary prototype to fuzz
contracts, just that SorobanArbitrary::Arbitrary exists and gives them the
name of every possible prototype.
So the previous Bytes fuzzer could be written
fuzz_target!(|input: <Bytes as SorobanArbitrary>::Arbitrary| {
    let env = Env::default();
    let input = input.into_val(&env);
    // do something with `input`
});
This is more verbose, but it is consistent for every possible type.
A similar consistency could be achieved through naming conventions,
but I think using the type system is probably more idiomatic.
For simple types like Symbol the associated type doesn't need to be used:
fuzz_target!(|input: Symbol| {
    // do something with `input`
});
The second problem this solves is related to user-defined types.
Design - deriving Arbitrary for UDTs

Contract authors can define their own storage types with the contracttype macro:
#[contracttype]
pub struct State {
    pub count: u32,
    pub last_incr: u32,
}
The macro now automatically derives a corresponding prototype that looks roughly like:
#[derive(Arbitrary, Debug)]
pub struct ArbitraryState {
    pub count: <u32 as SorabanArbitrary>::Arbitrary,
    pub last_incr: <u32 as SorabanArbitrary>::Arbitrary,
}
The SorobanArbitrary::Arbitrary associated type provides
a way to easily name the correct type for each field.
Why not use the existing XDR types as prototypes?

Since the contract types are just reflections of the XDR types, which also don't
require an Env, it might seem obvious to generate those XDR types and convert
them to the contract types.
This would be reasonable, but I found a few reasons that disuaded me from that path.
First, it seems desirable for contract authors to write contracts in terms of
the contract types, and not think about the serialized form of the type and why
they should be converting between the two to write fuzz tests.
Second, there are some types for which the allowed values of the XDR types and
the allowed values of the contract types are not the same. An example is
Symbol, which in the XDR definitions is a typedef of StringM<10>, allowing
characters that are not allowed in the contract Symbol type. Another is
Status, which in XDR has a limited set of error codes allowed for each erorr
type variant, but in contract code allows all error codes to be paired ith all
error types. These could be considered bugs.
In some of the Arbitrary implementations I have written, namely Static and
Status, I do first create the corresponding XDR type and convert it to the
contract type, but I don't expose that detail in the interface.
Questions

This is pretty long already. For additional, less-orginized, thoughts I've had
on this subject, see this other gist.
Feedback that would be helpful:

Concerns about this general approach?
Should I reuse IntoVal this way?
Where should a practical example of Soroban contract fuzzing live?
Where should a tutorial on Soroban contract fuzzing live?