Skip to content

Instantly share code, notes, and snippets.

@brson
Created January 8, 2023 23:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save brson/7b09b4564250017483a483c4bbe043c9 to your computer and use it in GitHub Desktop.
Save brson/7b09b4564250017483a483c4bbe043c9 to your computer and use it in GitHub Desktop.

Fuzzing Soroban smart contracts RFC

Recently I have been working on making it possible to fuzz Soroban contracts. Having some progress and experience now, I am writing up what I have learned and prototyped in hopes of getting feedback.

The work is currently done in the arbitrary branches of three forks:

The primary code for this work, and it's documentation, is in the arbitrary module of soroban-sdk.

I am especially hoping for feedback on:

  • the design of the two traits involved
  • the ergonomics of writing fuzz tests for Soroban contracts
  • future priorities

But am happy for any other feedback.

The scope

The problem I am initially focusing on is making it ergonomic for Soroban contract authors to write fuzz tests using the cargo-fuzz tool. Similar types of testing tools, most notably quickcheck and proptest, may be desirable to use with Soroban contracts. For the moment I am focused on cargo-fuzz, but this work may lead to compatibility with those other tools as well.

These fuzz tests run in the host Soroban environment, like Soroban unit tests, not the guest WASM environment. Fuzzing within WASM could be desirable but is a distant goal. Fuzzing of other parts of Soroban, in particular soroban-sdk and the interface between host and guest, may be desirable, but is not something I am focused on yet.

About cargo-fuzz

With cargo-fuzz one writes standalone programs that run a single fuzz test. These tests are instrumented by the cargo-fuzz tool, and run repeatedly, fed input bytes that progressively exercise more of the program as new branches are discovered. The effectiveness of this process depends on how the tests are crafted and how the input bytes are interpreted.

The cargo-fuzz driver feeds bytes to the test program, and the test program must usefully interpret these bytes. To help with this, cargo-fuzz includes a trait Arbitrary that accepts bytes and outputs Rust values for any type that implements it. My work so far has focused on implementing Arbitrary for Soroban contract types.

A fuzz test that receives arbitrary types as input might look like

#![no_main]

use libfuzzer_sys::fuzz_target;
use soroban_hello_world_contract::*;
use soroban_sdk::{symbol, vec, Env, Symbol};

fuzz_target!(|to: Symbol| {
    let env = Env::default();
    let contract_id = env.register_contract(None, HelloContract);
    let client = HelloContractClient::new(&env, &contract_id);

    let words = client.hello(&to);
    assert_eq!(words, vec![&env, symbol!("Hello"), to]);
});

If the fuzz test panics or otherwise crashes it is considered a failure.

The main problem

Most of the types that Soroban contracts might accept as input do not implement Arbitrary, so the obvious first step in making Soroban contracts fuzzable is to implement Arbitrary for every type that might be used as input to a contract.

A simple implementation of Arbitrary looks like

impl<'a> Arbitrary<'a> for BitSet {
    fn arbitrary(u: &mut Unstructured<'a>) -> arbitrary::Result<Self> {
        let bits = u64::arbitrary(u)?;
        let bits = bits & 0x0fff_ffff_ffff_ffff;

        let bitset = BitSet::try_from_u64(bits).expect("BitSet");

        Ok(bitset)
    }
}

Though most types can simply #[derive(Arbitrary)].

There are two barriers to implementing Arbitrary for Soroban contract types:

  1. Many Soroban types, including object types, cannot be constructed without access to an Env environment; and Arbitrary constructors do not have access to Env.
  2. Users can define their own types with #[contracttype] that can be serialized to storage or accepted as method arguments. These types should be able to implement Arbitrary.

Design - the SorobanArbitraryPrototype trait

I have prototyped a design that allows users to fuzz with arbitrary values of nearly all types a contract might accept as input. It requires two new traits, one to surmount each of the above barriers, defined in the arbitrary::api module of soroban-sdk.

The gist is that Soroban contract types all have a corresponding arbitrary prototype, defined by the SorobanArbitraryPrototype trait. This prototype does not require an Env, so can be generated from random bytes. A prototype can be instantiated into an Env with the existing IntoVal trait.

The trait has a somewhat complex definition:

    pub trait SorobanArbitraryPrototype: IntoVal<Env, Self::Into> {
        type Into: IntoVal<Env, RawVal>
            + TryFromVal<Env, RawVal>;
    }

The main thing to understand here is that prototypes have an associated Into type that represents the final desired Soroban contract type, and that SorobanArbitraryPrototype implements IntoVal<Env, Self::Into> so that it can be converted to that type.

The IntoVal and TryFromVal bounds on SorobanArbitraryPrototype::Into are required because those bounds are also on the Vec and Map element types.

Some Soroban contract types that do not require Env are their own prototype, like Symbol.

An easy example of how this trait is implemented is for Bytes:

    #[derive(Arbitrary, Debug)]
    pub struct ArbitraryBytes {
        vec: RustVec<u8>,
    }

    impl SorobanArbitraryPrototype for ArbitraryBytes {
        type Into = Bytes;
    }

    impl IntoVal<Env, Bytes> for ArbitraryBytes {
        fn into_val(self, env: &Env) -> Bytes {
            self.vec.into_val(env)
        }
    }

With this definition one could write a fuzz test that accepts ArbitraryBytes and converts it into Bytes:

#![no_main]

use libfuzzer_sys::fuzz_target;
use soroban_sdk::arbitrary::ArbitraryBytes;
use soroban_sdk::Bytes;
use soroban_sdk::{Env, IntoVal};

fuzz_target!(|input: ArbitraryBytes| {
    let env = Env::default();
    let input = input.into_val(&env);
    // do something with `input`
});

But that's probably not quite how we would identify the Bytes prototype because of the other trait, SorobanArbitrary.

Design - the SorobanArbitrary trait

Where SorobanArbitraryPrototype identifies a prototype that can be turned into a Soroban contract type, the SorobanArbitrary trait is a Soroban contract type, and it's entire reason for existing is to provide an associated type that names the prototype:

    pub trait SorobanArbitrary {
        type Arbitrary: SorobanArbitraryPrototype;
    }

This solves two problems. The first is related to the previous example of naming the Bytes prototype: the programmer only needs to know the name of the contract type they want to generate, and that it has an Arbitrary associated type; i.e. nobody has to know the name of every arbitrary prototype to fuzz contracts, just that SorobanArbitrary::Arbitrary exists and gives them the name of every possible prototype.

So the previous Bytes fuzzer could be written

fuzz_target!(|input: <Bytes as SorobanArbitrary>::Arbitrary| {
    let env = Env::default();
    let input = input.into_val(&env);
    // do something with `input`
});

This is more verbose, but it is consistent for every possible type. A similar consistency could be achieved through naming conventions, but I think using the type system is probably more idiomatic. For simple types like Symbol the associated type doesn't need to be used:

fuzz_target!(|input: Symbol| {
    // do something with `input`
});

The second problem this solves is related to user-defined types.

Design - deriving Arbitrary for UDTs

Contract authors can define their own storage types with the contracttype macro:

#[contracttype]
pub struct State {
    pub count: u32,
    pub last_incr: u32,
}

The macro now automatically derives a corresponding prototype that looks roughly like:

#[derive(Arbitrary, Debug)]
pub struct ArbitraryState {
    pub count: <u32 as SorabanArbitrary>::Arbitrary,
    pub last_incr: <u32 as SorabanArbitrary>::Arbitrary,
}

The SorobanArbitrary::Arbitrary associated type provides a way to easily name the correct type for each field.

Why not use the existing XDR types as prototypes?

Since the contract types are just reflections of the XDR types, which also don't require an Env, it might seem obvious to generate those XDR types and convert them to the contract types.

This would be reasonable, but I found a few reasons that disuaded me from that path.

First, it seems desirable for contract authors to write contracts in terms of the contract types, and not think about the serialized form of the type and why they should be converting between the two to write fuzz tests.

Second, there are some types for which the allowed values of the XDR types and the allowed values of the contract types are not the same. An example is Symbol, which in the XDR definitions is a typedef of StringM<10>, allowing characters that are not allowed in the contract Symbol type. Another is Status, which in XDR has a limited set of error codes allowed for each erorr type variant, but in contract code allows all error codes to be paired ith all error types. These could be considered bugs.

In some of the Arbitrary implementations I have written, namely Static and Status, I do first create the corresponding XDR type and convert it to the contract type, but I don't expose that detail in the interface.

Questions

This is pretty long already. For additional, less-orginized, thoughts I've had on this subject, see this other gist.

Feedback that would be helpful:

  • Concerns about this general approach?
  • Should I reuse IntoVal this way?
  • Where should a practical example of Soroban contract fuzzing live?
  • Where should a tutorial on Soroban contract fuzzing live?
@leighmcculloch
Copy link

Should I reuse IntoVal this way?

Yes I think so. IntoVal and TryIntoVal are intended for any conversion that needs to introduce an Env that isn't already available in the converting type.

@leighmcculloch
Copy link

leighmcculloch commented Jan 9, 2023

Where should a practical example of Soroban contract fuzzing live?

I think we should have a minimal example in the stellar/rs-soroban-sdk repo, as a test vector in the tests/ directory. We have contracts in two places in this repo:

  • tests/ - Test vectors that compile as standalone programs.
  • soroban-sdk/src/tests/ - Test contracts that aren't compiled as standalone programs.

Where should a tutorial on Soroban contract fuzzing live?

Then I think we should have a more complete example in the stellar/soroban-examples repo. Ideally that example would build on the increment example that is used in other examples, but if that's not a good example lets not force it and it can be a new contract.

Thoughts @paulbellamy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment