mayoff/libfuzzer.md

## libfuzzer.md

      
    Raw
  

              libfuzzer.md
            
          
    Using LLVM's libfuzzer with Swift

Background

Of the many tools available for fuzzing, I've found that LLVM's libfuzzer is the easiest to use for Swift on macOS. Other tools seem to have a list of requirements taht are difficult to meet. libfuzzer on the other hand is built into LLVM and is available on macOS in the custom Swift toolchains: https://www.swift.org/download/
In this document I'll describe how to use libfuzzer with Swift and Swift Packages.
I used this setup to fuzz an SVG Renderer package that I am building. I was able to find and fix a number of bugs in my SVG parsing code using libfuzzer in basically no time at all.
How to use libfuzzer

libfuzzer is a special llvm sanitizer mode that you invoke from the command line when building your project. When invoked, libfuzzer will run a C function you provide called LLVMFuzzerTestOneInput. This function will be called repeatedly with different fuzzed inputs. In my case I used this function to parse SVG documents and render them to (offscreen) images. When built - this project is a command line tool that provides a front end to libfuzzer - you pass flags to libfuzzer directly from the command line.
Swift Toolchain

libfuzzer is not built into Xcode's llvm toolchain but is built into custom toolchains available on swift.org. Download and install the latest macOS Swift toolchain. (I downloaded and installed the latest Xcode 14.3 compatible toolchain).
Swift Package Manager

To integrate fuzzing with my project I created a custom executable target in my Package.swift. The actual logic of my SVG parser is in the sibling SVG target marked as a dependency of the fuzzer target.
        .executableTarget(
            name: "SVGFuzzer",
            dependencies: [
                "SVG",
            ]
        ),
Swift LLVMFuzzerTestOneInput

The executable target contained one source file: SVGFuzzer.swift. This file contains fuzzing function and uses the @_cdecl attribute to make it available to libfuzzer with the correct name.
This source file also contains a main entry point. Note that this main entry point is only used when NOT building against libfuzzer. Libfuzzer provides its own main. To control this behavior I define a FUZZ flag that I can pass in from the command line (note to self: this may not be necessary)
One important thing to note is that your Swift code needs to be wrapped in an autoreleasePool. This is because libfuzzer will call your fuzzing function repeatedly and you don't want to leak memory between runs. Your fuzzing sessions will quickly run out of memory if you don't do this. Similarly it is best to test for, and solve leaks before running libfuzzer.
import Foundation
import SVG
import SwiftUI

@available(macOS 13, *)
@_cdecl("LLVMFuzzerTestOneInput")
@MainActor
public func testOneInput(_ start: UnsafeRawPointer, _ count: Int) -> CInt {
    autoreleasepool {
        let bytes = UnsafeRawBufferPointer(start: start, count: count)
        do {
            let document = try SVGDocument(data: Data(bytes))
            _ = Image(svg: document, size: CGSize(width: 1, height: 1)).nsImage
            return 0
        }
        catch {
            return -1
        }
    }
}

#if !FUZZ
@main
enum SVGFuzzer {
    static func main() {
        print("You should not run SVGFuzzer.main")
    }
}
#endif
Building

I built my project with the following command line. The -parse-as-library flag is in effect telling the toolchain not to worry about main - which as noted above is provided by libfuzzer.
Note that --toolchain flag used to specify the toolchain downloaded and installed previously.
xcrun --toolchain swift swift build -v -c debug -Xswiftc -DFUZZ -Xswiftc -sanitize=fuzzer,address -Xswiftc -parse-as-library
Running

Once built, you can run the fuzzer with the following command line.
../.build/debug/SVGFuzzer -fork=16 -ignore_ooms=1 -ignore_timeouts=1 -dict=XML.dict Corpus-New Corpus
See the libfuzzer documentation for more information on the various ways you can invoke libfuzzer. One thing to note is that I provide two Corpus directories. The first corpus directory passed to libfuzzer is the one where it will save new generated test cases. The second corpus directory, in my case contains a few hundred test cases (in my case sample SVG files I've been tested against manually and all the SVG files from a standard SVG test suite). Libfuzzer will use its existing corpus of data to generate new test cases. You can decide not to provide a corpuse and it will generate purely random test cases. libfuzzer is much more effective when it has an existing corpus of data to work with.
I also provide a dictionary file. This is a text file containing a list of terms that libfuzzer will use to generate new test cases. libfuzzer dictionaries are in the same format as AFL (American Fuzzy Lop) dictionaries and I copied this dictionary from the AFL repository.
Upon running libfuzzer will start generating test cases. It will save these to the (first) corpus directory. Because libfuzzer uses code coverage to understand if test cases are interesting or not it will only save test cases when it has found new code paths (i.e. it wont save multiple test cases that exercise the same code).
In my case it generates a few hundred test cases in maybe 15 minutes of running on 16 cores.
When libfuzzer finds a test case that causes a problem - either a hang, an out of memory event or a crash it will (should: see issues) save the test case to the current working directory and exit. You can then use the crash logs (in ~/Library/Logs/DiagnosticReports/ or use Console.app) and the test case to debug and fix the problem. Rebuild libfuzzer and run it again.
Example output (just started a run):
cd Fuzzing; ../.build/debug/SVGFuzzer -fork=16 -ignore_ooms=1 -ignore_timeouts=1 -dict=XML.dict Corpus-New Corpus ../"Sample Data"
Dictionary: 60 entries
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2711971732
INFO: Loaded 1 modules   (32710 inline 8-bit counters): 32710 [0x100c2c700, 0x100c346c6),
INFO: Loaded 1 PC tables (32710 PCs): 32710 [0x100c346c8,0x100cb4328),
INFO: -fork=16: fuzzing in separate process(s)
INFO: -fork=16: 720 seed inputs, starting to fuzz in /var/folders/8r/g2kfb_dd1p1b5vbnnlvh0d0m0000gn/T//libFuzzerTemp.FuzzWithFork43488.dir
#61: cov: 25711 ft: 25711 corp: 720 exec/s 30 oom/timeout/crash: 0/0/0 time: 32s job: 1 dft_time: 0
#137: cov: 25711 ft: 25711 corp: 720 exec/s 25 oom/timeout/crash: 0/0/0 time: 33s job: 2 dft_time: 0
  NEW_FUNC: 0x100907b60 in __swift_instantiateConcreteTypeFromMangledName <compiler-generated>
  NEW_FUNC: 0x100907cbc in __swift_instantiateConcreteTypeFromMangledNameAbstract <compiler-generated>
  NEW_FUNC: 0x10090a5fc in outlined init with copy of String? <compiler-generated>
  NEW_FUNC: 0x10090a68c in outlined destroy of String? <compiler-generated>
  NEW_FUNC: 0x10090a6c4 in outlined destroy of String <compiler-generated>
  NEW_FUNC: 0x10090bd48 in lazy protocol witness table accessor for type String and conformance String <compiler-generated>
  NEW_FUNC: 0x10090bdec in type metadata accessor for NSScanner <compiler-generated>
  NEW_FUNC: 0x10090be8c in NSScanner.__allocating_init(string:) <compiler-generated>
  NEW_FUNC: 0x10090bee0 in outlined copy of CSSSelector? <compiler-generated>
  NEW_FUNC: 0x10090bf78 in outlined destroy of CSSSelector? <compiler-generated>
  NEW_FUNC: 0x10090c0a0 in outlined consume of CSSSelector? <compiler-generated>
  NEW_FUNC: 0x10090e97c in outlined destroy of DefaultStringInterpolation <compiler-generated>
  NEW_FUNC: 0x10090f79c in outlined destroy of [String] <compiler-generated>

Throughout a run you'll see log lines like this:
#212798: cov: 29473 ft: 25950 corp: 756 exec/s 256 oom/timeout/crash: 0/0/0 time: 126s job: 46 dft_time: 0

This is libfuzzer reporting its progress. It has generated 756 test cases and is currently running at 256 test cases per second. It has been running for 126 seconds.
Resource Usage

While for me - it found issues in just a few minutes of running. Expecting fuzzing to be a task that can take hours. I run this on a 20 Core M1 Ultra Mac Studio with 128GB of RAM. It's able to generate hundreds of test cases per second.
Results

For me it was able to find 9 crashing bugs in a few minutes of running. Some of these issues were obvious (fatalError() left in place to handle unimplemented features) but some where more subtle. All of the more subtle issues were found in lines that had full coverage of traditional unit tests. Now that these 9 low hanging fruit issues have been fixed libfuzzer will run for much longer before hitting out of memory issues. I'm unsure how to progress past this (see below).
Issues

Out of Memory Errors

Despite wrapping my code in an autoreleasePool and testing for leaks using other tools - libfuzzer still reports out of memory errors and stops running after ~15 minutes. The test case it outputs does not cause an out of memory error when run manually and I believe this test case is merely the last test case that libfuzzer generated before it ran out of memory.
Update: After more testing and instrumenting I've confirmed that my code isn't leaking under testing. Increasing the memory limit for libfuzzer to 4GB seems to solve the issue and I've been able to run for over an hour without an out of memory error.
Crash files aren't written

For me, libfuzzer is not writing the test case that causes the crash. I have been able to diagnose and solve issues from the crash logs but it seems like a bug in libfuzzer is preventing these crashes from being output. Out of memory and hang test cases are being written correctly.
Take Aways/Questions

libfuzzer seems to be the easiest fuzzing method to use for Swift, with fewer dependencies or requirements. It is relatively straightforward to get working but there are some subtleties that I hope this document hopefully helps with.
If I can get past the out of memory issues and the lack of output for crashes I believe I should be able to find more bugs. Even with the relatively short amount of time I've spent on this I've found a few bugs I may not have found otherwise.
It's unclear if providing just a dictionary of XML tags and attributes will help libfuzzer find more bugs with SVG. I should probably create a dictionary of SVG specific tags and attributes.
It's unclear if -ignore_ooms=1 works.
It may make sense to fuzz different parts of the codebase seperately. For example the CSS parser and process is a good candidate for independent fuzzing (and a CSS dictionary could be provided).
If anyone with more experience with libfuzzer has any suggestions on how to get past these issues I'd love to hear them.
Links

General Wikipedia page on Fuzzing
: https://en.wikipedia.org/wiki/Fuzzing
LLVM Libfuzzer documentation
: https://www.llvm.org/docs/LibFuzzer.html
Using libfuzzer with Swift
: https://github.com/apple/swift/blob/main/docs/libFuzzerIntegration.md
A somewhat confusing blog post on using libfuzzer with Swift
: https://grayson.github.io/ipspatcherFuzzer/