Marcondiro/google_summer_of_code_2024_libafl.md

## google_summer_of_code_2024_libafl.md

      
    Raw
  

              google_summer_of_code_2024_libafl.md
            
          
    Integrate Intel PT tracing into LibAFL QEMU

A Google Summer of Code 2024 Project with the AFLplusplus Organization

Mentor: @rmalmain

Project repository: https://github.com/AFLplusplus/LibAFL/
Note: This report is not intended to be purely technical documentation of the developed code.
Instead, its goal is to describe my GSoC 2024 contributor experience, link to the developed code, summarize the current state of the project, and outline the challenges and lessons learned.
Project Description

LibAFL is a fuzzing library that covers a wide range of use cases. One of its components, LibAFL QEMU¹, provides an API for fuzzing programs executed within QEMU, a popular open-source emulator and virtualizer.
Until now, LibAFL QEMU has only supported emulation via QEMU's Tiny Code Generator (TCG) emulator. This project aims to enhance the LibAFL QEMU component by integrating Intel® Processor Trace (PT)² tracing capabilities. Intel PT is a hardware feature that captures various runtime information, including the outcomes of conditional branches. This allows the tracer to reconstruct the execution flow of a program, making it a valuable tool for fuzzing. With Intel PT, LibAFL QEMU will be able to use virtualization through QEMU-KVM, leveraging the power of hardware-assisted virtualization for faster and more efficient fuzzing.
What work was done

Study of Related Work

The first step was to study different projects related to fuzzing and/or Intel PT. Initially, the plan was to adapt the fuzzer kAFL/Nyx³ to LibAFL. However, after investigating the Intel PT driver available in the Linux kernel, it was decided to take a different approach and build on top of it. This driver is part of the perf suite and can be accessed from userspace through the perf syscalls.
Leveraging Intel PT is not a novel idea in fuzzing, some projects, in addition to kAFL/Nyx³, already take advantage of it ⁴⁵⁶.
While investigating the Linux kernel driver code, I found a little bug for which I sent a patch that got merged (and back-ported as well!).
It was my first contribution to the Linux kernel! 🎉
This gave me the chance to learn how the Linux kernel development process works.
First Proof of Concept (POC)

To test the feasibility of using the built-in Linux kernel driver, a POC was developed by working directly on the QEMU codebase in C. The code, available in this fork, interacts with the kernel through syscalls, memmaps, and ioctls to manage Intel PT throughout the VM lifecycle. The modified QEMU is capable of tracing the execution of a KVM VM (with a single CPU core) using Intel PT. As a test, a small bootloader was written and run in the modified QEMU, and the collected Intel PT traces accurately showed the execution flow. This confirmed that it is possible to trace a QEMU-KVM instance with Intel PT, using the perf driver under the hood. The main advantages of this approach with respect to kAFL/Nyx are that it does not require a custom kernel, avoids the need to code and maintain a complex driver, and relies on a mature implementation.
Preparing LibAFL for different accelerators support

As mentioned earlier, LibAFL only supported the QEMU TCG emulator. To support the KVM accelerator, the LibAFL QEMU API was enhanced to allow users to set up the QEMU instance programmatically in a more structured way, including the option to choose an accelerator. This enhancement was implemented in PR 2339.
Porting the code to Rust in LibAFL

The next step was to port the code developed in C within the QEMU project to Rust within the LibAFL project. This was addressed in PR 2471. The C nature of the kernel made this task challenging, especially when dealing with shared memory and code from the Linux kernel. The goal of this implementation was to create a safe and general-purpose interface rather than a target-specific Proof of Concept. At this stage, the decoding of the traces was introduced, which involved converting compressed branch information (taken/not taken, destination address, ...) into a sequence of executed program basic blocks.
LibAFL QEMU defines a modules system that allows the library user to compose its fuzzer with the desired modules.
Therefore, I started setting up the module responsible for setting up the hooks (sort of callbacks) which goal is essentially to call the lower level functions at right moment during fuzzing.
This part is still to be refined and completed, and I will continue working on this in the coming months after GSoC.
If you want to know more, you can have a look at the code in Contributions!
Contributions


LibAFL: qemu: Add QemuConfig to set qemu args via a struct
LibAFL: WIP qemu: Add kvm+Intel PT fuzzing capability
Other minor contributions:

Linux: perf/x86/intel/pt: Fix topa_entry base length
LibAFL: bolts: fix warning about error_in_core now stable
LibAFL: QEMU fix failing Doc-tests
My Gists: How to Enable Intel PT (Processor Trace) in QEMU-KVM VMs
LibAFL: Uniform deps versions: do not use caret requirements


What's left to do

There are still some tasks left to do, and several improvements can be made:

 Complete and test the IntelPTModule
 Handle full PT buffers to avoid losing traces
 Implement Intel PT filtering by CR3 value
 Use memory snapshots for faster executions
 Improve traces decoding

 by tweaking Intel PT parameters (like PSB frequency)
 by trying to use libxdc


 Support parallel fuzzing

Acknowledgments

I'd like to thank my mentor, Romain, and all the LibAFL ppl.
I look forward to continue working together!
Additional References

Here are some additional references, more or less related to the project, that I found useful or came across during my GSoC:

AFL++ Coresight mode
Honggfuzz + Intel PT (blogpost)
Microsoft Windows IntelPT
Honeybee: faster PT for Honggfuzz
perf_event_open syscall
Perf wiki
perf-intel-pt man page
lldb intel PT using perf kernel module
related RFC
C bitfields & integer promotion
QEMU

Build system
Qemu Object Model
TCG
C standard, implementation defined and undefined behaviors
Invocation (command line args)


kAFL/Nyx

QEMU-Nyx
kAFL.qemu
KVM-Nyx
kAFL github
kAFL docs
kAFL linux
nyx-fuzz
KVM-nyx intel pt


Footnotes


LibAFL QEMU: A Library for Fuzzing-oriented Emulation, 2024, Romain Malmain and Andrea Fioraldi and Aurélien Francillon ↩


Intel® 64 and IA-32 Architectures Software Developer Manuals, Volume 3C, Chapter 33 ↩


Nyx: Greybox Hypervisor Fuzzing using Fast Snapshots and Affine Types, 2021,  Sergej Schumilo, Cornelius Aschermann, Ali Abbasi, Simon Wörner, and Thorsten Holz ↩ ↩²


Honggfuzz fuzzer, Intel PT code ↩


ptfuzzer ↩


Ptrix ↩