chp-io/gsoc20_final_work_product.md

## gsoc20_final_work_product.md

      
    Raw
  

              gsoc20_final_work_product.md
            
          
    SVmiDbg: Hypervisor VM Exit Events needed for stealthy debugging with LibVMI and Bareflank Boxy/MicroV

Table of Contents 


1. Introduction
2. LibVMI with Bareflank hypervisor / Boxy / MicroV

2.1. Previous work

2.1.1. Features
2.1.2. Limitations


2.2. My work during GSoC20

2.2.1. Work

2.2.1.1. Porting the previous work to Boxy
2.2.1.2. VMILinux: A minimal Linux VM for LibVMI with Buildroot
2.2.1.3. MicroV: A new ABI replacing Boxy
2.2.1.4. VMI Events: MTF (Single-steps), EPT (Memory events) and Control Register events
2.2.1.5. Multiple serial passthrough
2.2.1.6. Extra work
2.2.1.7. Documentation
2.2.1.8. Final GSoC20 branch


2.2.2. Features
2.2.3. Limitations
2.2.4. Demos


2.3. What is needed
2.4. Challenges

2.4.1. Challenges of porting the previous work to support Boxy
2.4.2. Challenges of VMI Events
2.4.3. Challenges due to outside events


2.5. Usefulness of this work


3. Conclusion
4. Acknowledgements

1. Introduction

SVmiDbg aims to be a GDB server / debugger that leverages hypervisor features for stealthy breakpoints.
To achieve stealthy traps, I choose to leverage the following features of the CPU:

Monitor Trap Flag (MTF), allows to single step a VM's vCPU.
Extend Page Table (EPT), allows to change page mapping permissions and trap on execute, read or write access.

A stealthy breakpoint can be made by setting up EPT to trap on page execute. When an EPT violation occurs, we can re-enable execute then we use MTF to single step until reaching the desired location before generating a breakpoint. This process is simple and doesn't require to enable interrupt exiting on the host, when the host VM is the target (a possibility only with Bareflank).
In comparison, DRAKVUF achieves stealthy traps slightly differently, it uses the technologies mentioned above but it also uses interrupts to trap on int3 and uses multiple EPT mappings and switches between them, currently only a feature of the Xen hypervisor with altp2m.
There are pros and cons for both breakpoint technologies but in both cases the entire process is invisible from the VM's Operating System or user space applications.
By using LibVMI as a dependency, we have access to some interesting features such as:

Semantic understanding of the Guest Physical Memory (Kernel structures and pointers)
Multiple Hypervisor support (Xen, KVM, Bareflank)
VMI Events (LibVMI's name for VM exiting notifications from the hypervisor which allows for interposition)
Interposition (Once an event is received, we can change the state of the VM or do something with the event)

The Bareflank hypervisor has some very interesting properties. One of which is its late launch feature that allows to demote the current running host OS into a VM, a feature also used by rootkits.
Bareflank Boxy / MicroV is an extension of the base hypervisor that adds multiple guest VM support.
With the support of Boxy / MicroV, LibVMI would be able for the first time to introspect a host Windows OS. With the combination of late launch, interesting new types of LibVMI based security applications could emerge. e.g. One could imagine a LibVMI based anti-virus that can just be deployed and installed on a host Windows OS. It doesn't require any complicated environment to be set up.
In preparation for this GSoC, I had fixed issues with the latest Boxy to work again on Windows (PR#49) by bypassing cygwin's toolchain limitation which was missing timespec_get.
At the start of GSoC20 I had a minimal implementation of SVmiDbg working with Xen / DRAKVUF. The state of Bareflank support in LibVMI was minimal and didn't support guest VMs or any VMI events (see previous work section).
2. LibVMI with Bareflank hypervisor / Boxy / MicroV

2.1. Previous work

The previous work made some feature of the Bareflank hypervisor (without guest VM support) work with LibVMI.
Two previous GSoC and a contribution between them have been made to add Bareflank hypervisor support for LibVMI.

First GSoC: Initial Bareflank driver for LibVMI (memory / register access) for the Bareflank hypervisor (i.e. Bareflank, LibVMI)
Intermediate contribution: Based on the previous work, it made the Bareflank driver support for LibVMI official (PR: Bareflank driver)
Second GSoC: Updated to support the newer Bareflank hypervisor API and other work in progress. (LibVMI and Bareflank)

2.1.1. Features


Memory access
Limited register read/write support

2.1.2. Limitations


Limited number of register read/write support
No guest support: LibVMI can only introspect its current running operating system.
No Windows support: Due to the previous point, LibVMI needs to run on the host which has to be Linux
No VMI event support

2.2. My work during GSoC20

My work during GSoC has been focused first on adding guest VM support to LibVMI (Boxy/MicroV with Buildroot), then on adding the needed VMI events by SVmiDbg between the Boxy/MicroV hypervisor.
2.2.1. Work

2.2.1.1. Porting the previous work to Boxy

Using the previous work's ABI, it had to be ported to Boxy to add guest VM support. Doing so, we stopped supporting the base hypervisor as is. What this means is simply that LibVMI needs its own Linux VM whereas before it was running directly on the host. However, we gain the ability to introspect a Windows host as mentioned before.

Boxy PR: Add initial support for LibVMI
LibVMI PR: LibVMI: Add Bareflank Boxy / MicroV support)

See the challenges section for the difficulty encountered with this work.
2.2.1.2. VMILinux: A minimal Linux VM for LibVMI with Buildroot

Creating VMILinux (service VM): a tiny Linux environment made with Buildroot to cross compile LibVMI and its dependencies.

Boxy: Add LibVMI VM example with Buildroot

Requested by my backup mentor Dr. Tamas K Lengyel, I had posted a demo of LibVMI's process list example running in Boxy in a tiny Linux VM on a Windows host.
2.2.1.3. MicroV: A new ABI replacing Boxy

Boxy is in the process of being replaced by MicroV. After feedback from Dr. Rian Quinn from my first Boxy PR, we have defined a new MicroV ABI to help with the long-term support of this project.

I provided what I thought would be needed for LibVMI as an ABI specification draft
Dr. Rian Quinn posted the new MicroV ABI

Following the new MicroV ABI, I reimplemented some hypercalls which removed the need from using JSON within the hypervisor as was previously done with the old ABI and since there weren't many hypercalls left to reimplement, I simply reimplemented those too. This will allow LibVMI to support the upcoming MicroV hypervisor.

Boxy: MicroV: implement initial spec needed for LibVMI
LibVMI: Add MicroV support to the Bareflank driver
Bareflank base hypervisor: Add MicroV support

2.2.1.4. VMI Events: MTF (Single-steps), EPT (Memory events) and Control Register events

I was able to reach the VMI events milestone which was probably the hardest part of this project (see the Challenges of VMI Events section).

Boxy MicroV: Add support for exit events (control register access, single step (MTF) and memory events (EPT))
LibVMI: BF: Add initial support for VMI events
MicroV: Virtual Processor Exits specification is my addition to the MicroV specification. It is my proposition for adding support for VMI events. My exact changes are visible from the diff here.

The set of MicroV APIs that I have defined is loosely based on VMX exiting and adds vmread, vmwrite hypercalls for direct access to the VMCS. The hypercalls next_exit and end_of_exit are used for the processing of VM exit events and to tell the hypervisor how to proceed. I also made this set of hypercalls usable from user space (bypassing the kernel) for less overhead or from a kernel driver (by vIRQ injection) if desired. While usable from user space, I made sure to make it compatible with PCIe passthrough or vIRQ injection.
2.2.1.5. Multiple serial passthrough


Boxy: Add support for multiple UART passthrough

This will be needed by SVmiDbg to add interactivity with serial polling between the server in the VM and a GDB client on a secondary machine.
2.2.1.6. Extra work


MicroV spec and implementation (see section 2.2.1.3. MicroV: A new ABI replacing Boxy)
Buildroot contribution originally needed for the support of the old ABI.
Google Rekall contribution to fix profile generation needed by LibVMI
Control Register event (see VMI events section).

Control register events were originally planned as extra work for the future but I needed a way for events to fire off while requiring the least amount of setup. I couldn't rely on LibVMI's semantic ability and while CR3 events happen all the time (every time a process is being scheduled) it made the most sense to me to use them. This will also allow user space application to be targeted by SVmiDbg in the future.
2.2.1.7. Documentation

It is available here: LibVMI with Boxy / MicroV documentation.
2.2.1.8. Final GSoC20 branch

I have added a chp-gsoc20-final branch in all of the main projects that I have contributed to which set a point in time for the end of my GSoC and contains my latest work at this time.
It includes many more commits not mentioned here. I tried to keep the commit history as clean as possible by rebasing / squashing and writing useful commit messages.

Boxy fork
LibVMI fork
hypervisor fork
MicroV fork
SVmiDbg

2.2.2. Features


Full general purpose registers and model-specific registers read / write access support
Guest support: LibVMI can run in a tiny Linux VM with Boxy / MicroV
Windows support: LibVMI can introspect the root VM (Windows or Linux) from its guest VM (Linux).
Initial VMI event support: Control register, single step (MTF), memory access (EPT violations)
Multiple serial passthrough with Boxy

2.2.3. Limitations


1 virtual CPU support only (requires IPIs in MicroV and support in the MicroV spec)
Limited event support (Currently limited to what svmidbg needs)

2.2.4. Demos


LibVMI's process list example running in Boxy in a tiny Linux VM on a Windows host.
Linux host: process-list, module-list, cr3-events, single-step events
Windows host all (process-list, cr3-events, single-step events)
MineSweeper hack

My future honeynet project blog post will likely have more.
2.3. What is needed

The VMI events needed by SvmiDbg are now implemented in LibVMI and the Boxy/MicroV hypervisor. There is very little needed to make the debugging server work:

Serial polling for interactivity
Stealthy breakpoint using the now implemented MTF + EPT Violation events, following the algorithm described in the introduction

For the Bareflank driver to be complete in LibVMI:

Multi-vCPU support
The missing VMI events (CPUID, interrupts, etc.)

2.4. Challenges

Most of these challenges are due to the nature of the project.
2.4.1. Challenges of porting the previous work to support Boxy

After porting the previous work to Boxy and building a tiny Linux VM with Buildroot, I had it working on a Linux host. The next step was to make it work on Windows but I was getting random BSoD errors and I had no useful error messages to debug this.
It turned out that during the initialization of LibVMI, if most of the "smart" initialization methods failed to try finding the Windows kernel location, LibVMI would try memory sweep algorithms to find patterns in the guest physical memory. The host being the target, and Boxy not having safety mechanisms, it would allow LibVMI to touch any physical memory region which is very unsafe and was enough to result in random BSoD. I once had the GPU's memory mapped touched because I had some beautiful artifact on my screen, I also experienced weird noises on reboot, yet my machine survived.
Once I explained the GPU artifact to my mentors, they were quick to put me in the right direction. It turned out that Dr Tamas also encountered this issue with the base hypervisor and wrote a note on e820 memory map.
To resolve this, I disabled all memory sweep algorithms if Bareflank is the hypervisor. It forces LibVMI to only use rekall or volatility profiles during initialization, otherwise fail.
A set of hypercalls has been defined to allow in the future to add an e820 memory map of the host at launch time, which will then be passed to the LibVMI VM and used by LibVMI during initialization.
Another solution proposed by Dr Rian was to check for the cacheability of the mapped memory and deny access to non-write-back cache.
To fix the initialization problem, I had to implement the missing MSR register access (commit 04bbac) and modify the kpcr_find2 algorithm to use KiSystemCall64 as a fallback to KiSystemCall (commit d2f971). The algorithm issue was never encountered likely due to the fact that Windows guest VMs were always the target not a Windows room VM (or host) which initialized differently on bare metal and used KiSystemCall64.
2.4.2. Challenges of VMI Events

Architecture-wise, we originally thought the implementation of events would require interrupt injections into the monitoring VM (where LibVMI runs) from the hypervisor which would have required a kernel driver and then to notify the LibVMI userspace application from the kernel. This is how other hypervisors do it and we thought this was necessary to be compatible with PCIe passthrough (also requiring IRQ injections).
I spent time studying how the Xen hypervisor does it. I also looked at Intel's ACRN project. One thing that tend to seperate Bareflank from other projects is with the minimal approach it takes for the design of its architecture. This is visible in how Boxy schedules its VMs. It lets the host OS do the heavy lifting by having a simple "hollow" process with a thread looping on a run_op hypercall to donate its running time to its associated VM and handing back control to the host on interrupt exiting from the guest. While implementing the hypercalls for VM exit events, this minimalist approach influenced me a lot.
I had the idea of bypassing the kernel altogether by having LibVMI loop on a next_exit hypercall for VM exit event notifications. The problem was that we also needed to be compatible with PCIe passthrough (also using interrupt injections) and to not have the CPU spin at 100%. While implementing this, I had the idea to not advance the instruction pointer on the hypercall if no events are pending. Instead I would return execution to the host. When a VM exit would occur that LibVMI is listening on, I return execution to the LibVMI guest, which is still on the VMCALL instruction, causing the hypercall to trigger again but now we advance the IP to let LibVMI process the event. This has the benefits of being simple, having less overhead than having to go through the kernel (there is only one process running, LibVMI as init), not having the CPU spin at 100% and yet still compatible with PCIe Passthrough or other interrupt injections. Dr. Rian told me that he loved this approach.
2.4.3. Challenges due to outside events

I was unable to work for a week due power outages caused by tropical storm Isaias the first week of August.
2.5. Usefulness of this work

Hopefully others will find my contributions useful. I can think of the following scenarios:

This is the first time that LibVMI is able to introspect a Windows host!
Other projects dependent on LibVMI will benefit from the support of a new hypervisor
The Bareflank late launch feature will allow new kinds of security applications with LibVMI
Other introspection libraries will be able to add support more easily for Boxy/MicroV


3. Conclusion

This was an incredible learning experience. I became very efficient at navigating the Bareflank and LibVMI projects. More importantly my C++, assembly and debugging skills have improved. Even more importantly I have learned a lot about system programming with hypervisor technologies with Bareflank and Xen on Intel CPU, the Linux kernel and the Windows kernel.
4. Acknowledgements

I want to thank my mentors Dr Rian Quinn and Dr Tamas K. Lengyel for their support and their quick response whenever I had a technical question. I felt very lucky to be under their mentorship and to benefit from their genius and technical knowledge.
The Bareflank slack was very welcoming and technical questions on systems programming gets asked and answered all the time. I would like to mention and thank a special visitor, Andrew Cooper of Citrix for providing historical context on top of some interesting technical answers and to the rest of the Bareflank team, Connor and Jared also for providing useful feedbacks.
Thanks to the Honeynet Project for providing support and incredible GSoC project ideas and opportunities.
Thank you Google for the Google Summer of Code program and for the opportunity you have given me and to the open source community as a whole.