malbecki/Edk2DriverTests.md

## Edk2DriverTests.md

      
    Raw
  

              Edk2DriverTests.md
            
          
    Introduction

EDK2 testing framework allows running and testing code that is conventionally only able to run during boot time.
This can be used to perform automatic tests on a code without booting target platform and executing tests on the same system
that was used for development or from a CI/CD env. For the remainder of this discussion, I will refer to this method of testing/development as running the test on host system as opposed to running tests on target system which is the system which you are developing BIOS for. The goal of the proposition is to extend the host tests coverage to device drivers code or any other code
which communicates with device via memory writes.
Benefits of testing(and development) on host system

Before we discuss how to implement host-based tests for device drivers I want to quickly go over why I think host based tests are inherently better than the ones that run on the target. I see following advantages:

Access to normal developer tools and libraries that can be used under OS (example: gdb)
Isolation from the target SW. For instance, you don't depend on target services to write/read log files, connect to the internet to publish results etc.
Isolation from the target HW. Especially early in the platform lifecycle problems with new HW might cause the target system to be unable to boot even to EFI shell (for instance memory is not training, PCI bus can't be enumerated or maybe even BIOS is not fetching).
Ability to perform destructive tests. For instance, performing a block i/o write test on a target system might destroy a file system on which the test code depended on to be able to save log files or maybe PCI test has messed up PCI hierarchy and now all of the device drivers are doing invalid accesses to physical memory. Such issues are avoided if nothing is booted on the target.
Speed. Because you don't have to boot the target system the tests should be significantly faster.

Issue with testing device drivers

While current testing framework works fine for libraries that do not have a lot of dependencies it is still not suitable for testing
device drivers. Device drivers have 2 main dependencies which are currently problematic for developing tests:

EDK2 framework dependencies (gBS, gRT etc.)
Device itself

I will skip over EDK2 framework dependencies as those are purely software and can be provided fairly easy in the host environment.
The problematic dependency is the device for which we have to provide some kind of a working model simulating device registers and
device behavior. Following is the discussion of different methods that can be employed to model a device.
Mocking device in the host

Usually, our driver code uses protocols to access device registers - here is a diagram that shows how a typical device driver is
structured(simplified view):

To fully mock a device, we need to be able to mock all of those interfaces and moreover they need to point to the same device mock to
allow device driver to mix and match which one they use. To accommodate that I propose we add a new interface which can be used to
implement all of the above interfaces. Here is the view of the dependencies with the new interface:

Once we have this new interface it is just a matter of writing libraries which can translate it to standard interfaces used by our drivers.

Implementation of PCI_IO: https://github.com/matalbec/edk2/blob/qemu_unit_tests/UnitTestFrameworkPkg/RegisterMock/Library/MockPcioLib/MockPciLib.c
Implementation of PciSegmentLib: https://github.com/matalbec/edk2/blob/qemu_unit_tests/UnitTestFrameworkPkg/RegisterMock/Library/MockPciSegmentLib/MockPciSegmentLib.c
Implementation of IoLib: https://github.com/matalbec/edk2/blob/qemu_unit_tests/UnitTestFrameworkPkg/RegisterMock/Library/MockIoLib/MockIoLib.c

NOTE: At the time of the writing the implementation of those libs is very much incomplete as my only goal was to make the framework work
with SD card driver.
The question remains on how to implement this new interface to get a meaningful device behavior. Below I will discuss a couple
of approaches that can be used.
Implemeting device model under host

Writing your own device model

Conceptually a simplest approach - just write the device model, or a part of it that you need to execute a test, inside the test code
itself. To help with writing local device models I have provided a library LocalMockRegisterSpaceLib which acts as a software
implementation of a bus. The main goal of that lib is to align unaligned transactions initiated by SW. Right now only DWORD alignment is
supported. Guarantees on the read/writes alignment is important to simplify writing side-effects code.
Overview of local device model

To implement device model we need a device context to hold device state and read/write functions for each of its register spaces. For
Sd card host controller following were provided:
typedef struct {
  BOOLEAN  PioTransferStart;
  UINTN    CurrentPioIndex;
  UINT8    Block[SD_CONTROLLER_MODEL_NUM_OF_BLOCKS][SD_CONTROLLER_MODE_BLOCK_SIZE];
  BOOLEAN  LedWasEnabled;
  UINT32   NormalErrorInterruptStatus;
  UINT16   TransferMode;
  UINT32   SdmaAddress;
} SD_LOCAL_DEVICE_MODEL;

VOID
SdMmcBarLocalModelRead (
  IN VOID                   *Context,
  IN  UINT64                Address,
  IN  UINT32                ByteEnable,
  OUT UINT32                *Value
  )
{
  // Process the read
}

VOID
SdMmcBarLocalModelWrite (
  IN VOID                  *Context,
  IN UINT64                Address,
  IN UINT32                ByteEnable,
  IN UINT32                Value
  )
{
  // Process the write
}
Those functions need to be passed to the LocalRegisterSpaceLib which will produce REGISTER_SPACE_MOCK from them. Returned
REGISTER_SPACE_MOCK needs to be used when creating MOCK_PCI_DEVICE which is then used to construct EFI_PCI_IO_PROTOCOL.
  MockPciDeviceInitialize (&SdControllerPciSpace, &MockPciDevice);

  LocalRegisterSpaceCreate (L"SD BAR", SdMmcBarLocalModelWrite, SdMmcBarLocalModelRead, *Device, &SdBar);

  MockPciDeviceRegisterBar (MockPciDevice, (REGISTER_SPACE_MOCK*) SdBar, 0);

  MockPciIoCreate (MockPciDevice, &MockPciIo);
Again the only reason to use LocalRegisterSpaceCreate is to simplify writing the side-effects. LocalRegisterSpaceLib will guarantee that
all accesses that come to SdMmcBarLocalModelRead/Write are aligned to DWORD. Accesses that cross the DWORD boundary are split by the
LocalRegisterSpaceLib.
LocalRegisterSpaceLib: https://github.com/matalbec/edk2/blob/qemu_unit_tests/UnitTestFrameworkPkg/RegisterMock/Library/LocalMockRegisterSpaceLib/LocalMockRegisterSpaceLib.c
Full test code implementation: https://github.com/matalbec/edk2/blob/qemu_unit_tests/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/UnitTest/SdMmcPciHcHostTest.c
Summary of writing your own device model

I see following advantages of writing your own device model

You can instantiate the device model in whatever state you wish. For instance, there is no need to initialize the SD card
in my test code. I simply ignore that aspect of SD host controller and provide functions which can immediately service block io.
Instantiating and testing the code is very fast as everything is purely software.
You control the code. If your change breaks the unit test you can just go and see how the model behaves and either fix the model
or fix your change.

Following are the disadvantages that I see:

Labour intensive - you have to write device behavior. Even simple device behavior can be time consuming.
Inaccurate - or rather most likely inaccurate as the accuracy depends on the amount of work you put into the device model. I've put
only a minimum amount to make block io work and if the driver added a behavior to, for instance, request SD card state during the
block io execution (perfectly valid thing to do) my test will simply fail. This makes tests written using this method brittle.
Biased - in this method same set of people are responsible for writing both device model and the driver. Any misunderstandings of
the spec will not be caught with such tests.

I believe we should avoid this method and only use it when:

There is no other way
Device is very simple

Using a virtual machine - explained with QEMU

The number one problem with previous method for me is the time required to make the software device model. Wasting time on writing
dubious device models that are not going to be used outside of the test case doesn't seem like a proper allocation of programmers time.
Instead, maybe we can leverage a work that somebody else has already done. This is where virtual machines come in and for my example
specifically, I will be using QEMU virtual machine. Several things make QEMU a very good target both as an example and as an eventual
CI/CD target for EDK2:

QEMU is open source so whenever there is a failure in test case we can go and inspect device behavior
QEMU is a target for OVMF, one of EDK2 platforms. This means that we are testing against an actual target platform
QEMU supports QMP(QEMU machine protocol). It's a fairly powerful protocol which makes the whole idea of running the tests on a host
possible. It allows the test process to start and connect to the QEMU process via a socket. With this connection test code can do
IO/MEM writes/reads. QMP also supports some more advanced functionalities such as device hot plug

QEMU actually already uses the QMP to perform their own unit tests via the QTest framework. Here is how the test setup looks like in
QTEST:

We can simply replace QTest with an EDK2 test app and execute our driver code the only thing that we need to do is to link to an
already existing libqos from QEMU which has all of the QMP support.

Read more about QTest: https://wiki.qemu.org/Features/QTest
My implementation of libqos and QEMU based testing in EDK2 code: https://github.com/matalbec/edk2/blob/qemu_unit_tests/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/UnitTest/SdMmcPciHcHostTestQemu.c
Check function SdMmcSignleBlockReadShouldReturnDataBlockFromQemuDeviceModel

QEMU based test explanation

This section will be dedicated to the code walkthrough for people who want to understand better how the testing with QEMU works. Please bear in mind that I am not super familiar with libqos myself so maybe I am doing some things inefficiently (for instance I am not using Qgraph).
First step is to create a QEMU process and connect to the QMP socket. Libqos provides convenience function for this:
  QOSState *qs;

  const char*  cli = "-M q35 -device sdhci-pci -device sd-card,drive=mydrive -drive id=mydrive,if=none,format=raw,file=/home/matalbec/vm_images/sdcard.img";
  qs = qtest_pc_boot(cli);
This call will create a Q35 machine with SD card host controller. For simplicity I have hardcoded a path to the image file, but this can be provided via env variables or via cmd line parameters.
Next step is to find a device which we want to test and create a PCI IO protocol for it. Here is a loop which iterates over all devices
and finds SD card host controller:
QEMU_REGISTER_SPACE_MOCK  *QemuRegisterSpace;
QPCIBus                   *PciBus;
QPCIDevice                *SdhciDevice;
UINTN                     Device;
UINTN                     Function;

PciBus = qpci_new_pc (Qs->qts, NULL);
if (PciBus == NULL) {
  DEBUG ((DEBUG_INFO, "Failed to get pci bus\n"));
}
for (Device = 0; Device < 32; Device++) {
  for (Function = 0; Function < 8; Function++) {
    SdhciDevice = qpci_device_find (PciBus, QPCI_DEVFN (Device, Function));
    if (SdhciDevice == NULL) {
      continue;
    }
    if (qpci_config_readw (SdhciDevice, 0xA) == 0x0805) {
      DEBUG ((DEBUG_INFO, "Found SDHCI at Dev %X, Fun %X\n", Device, Function));
      if (Type == QemuBar) {
        QemuRegisterSpace->Bar = qpci_iomap (SdhciDevice, BarNo, NULL);
        qpci_device_enable (SdhciDevice);
      }
      QemuRegisterSpace->Device = SdhciDevice;
      break;
    }
    g_free (SdhciDevice);
  }
  if (QemuRegisterSpace->Device != NULL) {
    break;
  }
}
This code first creates a handle to PCI bus on the QEMU and then goes over all possible devices until it finds a device with SD
host controller class code. Full code can be seen here: https://github.com/matalbec/edk2/blob/qemu_unit_tests/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/UnitTest/SdMmcPciHcHostTestQemu.c function QemuRegisterSpaceInit
When this is done, we are ready to access QEMU device over QMP socket. So far, I was able to initialize the SD card and execute block
io transfer (both write and read) via PIO method(DMA support is still WIP).
How to integrate libqos into EDK2

It's actually pretty hard to link those libraries to EDK2 as the libraries are a part of the QEMU tree. Here are the steps just to
make my change build (gcc only):

Clone QEMU source code and build QEMU (any machine should do it but I tested only on x86-64)
Add include paths to the .inf file of the test code example for my system here: https://github.com/matalbec/edk2/blob/qemu_unit_tests/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/UnitTest/SdMmcPciHcHostTestQemu.inf BuildOptions section
Do the build like you would build normal EDK2 platform. This will fail during linking - it is expected
After the build failed you need to add missing static qemu libraries. To do that go to the static_library_files.lst and
add qemu libraries you can find here: https://github.com/matalbec/edk2/blob/qemu_unit_tests/qemubuild of course change the paths to fit your system
Execute the build command yourself. You can use the one in my qemubuild file above, but you need to change the paths to fit your system.

So, in summary to make this a nice experience we need 3 things in the long term:

libqos released separate from QEMU
An ability in our build system to link to pre-built libraries for host targets
An ability to allow dynamic linking for host targets

Summary of testing with virtual machines(based on QEMU)

Advantages:

We are testing an actual target platform
We are still not running on the target system which means that we do not need to boot, and we do not depend on the HW which we are
testing (for functionalities other than the test itself).
We leverage an actual full device behavior
Tests done this way are very fast. The difference in time between this method and the first one is not
noticeable.

Disadvantages:

We need significant prework in EDK2 to be able to link to libqos from QEMU and probably some talk with QEMU community to release
this library separately (or release it ourselves)
QEMU doesn't simulate all of the devices with all of the capabilities. We would still need to put in the work to write missing
support. This is fine however as, at the very least, the work can be used in actual product.

I think this is the best method to provide automated tests for EDK2. It's fast, robust, complete, and surprisingly lightweight after we
put in the initial effort to enable libqos in EDK2.
Connecting to RTL simulation

This section looks into how we could run the EDK2 driver on the RTL simulation. To my understanding typical setup for validating EDK2
driver on RTL would involve full system simulation/emulation on which we could execute. This is problematic as the execution times for
full system are prohibitively long.

When writing device driver we are typically not concerned with other IPs on the system, CPU architecture, bus architecutre etc. so all
of those elements can be removed without loosing much accuracy. The problem then remains how to execute memory cycles to the device
without a core.
One solution to that problem is to use a VPI(verilog programming interface) to develop a SW plugin which would expose the protocol over
a socket which would allow the driver to execute memory cycles directly on the RTL simulation. In such setup we only need to simulate IP
itself, the role of the CPU andfabric is replaced by SW which instruments IP signals to execute a transaction.

The full setup consist of

IP RTL itself. It needs to implement some interface to a fabric such as wishbone, OCP or PCIe
Simulated components to make it functional. For instance for SD card, flash memory might be simulated using file on a host system
Cosimulation plugin which creates a socket for the communication with the driver and instruments IPs fabric interface to generate cycles
The driver itself.

Here is how a full msg cycle looks between different components:

Advantages:

Model is very accurate as this is the RTL based on which HW will be created
You can write assertions based on the pin/signal state. This is useful if you programming is affecting controller in a SW-invisible way
For a company developing RTL it removes the neccessity of writing pure SW model of a device

Diadvatages:

I wasn't able to fully gauge performance but it can be safely assumed that it will be worse then a pure SW model.
RTLs are not easily available. The only big open source project with HW IPs is the opencores project. In general this method would be most useful for the companies.

Full example of such a setup is provided in following files:

Sample SD card RTL + VPI based cosimulation plugin which creates a socket and instruments IP pins: https://github.com/matalbec/sdhci-cosim
EDK2 test that connects to simulation and executes block io flow: https://github.com/matalbec/edk2/blob/driver_tests/MdeModulePkg/Bus/Pci/SdMmcPciHcDxe/UnitTest/SdMmcPciHostTestVpi.c

Summary

Below is the visualization on how the stack would look like in the end:


## FullSystemSim.puml
@startuml
[CPU] --> [Fabric]
[CPU] -> [Driver] : executes
[Fabric] --> [IP1]
[Fabric] --> [IP under test]
[Fabric] --> [IP2]
@enduml

## IsolatedIpSim.puml
@startuml
[Software fabric] <- [Driver] : communication over socket
[Software fabric] --> [IP under test]
@enduml

## StackSummary.puml
@startuml
object DeviceDriver
object PciIo
object IoLib
object PciSegmentLib
object CpuIo
object RegisterSpaceMock
object LocalDeviceModelRegisterSpace
object QmpRegisterSpace
object Qemu
object RtlConnectionRegisterSpace
object RtlSim

DeviceDriver *-- PciIo
DeviceDriver *-- IoLib
DeviceDriver *-- PciSegmentLib
DeviceDriver *-- CpuIo

PciIo *-- RegisterSpaceMock
IoLib *-- RegisterSpaceMock
PciSegmentLib *-- RegisterSpaceMock
CpuIo *-- RegisterSpaceMock

RegisterSpaceMock <|-- LocalDeviceModelRegisterSpace
RegisterSpaceMock <|-- QmpRegisterSpace
RegisterSpaceMock <|-- RtlConnectionRegisterSpace

QmpRegisterSpace --> Qemu : over socket
RtlConnectionRegisterSpace --> RtlSim : over socket

@enduml

## XCosimulationMessages.puml
@startuml
Driver -> SimulationListenerThread : over socket "pci read 0x0 2"
SimulationListenerThread -> SimulationInstrumentationThread : over shared variable "type: pci, op: read, offset: 0, width: 2bytes"
SimulationInstrumentationThread -> SimulationInstrumentationThread : set interface pin values to execute pci read cycle
SimulationInstrumentationThread -> SimulationInstrumentationThread : wait for response from simulation and read the value
SimulationInstrumentationThread -> SimulationListenerThread : over shared variable "cycle_complete: 1, value: 0x8081"
SimulationListenerThread -> Driver: over socket "value 0x8081"
@enduml
	@startuml
	[CPU] --> [Fabric]
	[CPU] -> [Driver] : executes
	[Fabric] --> [IP1]
	[Fabric] --> [IP under test]
	[Fabric] --> [IP2]
	@enduml
	@startuml
	[Software fabric] <- [Driver] : communication over socket
	[Software fabric] --> [IP under test]
	@enduml
	@startuml
	object DeviceDriver
	object PciIo
	object IoLib
	object PciSegmentLib
	object CpuIo
	object RegisterSpaceMock
	object LocalDeviceModelRegisterSpace
	object QmpRegisterSpace
	object Qemu
	object RtlConnectionRegisterSpace
	object RtlSim

	DeviceDriver *-- PciIo
	DeviceDriver *-- IoLib
	DeviceDriver *-- PciSegmentLib
	DeviceDriver *-- CpuIo

	PciIo *-- RegisterSpaceMock
	IoLib *-- RegisterSpaceMock
	PciSegmentLib *-- RegisterSpaceMock
	CpuIo *-- RegisterSpaceMock

	RegisterSpaceMock <\|-- LocalDeviceModelRegisterSpace
	RegisterSpaceMock <\|-- QmpRegisterSpace
	RegisterSpaceMock <\|-- RtlConnectionRegisterSpace

	QmpRegisterSpace --> Qemu : over socket
	RtlConnectionRegisterSpace --> RtlSim : over socket

	@enduml
	@startuml
	Driver -> SimulationListenerThread : over socket "pci read 0x0 2"
	SimulationListenerThread -> SimulationInstrumentationThread : over shared variable "type: pci, op: read, offset: 0, width: 2bytes"
	SimulationInstrumentationThread -> SimulationInstrumentationThread : set interface pin values to execute pci read cycle
	SimulationInstrumentationThread -> SimulationInstrumentationThread : wait for response from simulation and read the value
	SimulationInstrumentationThread -> SimulationListenerThread : over shared variable "cycle_complete: 1, value: 0x8081"
	SimulationListenerThread -> Driver: over socket "value 0x8081"
	@enduml