Skip to content

Instantly share code, notes, and snippets.

@i3abghany
Last active August 22, 2021 20:22
Show Gist options
  • Save i3abghany/95dbbd48b3a4d8ffe4350f6290ea6d37 to your computer and use it in GitHub Desktop.
Save i3abghany/95dbbd48b3a4d8ffe4350f6290ea6d37 to your computer and use it in GitHub Desktop.
A presentation of the work done for my GSoC 2021 project with the QEMU organization.

QEMU Cache Modelling TCG Plugin

This repository contains all the work done for the project TCG Plugin: Cache modelling, wherein a multi-core, multi-level cache modelling TCG plugin is developed. I also wrote a QEMU blog post that contains more technical information about the internals of the plugin, along with an example demonstrating how to make use of it. The plugin can be optionally attached to QEMU on either user-mode emulation or full-system emulation. On finishing execution, the plugin outputs statistics related to cache performance using the working set proposed by the memory access pattern proposed by the emulation target.

Introduction

QEMU, as a multi-arch emulator, uses TCG as one means to translate the target architecture's instructions to host instructions that run on the processor executing QEMU. TCG has the ability to register subscribers on several events, which make up the TCG plugins subsystem. Through plugins, one can subscribe to events such as instruction translation, instruction execution, and memory access, and instrument those through registered callbacks.

TCG plugins can observe the system down to the granularity of individual instruction execution and memory access. By utilizing the QEMU plugin API, we've intercepted those events and emulated CPU caches that are pre-run-time configurable.

Scope and Basic Organization

While different microarchitectures often have different approaches at the very low level, the core concepts of caching are universal. As QEMU is not a microarchitectural emulator we model an ideal caching system with a few simple parameters. By doing so, we can adequately emulate the behaviour of a caching system without diverging from real-hardware behaviour.

The scope of the plugin is to catch the trends that depend on the memory access pattern, which is largely dependent on the executable code, without delving into microarchitectural details that change from one microarchitecture to another.

Having that said, we limit ourselves to private L1 per-core instruction caches, and private L1 per-core data caches, with the ability to optionally emulate unified (data + instructions) per-core L2 caches.

To keep it simple, no inter-core interaction was taken into account, since such considerations are largely implementation-dependent and will vary from one microarchitecture to another.

Configurability

Emulated caches are configurable in terms of the following parameters

  1. Overall cache size
  2. Block(line) size
  3. Set-associativity

A single cache eviction policy can also be specified as a plugin argument. This policy can be one of the following

  1. Least Recently Used (LRU)
  2. First-in first-evicted
  3. Random eviction

For multi-threaded user-space programs, and full-system emulation that have access to more than one core, we can specify the number of "cores" to take into account and emulate caches for.

Multi-Core Cache Emulation

QEMU can emulate multi-threaded user-space applications, and it can provide more than one CPU for a guest kernel in full-system emulation. These kinds of working sets are supported through the following mechanisms.

Full-System Multi-Core Cache Emulation

TCG plugins have access to basic information about the system, such as the number of core available for the guest. By default, This information is used to construct a cache emulation system for each core available. (i.e. have L1 instruction cache and L1 data cache, and optionally L2 unified cache)

A subscription callback for a memory access event has access to the vCPU index that initiated the access. This is used to identify the cache to access.

User-Space Multi-Threaded Cache Emulation

User-space emulation targets mirror the thread structure of the emulated program, and it's bound by how many threads the host kernel will allow it to create. This means that we cannot know how many threads will be created prior to running.

To mitigate this, the plugin tracks a static number of cores (1 by default) that can be configured as a plugin argument. If the number of threads is more than the number of available cores, the threads may thrash each other.

This mirrors how kernels allow user-space applications to make as many threads as they want, but eventually, those threads must be scheduled on an available physical core and subsequently may thrash each other.

Future Work

All the goals defined by the project proposal were successfully met and merged (either upstreamed or in the maintainer's tree, since at the time of GSoC 2021 ending, the QEMU project is on a release cycle). However, there could be some convenient features to aid input and output.

The plugin has various parameters and usually the great majority of them are used. This makes the invocation command cluttery. Hence, the plugin could make good use of parsing a configuration file and get its parameters from it.

Also, it would be nice if we could support outputting the data in a standard format like YAML or JSON, as plugins outputs could be fed into another program for post-processing.

Acknowledgements

I'd like to show my sincere gratitude to Alex Bennée (stsquad on IRC) for mentoring me, patiently reviewing my patches, and answering my questions.

I'd also like to thank the QEMU community for helping me on various occasions during my GSoC participation.

Proposed Changes

Only changes accepted or on-going are listed in this section.

Patches Related to the Cache TCG Plugin

[PATCH v4 0/5] plugins: New TCG plugin for cache modelling

Misc. Updates and bug fixes

[PATCH v5 0/2] plugins/cache: multicore cache modelling

[PATCH v4 00/13] new plugin argument passing scheme

[PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix

Patches Related to the QEMU CLI

[PATCH v4 00/13] new plugin argument passing scheme

Patches Related to Other Plugins

[PATCH v4] plugins/syscall: Added a table-like summary output
[PATCH] plugins/execlog: removed unintended "s" at the end of log lines.

[PATCH v4 00/13] new plugin argument passing scheme

Patches Related to QEMU-Web

[PATCH v3] blog: add a post for the new TCG cache modelling plugin

Getting The Code

At the time of this post, single-core cache emulation is merged to QEMU upstream. A fresh clone of the QEMU source will have that code in. Multi-core emulation is accepted by the maintainer, but since the QEMU project is on a revision cycle, it's not getting any major updates, so the code will have to wait after the project stabilization in order to get merged. In order to get that, you may manually apply the relevant patches, or fetch the plugins/next tree. The plugins/next tree also contains the new plugin argument passing scheme. The L2 part is waiting to be reviewed and is still experimental. It must be manually applied through the appropriate patches stated above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment