i3abghany/GSoC-QEMU-Cache-Modelling-TCG-Plugin.md

## GSoC-QEMU-Cache-Modelling-TCG-Plugin.md

      
    Raw
  

              GSoC-QEMU-Cache-Modelling-TCG-Plugin.md
            
          
    QEMU Cache Modelling TCG Plugin

This repository contains all the work done for the project TCG Plugin: Cache
modelling,
wherein a multi-core, multi-level cache modelling TCG plugin is developed. I
also wrote a QEMU blog
post that contains
more technical information about the internals of the plugin, along with an
example demonstrating how to make use of it. The plugin can be optionally
attached to QEMU on either user-mode emulation or full-system emulation. On
finishing execution, the plugin outputs statistics related to cache performance
using the working set proposed by the memory access pattern proposed by the
emulation target.
Introduction

QEMU, as a multi-arch emulator, uses TCG as one means to translate the target
architecture's instructions to host instructions that run on the processor
executing QEMU. TCG has the ability to register subscribers on several events,
which make up the TCG plugins subsystem. Through plugins, one can subscribe to
events such as instruction translation, instruction execution, and memory
access, and instrument those through registered callbacks.
TCG plugins can observe the system down to the granularity of individual
instruction execution and memory access. By utilizing the QEMU plugin API, we've
intercepted those events and emulated CPU caches that are pre-run-time
configurable.
Scope and Basic Organization

While different microarchitectures often have different approaches at the very
low level, the core concepts of caching are universal. As QEMU is not a
microarchitectural emulator we model an ideal caching system with a few simple
parameters. By doing so, we can adequately emulate the behaviour of a caching
system without diverging from real-hardware behaviour.
The scope of the plugin is to catch the trends that depend on the memory access
pattern, which is largely dependent on the executable code, without delving into
microarchitectural details that change from one microarchitecture to another.
Having that said, we limit ourselves to private L1 per-core instruction caches,
and private L1 per-core data caches, with the ability to optionally emulate
unified (data + instructions) per-core L2 caches.
To keep it simple, no inter-core interaction was taken into account, since such
considerations are largely implementation-dependent and will vary from one
microarchitecture to another.
Configurability

Emulated caches are configurable in terms of the following parameters

Overall cache size
Block(line) size
Set-associativity

A single cache eviction policy can also be specified as a plugin argument. This
policy can be one of the following

Least Recently Used (LRU)
First-in first-evicted
Random eviction

For multi-threaded user-space programs, and full-system emulation that have
access to more than one core, we can specify the number of "cores" to take into
account and emulate caches for.
Multi-Core Cache Emulation

QEMU can emulate multi-threaded user-space applications, and it can provide more
than one CPU for a guest kernel in full-system emulation. These kinds of working
sets are supported through the following mechanisms.
Full-System Multi-Core Cache Emulation

TCG plugins have access to basic information about the system, such as the
number of core available for the guest. By default, This information is used to
construct a cache emulation system for each core available. (i.e. have L1
instruction cache and L1 data cache, and optionally L2 unified cache)
A subscription callback for a memory access event has access to the vCPU index
that initiated the access. This is used to identify the cache to access.
User-Space Multi-Threaded Cache Emulation

User-space emulation targets mirror the thread structure of the emulated
program, and it's bound by how many threads the host kernel will allow it to
create. This means that we cannot know how many threads will be created prior to
running.
To mitigate this, the plugin tracks a static number of cores (1 by default) that
can be configured as a plugin argument. If the number of threads is more than
the number of available cores, the threads may thrash each other.
This mirrors how kernels allow user-space applications to make as many threads
as they want, but eventually, those threads must be scheduled on an available
physical core and subsequently may thrash each other.
Future Work

All the goals defined by the project proposal were successfully met and merged
(either upstreamed or in the maintainer's tree, since at the time of GSoC 2021
ending, the QEMU project is on a release cycle).  However, there could be some
convenient features to aid input and output.
The plugin has various parameters and usually the great majority of them are
used. This makes the invocation command cluttery. Hence, the plugin could make
good use of parsing a configuration file and get its parameters from it.
Also, it would be nice if we could support outputting the data in a standard
format like YAML or JSON, as plugins outputs could be fed into another program
for post-processing.
Acknowledgements

I'd like to show my sincere gratitude to Alex Bennée (stsquad on IRC) for
mentoring me, patiently reviewing my patches, and answering my questions.
I'd also like to thank the QEMU community for helping me on various occasions
during my GSoC participation.
Proposed Changes

Only changes accepted or on-going are listed in this section.
Patches Related to the Cache TCG Plugin

[PATCH v4 0/5] plugins: New TCG plugin for cache modelling

[PATCH v4 1/5] plugins: Added a new cache modelling plugin
[PATCH v4 2/5] plugins/cache: Enable cache parameterization
[PATCH v4 3/5] plugins/cache: Added FIFO and LRU eviction policies
[PATCH v5] docs/devel: Added cache plugin to the plugins docs
[PATCH v5] MAINTAINERS: Added myself as a reviewer for TCG Plugins

Misc. Updates and bug fixes

[PATCH 1/6] plugins/cache: Fixed a bug with destroying FIFO metadata
[PATCH 2/6] plugins/cache: limited the scope of a mutex lock
[PATCH 6/6] plugins/cache: Fixed "function decl. is not a prototype" warnings

[PATCH v5 0/2] plugins/cache: multicore cache modelling

[PATCH v5 1/2] plugins/cache: supported multicore cache modelling
[PATCH v5 2/2] docs/devel/tcg-plugins: added cores arg to cache plugin

[PATCH v4 00/13] new plugin argument passing scheme

[PATCH v4 08/13] docs/tcg-plugins: new passing parameters scheme for cache docs

[PATCH 0/5] plugins/cache: L2 cache modelling and a minor leak fix

[PATCH 1/5] plugins/cache: freed heap-allocated mutexes
[PATCH 2/5] plugins/cache: implement unified L2 cache emulation
[PATCH 3/5] plugins/cache: split command line arguments into name and value
[PATCH 4/5] plugins/cache: make L2 emulation optional through args
[PATCH 5/5] docs/tcg-plugins: add L2 arguments to cache docs

Patches Related to the QEMU CLI

[PATCH v4 00/13] new plugin argument passing scheme

[PATCH v4 01/13] plugins: allow plugin arguments to be passed directly
[PATCH v4 02/13] plugins/api: added a boolean parsing plugin api
[PATCH v4 13/13] docs/deprecated: deprecate passing plugin args through arg=

Patches Related to Other Plugins

[PATCH v4] plugins/syscall: Added a table-like summary output

[PATCH] plugins/execlog: removed unintended "s" at the end of log lines.
[PATCH v4 00/13] new plugin argument passing scheme

[PATCH v4 03/13] plugins/hotpages: introduce sortby arg and parsed bool args correctly
[PATCH v4 04/13] plugins/hotblocks: Added correct boolean argument parsing
[PATCH v4 05/13] plugins/lockstep: make socket path not positional & parse bool arg
[PATCH v4 06/13] plugins/hwprofile: adapt to the new plugin arguments scheme
[PATCH v4 07/13] plugins/howvec: adapting to the new argument passing scheme
[PATCH v4 09/13] tests/plugins/bb: adapt to the new arg passing scheme
[PATCH v4 10/13] tests/plugins/insn: made arg inline not positional and parse it as bool
[PATCH v4 11/13] tests/plugins/mem: introduce "track" arg and make args not positional
[PATCH v4 12/13] tests/plugins/syscalls: adhere to new arg-passing scheme

Patches Related to QEMU-Web

[PATCH v3] blog: add a post for the new TCG cache modelling plugin
Getting The Code

At the time of this post, single-core cache emulation is merged to QEMU
upstream. A fresh clone of the QEMU source will have that code in. Multi-core
emulation is accepted by the maintainer, but since the QEMU project is on a
revision cycle, it's not getting any major updates, so the code will have to
wait after the project stabilization in order to get merged. In order to get
that, you may manually apply the relevant patches, or fetch the
plugins/next tree. The
plugins/next tree also contains the new plugin argument passing scheme. The L2
part is waiting to be reviewed and is still experimental. It must be manually
applied through the appropriate patches stated above.