jonludlam/gist:a9ebf00a024da7b89f08

## gistfile1.txt
The purpose of this part of the CAR is to improve the reliability and
robustness of the XenVM component of the Thin LVHD feature. This will
be achieved by three main activities: Expanding the existing
dev-tests, implementing some new features and applying some formal
methods to prove models of how bits xenvmd works are correct.

Expanding dev tests:

There are already many dev tests being run on every single build and
pull request going into xenvm. Currently, these mainly cover the
functionality of xenvmd and xenvm, and only partly cover the
activities of the local allocator. In order to increase the coverage
we propose to do the following:

 - Extend the mock device-mapper component such that it can be used
   between processes.

   Using the real device mapper is limiting, in that udev becomes
   involved and is a source of delays. Additionally, using the one
   system device-mapper means we can't do multi-host testing: By
   extending the mock to work between processes, we can simulate a
   pool of as many hosts as we like using only 1 real host. This is a
   small amount of work to change the mock to use 'read-modify-write'
   with filesystem locks on each call rather than keeping state in
   memory as it currently does.

 - Functorize high-level logic over the lower-level modules.

   This is a neat trick that we already use today, but can be
   extended.  The idea is to make mock modules that simulate parts of
   the code.  As a concrete example, we can functorize over the
   'shared-block-ring' code in order to use a more convenient on-disk
   layout of the messages, so that each message sent over the ring
   becomes a file on disk that is easily examined. This would be
   particularly useful in testing invariants over the set of all
   messages sent over the ring, as in the 'real' shared-block-ring the
   messages get overwritten.  Another example is functorizing over the
   'Time' module, so that the current 5 second poll interval can be
   changed to be much shorter or longer or randomized in order to
   cause the tests to run more quickly or to explore differences in
   thread interleaving.

 - Implement new component tests:

   Once the above two are completed, the new functionality will be used
   to enable the following tests:

    - Stress test

      This would be a multi-host, many LV test that would quickly
      simulate thousands of VDIs across 16 hosts, with lots of
      allocations.

    - Restart tests

      This would test restarts of xenvmd and the local allocator. Both
      are built to be 'Crash-only software' (not as bad as it sounds,
      honest!) https://en.m.wikipedia.org/wiki/Crash-only_software. The
      idea is to put fist points in to cause an exit at particularly
      important points and to verify that operations either succeed
      or fail, and not leave the system in an intermediate state.

    - Invariant post-processing

      Having written out the journals and rings into easily readable
      files, they can be post-processed to ensure that the invariants
      required are indeed being held. These are statements such as
      'For all FreeAllocation messages sent from xenvmd to all
      local allocators, the blocks allocated must be unique unless
      the messages have the same generation count'
Targetted Formal Methods

We already have a Promela model of the shared-block-ring
suspend/resume protocol, and a model of a previous (broken) version of
the xenvmd -> local allocator messages. We would like to spend some
time examining some of the more critical aspects of system to try to
find any other lurking issues. This is to be done alongside the
functorization work outlined above in order to simplify the logic
in xenvmd/local allocator such that it is more obviously peforming
the same logic as the models are testing.

As a group and team, we have limited exposure to these methods so it's
hard to predict how long this will take and what the benefit will be.
However, I believe knowledge of these methods will be very beneficial
not only to Thin LVHD but to XenServer Engineering as a whole. I
suggest our approach should be to time-box the CP ticket to do this
aspect of the work.


New feature implementation

- Watchdog. Xapi has a watchdog that makes sure that it's running,
  and restarts it if it crashes. Since both xenvmd and the local
  allocator were designed to be crash-only, a watchdog is almost
  trivial to do.

- PV resize. I'm unsure whether this should go in this CAR or whether
  it should be just done under the CA ticket. We need to implement PV
  resize in the underlying LVM library - mirage-block-volume
	The purpose of this part of the CAR is to improve the reliability and
	robustness of the XenVM component of the Thin LVHD feature. This will
	be achieved by three main activities: Expanding the existing
	dev-tests, implementing some new features and applying some formal
	methods to prove models of how bits xenvmd works are correct.

	Expanding dev tests:

	There are already many dev tests being run on every single build and
	pull request going into xenvm. Currently, these mainly cover the
	functionality of xenvmd and xenvm, and only partly cover the
	activities of the local allocator. In order to increase the coverage
	we propose to do the following:

	- Extend the mock device-mapper component such that it can be used
	between processes.

	Using the real device mapper is limiting, in that udev becomes
	involved and is a source of delays. Additionally, using the one
	system device-mapper means we can't do multi-host testing: By
	extending the mock to work between processes, we can simulate a
	pool of as many hosts as we like using only 1 real host. This is a
	small amount of work to change the mock to use 'read-modify-write'
	with filesystem locks on each call rather than keeping state in
	memory as it currently does.

	- Functorize high-level logic over the lower-level modules.

	This is a neat trick that we already use today, but can be
	extended. The idea is to make mock modules that simulate parts of
	the code. As a concrete example, we can functorize over the
	'shared-block-ring' code in order to use a more convenient on-disk
	layout of the messages, so that each message sent over the ring
	becomes a file on disk that is easily examined. This would be
	particularly useful in testing invariants over the set of all
	messages sent over the ring, as in the 'real' shared-block-ring the
	messages get overwritten. Another example is functorizing over the
	'Time' module, so that the current 5 second poll interval can be
	changed to be much shorter or longer or randomized in order to
	cause the tests to run more quickly or to explore differences in
	thread interleaving.

	- Implement new component tests:

	Once the above two are completed, the new functionality will be used
	to enable the following tests:

	- Stress test

	This would be a multi-host, many LV test that would quickly
	simulate thousands of VDIs across 16 hosts, with lots of
	allocations.

	- Restart tests

	This would test restarts of xenvmd and the local allocator. Both
	are built to be 'Crash-only software' (not as bad as it sounds,
	honest!) https://en.m.wikipedia.org/wiki/Crash-only_software. The
	idea is to put fist points in to cause an exit at particularly
	important points and to verify that operations either succeed
	or fail, and not leave the system in an intermediate state.

	- Invariant post-processing

	Having written out the journals and rings into easily readable
	files, they can be post-processed to ensure that the invariants
	required are indeed being held. These are statements such as
	'For all FreeAllocation messages sent from xenvmd to all
	local allocators, the blocks allocated must be unique unless
	the messages have the same generation count'
	Targetted Formal Methods

	We already have a Promela model of the shared-block-ring
	suspend/resume protocol, and a model of a previous (broken) version of
	the xenvmd -> local allocator messages. We would like to spend some
	time examining some of the more critical aspects of system to try to
	find any other lurking issues. This is to be done alongside the
	functorization work outlined above in order to simplify the logic
	in xenvmd/local allocator such that it is more obviously peforming
	the same logic as the models are testing.

	As a group and team, we have limited exposure to these methods so it's
	hard to predict how long this will take and what the benefit will be.
	However, I believe knowledge of these methods will be very beneficial
	not only to Thin LVHD but to XenServer Engineering as a whole. I
	suggest our approach should be to time-box the CP ticket to do this
	aspect of the work.


	New feature implementation

	- Watchdog. Xapi has a watchdog that makes sure that it's running,
	and restarts it if it crashes. Since both xenvmd and the local
	allocator were designed to be crash-only, a watchdog is almost
	trivial to do.

	- PV resize. I'm unsure whether this should go in this CAR or whether
	it should be just done under the CA ticket. We need to implement PV
	resize in the underlying LVM library - mirage-block-volume