Skip to content

Instantly share code, notes, and snippets.

@mgerdts
Created April 13, 2020 16:14
Show Gist options
  • Save mgerdts/6d574c4909ca8068e1e7b8d215cd3144 to your computer and use it in GitHub Desktop.
Save mgerdts/6d574c4909ca8068e1e7b8d215cd3144 to your computer and use it in GitHub Desktop.

This describes Mike Gerdts' suggestion for how to complete the work on OS-6632. It is an expansion of this comment.

Overview

The OS-6632 branch has a prototype fix that demonstrates that it is possible for the guest to recognize disk size changes when a zvol changes size. To detect the size change, the bhyve process has an mevent that fstat()s each virtio-blk device every 5 seconds. For an event that is unlikely to ever happen in the life of a particular VM, this is rather extreme.

Problem statement

A better approach for alerting the bhyve process of device size changes is needed. The first priority should be development of interfaces that are portable between SmartOS and FreeBSD. Secondarily, the experience on SmartOS may be optimized.

Currently there is no way for a process in a zone to be automatically made aware of a device size change. How a device size change happens will be dependent on at least the following:

  • Operating System
  • Backing store type (disk, zvol, file, lofi, qcow, etc.)
  • Priviliges/capabilities of the bhyve process (e.g. bhyve can't listen for sysevents in a zone)

What's missing?

A generic mechanism is needed so that arbitrary user-space utilities may alert the bhyve process that it needs to perform a size check. The most obvious place for this to happen is as an extension to bhyvectl and the related ioctl interface.

Existing vmm ioctl calls either operate on state that exists within the vmm module or is associated with vcpu state stored in the bhyve process. When vmm needs vcpu state from the bhyve process, it injects a vmexit into the appropriate vcpu. What is needed here is different - there's no need to interrupt a vcpu thread to notify the bhyve process of a disk change.

I propose the introduction of an event delivery mechanism that serves the needs of disk resizes and can be readily extended to other needs.

The Solution

The soution includes:

  • A generic event delivery mechanism that allows vmm to communicate events to bhyve.
  • A disk resize event type.
  • An enhancement to bhyvectl to say check the size of device X.
  • A SmartOS specific enhancement that allows automatic size change detection for some backing stores.

Event delivery from vmm to bhyve

A new ioctl, VM_GET_EVENT will be added. It is only valid on minors associated with a particular VM, not on VMM_CTL_MINOR.

typedef enum vm_event_type_t {
	VM_EVENT_FOO,
	/* Insert others here */
	VM_EVENT_LAST
} vm_event_type_t;

typedef struct vm_event {
	size_t		vme_size;
	vm_event_type_t	vme_type;
	// XXX Maybe add a timestamp for debugging
} vm_event_t;

void vmm_event_add(void *event);

The vmm module will maintain a ring buffer containing event pointers. An event is added to the ring buffer with:

	vm_event_foo_t *ev;
	ev = kmem_zalloc(sizeof (*ev), KM_SLEEP);
	ev->vmef_event.vme_size = sizeof (*ev);
	ev->vmef_event.vme_type = VM_EVENT_FOO;
	ev->vmef_val = 42;

	vmm_event_add(ev);

The bhyve process will have a thread that does something like the following:

void *
vmm_event_thread(void *fdp)
{
	int fd = *fdp;
	uchar_t data[XXX_LARGEST_EVENT_POSSIBLE];
	vm_event_t *event = (vm_event_t *)data;
	int err;

	while (!exiting) {
		err = ioctl(fd, VM_GET_EVENT, data, sizeof (data));
		if (err != 0) {
			// Handle error
			continue;
		}
		switch (event->vme_type) {
		case VM_EVENT_FOO:
			handle_event_foo(event);
			break;
		default:
			// Handle error
		}
	}
}

The ioctl will block until an event is available, the bhyve process is exiting, or the vmm instance is being torn down.

Within the kernel, vmm_handle_get_event() (a new function) will watch the event ring buffer for new entries. At most one event will be returned with each ioctl call.

Disk resize event

A disk resize event adds VM_EVENT_DISKRESIZE to vm_event_type_t.

typedef enum vm_event_type_t {
	VM_EVENT_DISKRESIZE,
	/* Insert others here */
	VM_EVENT_LAST
} vm_event_type_t;

typedef struct vm_event_disksize {
	vm_event_t	vmed_event;
	/*
	 * XXX TBD - something to uniquely identify the backing store.  The FD?
	 */
} vm_event_disksize_t;

Now, when vmm becomes aware of a disk resize it will queue a event with vm_event_disksize_t.

bhyvectl disk resize notification

bhyvectl will be enhanced to support:

bhyvectl --vm=<vm> --notify-disk-resize=<path-to-backing-store>

With that command, bhyvectl will open the appropriate minor device and call:

	// XXX not sure if passing the path is the best approach here.
	ioctl(fd, VM_NOTIFY_DISKRESIZE, path, strlen(path));

As described in the previous section, now vmm_handle_notify_diskresize() will queue the appropriate event.

SmartOS automatic detection for device changes

When the backing store is a device, it may be feasible to add automatic size change detection. This would involve the following changes:

  • spec_size_invalidate() would issue a sysevent saying that the device size has been invalidated.
  • vmm would have an in-kernel sysevent listener that would listen for the events emitted by spec_size_invalidate(). When a relevant sysevent is received, a vmm event is generated.

If the in-kernel sysevent listener is not feasible, it would be quite straight-forward to enhance vminfod to listen for the sysevent and invoke bhyvectl when relevant sysevents are received.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment