Skip to content

Instantly share code, notes, and snippets.

@z0marlin
Last active August 23, 2021 15:19
Show Gist options
  • Save z0marlin/84c5c53d82d2972c90dd8fe50ebc6f99 to your computer and use it in GitHub Desktop.
Save z0marlin/84c5c53d82d2972c90dd8fe50ebc6f99 to your computer and use it in GitHub Desktop.
GSoC 21 work product for final evaluation

GSOC 21: Work Product Submission

Org: CNCF

Project: OpenEBS: Update mount-points and capacity of Block Devices without restarting NDM

Proposal: https://bit.ly/3D1qitD

Introduction

GSoC 2021 has come to an end and it has been a fantastic experience for me. I would firstly like to thank Akhil Mohan for being an amazing mentor. I would also like to thank the OpenEBS and the CNCF community for being extremely helpful and welcoming, and finally the GSoC organisers for making this program possible.

During GSoC, I worked on the Node Disk Manager (NDM) project in the OpenEBS ecosystem. In a typical OpenEBS setup, NDM is a daemonset that runs on every node in the Kubernetes cluster and discovers and monitors various storage devices connected to the node. It exports these devices as BlockDevice (BD) custom resources on the Kubernetes cluster, which are then used by other OpenEBS stack components. My task was to add a new feature in NDM to enable online detection of changes to certain properties of block devices. These properties are - size, filesystem, and the mount points of the block device. The corresponding Github issue for this task can be found here.

Work Done

Overview

Detect changes to block device mount-points and filesystem

The scope of this task was to enable NDM to detect changes to the mount-points and the filesystem associated with a block device. This was achieved by using procfs provided by the Linux kernel. The Linux kernel exposes all the mounts on the system using the file /proc/1/mounts. To detect changes to the mount-points or the fs on a block device, changes to the mounts files are recognized using the IO events sent by the kernel through the epoll API. A new handler is also introduced to propagate changes detected to etcd through the Kubernetes API server.

In the initial design, change detection only kept track of whether the file /proc/1/mounts has changed without knowing what changed in the file. Due to some issues that were discovered while testing this implementation, it became necessary to also find out exactly what changed in the file. For the same, I wrote a package libmount which is a partial go port of the C libmount library that provides an API to parse mounts file and perform various operations on it. This allows NDM to keep the latest known version of the mounts file in memory which can be used to compare against a newer version of the file once a change is detected. A detailed design of the mount-point and fs change feature can be found in this design doc.

Implement size change detection

The second part of the project was to add size change detection to NDM. Unlike the first phase, there were two possible designs for this feature. In the first design, the existing integration with the kernel udev events is used to get changes happening to the block device. On receiving a change event from udev, the probe system is used to fetch the size of the block device and an update is sent to etcd using the change handler implemented in phase 1. It is worth noting that the update request is sent only if the size has actually changed. The alternate design for this feature replaces change detection using udev events, with reading the size files provided in sysfs at regular intervals to check if they have changed for all the detected block devices.

Although at first glance, an event based design seems better than polling, both the designs have pros and cons that made it difficult to choose between the two just by looking at them theoretically. Hence, I implemented both the designs and profiled them to pick the more suitable design. The tests were run on a Kubernetes cluster on GCP. The results of the tests showed that both the implementations didn’t differ much in performance. After discussing with my mentor, I chose the udev based implementation since it would fit well with the existing integration. Both, the udev and sysfs implementations used for profiling can be found in the following PRs:

  1. Size change detection using udev events
  2. Size change detection using sysfs size file polling

The design doc can be found in this PR.

List of changes

  1. Implement mount-points and fs change detection using epoll
  2. Add change handler for propagating changes to etcd
  3. Integration tests for fs and mount-point change detection
  4. Update controller to allow selectively running ndm probes
  5. Decouple udevevent package from controller
  6. Implement size change detection using udev events
  7. Implement size change detection by sysfs size file polling
  8. Profile size change detection implementations
  9. Integration tests for size change detection

Pull requests

  1. docs: add mount change detection design (merged)
  2. feat(probe): add mount change detection to mount probe (merged)
  3. feat(controller): selectively run probes using allowlist in event msg (merged)
  4. fix(probe): update bd cache after generating uuid (merged)
  5. refactor(pkg, udevevent): decouple udevevent from controller (merged)
  6. docs: add design docs for device size change detection (in review)
  7. feat(probe): add size change detection (in review)

Future Work

Most projects have some room for future improvements and this project is no different. There are quite a number of optimisations that can be made. One such optimisation is to use the parsed mounts file stored in memory in mountprobe to fill the new mount-points and fs for the affected block devices. This can replace rescanning of the mounts file in mountprobe.FillBlockDeviceDetails which is currently an additional overhead that can be avoided. Another improvement can be made by applying filters on the events received from udev, and the difference generated between new and old mounts files when epoll sends an event. These filters can be based on basic properties, such as the dev path of the device, which can be determined without the probes. The filters can be used to only propogate the events related to devices that NDM is concerned about, downstream for further processing. This can be integrated with the existing device filtering framework in NDM. There is also a need for better error handling for some of the edge cases that might occur.

Moreover, the work done in this project can be used as a starting point for building a framework within NDM to detect changes to the other properties of a block device and propagate them to etcd. It can be in the form of a pluggable system wherein the logic for detecting changes to certain properties can be plugged in and change events are generated which are processed by the change handler. There is also a need to specify what properties have changed and what properties are expected to change. This information can be used to prevent unnecessary api calls to the Kubernetes api server (a very primitive example can be found here).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment