Skip to content

Instantly share code, notes, and snippets.

@ShyamsundarR
Last active March 18, 2020 00:11
Show Gist options
  • Save ShyamsundarR/6650152aa20ccac952bf1de9dd661e00 to your computer and use it in GitHub Desktop.
Save ShyamsundarR/6650152aa20ccac952bf1de9dd661e00 to your computer and use it in GitHub Desktop.
RBD-CSI Snapshots/Clones and Futures

RBD-CSI Snapshots and clone: Design and uses

Terminology

  • CSI-Volume: A volume created by the CreateVolume CSI call and can be used in any related CSI call
  • RBD-Image: An image created on the RBD pool that is related to a CSI-Volume, CSI-Snapshot, or CSI-Clone
  • CSI-Snapshot: A snapshot created by the CreateSnapshot CSI call and can be used in any related CSI call
  • CSI-Clone: A CSI-Volume created by the CreateVolume CSI call, providing a CSI-Snapshot or a CSI-Volume as the DataSource
  • DataSource: Source of data for a CSI-Volume CreateVolume request
  • CO: Container orchestrator, say kubernetes

RBD-CSI snapshot and clone design (in a nutshell)

NOTE: Design and implementation is detailed in the following PRs, and is represented here for completeness

  1. CSI-Snapshots:
    • CSI-Snapshot is a snapshot of a CSI-Volume
    • CSI-Snapshot translates to a clone of an RBD-Image
      • Snapshot the RBD-Image, clone the snapshot, delete the snapshot
    • Why:
      • Keeps CSI-Snapshots independent of CSI-Volume
        • CSI-Volume and its backing RBD-Image can be deleted even if there are CSI-Snapshots for the same
  2. CSI-Clone:
    • Really is just a CSI-Volume created from a CSI-Snapshot or a CSI-Volume in turn (referred to as the DataSource)
    1. CSI-clone from CSI-Snapshot as the DataSource
      • CSI-Clone from CSI-Snapshot translates to a clone of an RBD-Image representing the CSI-Snapshot
        • Snapshot the RBD-Image, clone the snapshot, delete the snapshot
    2. CSI-Clone from CSI-Volume as the DataSource
      • CSI-Clone from CSI-Volume translates to a 2 step cloning of an RBD-Image representing the CSI-Volume
        • Snapshot the RBD-Image, clone the snapshot (intermediate clone), delete the snapshot
        • Snapshot the intermediate clone, clone the snapshot (final clone), delete the snapshot
      • Why:
        • This enables retaining an intermediate clone for any flattening needs
          • Flatten at clone depth limits require an image not in use and the intermediate clone satisfies that requirement
  3. RBD flattening requirements:
    • Why/When TBD
    • Caveats: when multiple CSI-Snapshot or CSI-Clone objects are created from a CSI-Volume or a CSI-Snapshot which is at flatten depth TBD

The other advantage gained by the above design is, the implementation is a function of snap-clone-snap operations against an RBD-Image for any of the above operations. Hence, keeping the implementation consistent across CSI requests.

CSI Snapshot current and possible future uses and their potential workflows

  1. CSI-Snapshot logical restore
    • What:
      • User/admin requires to restore a CSI-Volume to a CSI-Snapshot, due to user or other data loss/corruption
    • Current workflow:
      • Delete CSI-Volume
      • Create the same CSI-Volume using, CSI-Clone from CSI-Snapshot
    • Possible future:
      • CSI-Rollback operations on a CSI-Volume with a provided DataSource
        • NOTE: This is not a supported or a designed API in CSI yet
      • Can internally be a reassociation of an RBD-Image representing the DataSource as the CSI-Volume
        • Subsequently deleting the older RBD-Image
  2. Snapshot backup and restore
    1. What:
      • Scheduled (or user/admin driven) snapshots, leveraged for creating consistent off-cluster backups
      • Snapshots may be backed up entirely, IOW backing up the entire image representation at the time of snapshot
        • This is currently at the logical level, where the storage level blocks are not backed up but the logical contents of the file system on it (say tar)
        • Makes restores easy across different storage backends when needed
      • It could be at the block layer, where allocated storage blocks are backed up
        • Snapshots may be backed up in its entirety, OR
        • Snapshots may be backed up incrementally (or differentially), to store base image and increments for use at time of restore
        • Each has it's own costs and benefits, more bandwidth/time during backups when storing entire copies versus time to recovery when restoring incremental copies
    2. Current workflow for backup would be,
      • Create CSI-Clone from CSI-Snapshot
      • Stage/Publish (mount) the CSI-Clone, and read out the data to backup
      • Mostly will become filesystem level logical backups, rather than block level backups
    3. Potential workflows for backup in the future may require/support snap-delta based data stream
      • Differential snapshot streams [1]
        • Requires computation of a snapshot delta stream between a snapshot and its base image
        • Hence needs the ability to generate a snap-diff/diff between 2 RBD-Images where one is a CSI-Snapshot of a CSI-Volume
      • Incremental snapshot streams [2]
        • Requires computation of a snapshot delta stream between a snapshot and a prior backup target
        • Prior backup target could be the base RBD-Image or a CSI-Snapshot of the same
        • Hence needs the ability to generate a snap-diff/diff between 2 RBD-Images that are CSI-Snapshots of the same CSI-Volume, or like the differential case needs a diff between a CSI-Snapshot and its base CSI-Image
      • Bottom line, if we can generate the above deltas using 2 RBD-Images in the future, we should be able to support this scheme when needed
    4. Workflow for restores
      • Restoring logical backups
        • This would typically involve creating a CSI-Volume and restoring from the backups
          • This works for filesystem level logical backups
      • If/When block based backups or snapshot delta streams based backups need restoration, it would need some mechanisms from the storage provider to provide CSI the ability to write the data streams to recreate the volume (akin to providing a mechanism to read such data streams for backup purposes)

RBD mirroring for DR and CSI-Snapshot/Clone

  1. Snapshots on the primary site needs to be reflected in the DR site
    • RBD-Image created for the CSI-Snapshot, will also be mirrored (or gain the mirror attributes implicitly/explicitly when created)
    • This ensures that the DR site also has the snapshots
    • Further, RBD is already optimized for fast-diff based mirroring when images have a (non-flattened) clone, which is a useful optimization
    • Caveat would be in the CO system, where the respective CSI-Snapshot object needs to be created and present
      • No different that the CSI-Volume object that needs to be (re)created at the DR site
  2. Restores:
    • Logical restores would again be mirrored, so when data is restored back to a new CSI-Volume, the mirror will also catch up
    • Restore based on a CSI-Snapshot as the DataSource
      • This assumes CSI-Volume is deleted and then recreated from a CSI-Snapshot
      • All data required to reflect the same operations on the DR site is present, how it is orchestrated may need more thought
    • Restore based on CSI-Rollback
      • If the mechanism is to associate the CSI-Volume to a CSI-Snapshot image, the same may need to be orchestrated on the DR site
      • Again, all required data is present on the DR site, orchestration needs more thinking

Future CSI and DR/Backup directions

  1. Freeze/Unfreeze of CSI-Volumes prior to taking snapshots
    • This is outside the purview of CSI as such, and orchestrated by the CO
    • As long as in the future, we can support the required Freeze/Unfreeze operations we should be fine in the primary site
    • DR site is inactive, so there is no reason to reflect the operations of Freeze/Unfreeze on the DR site, the created snapshot will be mirrored anyway
  2. Consistency groups of volumes, to enable coordinated snapshots across a set of volumes
    • Again this is outside the purview of CSI, and orchestrated by the CO
    • Nothing for CSI or RBD to provide in addition to the above, for both primary and DR sites

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment