Skip to content

Instantly share code, notes, and snippets.

@rektide
Created March 31, 2014 17:36
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rektide/9897727 to your computer and use it in GitHub Desktop.
Save rektide/9897727 to your computer and use it in GitHub Desktop.
Checkpoint Restore

Plenty of options for saving/restoring state-

For programs, there's criu and linuxpmi. CRIU has been very active getting changes pushed into the Linux kernel: there are plenty of peoople in this AskCS talking about device driver states that need to get saved, disk images that need to be put back in "just so" state, file handles and sockets taht need to be open for everything to happen. (Comparatively emulators have relatively self-contained devices, with saved game storage that is written to very infrequently, and generally fairly simple device drivers, but crucially device drivers that have already been well deseribed by software (in fact, the device drivers are software in the case of emulators.))

CRIU has been taking on the task of getting Linux to be able to describe the state of itself, of the processes and device it runs accurately, and providing hooks so that userland can directly instruct Linux to become a certain, very precise state (for example: for setting up an already-in-use TCP socket, for restoring exact video modes and buffers, &c). This work was pioneered by OpenVZ, but the patches OpenVZ were building have never been favored for upstream consideration, and CRIU is really about looping back and making a proper job of getting Linux able to record/report & have it's state set.

For VM's, KVM and Xen both allow one to save an entire OS. Similar challenges, but like an game console emulator, the hypervisor typically has a very advanced view of the system resources and has a lot of flexibility to bring up the virtualized peripherals in a just so state, has knowledge of how the virtualized peripherals are when a checkpoint is done.

Often this capability is used for live migrating processes or vm's between machines. Live Migration of Virtual Machines `05 is somewhat of a classic on the subject, discussing live migration of a Quake 3 server in under 60 ms. There's also ongoing work in continuous state streaming, being pursued in Qemu by an IBM'er, called MicroCheckpointing, designed for continuous replication of an app (the High Availability holy grail). MicroCheckpointing in particular would be a promising approach for making an ongoing "undo" if, instead of replaying the state deltas, one persisted them: one could arbitrarily fast forward or rewind the program or machine.

You're expressing interest derived from an existing application that exemplifies this kind of work, and that's fine, but I do want to say that your phrasing makes me a little cautious, makes me a little concerned, in that you seem to have a mechanism conceived of at an applicatin level, whereas the works I'm citing merely provide capabilities one might build aplications on top of. All well and good, but some additional work might be required to make ongoing, regular use of the checkpoint/restore/snapshotting capabilities to fit the application you have in mind. But yeah, one can "save" state, by having something one ring "in" the runtime hierarchy, watching, doing crazy magic with pagetables, typically implementing copy-on-write memory for the running program while doing something with the other copy.

Checkpoint/snapshot operations on state are closely akin to something that has to be done when the fork system call is hit: althought a program performs fork and checkpointing/snapshotting is typically done by inner ringed managers (os, hypervisor), both signal the point where the inner-rings need to take an outter layer application and turn it into two identical sets. What's done with the designated "snapshot" set is largely a matter of preference: send it to disk, send it to network.

On the other hand, MicroCheckpointing must continuously be recording the operations done to state: it's goal is to track the diffs across time, not just get another working copy.

http://www.reddit.com/r/compsci/comments/21jhlv/is_there_a_way_to_make_a_pc_save_state_like_a/cggknej

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment