Instantly share code, notes, and snippets.

Embed
What would you like to do?

Deterministic save load in Factorio

One of the key parts of the save/load process in Factorio is that it must be deterministic. This means that for a given save file (when no external factors change) saving, exiting, and loading the save shouldn't change any observable behavior.

There are a few reasons and benefits for this strict requirement:

  • Without it: You couldn't join a running multiplayer game (and by proxy save, exit, and resume one)
  • Without it: the replay system wouldn't work if you ever saved, exited, and resumed playing.
  • With it: we can easily test that saving and loading produces no observable change letting us know we implemented save/load correctly.
  • With it: you won't see things change randomly as a result of "reloading" like you do in so many other games.

The premise sounds simple enough: make 'running game -> save -> exit -> load -> running game' produce the same results every time (when nothing else changes). However in practice this ends up being quite complex.

Factorio uses serialization and de-serialization (see: https://en.wikipedia.org/wiki/Serialization) the exact format of the serialized save file changes from map version to map version. The point of this is not to document the format but document the process and the reasons why the process is what it is.

The save process from begin to end: This process must be repeated in the same order during loading.

  • Force a full garbage collection sweep of every Lua state

    • Lua supports incremental garbage collection which allows spreading the garbage collection CPU load over many ticks avoiding stalls when a large number of things need to be collected. However, when a "LuaObject" (a Factorio class linked to the Lua state) is collected it signals to the C++ logic that the backing class can be freed. When it's freed we no-longer need to include it in the save file. So, we need to ensure that any lingering garbage is collected by every Lua state to make sure we free every "LuaObject" that isn't actually meant to be in the save.
  • Create a ScopedCallback class: when this class is destroyed it calls "postSaveHook()" on the map.

    • This goes over the map data and clears temporary data created by the save process. By making this we ensure that it's always run regardless of the save process finishing successfully or failing. We don't want to leave the running map data in a corrupt state in the event of a save failure.
  • Save the current ID mapping (See 'Stable prototype IDs' https://www.factorio.com/blog/post/fff-259)

    • The ID mapping is required to load any data which uses IDs. It needs to come before anything that uses IDs is loaded or it will cause the result map to fail to load should the IDs change for any reason (mods changing, items being added/removed).
  • Save the applied prototype migrations

    • The prototype migrations already applied need to be known at load time to know if any new migrations should be applied.
  • Save everything else that needs to be saved

  • Save SaveHelpers

Notable things:

  • Entities are saved in the reverse order they're listed so when they're loaded and put back into the world the order is preserved.

Common save related questions:

  • What is a Targeter?

    • A Targeter is a special class that serves 2 purposes:
      1. It's a disappear-aware pointer to some "Targetable" thing. When the thing it points at is deleted it is cleared (set to nullptr).
      2. It provides a way to save a pointer in the save file via the TargetSerialiser class.
  • What is the TargetSerialiser?

    • The TargetSerialiser records which Targetable classes have been included in the save file by assigning them a (at-save-time) unique ID. When a Targeter is saved it checks with the TargetSerialiser to see what the ID is of the thing its pointing at and records that in the save file. At load time the IDs are matched back up and the Targeter pointer is restored to point at the corresponding Targetable class.
  • What is the SaveHelper system?

    • The SaveHelper system provides a way to tell the saving logic "I want to perform some save-related logic after everything else has finished saving". You store an instance of a SaveHelper in the system and when everything else has finished saving the save logic goes over each helper in the SaveHelper system and calls "save".
  • How does a SaveHelper work (when is data actually saved)?

    • Creating a SaveHelper doesn't have any immediate side effect. By putting it into the SaveHelper system it queues it to have "save" called after everything else is saved.
  • How do you save a pointer to something if you don't have a Targeter to it?

    • You need a Targeter to some Targetable thing in order to save and load that pointer. Because a Targeter has some runtime overhead and may not be needed runtime not everything wants or needs to use them. A temporary Targeter can be created by using the SaveHelper system in this case. Once the save process is finished the temporary Targeters are cleared.
  • How do you create a temporary targeter for saving using a SaveHelper?

    • Depending on what exactly needs to be saved (a single pointer, multiple, some container of pointers) the answer varies. The basic logic is: make some class which inherits from SaveHelper that stores the Targeters you need and override save(...) to save those Targeters.
  • Why would a temporary-Targeter-for-saving need to exist until the save process has finished?

    • Things which are Targetable during saving can detect if anything is actually targeting it. If it detects that nothing is, it doesn't need to generate and save an ID for itself. This allows the save file format to be measurably smaller since every Entity in the game is Targetable but the majority never have anything targeting them.

Loading has far more things it needs to deal with compared to saving because it has to handle the following:

  • Convert binary data into the expected classes
  • Restore Targeter/Targetable pointers
  • Migrating migratable class types to other types (entities, items, fluids, ...)
  • Handle when some entity/item/fluid no-longer exists and should be removed
  • Map version changes (saved as one format, now a different format)
  • Handle if the input stream of data is invalid in some way and abort loading gracefully if possible

The load process from begin to end:

  • Check if the save file map version is different from the one its going to be loaded into.

    • Not every map version can be loaded by every version of Factorio. The game can't load a map version newer than what it knows about and can't load a version older than what it knows about. We keep a range of supported versions and with each major version drop one or more older versions support.
  • Check if the active mods have changed.

    • Active mods changing will mean special events need to happen after loading is finished.
  • Check if the startup mod-settings have changed

    • Startup mod settings changing will mean a special event needs to happen after loading is finished.
  • Record if any of the above have happened and set a flag on the loading class: "prototype data changed".

    • This is a signal to the rest of the loading logic for 2 reasons:
      • Something has changed such that the replay system will be disabled and save/load determinism doesn't need to be maintained for this specific load instance. It still need to load into a state such that if that state is taken, and the save/load process is run again (with nothing else changing) it is deterministic.
      • The loading logic could be expensive or could be intrusive and during standard "load save in the version it was created when nothing else has changed" we don't want to perform that logic.
    • An example of something done when this flag is set to true during loading:
      • Any entity with a module inventory goes over the inventory and checks to make sure that no non-module items are in the inventory. If any are found it clears them. Normally this would cause save/load instability if it happened every time the game was saved but because the "prototype data changed" flag was set to true it's a signal that things are already instable (for this instance of loading) and performing otherwise non-deterministic things is ok (for this instance of loading).
  • Load the ID mapping from the save file recording what everything in the save file was saved as, what things where changed, and what things where removed.

    • This needs to happen before anything that uses IDs is loaded to ensure they migrate correctly and to ensure that the loading process knows what the binary data is.
  • Load the applied migrations and apply any that haven't already been applied.

    • This tells the loading system that some mod wanted to migrate some ID to another ID and need to be done before anything with IDs is loaded.
  • Load the standard map data (entites/chunks/forces/trains)

  • Call load on all LoadHelpers after everything else is loaded.

    • This must mirror the SaveHelpers ordering such that:
      • During saving all "standard" data is saved, then SaveHelpers are saved.
      • During loading all "standard" data is loaded, then LoadHelpers are loaded.
    • The order that the SaveHelpers where created in must be mirrored in the order that LoadHelpers are created in.
  • Apply staged changes after everything has been loaded and after load helpers have been run.

    • Staged changes are "things that need to be deleted but can't be deleted during loading because other things may have pointers to them"
    • Things such as Trains, or Entities
  • Add any entities that failed to add themselves to their surface

    • During loading the bounding box of an entity can change if the prototype data for that entity was changed (mod changed, mod added/removed). When the bounding box changes it can change which chunks the entity will overlap with. During loading it's expected that the number of chunks doesn't change. When the bounding box of an entity changes it can overlap with a chunk which didn't exist during saving. When this happens it's detected during loading and put into this list of "entities to be inserted before setup". After loading finishes the number of chunks can be changed freely and the entities that failed to add themselves to their surface are told to "do it now, and create the chunk(s) if you need to".
  • Restore any Targeters so they point at the Targetable they should point at (if the Targetable still exists after loading)

  • Call setup on all LoadHelpers

    • If a temporary SaveHelper was used to save a pointer this step will call the LoadHelper created during loading to restore the temporary Targeters created for save/load purposes so they point at the corresponding Targetable class.
    • LoadHelpers can also perform other logic during "setup" if need be. This can be useful if some logic needs to happen after loading, but before entities are setup, but after Targeters are restored.
  • If prototype data was flagged as changed clear electric network data

    • When prototype data changes it can mean that entity bounding boxes have changed and what electric networks an entity was in could have changed. When this happens the game clears and re-sets up every entities electric network connection to make sure everything is in a correct state.
    • This needs to be cleared before entities are actually setup because during setup when prototype data was flagged as changed an entity will attempt to re-setup its electric network connection(s).
  • Go over every loaded entity and if the entity is a transport belt connectible call setup on it.

    • Transport belt connectible entities need to be setup before other entities so the merging logic can be finished by the time normal entity setup is called.
  • Go over every loaded entity and if the entity isn't a transport belt connectible call setup on it.

  • If any entity was created during loading that didn't exist when the map was saved call setup on it.

    • When an entity is migrated from one type to another (inserter migrated to a tree) the original entity is created and put into a list of "dummy entities" and the replacement new entity is created and stored in this list to be handled later.
  • Call "postLoadHook()" for the map

    • This goes over every surface in the game and every chunk on the surfaces and restores the active entities lists.
    • Normally this could be done with a LoadHelper however due to the previous migration step of "entities created during setup" this can't just "restore what was there when it was saved" since the final list may contain more entities then where saved. Additionally when an entity changes from being updatable to not updatable a special check has to be done to make sure it isn't put back into the active-entities list.
  • After all entities are loaded and setup fix any tile migrations that may have happened.

    • Tile migrations can invalidate the tiles that exist in the world and by fixing the tile(s) it can cause an entity built on them to be destroyed.
    • Entities need to be fully setup before one can be killed/destroyed because all of the logic around destroying them is setup to expect that invariant.
    • Additionally when tiles change the path finding system needs to know so it can record that maybe it can path through that chunk now if it previously thought it couldn't (due to it being all water or some other in-passable tile).
  • Go over any dummy electric energy sources created during loading and destroy and free them.

    • This can modify the electric network system in general when a network is removed and needs to be done after all entities are setup and valid in the world.
  • Up until this point if any error happens during loading all of the temporary data created for the loading process needs to be cleared explicitly so it doesn't reference the failed-to-load map data. If loading succeeds then the process continues.

  • Go over each surface and call setup() on them.

    • This restores AI related tasks and map generation related tasks adding them back into the "need to be processed" queues.
  • Setup the train manager

    • The rail system doesn't save which rails are connected to what signals in the save file but instead re-creates this data at setup time (this call).
    • The reason for this is: it would add a lot of extra data to the save which isn't strictly needed (increasing save time, load time, and save file size). Additionally rails, signals, and trains can change as a result of loading and can only be setup correctly once every other rail/signal/train is setup.
  • Go over any entity marked as needing to be "alarmed" and alarm it.

    • During setup of entities when prototype data has been marked as changed an entity may need to turn itself into the active state. This can't be done during setting up entities because the "alarm" system expects that all other entities in the world are already setup and at that stage of "set up all loaded entities" that isn't true. When this happens the entity adds itself into a list of "call alarm on me once everything else is setup".
  • Go over every "PreFinalLoadHelper" and call setup on them.

  • Go over every dummy object and destroy/delete them.

    • When something is migrated or removed during loading it's placed in a 'dummy ...' list to be destroyed/deleted after the setup process has finished. This allows the same "load and migrate" logic to be done on a standard entity and one that will be removed after loading finishes instead of needing to have 2 or more different loading systems.
    • Dummy 'objects' need to be destroyed/deleted in a specific order:
      • Electric energy sources first (they can have references to other entities/items)
      • Equipment before equipment grids (equipment can point at a electric network owned by a grid)
      • Everything else (order doesn't matter at the time of writing this)
  • Go over every "FinalLoadHelper" and call setup on them.

  • Call setup on the ElectricNetworkManager

    • This goes over every energy source that the manager knows about and if the "prototype data changed" flag has been set makes sure that energy buffers don't have more energy stored then the buffer is meant to hold.
  • Go over each surface and call "postSetup()"

    • postSetup() notifies the chunk generation system to start generating chunks in another thread should it have chunks it needs to generate.
    • The chunk generation system can't run until all of the tiles on the chunks are finalized and so this needs to be done after fixing tile migrations for the chunks and after entities stop changing.
  • Load and restore Lua data

    • Loading Lua data calls on_load for every mod that was loaded or on_init for any mod that was added that didn't exist when the map was saved.
    • Any time a Lua event is called the entire world needs to be in a valid state since mods have near unlimited access to modify the world during events.
  • Run the "configuration changed" Lua event if "prototype data changed" was set

    • This needs to happen after the entire map state has been setup correctly so Lua mods can react accordingly and modify the world if they need to.

Common load related questions:

  • What is a LoadHelper?

    • A LoadHelper is the loading equivalent of a SaveHelper however it has 2 stages: load(...) and setup(...)
  • How does a LoadHelper work (when is data actually loaded)?

    • A LoadHelper like a SaveHelper doesn't do anything when created. By putting it int the LoadHelper system it will have load(...) called on it after everything else has been loaded and will have setup(...) called on it after everything else has been setup.
  • Can you create a LoadHelper and not load anything?

    • Yes. Migrations are a typical case when you would want to do this: run some logic after everything else is setup.
  • What is a PreFinalLoadHelper?

    • These are LoadHelpers that get have setup(...) called after all entities have been setup but before 'dummy objects' have been destroyed. These do not have load(...) called on them.
    • When some object is being loaded it can create a load helper that it wants to run for example: resize my inventory to be smaller. This can't be done at load time because it could potentially delete items that are in the inventory which might themselves have created LoadHelpers which point at them. Additionally other things being loaded may have references to the items/entities which could cause problems if it simply stopped existing before the full destroy and delete system could be performed.
  • What is a FinalLoadHelper?

    • These are the same as PreFinalLoadHelpers except they're processed after dummy objects have been processed and deleted and are a general purpose "I need something to be done after everything else is finished".
    • Currently these are used for:
      • Setting up anything migrated to a rolling-stock entity type. Rolling-stock entity types can't be setup until every other entity is setup in the world since they connect to rails, rail signals, and modify other trains in the world.
      • Handling when logistic containers change type (from requester to storage/provider/other)
      • Clearing invalid crafting-queue orders in players.
  • Why are entities "setup" after they're loaded?

    • "setting up" an entity is essentially taking the set of data that makes up the entity and hooking it into the rest of the game state. Examples:
      • Setting up a rail entity connects the rail to the neighboring rails, rail signals, and train stops.
      • Setting up an inserter adds the inserter to the list of active entities on that chunk
      • Setting up a logistic storage chest adds that chest to the logistic network that it overlaps with
    • The "setup" process links the entity with other entities/structures and as such it needs the other entities to exist and be ready to be connected.
  • Why can't an inventory be resized while loading/while setting up entities?

    • Resizing an inventory (smaller or larger) will generate a "post-transfer-hook" event stating what about the inventory has changed. The rest of the game expects these events in order to track what items are in what inventories. During the loading and entity-setup process these hooks aren't fully setup and so if one of these events was fired during that not-fully-setup state it wouldn't be received be everyone who was expecting to receive them.
  • Why can't a Targeter or Targetable be destroyed during loading?

    • The TargetDeserialiser system stores a pointer to the loaded Targetable class so it can restore any Targeters loaded. If the Targetable class is deleted/invalidated before the TargetDeserialiser has had a chance to finish it will contain dangling pointers and crash if it ever tries to use them.
    • It is almost always an error when something is destroyed/deleted during loading so we enforce that it never happens and have a system to handle the edge case when we actually need it to happen (when a map fails to load and all data created up until that point needs to be thrown out).
@khlorghaal

This comment has been minimized.

khlorghaal commented Nov 24, 2018

this is the sort of thing a pure-functional orientation is perfect for

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment