WarpspeedSCP/GSoC_mid_report.md

## GSoC_mid_report.md

      
    Raw
  

              GSoC_mid_report.md
            
          
Project: Godot cache module
Student: Raghav Shankar
Mentors: Ariel Manzur, Hein-Pieter van Braam
Repository: https://github.com/WarpspeedSCP/godot/tree/wip-patch

About this project

Godot engine is pretty easy to use for most things and is becoming a better competitor to unreal engine and unity by the day. But one area where it's lagging behind is in the way it handles file and network IO on various platforms, especially on systems like consoles.
Nowadays, all IO operations are cached to speed up access to data from hard drives or the network. The cache sits in RAM and holds on to information that is frequently accessed so we don't need to wait a long time for the data. For desktop and mobile systems (like android and iOS), this may be less of a problem because the OS can provide caching for disk and network IO. But consoles and other more specialised systems may not have an OS that does this for us, which means we may need to do the caching ourselves.
The current mechanism that Godot engine provides for such cases is pretty bad - it only reads ahead, it doesn't allow for seeking backwards and it's only for files on disk. My project aims to provide a more flexible solution that manages caching centrally, and allows for using different caching strategies as the situation demands through a C++ module which can be dropped in at compile time.
Current progress

As of now, about half of the project's goals are complete. I have created a cache structure that can hold data from multiple files which may be cached under different policies like FIFO (first in, first out) or LRU (least recently used).
The system is designed so that the actual interaction with the network/file system, which may take time, takes place on a separate thread alongside the rest of the engine. If data is already in the cache because we've read ahead a little, access will be much faster. Otherwise, we only need to wait a little while the bits of the file that are needed are loaded in the background.
The engine sees a normal file interface which it can use to read, write and seek within a file. Behind this front end, the module keeps track of what parts of the file are in the cache, and loads more of the file into the cache on demand.
The cache breaks each file into a bunch of equally sized parts which can be easily shifted in and out of the cache.
There are three different caching algorithms I've set up.
FIFO (First In First Out)

FIFO is a straigntforward algorithm that just reads ahead by some number of parts. If the cache runs out of space, this algorithm discards the oldest part first. This may be ok for a file that doesn't ever need to be seeked through, and which will only be read sequentially.
LRU (Least Recently Used)

LRU is a great caching algorithm that handles cases where we may read old data again very well. My particular implementation keeps a list of parts in order of the time they were accessed. When we run out of space, we discard the part that was accessed least recently (hence the name).
Permanent store

Sometimes we may want to use a file for a really long time, to the point where it's probably going to be open the entire time the game is running. Maybe it's for logging, or for autosaving progress. I've included a caching policy for this use case as well. Parts of files that are accessed with this policy are cached in the same manner as with LRU, except that permanent cached parts cannot be replaced by parts of other files.
For example, if we have a choice betweeen a permanent part and an LRU part which may be replaced, we must choose to replace the LRU part instead of the permanent one.
In summary

Currently, files can be read from and I've got some GDScript integration set up to test things. I want to be able to write to files as well, but right now there are still a few bugs in the read logic that I need to squash.
What's next?

For now, the module uses standard library file handling functions in the backend. These can be switched out for the platform specific unbuffered IO functions. That way, things won't be cached twice by both the OS and the engine. I plan to add unbuffered versions of the FileAccess class specific to each platform for this reason.
By August, this module should be feature complete with regards to my proposal.
What more can be done?

I want to provide support for something like magic streams, which are a way to store assets as a contiguous stream of data, where assets are stored in the order they are accessed in a game. So for a game, if the splash screen asset is loaded first, and then a character model, the magic stream for the game would have the contents of the splash image file appearing first, then the contents of the model, and so on.
Magic streams are great because they reduce the amount of time spent seeking for hard drives (since all the assets are stored contiguously in one file), and could also be useful for streaming assets over the network (just one request for all the data).
I also think this module could serve as a basis for streamlining the asset loading pipeline for Godot engine, which is currently single threaded.