Skip to content

Instantly share code, notes, and snippets.

@mosra

mosra/fs.md Secret

Last active January 7, 2017 18:57
Show Gist options
  • Save mosra/d64d4388d6a3bef80c6226ea6b479d6d to your computer and use it in GitHub Desktop.
Save mosra/d64d4388d6a3bef80c6226ea6b479d6d to your computer and use it in GitHub Desktop.
Filesystem abstraction API proposal

Filesystem library

Goal: cross-platform abstraction over file enumeration, reading and writing.

Example usage

  • Use the disk filesystem and memory-mapping OS functionality on desktop for shorter iteration times and easier debugging, while having the same client code treat memory-mapped package resources on Android or HTTP URLs on Emscripten as a filesystem.
  • Use system-provided filesystem location on Unix-like OSes, registry on Windows and HTML5 client storage on Emscripten to store local user configuration.
  • Cherry-picking files from a ZIP archive on a HTTP URL without fully downloading the whole file.

Base functionality

  • A new Corrade::Filesystem library that provides a AbstractFilesystem plugin interface.
  • Each plugin statically defines a set of features it supports:
    • file reading (might not be supported in e.g. HTTP PUT)
    • file writing (might not be supported in read-only filesystems)
    • file listing (might not be supported when opening e.g. HTTP URL)
    • file memory mapping
    • more?
  • Each plugin implements a subset of the API:
    • Opening an URI
    • Opening a memory
    • Listing a directory
    • Reading full contents of a file
    • Mapping a file to memory (returning array with custom deleter)
  • A bunch of (platform-specific) plugins implmeneting filesystem functionality:
    • Filesystem (disk-based)
    • Resource (compiled-in resources using Utility::Resource)
    • Zip, Tar, ... (file archives)
    • EmscriptenHttp (reading HTTP URLs, Emscripten-specific)
    • AndroidApk (accessing Android package contents, Android-specific)
    • EmscriptenClientStorage (R/W in-browser client storage, Emscripten-specific)
    • more?

Cherries on top

Path handling

  • In URI passed to a open function: forward slashes, things before first / have platform-specific meaning (e.g. http://, C:/)
  • In file listing: forward slashes, all characters allowed, / prefix denotes absolute path, .., . do the usual stuff

Working directory

  • Working directory get/set functions (wd(), cwd())
    • Names?
  • Opening a filesystem makes opened path the root (it's not possible to cwd() up)
  • Paths passed to all functions are relative to current working directory, /, .., . work as expected
    • Handle this in the plugin interface and pass only absolute paths to plugin implementations to reduce code duplication?

Byte serving API

  • Alternative to memory mapping on platforms that don't support it
  • Needs a concept of currently opened file in addition to a concept of currently opened filesystem
  • Functionality for getting file size and geting a range of bytes out of currently active file
    • Might be a bottleneck on some platforms if used too extensively (while probably okay with disk-based filesystem, virtual call overhead would be very visible w/ memory-based filesystems)

Feeding filesystems into each other

  • New function for opening a file inside another filesystem
  • HTTP URL -> XZ archive -> tar archive -> a file
  • Lazy data download
  • May result in memory saved compared to the obvious approach of loading whole files, extracting them fully and then opening those as in-memory locations with other filesystem plugins
  • Would depend on byte-serving API
    • Virtual calls might be a ottleneck

Asynchronous file access

To make things simpler on all fronts, threading should be done completely on client side and all APIs should be blocking and synchronous. I might revise this statement later in case I come across filesystem APIs that can't be persuaded to be used synchronously (but even Emscripten provides a way, so I doubt it).

The synchronous URL loading in Emscripten requires quite a lot of boilerplate for setting up the web worker. There's emscripten_async_wget() family of functions that should be used instead.

Not sure how to design the async API yet:

  • callbacks?
  • something else (message passing)?

More ideas (a braindump):

  • In http://flohofwoe.blogspot.cz/2013/12/asset-loading-in-emscripten-and-pnacl.html there is a good point that having both sync and async API makes things pretty bloated, but dropping the sync API and having just the async would make writing of tools unnecessarily hard
  • Some default implementation of the async version so it's easy to use it?
  • Separate plugin interface for async file loading (i.e. what's actually used in apps) and directory listing and sync stuff (i.e. what's used in tools)? Can the distinction be even made? What about an app that actually wants to use the tooley APIs?

More ideas

  • AnyFilesystem that would pass through the URLs to various plugins based on prefix (file://, http:// etc.). What about file extensions (*.zip), handle them as well? Support crazy things like http://serv.er/path/to/file.zip/path/to/image.png? :O
  • Concatenating more filesystems into one (e.g. on-disk with compiled-in resources)
  • Magnum importer APIs making use of this
    • streaming parts of WAV audio files out of a huge all-radios-160hours.zip archive
    • opening internet radio streams as files
  • Support for endless files / non-seekable files (internet radio streams, webcam input...)
  • (crazy thinking) abstraction of device filesystem viewed through Android ADB commands for easier tool writing

Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment