Skip to content

Instantly share code, notes, and snippets.

@kevinwright
Created September 17, 2015 18:21
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kevinwright/9233d4582f504420a460 to your computer and use it in GitHub Desktop.
Save kevinwright/9233d4582f504420a460 to your computer and use it in GitHub Desktop.

What is a file?

It's more than just a path, you can have a well formed path that refers to a non-existant file.

It's not just a sequence of bytes on a storage medium, it also involves other metadata - such as a path, modification timestamp, etc.

Then we have "files" under procfs on linux systems, and named pipes, and network streams - all accessible fia "file handles". There are different and deeply interwoven concepts that are all known as "file" in different contexts. Some of these can be immutable.

First, there's the path. This is completely immutable. Paths can have many operations, you can get the parent path, find the path of the some subdirectory(as a new immutable path instance), etc. Path needn't have an operation to determine if it's well-formed, as this can be done at construction time.

In the face of named pipes et al, paths should really just be represented as uris. Path can then be introduced as a subclass of Uri, with an additional method to return the "native" representation.

There are also functions that operate on paths, but don't posess referential integrity, such as determining if a file exists at some path. Because side effects are involved, it's not valid for such functions to be methods on paths.

The next layer is some sort of File reference (or handle, moniker, etc. I've seen them all used before). I favour FileRef because it's concise and clear in meaning. A FileRef might be immutable, though often isn't, a good copy-on-write filesystem would also be able to provide immutable FileRefs. To go from a Uri to a FileRef, there needs to be a factory method capable of returning different FileRef subclasses. One such subclass would be NonExistentFileRef (which is immutable), another such subclass would be DirectoryRef. Given that we're using URIs in lieu of paths, a FileRef could also represent a web page, remote FTP file, named pipe, etc. It's doubtful that "file" ref is even a good name here, but it's historic and has a lot of stucking power.

As FileRefs can be immutable, they also mustn't contain methods that can't be free from side-effects. So delete won't be an available operation - it has to be a separate function. All you can do with a FileRef is obtain the path and metadata, and open the contents. In the case of a directory, the contents would be a collection of contained FileRefs, for a network connection the contents would be a read-once stream, for a "standard" file they could be a stream or random-access buffer. The exact subclass of FileRef (maybe combined with traits or interfaces) then determines exactly what form of "contents" are available.

To ensure that COW FileRefs are immutable, writing/modifying would then have to be done via function that transforms one instance of file contents to another instance. This function is then passed to a method on FileRef that applies the translation and returns the modified FileRef instance.

DCI isn't necessary here, as all the relevant differences can be captured in the subclass hierarchy of FileRef. This also neatly sidesteps any considerations about equality on roles :)

As for methods like delete or rename... they'd need to go in a utility class of static methods or a singleton. I'd call it FileSystem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment